General Overview of Probabilistic Surveys
This web page provides a non-technical overview of the background and general principles underlying probability based aquatic resource surveys to determine aquatic resource condition. It is intended for resource managers who may have had little statistical training, but who feel they would benefit from a better understanding of the statistical and scientific underpinnings of survey sampling. Familiarity with these concepts is helpful for understanding the kinds of information survey sampling can provide.
Status, Change, Trends
Principles of Survey Designs
Ecological Resource Types
Types of Statistical Designs
Steps for Implementing a Sample Survey
Developing a Sampling Frame
Selecting a Representative Sample
Collecting Data for Sampled Sites
Communication of Results
The Aquatic Resource Monitoring strategies and methodologies on this web site have been largely developed and tested within EPA's Office of Research and Development (ORD) research program over the past decade. The Environmental Monitoring and Assessment Program (EMAP) is a long-term research program designed to develop the scientific basis for monitoring programs that measure the current and changing conditions of the nation's ecological resources. EMAP achieves this goal by using statistical survey methods that allow scientists to assess the condition of large areas based on data collected from a representative sample of locations. Statistical survey methods are very efficient because they require sampling relatively few locations to make valid scientific statements about the condition of large areas (e.g., all wadeable streams within an EPA Region).
Regional-EMAP (R-EMAP) is a partnership between EMAP and EPA Regional Offices to adapt EMAP's broad-scale approach to produce ecological assessments at regional, state, and watershed levels. R-EMAP is based on the same statistical survey techniques used in EMAP, which have proven successful in many disciplines of science. Applying these techniques effectively requires recognizing several key principles of survey sampling and using specialized, although not difficult, design and data analysis methods.
Many states, as well as other federal agencies and other countries, have been increasingly asking EMAP to participate with them to design monitoring programs that are consistent with EMAP's goals. The Aquatic Resource Monitoring Design Team typically provides expertise on clarifying an organization's monitoring objectives, selecting sites using a statistical survey design, using necessary statistical analysis algorithms, and evaluating alternative field measurement protocols. The Design Team is not always able to provide all the support requested, but can provide information to further an organization's development of a probability-based monitoring program. Go to an overview of the State monitoring programs and their progress towards implementation of probabilistic designs. Go to Top
It is useful to distinguish among these terms and understand their influence on survey approaches and designs. Status is often seen as a "snapshot" of resource conditions at a certain time, e.g., the number of stream kilometers in Region III that meet their designated uses during the 2001 Survey. Providing information on change requires the ability to compare the resource status at two different time intervals, e.g., the estimated number of stream kilometers that meet their designated uses in 2001 compared to 1990. Trend questions require several estimates of resource status, often over longer time periods, e.g., the trend in annual status for nitrate concentration in the Santiam River at its confluence with the Willamette River between 1980 and 2001. Generally, different survey designs and strategies have strengths and weaknesses in their ability to provide estimates for status, change and trend, often requiring trade-offs among competing objectives within studies seeking to address all three. Go to Top
There are two generally accepted data collection schemes for studying the characteristics of a population. The first is a census, which entails examining every unit in the population of interest. For most ecological studies, however, a census is impractical. For example, measuring fish assemblages everywhere to assess conditions within a watershed that has 1000 kilometers of stream would be prohibitively expensive.
A more practical approach for studying an extensive resource, such as a watershed, is to examine parts of it through probability (or random) sampling. Studies based on statistical samples rather than complete coverage (or enumeration) are referred to as sample surveys. Sample surveys are highly cost-effective, and the principles underlying such surveys are well developed and documented. The principles of survey design provide the basis for (a) selecting a subset of sampling units from which to collect data, and (b) choosing methods for analyzing the data.
One example of a sample survey is an opinion poll to estimate the percentage of eligible voters who plan to vote Democratic in a presidential election. Such opinion polls are based on interviews with only a small fraction of all eligible voters. Nevertheless, by using statistically sound survey methods, highly accurate estimates can be obtained by interviewing a representative sample of only around 1200 voters. If 700 of the polled voters plan to vote Democratic, then the fraction 700/1200, or 58 percent, is a reliable estimate of the percent of all voters who plan to vote Democratic.
The approach used in conducting a stream sample survey is basically the same as in an opinion poll. Instead of collecting the opinions of a sample of people, a watershed or ecoregion project might collect data about fish assemblages from a representative sample of point locations along the stream length in the watershed or ecoregion to determine the percent of kilometers of streams in which ecological conditions are degraded. If data are collected from plots of, say, 40 times the stream width in length at each of 40 randomly selected sites, and 16 of the 40 sites exhibit degraded conditions, then the estimated proportion of degraded stream kilometers in the watershed or ecoregion would be 40% (i.e., 16/40). Go to Top
An important perspective within the design process is one of resource types. This perspective differs from an ecological one (estuaries, lakes, streams) in that it focuses on the spatial nature and extent of the resource. Three Ecological Resource Types and their survey design implications are commonly addressed by the Design Team to meet aquatic resource monitoring objectives. Go to Top
Gathering information on a population (human or natural resource) is usually done following a defined strategy and methodology or type of design. Survey designs that are statistically based are one such approach. The strategy and methodology developed for aquatic resource monitoring is based on probability surveys. Go to TopSteps for Implementing a Sample Survey
The survey design is a plan for selecting the sample appropriately so that it provides valid data for developing accurate estimates for the entire population or area of interest. Planning and executing a sample survey involves several steps: (1) completing a Survey Design Process (see Design Process Overview), which includes establishing objectives and design requirements, establishing the target population and sampling frame, selection of the survey design and selection of a random sample of units, (2) specification of a response design to be followed for collecting data from the selected units, (3) summarizing the data with statistical analysis procedures appropriate for the survey design (see Analysis Process Overview), and (4) communicating the results. The same kinds of techniques used to select the sample of people to interview in an opinion poll are used to select the sample of sites from which to collect field data. Go to TopDeveloping a Sampling Frame
A key task in the development of a sample survey is establishing a clear, concise description of the target population. In statistical terminology the target population (often shortened to "population") does not necessarily refer to a population of people. It can be a population of schools, area units of farm land, freshwater lakes, or the network of streams. The description of the target population must include sufficient descriptions and criteria to correctly classify as target or non-target all potential units of the population. Note that the target population description does not contain any information on specific units, i.e., location, size, condition, etc.
The specific information, usually a list or map, that identifies every unit within the population of interest is the sampling frame. Such information is needed so that every individual member of the target population can be identified unambiguously. The individual members of the target population whose characteristics are to be measured are the sampling units.
For example, if we were conducting a sample survey to estimate the percentage of students at a university who participate in intramural sports, the target population would consist of all the enrolled students. The individual students would be the sampling units, and the registrar's office could provide a list of students to serve as the sampling frame. We could draw a representative (random) sample of students from this list and interview them about their participation in sports. Their responses would be "yes or no." The percentage of interviewed students who participate in intramural sports would yield an estimate of the "true" percentage for all students.
For a stream survey, the target population might be all perennial, wadeable streams in a watershed. The sampling unit is a point (longitude and latitude) along the stream length, and an associated plot, e.g., 40 times the stream width in length. The response variable might be "degraded" or "non-degraded" based on measures of water quality. Conceptually, the collection of all possible point locations along these streams serve as a sampling frame, similar to the list of students in the previous example. The sampling frame for streams typically would be established by using the U.S. River Reach files (RF3 or NHD) through a geographic information system (GIS). Thus, a map of the watershed with a spatial representation of all perennial, wadeable streams is an example of a sampling frame. Go to TopSelecting a Representative Sample
Survey sampling is intended to characterize the entire population of interest; therefore, all members of the target population must have a known chance of being included in the sample. Conducting an election poll by asking only your neighbors' opinions probably would not enable you to accurately predict the outcome of a national election.
Simple random selection ensures that the sample is representative because all members of the population have an equal chance of being selected. Random selection can be thought of as a kind of lottery drawing to determine which stream reaches, for example, are included in the sample. The selection is non-preferential towards any particular reach or group of reaches. One way to make a random selection would be to place uniquely numbered ping-pong balls (one for each sampling unit) into a drum, thoroughly mix the drum, and then blindly pick one ball corresponding to each stream reach (i.e., sampling unit) from which data are to be collected. In practice, picking the sampling unit is more complex and computers are used to make the random selections. Either way, the result is a subset of sampling units randomly selected from the sampling frame. Go to TopCollecting Data for Sampled Sites
Before collecting data on the selected units, procedures on how the information will be obtained are needed. In sample surveys of human populations, the procedures may be the construction of a questionnaire or a telephone interview form. How questions are phrased, the order in which they are presented, the time it takes to complete the questionnaire or interview can, and do, influence how people answer. Two different versions of the same question can lead to different answers from an individual, potentially invalidating the survey.
How data is collected on an aquatic resource, such as a stream, also impacts the interpretation of the survey results. An explicit, precise protocol is needed for each type of data to be collected. The protocol may specify what type of instrument to use, how to preserve a field sample, the type and size of field plot as well as many other items. The collection of procedures is termed the response design. The goal of the response design is to ensure the collection of consistent information at all sampling units (potentially by different crews, at different times, and years). Go to TopSummarizing the Data
Once the units have been selected and data obtained, a statistical summary is needed to provide information to meet the survey objectives. For a public opinion poll, the summary may be as simple as the percentage of people who state that they plan to vote for a particular candidate. How the percentage is calculated depends on the survey design used to collect the data. When the survey design is a simple random sample, the percentage is calculated simply as the number of respondents who plan to vote for the candidate divided by the total number of respondents and multiplied by 100 to convert to a percentage. If the survey design is more complex (and most are), then the calculation is more complex but conceptually the same. The same is true for aquatic probability survey designs. The analysis can not be done independent of the design. Go to TopCommunicating the Results
No sample survey is useful until the results are promptly and unbiasedly communicated to the intended audience. A carefully conducted probability survey can be negated by a poorly written report. Reports of a public opinion poll may re-phrase the question asked on the questionnaire, giving readers a different impression of the question's meaning. Results may be reported as percentages and not actual values, giving the impression that a problem is a minor issue rather than impacting millions of people. Graphs may be constructed to emphasize a particular point of view. Similar issues arise with aquatic monitoring. Numerous examples of communicating aquatic resource survey results are included on this web site.Go to Top