Sampling Design (continued)
Free information is cost-effective
Why sample randomly? The key point is free information. If site selection is random and all sites are sampled with an equal or known probability, then information from the sampled sites can be used to infer the condition of sites not sampled. Thus, results based on a random sample of sites can be scaled up to the entire population of sites within a region, as long as each site in the region could have been included in the sample. The knowledge obtained through random sampling about the unsampled sites is essentially free information. The only other way to know something about every site in a region would be to sample each one. Sampling all the sites is much more expensive than randomly sampling a few.
If sites are selected according to any method other than randomization, then statistics and conclusions derived for those sites can be safely applied only to those sites sampled. Generalizing from a nonrandom sample to the larger unsampled population of sites typically yields biased conclusions. The statistical truth of this fact may not be compelling; in fact, it seems somewhat counterintuitive that a large enough sample size would fail to be representative of general conditions in a region. Nonetheless, numerous large-scale studies based on real data have demonstrated the inevitable bias in nonrandom sampling schemes, even when hundreds of sites were sampled (Paulsen et al., 1998; Stoddard et al., 1998; Peterson et al., 1999).
In the controversy over how probabilistic sampling fails to meet state needs, the coarse scale of sampling is often emphasized as inadequate at the state level (White and Merritt, 1998; Yoder and Rankin, 1998). One relevant concept that is often missed in the discussion is that the scale of probabilistic sampling can be altered to fit a state's more local assessment needs. Both USEPA and state programs ask the same types of questions, but they ask them at different scales. USEPA wants to know if surface waters in a state or region are getting better or worse. A state manager or citizen group wants to know if a specific stream, or basin, is getting better or worse. Same question, different scale. The EMAP concept of random sampling and free information could be applied at a different spatial scale such as a basin or a watershed to infer the condition of other sites or reaches not sampled within the basin or watershed. The EMAP method used to define and select sampling units is flexible and was intentionally designed to be applicable at different spatial scales including the conterminous U.S., geographic regions covering multiple states, or smaller regions defined within state boundaries (Herlihy et al., 2000).
Our statistical ability to detect change in regional condition is also independent of the size of the region when regional condition is summarized in terms of the percentage of sites meeting (or failing to meet) specific criteria. For example, suppose in the first year of sampling that 50 sites are randomly selected and sampled. Of these, 50% (25 sites) fail to support their designated uses and are considered impaired. If 50 new sites are sampled the next year, a change of greater than 12% in either direction represents a significant change (at 90% confidence) in stream condition within the sample region (USEPA, 2003). These types of summary statistics based on proportions along with their estimates of precision would apply to a probabilistic survey of any 50 sites, whether sampled within a county or within a state.
Reference sites did not always meet criteria for reference condition
Expectations for biological indicators such as multimetric indexes are based on observed conditions at undisturbed or minimally disturbed locations. These reference sites represent a standard for what the biological assemblage would look like in the absence of human influence (Hughes, 1995). Given a probabilistic sampling design, the concern developed during the course of the MAIA pilot that only moderately disturbed sites would be included in the data because a moderate level of human influence was typical throughout the region. As a consequence, examples from the ends of the spectrum, that is, sites with minimal human influence and extreme degradation, would be missing from the data. Therefore, to supplement the EMAP design data, local biologists helped select reference and impaired sites based on their experience and knowledge of the region, i.e., according to their best professional judgment (Gerritsen et al., 1994). The idea was to use these "hand-picked" sites to test metric response to disturbance and then use the probabilistic sites to estimate regional status and trends.
In the meantime, researchers developed a concept of reference condition and defined specific criteria for the Mid-Atlantic (Hughes, 1995; Waite et al., 2000; Klemm et al., 2002). Ironically, most of the hand-picked reference sites (44 out of 60, or 73%) failed to meet the independently-established criteria for reference condition. The lesson learned was that best professional judgment should always be confirmed with objective criteria.
The criteria for reference condition included acid-neutralizing capacity (ANC) > 50 µeq/L, total P < 20 µg/L, total N < 750 µg/L, chloride (CL-) < 100 µeq/L, SO4 < 400 µeq/L, and mean RBP habitat scores >15 (based on USEPA's Rapid Bioassessment Protocol; Barbour et al., 1999). Reference sites had to satisfy all criteria. Five of the six criteria were based on water chemistry and selected for their close association with specific human activities in the watersheds. Chloride increased along with development, total N and P were associated with agricultural intensity, low levels of ANC indicated acid rain, and SO4 was related to mine drainage (Herlihy et al., 1990; Herlihy et al., 1993; Herlihy et al., 1998). Therefore, extreme values of these chemical measures also indicated other potential stressors and disturbances associated with these human activities.
The "hand-picked" reference sites selected according to best professional judgment included sites with strong indications of intense human disturbance (Figure 1). For example, values for total N > 3000 µeq /L are usually considered high and are typical of urban or agricultural land use. Chloride values > 300 µeq /L also represented some of the highest values in the entire data set and were associated with relatively high levels of urban development. Similarly RBP values below 12 represented obvious signs of human influence nearby or within the sample reach.
Although the first years of random site selection may not have included a large enough sample of reference sites, successive years of sampling did provide a broad enough range of human influence to test and develop biological indicators. In fact, over time, researchers moved away from the idea of a simple comparison of reference vs. impaired sites and adopted a more sophisticated approach that evaluated metric response across multiple gradients of human disturbance. Thus, the probabilistic survey design yielded an adequate range of conditions for testing metrics. Furthermore, objective definitions of site condition and impairment represented a more reliable approach for testing and selecting biological indicators than did definitions of site condition based on best professional judgment.
Figure 1. Comparison of reference sites selected on the basis of "best professional judgment" (BPJ; light bars) with sites selected according to reference condition criteria for total N, total P, chloride, and mean RBP score (dark bars). Arrows indicate criteria defined for reference condition; higher values to the right of the arrows indicate greater disturbance; for RBP scores higher values indicate better condition. The broad range of values for BPJ sites indicate that highly disturbed sites were initially included as reference sites.
<< previous page
next page >>
![[logo] US EPA](http://www.epa.gov/epafiles/images/logo_epaseal.gif)
