An official website of the United States government.

We've made some changes to If the information you are looking for is not here, you may be able to find it on the EPA Web Archive or the January 19, 2017 Web Snapshot.

CADDIS Volume 4

CADDIS Volume 4. Data Analysis: Selecting an Analysis Approach

How Can I Use My Data?

The Stressor Identification approach (described in Volume 1) does not require a minimum data set. Existing data often are sufficient to determine the cause of impairment. When data are insufficient to support an assessment, Stressor Identification can be used to identify priority data needs. When available, you also can use larger monitoring datasets for quantitative causal analysis.

Analysis of State and Regional Monitoring Data

Analysis of state and regional monitoring data can inform two questions relevant to causal assessment:

Do Environmental Conditions or Biological Characteristics at a Test Site Differ from Expectations?
In most causal assessments, we start with information that observed biota at a test site are impaired. Often, this means the biota differ from reference expectations for that site. Data analysis can often determine whether certain stressors also differ from reference expectations.

Answering this basic question can support evaluation of spatial/temporal co-occurrence. To establish co-occurrence, we assess whether a stressor is present when the biological effect is observed. We can do this by comparing stressor levels at impaired and reference sites.
This same question often underlies evaluation of the verified prediction type of evidence. We hypothesize that a particular stressor caused the observed impairment. Then, based on this hypothesis, we make a prediction regarding biological characteristics at the impaired site. Suppose increased bedded sediment is a possible stressor. We might predict that clinger taxa are reduced at the impaired site (see the Helpful link for traits). We then might assess whether clinger taxa richness differs from reference expectations at the impaired site.

What is the Relationship Between a Stressor and a Response in a Particular Region?
Stressor-response relationships can provide an estimate of effect magnitude for a given stressor level. Different statistical approaches can be used to estimate stressor-response relationships with varying degrees of confidence. Stressor-response relationships may be derived using data from the case or from other field studies. Each of these represents a different type of evidence, and each may raise different issues. For example, other field studies may require consideration of covariants across the larger study area.

Top of Page

Establishing Differences from Expectations

Establishing that biological or environmental characteristics at a test site differ from expected values is a key analysis for causal assessment. Expectations regarding site characteristics can be based on a single reference site (e.g., upstream of the test site) or on a set of comparable, regional reference sites. Analytical approaches range from a simple comparison of measurements to formal statistical tests.

  • Comparison of Values
    If only a single measurement is available at the test site and at the reference site, then one can only compare these two values. Interpretation of whether the difference between the two values is meaningful requires an understanding of the inherent variability of the measurements and an understanding of ecologically meaningful differences in value.
  • Box Plots
    Box plots graphically represent the distribution of a set of samples, providing a visual means of assessing whether a test site value deviates from the range of conditions observed at a single reference site or a set of similar reference sites.
  • Tests of Significant Difference
    Given enough data from different samples at a reference site, or data from several different reference sites, one can explicitly calculate the probability that the observation from the test site could have been observed at the reference site(s). A low probability (e.g. less than 5%) would suggest that the observation at the test site likely differs from expectations defined by reference sites.
  • Natural Variations
    In cases where expectations are defined using a set of regional reference sites, it is likely that natural variations will be present in the characteristic that is being compared. Analyzing natural variation can help distinguish differences between test and reference sites that are meaningful.

Top of Page

Estimating Stressor-Response Relationships

Stressor-response relationships estimated from field data can potentially inform two types of evidence: stressor-response from the case and stressor-response from other field studies.

Stressor-Response From the Field (Using Data from the Case)

For this type of evidence, an association in which the magnitude of the biological response decreases as stressor levels decrease in measurements collected from the same stream would be consistent with a causal relationship. This relationship between stressor and response can be shown simply with a scatterplot. In cases in which the variability in the measured response data is too high to discern a response, a regression fit to the data may help assess whether biological response changes as hypothesized.
More information about analytical tools used to support this type of evidence  can be found on the pages describing scatterplots and regression analyses.

Stressor-Response From Other Field Studies

For this type of evidence, we use data collected from a larger study area to quantify the effects of the stressor on the biological response. Accurate estimates of effects can be difficult to obtain because of the strong possibility of covarying factors in field-collected data. In many cases, a more attainable analysis goal may be to simply determine whether the stressor of interest causes effects in the biological response.
A methodical approach to analysis can be helpful, including the following steps:
  1. Explore Associations Between Variables in the Data Set
    1. Identify extreme observations and autocorrelation that may influence estimates of relationships.
    2. Calculate correlations and view scatterplots to reveal relationships between pairs of variables.
    3. Use multivariate approaches to reveal relationships among groups of variables.
    4. Identify variables that may confound the estimated relationship.
  2. Estimate Effects
    1. Classification and regression trees can suggest possible discontinuities in relationships of interest.
    2. Regression analysis provides an estimate of the mean relationship between the biological response and stressor of interest. In some cases, the effects of possible confounding variables can be controlled by including them in the regression model, but estimates of effect may be unreliable when variables covary too strongly.
    3. Quantile regression provides a way to estimate the upper bound of the relationship between a stressor and a biological response. Under certain assumptions, this upper bound may provide a reasonably accurate estimate of the stressor-response relationship.
    4. Propensity score analysis provides a powerful means of controlling for the effects of covarying variables, and accurately estimating effects.
  3. Interpret Results
    1. Have most potential confounders been treated in the analysis?
    2. What do the significance test results mean?

Volume 4: Authors

Top of Page