# CADDIS Volume 4. Data Analysis: Selecting an Analysis Approach

## How Can I Use My Data?

### Analysis of State and Regional Monitoring Data

Analysis of state and regional monitoring data can inform two questions relevant to causal assessment:

**Do Environmental Conditions or Biological Characteristics at a Test Site Differ from Expectations?**

In most causal assessments, we start with information that observed biota at a test site are impaired. Often, this means the biota differ from reference expectations for that site. Data analysis can often determine whether certain stressors also differ from reference expectations.

**What is the Relationship Between a Stressor and a Response in a Particular Region?**

Stressor-response relationships can provide an estimate of effect magnitude for a given stressor level. Different statistical approaches can be used to estimate stressor-response relationships with varying degrees of confidence. Stressor-response relationships may be derived using data from the case or from other field studies. Each of these represents a different type of evidence, and each may raise different issues. For example, other field studies may require consideration of covariants across the larger study area.

## Establishing Differences from Expectations

Establishing that biological or environmental characteristics at a test site differ from expected values is a key analysis for causal assessment. Expectations regarding site characteristics can be based on a single reference site (e.g., upstream of the test site) or on a set of comparable, regional reference sites. Analytical approaches range from a simple comparison of measurements to formal statistical tests.

**Comparison of Values**

If only a single measurement is available at the test site and at the reference site, then one can only compare these two values. Interpretation of whether the difference between the two values is meaningful requires an understanding of the inherent variability of the measurements and an understanding of ecologically meaningful differences in value.**Box Plots**

Box plots graphically represent the distribution of a set of samples, providing a visual means of assessing whether a test site value deviates from the range of conditions observed at a single reference site or a set of similar reference sites.**Tests of Significant Difference**

Given enough data from different samples at a reference site, or data from several different reference sites, one can explicitly calculate the probability that the observation from the test site could have been observed at the reference site(s). A low probability (e.g. less than 5%) would suggest that the observation at the test site likely differs from expectations defined by reference sites.**Natural Variations**

In cases where expectations are defined using a set of regional reference sites, it is likely that natural variations will be present in the characteristic that is being compared. Analyzing natural variation can help distinguish differences between test and reference sites that are meaningful.

## Estimating Stressor-Response Relationships

Stressor-response relationships estimated from field data can potentially inform two types of evidence: stressor-response from the case and stressor-response from other field studies.

### Stressor-Response From the Field (Using Data from the Case)

### Stressor-Response From Other Field Studies

**Explore Associations Between Variables in the Data Set**- Identify extreme observations and autocorrelation that may influence estimates of relationships.
- Calculate correlations and view scatterplots to reveal relationships between pairs of variables.
- Use multivariate approaches to reveal relationships among groups of variables.
- Identify variables that may confound the estimated relationship.

**Estimate Effects**- Classification and regression trees can suggest possible discontinuities in relationships of interest.
- Regression analysis provides an estimate of the mean relationship between the biological response and stressor of interest. In some cases, the effects of possible confounding variables can be controlled by including them in the regression model, but estimates of effect may be unreliable when variables covary too strongly.
- Quantile regression provides a way to estimate the upper bound of the relationship between a stressor and a biological response. Under certain assumptions, this upper bound may provide a reasonably accurate estimate of the stressor-response relationship.
- Propensity score analysis provides a powerful means of controlling for the effects of covarying variables, and accurately estimating effects.

**Interpret Results**- Have most potential confounders been treated in the analysis?
- What do the significance test results mean?

Volume 4: Authors