CADDIS Volume 4: Data Analysis

## Basic Principles & Issues

##### Interpreting Statistics

- Author: D. Farrar

### Confounding

The effect of a stressor on a measure of biological condition (i.e., the stressor-response relationship) may be misunderstood if other environmental variables or stressors that may affect the biological measure are ignored. In many cases, a simple relationship observed between a measure of biological condition and a single stressor may reflect the effects of additional stressors. For example, increased urban land use encompasses many different stressors (e.g., increased flow flashiness, increased concentrations of different pollutants, and degraded physical habitat), all of which can influence the aquatic biological community.

Analyses to estimate stressor-response relationships often must take measures to avoid attributing biological effects to a single stressor when observed effects are as readily attributable to simultaneous exposure to multiple, associated stressors. This issue is particularly important when estimating stressor-response relationships from large, regional data sets, in which multiple, associated stressors are common.

#### Identifying concomitant variables

We use the term *concomitant variables* for variables that might
confound estimates of the effect of a stressor variable on a measure
of biological condition. Conceptual
diagrams showing linkages
between sources, stressors, and biological responses can help one
identify a set of concomitant variables. In particular, one should
look for variables that provide alternate pathways linking the
stressor of interest and the biological effect, and include variables
that block these pathways (see Confounding: Details for more information).

#### One approach for controlling for confounding variables

Stratum | SED (%) | r |
---|---|---|

1 | 0 - 7 | 0.03 |

2 | 8 - 14 | 0.12 |

3 | 15 - 28 | 0.08 |

4 | 29 - 46 | 0.25 |

5 | 47 - 76 | 0.09 |

6 | 77 - 100 | 0.15 |

For a basic data analysis tool that can address confounding to some degree, we emphasize scatterplots in combination with stratification. Stratification breaks the dataset into subsets (i.e., strata) that are relatively homogeneous with respect to one or more concomitant variables. If there is adequate variation within strata for the stressor of interest, one can evaluate the stressor-response relationship with concomitant variables approximately fixed, minimizing their influence on the estimated relationship. In a scatterplot, the strata may be labeled distinctively. A related approach is to use special symbols to flag points in the scatterplot that have relatively extreme values of concomitant variables.

We illustrate the use of stratification using data from streams of the
western United States, in which we are interested in estimating the
effects of total nitrogen (TN) on total macroinvertebrate richness.
TN and percent substrate sand/fines (SED) are strongly correlated (*r* =
0.65), and so the bivariate relationship we would estimate between TN
and total richness may be confounded by SED. To control for the
effects of SED, we first break the dataset into 6 strata, defined by
SED values (Table 1).

Within each stratum, the strength of correlation between SED and TN is greatly reduced (Table 1), and thus, the potential confounding effects of SED on the estimate effect of TN on total richness is also reduced. One could then use regression analysis within each stratum to estimate the TN-total richness relationship (Figure 4). In this example, the slope of the relationship between TN and total richness is similar across strata (colored lines), but noticeably less steep than the slope estimated using the full data set (black dashed line). This difference suggests that SED does indeed bias the simple estimate of relationship between TN and total richness using the full data set.

If concomitant variables (e.g., other stressors, sources) are strongly correlated with the stressor of interest in the available data, the specific roles of individual stressors may be difficult to evaluate. Alternatively, a group of correlated stressor variables may be combined using some index. For example, concentrations of multiple toxic metals might be combined using a simple concentration addition model or a more mechanistic biotic ligand model. More definite conclusions about the roles of specific stressors might depend on additional types of information. Methods for evaluating associations of stressor variables may be helpful in planning an informative analysis.

#### More information

A useful modification of this stratification approach can be based on propensity scores. Propensity scores combine multiple concomitant variables into a single variable that can be treated in the same way as a single concomitant variable in various approaches to data analysis.

Details regarding statistical approaches for identifying potential confounding variables and for controlling their effects are available here.