Jump to main content or area navigation.

Contact Us

CADDIS Volume 4: Data Analysis

Advanced Analyses

Propensity Score Analysis

Analysis of the role of a given stressor may be inaccurate if confounding variables are not taken into account. One method for developing a deconfounded analysis of observational data uses a type of balancing score called the propensity score.

What is a propensity score?

Propensity score analysis was first proposed by Rosenbaum and Rubin (1983) to infer cause and effect from studies in which experimental treatments cannot be randomly assigned to subjects. In epidemiologic research, a propensity score is an estimate of the probability that a subject would have received a ‘treatment’ (e.g., exposure to second-hand smoke), given information about his or her background. When two subjects have the same background but differ in their levels of exposure, they may be considered to have been randomly assigned to a treatment group (exposed) or control group (not exposed). Early applications tended to focus on biomedical and epidemiological issues, but applications are presently common in many disciplines.

Top of page

How are propensity scores used in causal analysis?

The original methodology dealt with a binary independent variable, e.g. comparing "treated" patients to "control" patients. In ecological causal analysis, we typically have continuous stressors rather than discrete treatment groups. In the example below, we apply stratified propensity score analysis (Rosenbaum and Rubin 1984, Imai and Van Dyk 2004), a generalization of the propensity score for continuous treatments. With this approach, the propensity score is estimated as the predicted level of a stressor, given the values of the measured covariates at a site. When used as a stratifying variable, the propensity score creates strata in which covariates are approximately independent of the stressor of interest. Regression analysis can then be used within each stratum to evaluate the effect of the stressor of interest (X) on the biological response variable (Y) without confounding by the covariates. This approach simplifies the task of minimizing multiple sources of covariation in observational data, for causal analysis.

In the example below, we estimate the effect of total nitrogen on macroinvertebrate taxon richness in streams in the western United States, from the U.S. EPA's western EMAP dataset. We use propensity scores to control for five potentially confounding covariates: catchment area, sediment, agricultural land use, annual precipitation, and chloride.

Example: excess nutrients

There are three steps in a stratified propensity score analysis:

  1. Estimating the propensity score
  2. Stratifying the dataset by the propensity score
  3. Evaluating the stressor-response relationship within strata

The first step, estimating the propensity score, produces the metric that will be used to stratify the dataset. Stratifying a dataset on a single variable (e.g., elevation, stream size) prior to analysis is a common technique for control of covariates, familiar to most ecologists. However, defining strata becomes increasingly difficult as the number of covariates grows. Strata of reasonable size are inevitably somewhat heterogeneous with regard to some covariates. Stratification by propensity score solves this problem via a balancing approach, in which a single metric (the propensity score) combines the effects of multiple original covariates.

In our example the stressor of interest (total nitrogen) is a continuous variable. We will calculate a propensity score for each site by modeling log-transformed total nitrogen as a linear function of the five covariates:

    ps = E[log(TN)] = b0 + b1 log(Area) + b2 Ag + b3 Sed + b4 log(Precip) + b5 log(Cl)        [Eq. 1]


observed and predicted TN
Figure 1. Scatterplot of observed and predicted total nitrogen. Predicted TN is the propensity score for each sample, estimated in step 1. The vertical lines are quartiles calculated in step 2. The 4 strata defined by the propensity scores are labeled (A,B,C,D)
ps = the propensity score; E[log(TN)] = the modeled expected value of log total nitrogen; Area = catchment area; Ag = the percentage of agricultural land use in the catchment; Precip = annual precipitation; Cl = chloride concentration; b0 is the fitted intercept and b1 through b5 are fitted regression coefficients.

From [Eq. 1] it can be seen that the propensity score is the predicted level of total nitrogen at a site given the values of the covariates. For a given value of the propensity score (predicted TN), the actual value of TN will vary (Figure 1).

In the second step, stratification, the dataset is stratified on the propensity score. Sites within a stratum are more similar in the values of the covariates and predicted nitrogen levels when compared with sites across the full dataset. The number of strata typically ranges from 4 to 8. Rosenbaum and Rubin (1984) state that five strata are sufficient for eliminating bias in most datasets [but see Lunceford and Davidian (2004)]. In this example we split the dataset into four strata (quartiles shown in Figure 1), each containing approximately 240 observations.

Within strata, the correlations of total nitrogen with the 5 measured covariates are significantly reduced (Table 1), an illustration of the balancing concept. It follows that a stressor response evaluation for TN is less subject to confounding, when restricted to a single stratum.

original correlation
Figure 2. Scatterplot of chloride and total nitrogen in the original (unstratified) dataset. r=0.65
stratum 1 correlation
Figure 3. Scatterplot of chloride and total nitrogen in the stratified dataset. Clockwise from top left: r=0.11, r=0.11, r=0.15, r=0.10
Table 1. Correlation of total nitrogen with 5 covariates before and after propensity score stratification
Before: After stratification:
Covariate r r (min, max)
%Agriculture 0.61 (0.06, 0.39)
Sediment 0.64 (0.07, 0.13)
log(Area) 0.51 (0.01, 0.09)
log(Precip) -0.51 (0.04, 0.27)
log(Chloride) 0.65 (0.10, 0.15)

To visualize this reduction in correlation strength, compare the correlation of nitrogen and one covariate, chloride, before and after stratification (Figures 2 and 3).

The appropriate number of strata depends on the dataset. As the number of strata increases, the number of observations in each stratum decreases, reducing the sample size and statistical inferential power for analyzing the stressor response relationship stratum by stratum. However, as the number of strata increases, propensity scores within each stratum span a narrower range of values, and the covariates effects will be minimized to a greater degree.

regression within strata
Figure 4. The effect of total nitrogen on macroinvertebrate richness varies across the 4 strata (A,B,C,D) defined by the propensity scores, shown in Figure 1. However, these results indicate that the reduction of macroinvertebrate richness may be causally related to increasing total nitrogen, after controlling for the confounding effects of watershed area, sediment, agricultural land use, precipitation, and chloride. The reader is reminded that statistical significance (p-value) may or may not indicate biological effects large enough in magnitude to be of practical concern. For more information on p-values, see the CADDIS page on interpreting statistics.

In the third and final step, evaluating the stressor-response relationship within strata, we use regression analysis to estimate the effects of total nitrogen on macroinvertebrate richness by fitting linear regressions within each stratum (Figure 4). By using propensity scores to stratify the dataset, the bias introduced by covariates is effectively reduced, and observed effects within each stratum can be more confidently attributed to the stressor of interest. Nonparametric regressions within strata could be helpful for an exploratory approach to evaluating stressor-response relationships. In other applications, a further step would be to estimate the population effect by pooling stratum effects.

Top of page

More information

Stratification by propensity scores yields a simple scatterplot modification that can account for multiple covariates. With appropriate data, propensity score analysis is a method for evaluating general trends in the causal relationships between specific environmental stressors and observed biological responses. The most serious drawback of the method is that it only controls for measured covariates. For covariates that are unmeasured, but possibly important, some effort should be made, aided by a causal diagram, to evaluate possible effects on the conclusions. Other drawbacks of the method are that it requires a large number of sites, with intensive sampling of both biotic and environmental variables, and modest programming skills. Stratification by individual covariates may sometimes reveal biologically interesting interactions, in the form of a dependence of the stressor-response variable on a covariate.

Technical details for confounding, balancing, and propensity scores are available here. The nutrient analysis example on this page is based on Yuan (2010).

Top of page

Jump to main content.