Jump to main content or area navigation.

Contact Us

CADDIS Volume 4: Data Analysis

Exploratory Data Analysis

Correlation Analysis

Authors: G.W. Suter II, P. Shaw-Allen, S.M. Cormier

Correlation Analysis

Correlation analysis is a method for measuring the covariance of two random variables in a matched data set. Covariance is usually expressed as the correlation coefficient of two variables X and Y. The correlation coefficient is a unitless number that varies from -1 to +1. The magnitude of the correlation coefficient is the standardized degree of association between X and Y. The sign is the direction of the association, which can be positive or negative.

Pearson's product-moment correlation coefficient, r, measures the degree of linear association between two variables. Spearman's rank-order correlation coefficient (ρ) uses the ranks of the data, and can provide a more robust estimate of the degree to which two variables are associated. Kendall's tau (τ) has the same underlying assumptions as Spearman's (ρ), but represents the probability that the two variables are ordered nonrandomly.

A value of r, ρ, or τ is interpreted as follows:

  • A coefficient of 0 indicates that the variables are not related (Figure 1, left).
  • A negative coefficient indicates that as one variable increases, the other decreases (Figure 1, center).
  • A positive coefficient indicates that as one variable increases the other also increases (Figure 1, right).
  • Larger absolute values of coefficients indicate stronger associations (e.g., Figure 1, right and center). However, small Pearson coefficients may be due to a nonlinear relationship (Figure 2).

Data sets with
   strong and weak correlation
Figure 1.   Examples of different correlations between two variables, X and Y.
    Left:     r = -0.04. The points are diffusely scattered, indicating no association of X and Y.
    Center: r = -0.37. The plot indicates a weak negative association in which Y decreases as X increases.
    Right:   r =  0.86. The scatterplot indicates a linear increase in Y with increasing values of X.

Examples of different behaviors of Pearson's and Spearman's correlations are shown in Figure 2. Pearson's r does not accurately represent the strength of the non-linear association in Figure 2 (left plot). Pearson's r and Spearmans ρ provide different estimates of correlation depending upon the distribution of the data (Figure 2, right plot).

Non-linear correlation
Figure 2.   Left plot: Non-linear association measured by r and ρ. Right plot:  Linear association with outliers.

How do I calculate correlations?

A tool for calculating correlations is available in CADStat, and most any spreadsheet or statistical software will compute them as well.

How do I use correlation in causal analysis?

Correlation analysis is used primarily as a data exploration technique to reveal the degree of association in a set of matched data. This information can inform subsequent analyses of relationships between variables. In particular, correlation can indicate possible factors that confound a relationship of interest. In most data, pairwise correlations may not provide enough insights, and multivariate exploratory analyses are recommended.

Top of Page


Jump to main content.