CADDIS Volume 4: Data Analysis
Exploratory Data Analysis
 What is EDA?
 Variable
Distributions  Scatterplots
 Correlation
Analysis  Conditional
Probability  Multivariate
Approaches  Mapping Data
 References
Topics in Scatterplots
 Data issues that can be revealed by scatterplots
 How can I use scatterplots in causal analysis?
 More information
Examples
Scatterplots
Scatterplots are graphical displays of matched data plotted with one variable on the horizontal axis and the other variable on the vertical axis. Data are usually plotted with measures of an influential parameter on the horizontal axis (independent variable) and measures of an attribute that may respond to the influential parameter on the vertical axis (dependent variable). Scatterplots are a useful first step in any analysis because they help visualize relationships and identify possible issues (e.g., outliers) that can influence subsequent statistical analyses.
Data issues that can be revealed by scatterplots
Different characteristics of a particular data set are readily apparent from scatterplots. Relationships between variables may be nonlinear (Figure 1a) or the variance about the mean relationship may not be constant (Figure 1b). In both of these cases, simple linear regression may not be appropriate, so identifying these features early can help one select more appropriate analytical techniques. For nonlinear relationships, a different functional form (e.g., quadratic) may be appropriate, and when variances are not constant, one might opt for quantile regression or generalized linear models.
How can I use scatterplots in causal analysis?
Scatterplots are useful primarily for data exploration, but can inform evaluations of stressorresponse from the case. An example of this latter application is available here.
More information

A set of scatter plots showing pairwise relationships between several variables can be conveniently displayed as scatterplot matrix (Figure 2).

One limitation of scatterplots is that one can only examine relationships between two variables. In cases in which many different variables interact, multivariate approaches for exploring data may provide greater insights.