Skip to main content
U.S. flag

An official website of the United States government

Here’s how you know

Dot gov

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

HTTPS

Secure .gov websites use HTTPS
A lock (LockA locked padlock) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

    • Environmental Topics
    • Air
    • Bed Bugs
    • Cancer
    • Chemicals, Toxics, and Pesticide
    • Emergency Response
    • Environmental Information by Location
    • Health
    • Land, Waste, and Cleanup
    • Lead
    • Mold
    • Radon
    • Research
    • Science Topics
    • Water Topics
    • A-Z Topic Index
    • Laws & Regulations
    • By Business Sector
    • By Topic
    • Compliance
    • Enforcement
    • Laws and Executive Orders
    • Regulations
    • Report a Violation
    • Environmental Violations
    • Fraud, Waste or Abuse
    • About EPA
    • Our Mission and What We Do
    • Headquarters Offices
    • Regional Offices
    • Labs and Research Centers
    • Planning, Budget, and Results
    • Organization Chart
    • EPA History

Breadcrumb

  1. Home
  2. Causal Analysis/Diagnosis Decision Information System (CADDIS)

Using R for Non-Parametric Regression

  • Introduction
  • Using Taxon-Environment Relationships
  • Estimating Taxon-Environment Relationships
  • Computing Inferences
  • R Scripts

How to Fit Non-Parametric Regressions

Helpful Links
Topics In R Scripts
  • Overview
  • Download R Scripts and Sample Data
  • Loading Data
  • Central Tendencies
  • Environmental Limits
  • Parametric Regressions
  • Non-Parametric Regressions
  • Significance Tests
  • Area Under the ROC Curve
  • Curve Shape
  • Weighted Average Inference
  • Estimate Taxon-Environment Relationships Using taxon.env()
Other Pages And Websites
  • Non-Parametric Regressions

PECBO Appendix Site Map

Non-parametric regressions (see Non-Parametric Regression page, Equation 8) can be computed with a set of commands similar to those of parametric regressions (see the Parametric Regressions page in the Helpful Links Box). In this case, generalized additive models (GAM) are used to fit nonparametric curves to the data.

First, install the GAM library into R. Type at the R prompt:

install.packages("gam")

You will then need to select a mirror site from the provided list, and the package should install automatically.

Next, make sure that you have loaded the sample biological and environmental data and merged them into a single data frame called dfmerge (see the Download Scripts and Sample Data page in the Helpful Links Box).

Also make sure that you have selected the taxa for which you wish to calculate environmental limits and saved them in the vector taxa.names (see the description on the Central Tendencies page in the Helpful Links Box).

# Load GAM library
library(gam)
modlist.gam <- as.list(rep(NA, times = length(taxa.names)))
for (i in 1:length(taxa.names)) {
  # Create a logical vector is true if taxon is
  #   present and false if taxon is absent.
  resp <- dfmerge[, taxa.names[i]] > 0

  # Fit the regression model, specifying two degrees of freedom
  # to the curve fit.
  modlist.gam[[i]] <- gam(resp ~ s(temp, df = 2), data = dfmerge,
              family = "binomial")

  print(summary(modlist.gam[[i]]))
}

To plot model results (similar to those shown in Figure 7 in the Non-Parametric Regression page) run the following script. 

# Specify 3 plots per page
par(mfrow = c(1,3), pty = "s")
for (i in 1:length(taxa.names)) {

  # Compute mean predicted probability of occurrence
  # and standard errors about this predicted probability.
  predres <- predict(modlist.gam[[i]], type= "link", se.fit = T)

  # Compute approximate upper and lower 90% confidence limits
  up.bound.link <- predres$fit + 1.65*predres$se.fit
  low.bound.link <- predres$fit - 1.65*predres$se.fit
  mean.resp.link <- predres$fit

  # Convert from logit transformed values to probability.
  up.bound <- exp(up.bound.link)/(1+exp(up.bound.link))
  low.bound <- exp(low.bound.link)/(1+exp(low.bound.link))
  mean.resp <- exp(mean.resp.link)/(1+exp(mean.resp.link))

  # Sort the environmental variable.
  iord <- order(dfmerge$temp)

  # Define bins to summarize observational data as
  # probabilities of occurrence
  nbin <- 20

  # Define bin boundaries so each bin has approximately the same
  # number of observations.
  cutp <- quantile(dfmerge$temp,
                   probs = seq(from = 0, to = 1, length = 20))

  # Compute the midpoint of each bin
  cutm <- 0.5*(cutp[-1] + cutp[-nbin])

  # Assign a factor to each bin
  cutf <- cut(dfmerge$temp, cutp, include.lowest = T)

  # Compute the mean of the presence/absence data within each bin.
  vals <- tapply(dfmerge[, taxa.names[i]] > 0, cutf, mean)

  # Plot binned observational data as symbols.
  plot(cutm, vals, xlab = "Temperature",
       ylab = "Probability of occurrence", ylim = c(0,1),
       main = taxa.names[i])
  # Plot mean fit as a solid line
  lines(dfmerge$temp[iord], mean.resp[iord])
 
  # Plot confidence limits as dotted lines.
  lines(dfmerge$temp[iord], up.bound[iord], lty = "dotted")
  lines(dfmerge$temp[iord], low.bound[iord], lty = "dotted")
}

Causal Analysis/Diagnosis Decision Information System (CADDIS)

  • CADDIS Home
    • About CADDIS
    • Frequent Questions
    • Publications
    • Recent Additions
    • Related Links
    • CADDIS Glossary
  • Volume 1: Stressor Identification
    • About Causal Assessment
    • Getting Started
    • Step 1. Define the Case
    • Step 2. List Candidate Causes
    • Step 3. Evaluate Data from the Case
    • Step 4. Evaluate Data from Elsewhere
    • Step 5. Identify Probable Causes
  • Volume 2: Sources, Stressors and Responses
    • About Sources
      • Urbanization
    • About Stressors
  • Volume 3: Examples and Applications
    • Analytical Examples
    • Worksheet Examples
    • State Examples
    • Case Studies
    • Galleries
  • Volume 4: Data Analysis
    • Selecting an Analysis Approach
    • Getting Started
    • Basic Principles & Issues
    • Exploratory Data Analysis
    • Basic Analyses
    • Advanced Analyses
    • PECBO Appendix
    • Download Software
    • Data Analysis Topics (A -Z)
  • Volume 5: Causal Databases
    • Learn about CADLink
Contact Us about CADDIS
Contact Us to ask a question, provide feedback, or report a problem.
Last updated on February 7, 2025
  • Assistance
  • Spanish
  • Arabic
  • Chinese (simplified)
  • Chinese (traditional)
  • French
  • Haitian Creole
  • Korean
  • Portuguese
  • Russian
  • Tagalog
  • Vietnamese
United States Environmental Protection Agency

Discover.

  • Accessibility Statement
  • Budget & Performance
  • Contracting
  • EPA www Web Snapshot
  • Grants
  • No FEAR Act Data
  • Plain Writing
  • Privacy
  • Privacy and Security Notice

Connect.

  • Data
  • Inspector General
  • Jobs
  • Newsroom
  • Regulations.gov
  • Subscribe
  • USA.gov
  • White House

Ask.

  • Contact EPA
  • EPA Disclaimers
  • Hotlines
  • FOIA Requests
  • Frequent Questions
  • Site Feedback

Follow.