Virtual Beach (VB)

This model is being distributed, maintained and actively supported by EPA.

Virtual Beach is a software package designed for developing site-specific statistical models for the prediction of pathogen indicator levels at recreational beaches.

On this page:

Audience
Abstract
Applications and Possible Uses
Software History
Technical Support and Training
Quality Assurance and Quality Control

Virtual Beach Releases

Version	Release Date
Release Notes
3.0.7	February 2019
3.0.6	February 2016
3.0.4	December 2014
2.4.3	September 2013
2.3	July 2012

Audience

VB is primarily designed for beach managers responsible for making decisions regarding beach closures due to pathogen contamination. However, researchers, scientists, engineers, and students interested in studying relationships between water quality indicators and ambient environmental conditions will find VB useful.

Abstract

Virtual Beach version 3.0.4 (VB_3.0.4) has been added to this website. Virtual Beach facilitates the development of statistical models of pathogen indicator levels at recreational beaches. VB_3.0.4 reads input data from a text or Excel file, assists the user in preparing the data for statistical analysis, and provides three analytical techniques for model development: multiple linear regression (MLR), partial least squares regression (PLS), and a gradient boosting machine (GBM). With an integrated mapping component to determine the geographic orientation of the beach, the software can automatically decompose wind/current speed and direction into along-shore and onshore/offshore components. VB_3.0.4 can produce new variables from sets of variables in the input file (e.g., means, minimums, maximums, differences, sums, products), and it can test an array of transformations on the independent variables to maximize the linearity of the relationship between the response and those independent variables. In the MLR module, automated censoring of models with a high degree of multi colinearity occurs during the selection process. The PLS and GBM modules institute 5-fold cross-validation during model development to avoid over specification. The prediction module of VB_3.0.4 has a direct link to the USGS EnDDaT system to automatically retrieve data for beach sites in the Great Lakes region.

Applications and Possible Uses

Most common usage of VB will be to generate statistical models for the prediction of pathogen indicator levels for freshwater/saltwater beach sites. Analyses have been performed at these locations:

Predicting E. coli levels at Huntington Beach, OH (2000-2010).
Predicting enterococci levels (culturable and qPCR) at various Great Lakes' beaches: West Beach, Porter, IN; Washington Park, Michigan City, IN; Silver Beach, St. Joseph, MI; Huntington Beach, Bay Village, OH; South Shore, Milwaukee, WI.
Predicting enterococci levels (culturable and qPCR) at various marine beaches: Goddard Beach, West Warwick, RI; Edgewater Beach, Biloxi, MS; Fairhope Beach, Mobile, AL; Hobie Beach, Miami, FL; La Monserratte, Puerto Rico; Boqueron Beach, Puerto Rico, Surfside Beach, Myrtle Beach, SC.

Software History

VB₃ is a direct descendant of VB₂ (the most recent release of this version is VB_2.4.3). The original Virtual Beach Model Builder application (VB₁) was developed by Walter Frick and Zhongfu Ge at the USEPA in Athens, Ga. VB₁ can be characterized as a linear regression model-building tool that supports a primarily manual analysis of data sets via visual inspection of data plots and manipulation of variables (e.g., transformations, creating interaction terms), followed by an iterative process of testing, comparing and evaluating models. The fitness of developed models is computed and tracked, allowing for comparison and eventual selection of a “best” model for the dataset under consideration. This model can then produce estimates of pathogen indicator levels using current or forecasted environmental data from the site.

VB₂ enhanced the functionality of its predecessor, performing similar functions (visual inspection of univariate data plots, manual transformations of individual variables, MLR model building, prediction, etc.), but also automated and extended functionality in several ways:

The Map component provided users with information on the location and availability of local data sources through the map interface. These sources include the USGS National Water Information System (NWIS), the National Climatic Data Center (NCDC), and the U.S. EPA STORET database (STORET). These sources provide recently collected and/or forecasted data for generating predictions by a chosen MLR model.
The Map component provided a convenient method for defining beach orientation by overlaying the beach on current shore-line layers (satellite images, Google Maps, MS Virtual Earth, etc). Given this orientation, VB₂ could calculate wind, wave, or current components (the A-component is parallel to shore and the O-component is perpendicular to shore), which can be important predictor variables.
Although manual processing and analysis of imported data (visual inspection of univariate data plots and the transformations/interactions of variables) was retained, the data processing component of VB₂ provided automated generation of all possible 2^nd order interaction terms amongst a set of IVs, formation of more complex functions of multiple columns, and automated testing of a suite of variable transformations for improved model linearity. This functionality increased the number of models to evaluate during later selection routines and removes the burden/difficulty of manual assessment placed on users of VB₁.
Within the linear regression analysis component, multi colinearity amongst predictor variables was handled automatically. Any model containing an IV with a high degree of correlation with other IVs (as measured by a large Variance Inflation Factor [VIF]) was removed from consideration during model selection.
During MLR model selection, models were ranked by a user-selected evaluation criterion. Possible criteria include R², Adjusted R², Akaike Information Criterion (AIC), Corrected AIC, Predicted Error Sum of Squares (PRESS), Bayes Information Criterion (BIC), Accuracy, Sensitivity, Specificity, or the model’s Root Mean Square Error (RMSE). Regardless of which criterion is chosen, the software records the ten best models in terms of that criterion. In comparison, VB₁ had only a single comparative criterion, Mallow’s Cp.
As the number of IVs in a dataset increases, possible MLR models increase exponentially (considering transforms/interactions), resulting in trillions of possible models from a modest number (12-13) of IVs. VB₂ implemented a Genetic Algorithm (GA) that effectively and efficiently searched for the best possible MLR model. Alternatively, VB₂ users could perform an exhaustive calculation in which all possible combinations of IVs were tested if the number of possible models was reasonably small (< 500,000 or so). Both the GA and exhaustive approaches greatly expanded the model-building capabilities of VB₂ compared to VB₁.
Users no longer had to enter data values in transformed, interacted, or component-decomposed form to make a prediction with a chosen MLR model. On the VB₂ MLR Prediction tab, a user-selected model is coded into an input grid with data entry columns matching the main effects of the model. Any mathematical manipulation of these IVs is then automatically performed prior to making predictions.

VB₃ primarily builds onto VB₂ by adding additional statistical methods to give users more flexibility in modeling their datasets. In addition to MLR, users can now use Partial Least Squares (PLS) regression and a Gradient Boosting Machine (GBM) in order to fit their data and make predictions. The re-designed software architecture (using DotSpatial libraries) can now easily accommodate future expansions of the suite of modeling tools. The Prediction tab of VB₃ also allows direct interaction with the USGS’s data acquisition system, EnDDaT for automated dataset construction and ease of FIB prediction from web-accessible data.

Technical Support and Training

Contact Mike Cyterski (cyterski.mike@epa.gov) for questions regarding the Virtual Beach application and its supporting software and documents.

The Virtual Beach 3.0.6 User Guide is currently available. As training materials (including video tutorials) become available, they will be posted here.

Quality Assurance and Quality Control

VB₂ and VB₃ have undergone quality assurance testing to ensure their computations are consistent with other statistical packages (R and SAS), and the user’s manuals for each version are internally reviewed.