Data Validation
Note: EPA no longer updates this information, but it may be useful as a reference or resource.
Introduction: Why is data
validation important?
Objectives
Definitions
Example Quality Control Flags
AIRS Null Data Reason Codes
Data Validation Procedures
and Results
Examples of Problems
Encountered in Databases (and Validation Actions)
Summary
References
Definition
- "The purpose of data validation is to detect and then verify any data values that may not represent actual air quality conditions at the sampling station. Effective data validation procedures usually are handled completely independently from the procedures of initial data collection. Moreover, it is advisable that the individuals responsible for data validation not be directly involved with data collection." (U.S. EPA, 1984, Sec. 2.0.3, p.10)
Why is Data Validation Important?
- Data validation is necessary to identify data with errors, biases, and physically unrealistic values before they are used for identification of exceedances, for analysis, or for modeling.
[Workbook Table of Contents] [Top
of Data Validation] [Previous
Section] [Next Section]
[Workbook Table of Contents] [Top of Data Validation] [Previous Section] [Next Section]
Figure 1

[Workbook Table of Contents] [Top
of Data Validation] [Previous
Section] [Next Section]
Outliers
Data physically, spatially, or temporally inconsistent.
Level 0 Data Validation
Conversion of instrument output voltages to their scaled scientific units using nominal calibrations. May incorporate data logger inserted flags.
Level 1 Data Validation
Observations have received quantitative and qualitative reviews for accuracy, completeness, and internal consistency. Final audit reviews required.
Level 2 Data Validation
Measurements are compared for external consistency against other independent data sets (e.g., comparing surface ozone concentrations with ozone concentrations from nearby aircraft flights, intercomparing rawinsonde and radar profiler winds, etc.).
Level 3 Data Validation
Continuing evaluation of the data as part of the data interpretation process.
[Workbook Table
of Contents] [Top of Data Validation]
[Previous Section] [Next
Section]
| Flag | Description | Explanation |
|---|---|---|
| 0 | Valid | Observations judged accurate within the performance limits of the instruments. |
| 1 | Estimated | Observations required additional processing because original values were suspect, invalid, or missing. |
| 7 | Suspect | Values judged to be in error because they violate reasonable physical criteria or do not exhibit reasonable consistency, but a specific cause of the problem is not identified. |
| 8 | Invalid | Values judged to be inaccurate or in error, known cause of the inaccuracy or error. |
| 9 | Missing | Observations not collected. Values assigned -999. |
[Workbook Table of Contents] [Top
of Data Validation] [Previous
Section] [Next Section]
|
CODE |
DESCRIPTION |
|---|---|
|
9973 |
SAMPLE TIME OUT OF LIMITS |
|
9974 |
SAMPLE FLOW RATE OUT OF LIMITS |
|
9975 |
INSUFFICIENT DATA (CAN'T CALCULATE) |
|
9976 |
FILTER DAMAGE |
|
9977 |
FILTER LEAK |
|
9978 |
VOIDED BY OPERATOR |
|
9979 |
MISCELLANEOUS VOID |
|
9980 |
MACHINE MALFUNCTION |
|
9981 |
BAD WEATHER |
|
9982 |
VANDALISM |
|
9983 |
COLLECTION ERROR |
|
9984 |
LAB ERROR |
|
9985 |
POOR QUALITY ASSURANCE RESULTS |
|
9986 |
CALIBRATION |
|
9987 |
MONITORING WAIVED |
|
9988 |
POWER FAILURE (POWR) |
|
9989 |
WILDLIFE DAMAGE |
|
9990 |
PRECISION CHECK (PREC) |
|
9991 |
Q C CONTROL POINTS (ZERO/SPAN) |
|
9992 |
Q C AUDIT (AUDT) |
|
9993 |
MAINTENANCE/ROUTINE REPAIRS |
|
9994 |
UNABLE TO REACH SITE |
|
9995 |
MULTI-POINT CALIBRATION |
|
9996 |
AUTO CALIBRATION |
Source: AIRS User's Guide, Volume III: AIRS Codes and Values, 1989.
[Workbook Table of Contents] [Top
of Data Validation] [Previous
Section] [Next Section]
- Assemble Level I database.
- Place data in a common data format with descriptive information concerning variables, validation level, QC codes, and standard units.
- Ensure that results of and suggestions from final audit reports have been incorporated into the database.
- Review simple statistics for unrealistic maxima or minima and for consistency with nearby stations (still Level I)
- Perform spatial and temporal comparisons of the data (begin Level II).
- Perform intercomparisons of the data (e.g., from two different instruments). Data now Level III.
[Workbook Table of Contents] [Top
of Data Validation] [Previous
Section] [Next Section]
- Air quality data reported during calibration runs. For example, ozone data with values of 0 ppb reported when instruments are known to be automatically calibrated.
- Nitrogen oxides data found to have a constant offset based on comparisons of NOx to NO+NO2. Data were adjusted.
- Some data which were physically consistent and thus passed statistical checks, but were spatially inconsistent. For example, calm winds observed at a site when all nearby sites measured strong winds; calm winds changed to suspect.
- Surface pressure at stations not reduced to sea level. Adjustments made to reduce pressure to sea level.
- Propane contamination from tank in the sampler shelter (an RV) at one site. Propane concentrations invalid.
- Cold trap failure on auto-GC identified with scatter plot of ethane to benzene. Species below C4 invalidated, species group totals and total NMHC invalidated.
- Ozonesonde surface measurements consistently lower than measurements from collocated surface monitor. Adjusted for measurement bias.
- Comparisons of rawinsonde, radar profiler, and sodar wind speeds and wind directions.
- Ozone concentrations from one instrument in an aircraft found to be significantly biased low.
- Winds computed with an inertial navigation system were noisy.
- Ground clutter, migrating birds, and precipitation affected radar profiler measurements.
EXAMPLES
OF PROBLEMS ENCOUNTERED IN
DATABASES (AND VALIDATION ACTIONS)
[Workbook Table of Contents] [Top of Data Validation] [Previous Section] [Next Section]
Figure 2

Example of identification of suspect data values from the Northeast (NESCAUM, 1993). The ozone concentration of 139 ppb reported at Cape Elizabeth on May 26, 1992 at 4:00 a.m. appears erroneous when viewed in a spatial and temporal context.
[Workbook Table of Contents] [Top of Data Validation] [Previous Section] [Next Section]
Figure 3


Examples of identification of suspect data values from the Northeast (NESCAUM, 1993). At the top, two values are anomalously high when inspected both temporally and spatially. At the bottom, reported isolated low values were probably the result of misplaced decimal points.
[Workbook Table of Contents] [Top of Data Validation] [Previous Section] [Next Section]
Figure 4

[Workbook Table of Contents] [Top of Data Validation] [Previous Section] [Next Section]
Figure 5
Example of problems encountered with AIRS data. The figure shows that there were an abnormally large number of zero ozone concentrations at a site during a few years possibly indicating monitor or reporting problems. (Level 0, AIRS data)
[Workbook Table of Contents] [Top of Data Validation] [Previous Section] [Next Section]
Figure 6

Plot of surface winds on June 27, 1991 at 1900 CDT. The calm wind at Bloomington, Illinois was identified as suspect (SUS) during the data validation process. (Roberts et al., 1994)
[Workbook Table of Contents] [Top of Data Validation] [Previous Section] [Next Section]
Figure 7

Example of questionable data identified during the data validation: (a) constant wind directions measured at Cocodrie, Louisiana from July 31 - August 2, 1993 and (b) high surface winds at a surface station in Grand Isle, Louisiana on August 29, 1993 at 0800 CST (SAI et al., 1995).
[Workbook Table of Contents] [Top of Data Validation] [Previous Section] [Next Section]
- Thorough data validation is critical prior to use of the data.
| Analysis/Procedure | Objectives |
|---|---|
| Level 0 Data Validation | Convert instrument output to scaled scientific units. |
| Level 1 Data Validation | Review for accuracy, completeness, and internal consistency. |
| Level 2 Data Validation | Review/compare for external consistency against other independent data sets. |
| Level 3 Data Validation
|
Ongoing evaluation as part of data interpretation process. |
Tools and methods include:
Spreadsheets, statistical software, Voyager, VOCDat, LapG; time series, scatter, and spatial plots; correlations among species.
[Workbook Table of Contents] [Top
of Data Validation] [Previous
Section] [Next Section]
LADCO (1995) Lake Michigan Ozone Study. 1994 data analysis report, version 1.1. Report prepared by Lake Michigan Air Directors Consortium, Des Plaines, IL, May.
NESCAUM (1993) 1992 regional ozone concentrations in the northeastern United States. Report prepared by the Ambient Monitoring and Assessment Committee and the Data Management Committee of the Northeast States for Coordinated Air Use Management, Boston, MA.
NESCAUM (1995) Preview of the 1994 ozone precursor concentrations in the northeastern U.S. 5/1/94 draft report prepared by the Ambient Monitoring and Assessment Committee of the Northeast States for Coordinated Air Use Management, Boston, MA.
Roberts P.T., Dye T.S., Korc M.E., and Main H.H. (1994) Air quality data analysis for the 1991 Lake Michigan Ozone Study. Final report prepared for Lake Michigan Air Directors Consortium, Des Plaines, IL by Sonoma Technology, Inc., Santa Rosa, CA, STI-92022-1410-FR, September.
Stoeckenius T.E., Ligocki M.P., Shepard S.B., and Iwamiya R.K. (1994a) Analysis of PAMS data: application to summer 1993 Houston and Baton Rouge data. Draft report prepared by Systems Applications International, San Rafael, CA, SYSAPP94-94/115d, November.
Stoeckenius T.E., Ligocki M.P., Cohen B.L., Rosenbaum A.S., and Douglas S.G. (1994b) Recommendations for analysis of PAMS data. Final report prepared by Systems Applications International, San Rafael, CA, SYSAPP94-94/011r1, February.
Systems Applications International, Sonoma Technology Inc., Earth Tech, and Alpine Geophysics (1995) Gulf of Mexico Air Quality Study. Vol 1: Summary of data analysis and modeling. Final report prepared for U.S. Department of the Interior, Minerals Management Service, Gulf of Mexico OCS Region, New Orleans, LA, OCS Study, MMS 95-0038.
U.S. Environmental Protection Agency (1984) Quality assurance handbook for air pollution measurement systems, volume ii: ambient air specific methods (interim edition), EPA/600/R-94/0386, April.U.S. Environmental Protection Agency (1989) AIRS user's guide volume iii: AIRS codes and values. Office of Air Quality Planning & Standards Technical Support Division, Researc Triangle Park, NC, June.
[Workbook Table of Contents] [Top
of Data Validation] [Previous
Section] [Next Section]
![[logo] US EPA](http://www.epa.gov/epafiles/images/logo_epaseal.gif)