On this page
The AQS Data Mart is a database containing all of the information from the AQS system. The AQS Data Mart was built as a storehouse of air quality information that allows users to make queries of unlimited quantities of data. The main AQS system must maintain constant readiness to accept data, and thus is limited in the number and size of queries it can respond to. The Data Mart has no such limitation, other than the “wall clock” time it takes for a query to run. The Data Mart also includes information from the EPA’s substance and facility registry systems to allow for cross-media integration. Starting in the summer of 2007, it will also contain information from AirNow (the real time air quality reporting system) that participating agencies allow to be shared with the public.
The intended users of the AQS Data Mart are air quality data analysts in the regulatory, academic, and health research communities. It is intended for those who need to download large volumes of detailed technical data stored at EPA and does not provide any interactive analytical tools. If you are interested in summary data or graphical displays, there are other applications available:
Currently the AQS Data Mart has nearly 1.6 billion values – every measured, daily aggregate and annual aggregate value collected and calculated by EPA since January 01, 1980. The spreadsheet AQS_Data_Mart_Contents.xls profiles the number of measured and NAAQS averaging period calculated values based on the parameter and sampling / averaging duration. Below is a complete list of variables available from the Data Mart.
Daily Summary Values (each monitor has the following calculated each day)
- The value as reported to AQS
- The value converted to standard units of measure, where applicable
- NAAQS average values (8- and 24-hour averages)
- Measurement uncertainty, where known
- Regulatory Certification status of data
- Any exceptional events affecting the data
Annual Summary Values (each monitor has the following calculated each year)
- Observation count
- Observation per cent (of expected observations)
- Arithmetic mean of observations
- Max observation and hour of max
- AQI (air quality index)
- Observations > Standard where applicable
- Observation count and per cent
- Valid days
- Required observation count
- Null observation count
- Exceptional values count
- Minimum value
- Arithmetic Mean and Standard Deviation
- Geometric Mean and Standard Deviation
- 1st – 4th maximum (highest) observations
- Percentiles (99, 98, 95, 90, 75, 50)
- Values > Standards
- Observations below half of the method detectable limit (MDL)
- Days > alert level
- Estimated days > Standards
- Missing days assumed < Standards
The data mart also contains the physical descriptions of monitors as the owners have entered in into AQS. This includes the geographic location (latitude, longitude, elevation, and datum), the political entity where the monitor resides (tribal land or state + county), administrative information about the monitor (the agencies responsible for its operation and its schedule), and information related to the monitoring protocol: the substance (parameter) measured, the sampling method, the sampling schedule, the duration of each sample taken, the analysis method, and whether the methods are approved or equivalent reference methods.
The AQS Data Mart has approximately 1.6 billion measurement values (raw measurements, NAAQS averages, daily summary, and annual summary) available for query. There are many millions more facility, monitoring method, and monitoring operations related values available also. Great pains have been made to structure the data and web service queries to function as optimally as possible. However, you have unlimited access to this data and if you are not prudent in selecting reasonable sized queries, they may run for a very long time and affect the operations of the Data Mart.
In the initial phases of release we ask that you limit your queries to the extents listed.
|Values Query Data Type||Geographic Extent||Parameter Extent||Time Extent|
|Raw / NAAQS||National||1 Parameter||1 Year|
|Raw / NAAQS||1 State||1 Parameter||10 Years|
|Daily Summary||National||1 Parameter||10 Years|
|Daily Summary||National||5 Parameters||1 Year|
|Annual Summary||National||1 Parameter||10 Years|
In our benchmark testing, it took about 2 hours to retrieve raw data for 1 parameter, for 1 year, for the nation. This is a rule of thumb you can use to anticipate your query wait times. We will add more detail about performance as we gather more statistics.
The data in the AQS Data Mart has been extensively tested to ensure it matches what it is in AQS. The data in AQS is considered to be of the highest quality. It is submitted by tribal, state, and local agencies and must pass several quality assurance tests before it can be saved. Also, each submitting agency annually certifies that the data they submit is correct. Data from AirNow (not available in the Data Mart yet) is submitted in real time and has not had the vetting and review of the AQS data. However, it must also pass many quality assurance tests designed to filter out bad data.
The certification status and system of origin (AQS or AirNow) of each data value is contained in Data Mart metadata that is available with your queries. It is imperative for the user to understand the meaning of the data elements and methods for averaging, as they are not always straight forward.
If you find data that you think to be in error, please identify it to us by sending an email to firstname.lastname@example.org. Or you may contact the state, local, or tribal government that provided EPA with the data.
The data in the AQS Data Mart is updated nightly, 5 times per week, from the AQS database so it is the latest available. Eventually we will bring in the real-time AirNow data each night as well.
Most data in AQS is required to be submitted by the end of the calendar quarter after the quarter in which it was collected. Some types of data (using non-continuous methods) are allowed an additional calendar quarter to be assembled and reported. However, AQS is updated practically every day as reporting agencies have data ready to submit.
Historical data can change at any time. Many quality assurance review processes are made on an entire year’s worth of data, so it might not be until the middle of this year until the final review and changes have been made to last year’s data by a submitter. Also, historical monitoring or calculation methods may be found to be problematic and require that older data be changed. Finally, there is no “versioning” or freezing of data in the Data Mart, so if other people may need your data exactly as you have it to verify or continue your analysis, you must keep a copy of it.
The air quality data collected by EPA, once relatively simple, has grown into a complex structure of data and concepts as statutes, regulations, technology, and our understanding of health effects have evolved. There are many monitoring networks with different structures and goals that report data. Different pollutants can only be measured using certain methods and this may dictate a certain sample collection time. Health standards have been set that require measurements to be averaged over times different than sample collection times. Monitoring methods have been found to have temperature sensitivity and some of the data in reported is corrected for this and some is not. Learning, understanding, and keeping up to date with changes in the data is a challenge for the analyst. To help with these tasks, the EPA has developed a document that describes all of the data elements related to air quality, shows where they may be found, and tries to offer usage caveats. This document is called the Field Guide to Air Quality Data. (See the Documentation section for more information about this document)
The AQS Data Mart is updated nightly, after every business day (that is Monday – Friday nights). During this process, all data that was loaded into AQS during that day is copied into the Data Mart. Also, all new information in the substance and facility registries is updated. You cannot access the Data Mart to begin new jobs during the update time, however if you have a large job running it will not be terminated but will complete normally. The update begins at 2:00 a.m. eastern time and usually lasts about 40 minutes. The database is also taken off-line each Monday morning at 4:00 a.m. eastern for a backup – the goal it to have it operational again by open-of-business.
EPA has established an RSS Feed for news related to air quality data and the AirData system (under which the Data Mart operates). If you use the Data Mart, we suggest you subscribe to this feed. We post notifications of system enhancements, down times, data changes, etc. The RSS feed is here: http://www.epa.gov/airquality/airdata/rssairdata.xml