EPA Open Data Metadata Editor Guidance

The EPA Open Data Metadata Editor tool allows data stewards to author, save, and submit metadata records for non-geospatial data that are compliant with EPA's Metadata Technical Specification. Extramural researchers may use the tool without logging in, but the metadata submission will be subject to approval by your EPA sponsor.

This Guide provides recommendations and instructions for using EPA’s Open Data Metadata Editor to create non-geospatial metadata conforming to EPA’s Metadata Technical Specification.

For additional assistance, please contact the EDC Team (edc@epa.gov).

How this Style Guide is Organized

For convenience, this Style Guide has been organized to include a “Quick Navigation” section which lists (in table form) each of the EPA Metadata Fields found in the EPA Open Metadata Editor. You may use this table to quickly navigate to guidance for each field. Or, you may reference the guidance directly within the EPA Open Metadata Editor.

Quick Navigation - Required fields for Extramural metadata submission

*Not Required for EPA

Field Name	Description	Required?
EPA Agreement	Enter the grant number, contract title, cooperative agreement, interagency agreement or other relationship under which you performed the research associated with this dataset.	Mandatory if Applicable
EPA Contact Email	Enter the email address of your EPA grant manager, sponsor, or other point of contact who can verify your relationship with the EPA.	Mandatory if Applicable

Quick Navigation - Core EPA Metadata Fields

Field Name	Description	Required?
Title	Human-readable name of the asset. Should be in plain English and include sufficient detail to facilitate search and discovery.	Always
Description	Human-readable description (e.g., an abstract) with sufficient detail to enable a user to quickly understand whether the asset is of interest.	Always
Place Keywords	One or more geographic feature names describing the range of spatial applicability of the dataset.	Always
ISO Keywords	The ISO 19115 Topic Category is a general categorization of data resources that is intended to provide data classification consistency across agencies.	Always
EPA Keywords	EPA requires the use of keywords from several different standard thesauruses; please also include terms that would be used by technical and non-technical users.	Always
General Keywords	EPA requires the use of keywords from several different standard thesauruses; please also include terms that would be used by technical and non-technical users.	Always
Publishing Organization	The title of the organization responsible for publishing the dataset.	Always
Publishing Individual	The name of a contact responsible for the dataset	Always
Publisher Email	The email address where questions about this dataset should be sent.	Always
Distribution	One or more URLs providing access to the dataset	Always
Identifier	This element is a unique identifier for the metadata record. It may be a UUID or DOI.	Always
Access Level	The degree to which this dataset could be made publicly-available, regardless of whether it has been made available. Choices: public (Data asset is or could be made publicly available to all without restrictions), restricted public (Data asset is available under certain use restrictions), or non-public (Data asset is not available to members of the public).	Always
Rights	An explanation for the selected “accessLevel” including instructions for how to access a restricted file, if applicable, or explanation for why a “non-public” or “restricted public” data asset is not “public,” if applicable. Text, 255 characters.	Required if Access Level is not public
Data License	The URL to a web page describing the data license governing the use of this dataset.	Always
Temporal Extent	The range of temporal applicability of a dataset (i.e., a start and end date of applicability for the data).	Always
Last Update	Most recent date on which the dataset was changed, updated or modified, or for continually updated data, the frequency with which the data are updated.	Always
Update Frequency	The frequency with which dataset is published or updated.	Optional
Release Date	Date of formal issuance	Optional
Language	The language of the metadata document	Required for geospatial data
Data Quality	Whether the dataset meets an organization's Information Quality Guidelines (true/false).	Optional
Conforms To	URL used to identify a standardized specification the dataset conforms to.	Optional
Described By	URL for the JSON Schema file that defines the schema used.	Optional
Landing Page	Homepage URL	Optional
References	Related documents such as technical information about a dataset, developer documentation, related publications, etc.	Optional

List of Fields found in the EPA Open Data Metadata Editor:

For each field, instructions and suggestions for appropriate field entries are provided. Many fields are designed with pre-populated values that may be selected from a dropdown menu while others are free text.

EPA Agreement
Guidance: Please enter the grant number, contract title, cooperative agreement, interagency agreement or other relationship under which you performed the research associated with this dataset. If you are not an EPA Grantee, please describe the nature of your affiliation with the EPA.

EPA Contact Email
Guidance: Please enter the grant number, contract title, cooperative agreement, interagency agreement or other relationship under which you performed the research associated with this dataset. If you are not an EPA Grantee, please describe the nature of your affiliation with the EPA.

Title
Guidance: Titles should be succinct yet descriptive including the topic, and where relevant, temporal info, geography and related programs in a way that distinguishes it from other, similar resources/
Suggested Text: {Subject, Geographic Extent, Relevant Time Period, Data Owner/Provider, Office/Region/Research Lab of Data Owner}
Examples:

Toxics Release Inventory (TRI) Locations, Oklahoma, 2012, EPA OEI, EPA REG 06 WQPD
Tribal Lands, Idaho, 2000, Bureau of Indian Affairs, EPA REG 10
Potomac River Basin Boundary, Chesapeake Bay Program

Description (Abstract)
Guidance: Descriptions should be used to provide a brief summary of the resource.
Suggested Text: This geospatial dataset contains {describe the dataset layer or layers' general content and features, geographic coverage, time period of content, and any special data characteristics or limitations}.
Examples:

This raster GIS dataset contains 100-meter-resolution cells depicting mean surface salinity (parts per thousand) in the Chesapeake Bay and its tidal tributaries during Fall season. Salinity was measured annually from 1985 to 2006.
This geospatial dataset contains point and polygon layers that depict National Register of Historic Places locations in the US compiled by the National Park Service in 2015. Layers include cultural resource sites, buildings, districts, objects, and structures.

Place Keywords
Guidance: Inclusion of one or more place names is required for minimum EPA compliancy which supports discovery of assets by location.
Expected value: Plain Text
Example: "United States"

ISO Keywords
Guidance: Inclusion of one or more ISO keywords is required for minimum EPA compliancy which supports inter-Agency sharing initiatives. The ISO 19115 Topic Category is a general categorization of data resources that is intended to provide data classification consistency across agencies. Select one or more of the basic ISO keywords for a general classification of your data or application.
Expected value: Plain Text
Example: "environment" Select one or more ISO 19115 topic categories from the checklist that correspond to the dataset being described.

EPA Keywords
Guidance: This field is not required for minimum EPA compliancy but is recommended for EPA personnel and contractors in order to promote metadata consistency across the Agency. Select one or more EPA keywords (listed below) to describe your data resource where applicable. EPA Keywords should be used to describe EPA-produced assets. The provided list of keywords represents a snapshot of EPA's Web Taxonomy from some years ago. Efforts are underway to provide guidance and tools that allow EPA metadata stewards to take advantage of the full complement of EPA's Terminology Services, Taxonomies and Controlled Vocabularies.
Expected value: Plain Text

General Key Words
Guidance: A minimum of three keywords is required. Any helpful and descriptive keywords may be chosen, but they should enhance the other keywords selected. Multi-word terms are acceptable as keywords.
Expected value: Plain Text
Example: vegetables, veggies, greens, leafy, spinach, kale, nutrition

Publishing Organization
Guidance: This element should contain the full standardized name of the organization responsible for publishing the dataset. If EPA is not the publishing entity, this element may be free text, but should still contain the name of the publishing entity (the metadata record will not be contributed to data.gov as an EPA record).
Expected value: Plain Text, All EPA Organizations should begin U.S. EPA or U.S. Environmental Protection Agency.
Example: U.S. EPA Office Of Mission Support (OMS) - Office Of Information Collection (OIC)"

Publishing Individual
Guidance: This element should contain the full name of a person responsible for the publication of a dataset. An actual human being is preferred, but if a helpdesk is responsible for providing support to dataset users, the name of the helpdesk team is acceptable.
Expected value: Full Name
Example: "Jane Doe"

Publisher Email
Guidance: This element should contain the email address of a person responsible for the publication of a dataset. An actual human being is preferred, but if a helpdesk is responsible for providing support to dataset users, the email address of the helpdesk team is acceptable.
Expected value: Email address
Example: "doe.jane@agency.gov"

Distribution
Guidance: Within a dataset, distribution is used to aggregate the metadata specific to a dataset’s resources (Access URL and Download URL), which may be described using the following fields. Ideally, each individual file, API, or other key resource that comprises the data asset should have a separate distribution entry, but if the number of endpoints is impractically high, it is acceptable to instead provide a link to an overview or entry page. Each distribution should contain one URL (Access or Download). A Download URL should always be accompanied by Media Type. An Access URL may also be used to provide a link to a journal article or other publication integral to the data asset.

Title: Mandatory human-readable name of the specific file or resource available at the URL as distinct from the overall data asset.
Description: Optional human-readable description of the distribution.
URL Type: An Access URL provides indirect access to a dataset, for example via API or a graphical interface, while a Download URL provides direct access to a downloadable file of a dataset. If an Access URL is provided, specify whether it is an API or some alternative like a web page, request form, journal article, or query tool. if a download URL is provided, also specify media type: The machine-readable file format (IANA Media Type or MIME Type) of the distribution’s downloadURL, which usually matches the file extension.
Data Standard: URI used to identify a standardized specification the distribution conforms to.
Data Dictionary: URL to the data dictionary for the distribution found at the Download URL.
Data Dictionary Type: The machine-readable file format (IANA Media Type or MIME Type) of the distribution’s Data Dictionary.

File Identifier
Guidance: This element must contain a unique identifier for the metadata record. This identifier may be a UUID string or a DOI URL. The unique identifier provides the link between the metadata associated with a resource (e.g., feature class) in your local system to the metadata uploaded and published with the EDG. If a computer (or person) knows the unique identifierof a metadata record published with the EDG, then a generic URL + unique identifier can be used to view that record (i.e., https://edg.epa.gov/metadata/catalog/search/resource/details.page?uuid={0CA7A9AE-68DB-4F92-8566-2003C8BF41AB}). Many applications leverage generic URL + unique identifier. For example, when services are published, users will place the EDG unique identifier URL inside the MXD for each data layer. This way you can get access to the metadata via a REST endpoint (or web application) for a given data resource.
Expected value: Plain Text; UUID or DOI
Example: "8B7D82BE-B130-44B8-99FB-192973508BE8"

Access Level
Guidance: This element is mandatory per Project Open Data requirements, and guidelines for non-geospatial records follow the POD specification. Since geospatial metadata standards do not have the same three-choice domain as the POD specification, geospatial records are assumed to be public if they contain no legal, security, or general constraints, and are considered restricted public or non-public if they do have security constraints listed: 'non-public' when security constraints code is sensitive, confidential, secret, or topSecret, 'restricted public' when security constraints code is restricted or official use only, and 'public' when the security constraint code is unclassified.
Expected value: Plain Text; "public", "restricted public", "non-public"
Example: "public"

Rights
Guidance: If a dataset has an access level of public, this field can be populated with general informational free text about accessing the dataset (note that the limit is 255 characters - EPA's legacy disclaimer is longer than this and may cause the record to be rejected from data.gov). Please include disclaimers in the 'General Use Limitations' field or at URLs. If the dataset is not public, the value must be chosen from an approved list of Controlled Unclassified Information (CUI) categories describing why the dataset may not be made public.
Expected value: Plain Text; "EPA Category: Mission Sensitive, NARA Category: Critical Infrastructure
EPA Category: Drinking Water Vulnerability Assessments, NARA Category: Critical Infrastructure-Water Assessments
EPA Category: Sensitive Drinking Water Related, NARA Category: Critical Infrastructure-Water Assessments
EPA Category: IT Security, NARA Category: Information Systems Vulnerability Information
EPA Category: Law Enforcement Sensitive, NARA Category: Law Enforcement
EPA Category: Attorney Client Privilege, NARA Category: Legal-Privilege
EPA Category: Attorney Work Product, NARA Category: Legal-Privilege
EPA Category: Deliberative Process Privilege, NARA Category: Legal-Privilege
EPA Category: Personally Identifiable Information (PII), NARA Category: Privacy
EPA Category: Proprietary, NARA Category: Proprietary
EPA Category: Confidential Business Information, NARA Category: Proprietary-Manufacturer
EPA Category: Source Selection Information, NARA Category: Proprietary-Source Selection"
Example: "EPA Category: Mission Sensitive, NARA Category: Critical Infrastructure"

Data License
Guidance: The license or non-license (i.e. Public Domain) status with which the dataset or API has been published.
Non-EPA publishers are strongly encouraged to select a common open data/public domain license (https://creativecommons.org/publicdomain/zero/1.0/). An explanation of the importance of specifying a license with recommendations for open licenses are available here: https://project-open-data.cio.gov/open-licenses/. A Creative Commons license is the least restrictive option.
Expected value: Plain Text; URL
Example: https://edg.epa.gov/EPA_Data_License.html or https://creativecommons.org/publicdomain/zero/1.0/

Temporal Extent
Guidance: This field should contain an interval of time defined by start and end dates for the applicability of the data - i.e. the period of time the data represent in ground truth or during which the data were collected and compiled, as distinct from when the data were or will be published. If data are updated on an ongoing basis, leave this element blank and use Update Frequency instead.
Expected value: ISO 8601 Date Dates should be formatted as pairs of {start datetime/end datetime} in the ISO 8601 format. ISO 8601 specifies that datetimes can be formatted in a number of ways, including a simple four-digit year (eg. 2013) to a much more specific YYYY-MM-DDTHH:MM:SSZ, where the T specifies a seperator between the date and time and time is expressed in 24 hour notation in the UTC (Zulu) time zone. (e.g., 2011-02-14T12:00:00Z/2013-07-04T19:34:00Z). Use a solidus ("/") to separate start and end times.
Example: 2000-01-15/2010-01-15

Last Update
Guidance: This element is the date when the data resource was published or otherwise made available for release. Dates should be in the format YYYY-MM-DD. If a dataset is updated very frequently or continously, specify a recurring interval in the Frequency element and leave this field blank and the recurring interval will be reported in place of a date. Otherwise specify a date.
Expected value: Plain Text
Example: "2012-01-15"

Update Frequency
Guidance: The frequency with which changes and additions are made to the data resource after the initial data resource is completed. If the data are static and will not be updated, this element may be left blank.
Expected value: ISO 8601 Interval or ISO Code
Example: weekly or R/P1W

Release Data
Guidance: The date the data was or will be published or otherwise made available, if it differs from when the last update was performed.
Example: 2001-01-15

Language
Guidance: Include as many languages as are appropriate for the dataset.
Expected value: Language Code
Example: English

Data Quality
Guidance: Indicates whether a dataset conforms to the publishing organization's information quality guidelines. This would only be set to false if the data are being published in a raw or 'as-is' basis and have not been reviewed. While not usually a best practice, there are scenarios where publishing raw data does serve the public interest.
Expected value: Must be a boolean value of true or false (not contained within quote marks).
Example: true

Conforms To
Guidance: This is used to specify a data dictionary or schema that defines fields or column headings in the dataset. If this is a machine readable file, it is recommended to be specified at the distribution level. At the dataset level it is assumed to be a human readable HTML webpage or PDF document. Documentation that is not specifically a data dictionary belongs in "references"
Example: https://project-open-data.cio.gov/v1.1/schema/catalog.json

Described By
Guidance: This is used to specify a data dictionary or schema that defines fields or column headings in the dataset. If this is a machine readable file, it is recommended to be specified at the distribution level. At the dataset level it is assumed to be a human readable HTML webpage or PDF document. Documentation that is not specifically a data dictionary belongs in "references"
Example: https://project-open-data.cio.gov/v1.1/schema/catalog.json

Landing Page
Guidance: This field is not intended for an agency's homepage (e.g. www.agency.gov), but rather if a dataset has a human-friendly hub or landing page that users can be directed to for all resources tied to the dataset.
Example: http://www.agency.gov/vegetables

References
Guidance: Include as many references as applicable, but also consider using Distribution for references where more context can be provided alongside each URL.
Expected value: Array of strings (URLs)
Example: http://www.agency.gov/legumes/legumes_data_documentation.html