Computational Toxicology Research Program
How to use DSSTox
DSSTox databases are compilations and reformulations of public databases that are made freely available on this website for any public use. The DSSTox project has placed considerable emphasis, however, on implementing data and documentation standards that are intended to encourage consistency in the use and reporting of such data. This not only creates common public expectations and understanding of these data, but also facilitates study reproducibility and greater community awareness and improvement of these data. The topic areas below provide further general guidelines for use of DSSTox files.
Downloading DSSTox database files
Merging DSSTox database files
Updating a DSSTox database
Using DSSTox file names
Referencing and citation of DSSTox databases
Reporting chemistry with DSSTox databases
A DSSTox NAMEID (e.g., EPAFHM, NCTRER) is associated with a set of DSSTox SDF data and documentation files. The main SDF file is the most complete reproduction of the original Source data file, with the tested form of the chemicals provided in the STRUCTURE field, if known (e.g., parent, salt, or complex), and mixtures and unknowns without a structure also included. The Field Definition and Log files provide essential documentation for all the NAMEID SDF data files and should be downloaded at the same time as the SDF data file(s). A user should consider these to be primary reference documents for the DSSTox SDF files. For more information on the nature of these files, see Templates & Sample Files and More on SDF.
A feature added to the site (7March2005) is the ability to access all files from EPA's Public FTP site. See FTP Download Instructions page for more information.
Upon download, a user can merge or incorporate DSSTox SDF files into a larger central database. The DSSTox SDF file name is lost upon merging files; however, the DSSTox_FileID field retains the information of the file of origin in each record of the original SDF. The original record order of an SDF file may also be lost upon merging and resorting records in a larger database according to different criteria (e.g., if all records are relisted alphabetically by chemical name, or by increasing CAS number). However, using the DSSTox_FileID field, a merged record can aways be traced back to the original DSSTox SDF file and record.
If DSSTox SDF files are merged into an individual or corporate database, it is still possible to easily retrieve all records corresponding to a particular DSSTox SDF file by using the DSSTox_FileID field. A global search for all records containing the same FileName portion of the DSSTox_FileID entry allows one to easily segregate and delete all records associated with a DSSTox SDF file. If a new version or revision of that file has been posted on the DSSTox website, it becomes a simple matter to then download and incorporate a new version of an SDF file into a merged individual or corporate database. Depending on the nature of the file modifications in a new DSSTox SDF version, it may also be advisable to replace the FieldDefFile and LogFile documentation files with updated versions.
All DSSTox files follow a standard naming convention that communicates some information pertaining to content (NAMEID, FieldDefFile, etc.), date of file creation, and, in the case of SDF files, file size (#records). See More on DSSTox File Names. It is strongly recommended that these file names be retained in their original form by the user to allow for easy correspondence to appropriate documentation, and for subsequent reporting and updating.
Listed on the main information page (i.e. Source SDF Download Page) for each DSSTox database are the Main Citation(s) and the DSSTox Citation for that database. The Main Citation (of Citations) is the original Source publication that serves as the primary literature reference for the DSSTox NAMEID database. Any use of DSSTox data files that involves significant utilization of the content of those files and results in literature publication should cite the Main Citation as the primary source of the original data. The DSSTox Citation refers more specifically to the DSSTox reformulation of the Source database and the associated DSSTox documentation files, and should be used as a primary or secondary reference when the DSSTox reformulation played a significant role in the investigation. Equally important is the reporting of the full DSSTox database file name in any publication involving modeling or use of that DSSTox data file. This enables anyone to locate the exact data file used in the model development or study. If a user-modified form of the DSSTox database was used in the study, the nature of the modifications also should be reported. Examples of each type of citation for the DSSTox DBPCAN database are shown below:
Main Citation: Woo, Y.T., D. Lai, J.L. McLain, M.K. Manibusan, and V. Dellarco (2002) Use of mechanism-based structure-activity relationships analysis in carcinogenic potential ranking for drinking water disinfection by-products, Environ Health Perspect,110 Suppl 1: 75-87.
DSSTox Citation: Woo, Y.T., C.R. Williams, N. Fields, and A.M. Richard (2003) DSSTox EPA Water Disinfection By-Products with Carcinogenicity Estimates Database (DBPCAN): SDF Files and Documentation, Updated version DBPCAN_v4a_209_15Jun2007, www.epa.gov/ncct/dsstox/
DSSTox File Name: DBPCAN_v4a_209_15Jun2007
DSSTox database file names and standard chemical fields have been chosen to annotate public toxicity data in useful ways from the standpoint of more transparently reporting some relevant chemistry. For instance, the STRUCTURE_ChemicalType and TestSubstance_Description fields convey important information about the nature of the actual chemical tested (e.g., parent, salt, complex) which should be specified in any modeling study. The ChemicalNote field contains potentially important details concerning the structure (e.g., tautomers, stereochemistry, replicates).