Computational Toxicology Research Program
DSSTox Master File Information
The DSSTox Master File consists of a series of relationally linked data tables that serve to consolidate, manage, and ensure quality and uniformity of the chemical and substance information spanning all DSSTox Structure Data Files, including those in development (see Work in Progress) but not yet published separately on this website. The DSSTox Master File contains only the chemical-content, or structure-index file portion of DSSTox files, i.e. the DSSTox Standard Chemical Fields common to all DSSTox structure data files. Details of DSSTox Master File organization and use are provided on this page. The DSSTox Master File is continually being updated with new and corrected content (see Chemical Information Quality Review Procedures), and due to its increasing size and complexity is no longer offered as a single SDF for download. (August 2007)
All DSSTox Master File content has been incorporated into an Microsoft Access relational database consisting of a series of linked data tables as shown in the figure below. Each of these 4 types of data tables is indexed by a different DSSTox ID that serves as the primary key and unique identifier for that table and its associated content. Also shown in this figure is the relationship between the various ID tables in terms of 1 to 1 [1 à1] or many to 1
The DSSTox_FileID field provides a unique record identifier across all published DSSTox Data Files, past and present, since it includes specification of DSSTox Data File version. It provides precise 1:1 mapping of the most current DSSTox data file version contents in the DSSTox Master File inventory to the record identifier, DSSTox_RID. If past DSSTox Data File versions are considered, this ID field is the most specific and, hence, most populated ID field in the DSSTox Master File inventory, past and present, i.e., each and every record in each and every DSSTox Data File, past and present, has a different DSSTox_FileID value. Due to this specificity to file version, the field also provides unambiguous linkage to a particular DSSTox Data File version's Source Content fields and is the most appropriate ID field for reporting errors. For discussion of how the DSSTox_FileID is used to initially establish and maintain linkages between DSSTox Standard Chemical Fields and Source Content fields during DSSTox Data File construction, refer to Chemical Information Quality Review Procedures: Flowchart and linked discussion.
A sample DSSTox_FileID entry for record number 15 of the file DBPCAN_v4a_209_15Jun2007 is:
The DSSTox_FileID field is useful for sorting records within a DSSTox Data File, during and after file manipulations. At the final stage of DSSTox Data File construction, after DSSTox Standard Chemical Fields and Source Content fields are merged prior to final SDF construction (see Flowchart), the contents of the DSSTox_FileID are regenerated to ensure an uninterrupted (1:n) sequential numbering of records within the Data File. Depending on whether Data File records were added or deleted, this numbering may or may not precisely correspond to the same chemical substance file record (DSSTox_RID) as in a previous or future Data File version, e.g.:
15_DBPCAN_v4a could change to 14_DBPCAN_v5a if a record were deleted in DBPCAN_v5a, whereas the DSSTox_RID would remain the same.
Note: This field has been modified from the previous DSSTox_ID_FileName (June 2007), and the field contents containing a truncated version of the file name (without spaces) to better serve current needs.
The DSSTox_RID field is the primary, unique record identifier for the current DSSTox Master File inventory. Each record in currently published DSSTox Data Files is assigned a unique DSSTox_RID integer value, regardless of the record's Structure or Test Substance content. A DSSTox_RID value has no intrinsic meaning other than for indexing all records in all DSSTox Data Files, so any numerical value can be assigned (numbering was started at 20000). If DSSTox Data File records are deleted for any reason from the DSSTox Master File inventory, the particular DSSTox_RID is "retired" from further use, i.e. a DSSTox_RID once used is never reassigned to a different Data File, Structure, or Test Substance. Unlike the DSSTox_FileID, however, if a DSSTox Data File record's test substance and structure characteristics do not change in a substantive way from one file version to another, the DSSTox_RID remains the same across file versions.
Note: This field is designed to be useful for "registering" the full published DSSTox Data File inventory in any external chemical database application. In particular, the DSSTox_RID values will be used to register all current published DSSTox Data File inventory in PubChem and will map 1:1 to the PubChem-assigned Source-specific Substance ID (PubChem SIDs assigned to the EPA DSSTox "Source").
The DSSTox Generic Substance ID, or DSSTox_Generic_SID field, provides a unique identifier for Generic Test Substances and TestSubstance content-related fields across the DSSTox Master File inventory. The term "Generic" is used to denote a test substance distinction at the level of general compound substance characteristics, including, e.g., salt or complex form, stereochemical specificity (if provided), mixture characteristics, or purity grade. The DSSTox_Generic_SID does not, however, distinguish among substances to the level of experimental sample, e.g., lot, batch, plate position, or manufacturer. A DSSTox_Generic_SID most closely coincides with the level of chemical substance distinction provided by a CAS Registry Number (as in TestSubstance_CASRN), although CAS do not generally distinguish to the level of purity grade. CAS are also less suitable for a general substance identifier, in our view, given that they are tied to a commercial registry system (CAS SciFinder ), are occasionally "retired" or replaced, and are not available for all substances.
The DSSTox_Generic_SID table in the DSSTox Master File indexes and stores all associated TestSubstance content-related fields, i.e. DSSTox Standard Chemical Fields pertaining to test substance that "travel" with the DSSTox_Generic_SID into new DSSTox Data File construction (see Use of the Master File in Chemical Quality Review Procedures). The DSSTox_Generic_SID has no intrinsic meaning other than for indexing test substances across all DSSTox Data Files, so any numerical value can be assigned (numbering was started at 20000). If Test Substance records are deleted from DSSTox Data Files, the information on the Generic Test Substance is retained in the DSSTox Master File inventory for possible future use.
Note: This field is designed to provide a "look-across" capability for common test substances in the DSSTox Data File inventory, grouping toxicological data that are most appropriately compared at the level of common test substance such as implemented in the DSSTox Structure-Browser. DSSTox_Generic_SID has no equivalent ID mapping in PubChem and potentially multiple DSSTox_Generic_SIDs with somewhat different substance characteristics can map to a single representative chemical structure, DSSTox_CID, or PubChem chemical ID (CID).
The DSSTox Chemical ID (DSSTox_CID) provides a unique identifier for the STRUCTURE and STRUCTURE content-related fields in the DSSTox Master File inventory. The DSSTox_CID table in the DSSTox Master File indexes and stores all associated STRUCTURE content-related fields, i.e. DSSTox Standard Chemical Fields pertaining to Structure that "travel" with the DSSTox_CID into new DSSTox Data File construction (see Use of the Master File in Chemical Quality Review Procedures). The DSSTox_CID has no intrinsic meaning other than for indexing structure across all DSSTox Data Files, so any numerical value can be assigned (numbering was started at 1). If Structure records are deleted from DSSTox Data Files, the information pertaining to Structure is retained in the DSSTox Master File inventory for possible future use.
Note: This field is designed to provide a "look-across" capability for common structures in the DSSTox Data File inventory, such as is provided by the structure-searching capability in the DSSTox Structure-Browser. DSSTox_CID has a 1:1 mapping to PubChem chemical ID (CID). One DSSTox_CID, PubChem CID, or structure potentially can represent and map to multiple DSSTox_Generic_SIDs having somewhat different substance characteristics.