Jump to main content or area navigation.

Contact Us

Computational Toxicology Research Program

Standard Chemical Field Definition Table

The table below is a detailed reference document for the definition and use of the DSSTox Standard Chemical Fields in DSSTox Structure Data SDF files. See More on DSSTox Standard Chemical Fields for additional notes pertaining to the purpose and use of these fields in DSSTox data files and applications.

Abbreviated versions of the following field definitions are included in each DSSTox SDF Field Definition File, and are also listed in the DSSTox Central Field Definition Table.

Field Name
Allowable Values
Description
Comments

STRUCTURE

 

 

 

Return to the list above.Return to Top

Molecule represented as molfile

2D (or 3D) "mol" file coordinates for defined molecular structure.

STRUCTURE_Shown field relates content of STRUCTURE field to actual tested substance and TestSubstance_... fields. STRUCTURE field directly corresponds to, and is used to generate the content of the remaining STRUCTURE_... fields.

STRUCTURE field entry is blank only when no reasonable or representative 2D structure can be provided, as in some cases when TestSubstance_Description entry is "mixtureor formulation" or "unspecified or multiple forms".

Chemical structure shown may be a single molecular entity, or a salt or complexed molecular species. Structures are obtained from a variety of public databases and sources and are verified to be consistent with CASRN numbers, SMILES, and chemical names whenever possible. Details of file construction and structure data review are provided in the LogFile for each DSSTox database, available for viewing and download from the DSSTox Source SDF Download Page. See also DSSTox Chemical Information Quality Review Procedures

STRUCTURE field directly corresponds to, and is used to generate the content of each of the following fields:
DSSTox_CID
STRUCTURE_Formula
STRUCTURE_MolecularWeight
STRUCTURE_ChemicalType
STRUCTURE_TestedForm_DefinedOrganic
STRUCTURE_SMILES
STRUCTURE_ChemicalName_IUPAC
STRUCTURE_Parent_SMILES
STRUCTURE_InChI

DSSTox_CID is unique chemical identification number with a 1:1 correspondence to the contents of the STRUCTURE and STRUCTURE-content fields. The same DSSTox_CID is assigned to all instances of the same STRUCTURE for all DSSTox files.

Given that STRUCTURE field entry may be a simplified, idealized, or representative version of the substance actually tested or studied, the STRUCTURE_Shown field serves to relate the information shown in the STRUCTURE field to what was actually tested (i.e., to the TestSubstance_... fields). If available, additional information on the test substance, mixture components, or purity of defined mixtures is provided in the ChemicalNote field.

If STRUCTURE_Shown entry contains the modifier "simplified to parent", it signifies that a "salt" or "complex" is represented in its desalted, neutral or protonated form, i.e. without counter ions or complexed chemical entities, in the STRUCTURE field. An exception is quaternary ammonium ions, which are represented in their positively charged state but without counter ions when in "simplified to parent" form.

SDF format supports display of stereochemistry with triangular bonds (solid and hashed) and cis/trans orientations of double bonds in most SDF viewing applications. SDF format can also support full 3D structures and coordinates. A number of commercial applications support batch conversion of 2D to 3D conversion. See More on SDF

Original DSSTox field name: Structure (modified August 2005).

DSSTox_RID

 

#

(integer)

DSSTox Record ID (RID) is number uniquely assigned to each DSSTox record across all DSSTox files, regardless of Test Substance characteristics or STRUCTURE field content, i.e. no two DSSTox records share a DSSTox_RID. It is used to centrally manage DSSTox data file information and to register DSSTox data file records in PubChem.

 

This field was added to aid in the central management of DSSTox Standard Chemical Fields. It is used to provide a unique record identifier across DSSTox data files. When a record is deleted from the DSSTox Master inventory, the DSSTox_RID is retired from further use.

In all cases, there will be a 1:1 correspondence between the DSSTox_RID and the PubChem Substance ID (SID) assigned to all DSSTox substances. See http://pubchem.ncbi.nlm.nih.gov/ exit EPA and Searching DSSTox Files in PubChem.

For more information on DSSTox IDs, see DSSTox Master File Information page.

DSSTox Standard Chemical Field added - June 2007.

DSSTox_CID

 

#

(integer)

DSSTox Chemical ID number uniquely assigned to a particular STRUCTURE and "STRUCTURE-content" fields across all DSSTox databases (see More on DSSTox Standard Chemical Fields). Different CID numbers will be assigned if two STRUCTURE records are substantively different, e.g., different chemical, salt or complex form, or stereochemical isomer.

DSSTox records with the same CID number will share the contents of all DSSTox STRUCTURE-content fields, except for STRUCTURE_Shown, which depends on the relationship to the TestSubstance_Description.

This field was added to aid in the central management of DSSTox Standard Chemical Fields. It is used to ensure consistency of chemical and structural information across DSSTox data files and for the population of new DSSTox databases. DSSTox_CID also can be used to locate structure duplicates throughout DSSTox data files.

In most cases, there is a 1:1 correspondence to the PubChem Chemical ID (CID) based on the ChemID Plus identifiers. See http://pubchem.ncbi.nlm.nih.gov/ exit EPA and Searching DSSTox Files in PubChem. [Note: in rare cases, a stereochemical distinction is made in the DSSTox_CID but not in the ChemID Plus ID.]

For more information on DSSTox IDs, see DSSTox Master File Information page.

DSSTox Standard Chemical Field added - August 2005.

DSSTox_Generic
_SID

Return to the list above.Return to Top

#

(integer)

Records with the same DSSTox_Generic_SID (Generic Substance ID) will share all DSSTox Standard Chemical Fields, including STRUCTURE. Field distinguishes at the level of "Test Substance" across all DSSTox data files, most often corresponding to the level of CASRN distinction, but not always.

Different DSSTox_Generic_SID numbers will be assigned to the same STRUCTURE record if, e.g., the TestSubstance_Description differs in the data record, i.e. one is "single chemical compound", the other is "mixture or formulation", or in cases where explicit information on Test Substance grade or purity is available (e.g., technical grade, etc). DSSTox_Generic_SID does not, however, distinguish DSSTox test substance records that differ in experimental settings only by lot/batch/plate location, etc.

This field was added to aid in the central management of DSSTox structures and Standard Chemical Fields, and to provide look-across capability for common Test Substances across DSSTox files. It is used to ensure consistency of chemical information across DSSTox data files and for the population of new DSSTox data files. Given it's non-unique nature, this SID field is no longer being used as the SID for PubChem submissions (DSSTox_RID is used for this purpose).

For more information on DSSTox IDs, see DSSTox Master File Information page.

Reformulated DSSTox Standard Chemical Field - June 2007.

DSSTox_FileID

# (integer) Text

Sequential ID number is assigned to each record in data file, with values ranging from 1 to n, where n=total # of records in the data file. ID number is followed by an underscore and then the abbreviated DSSTox SDF standard file name with version, e.g., 1_CPDBAS_v4a.

Field entry provides a unique record identifier for every DSSTox data record and is updated whenever a new version or revision of DSSTox SDF data file is generated.

A numerical counter field coupled with the DSSTox file name of the file containing the record of interest provides for unique record identification and location, with a 1:1 correspondence to DSSTox_RID. Whereas, DSSTox_FileID changes with DSSTox file version updates,, the DSSTox_RID remains the same.

Field can be used for error reporting and sorting, as well as for referencing, and file updating and replacement when DSSTox data files are consolidated into centralized databases. See More on DSSTox FileNames

Modified from DSSTox_FileName_ID (June 2007).

STRUCTURE_
Formula

Text

Empirical formula of displayed STRUCTURE.

blank if STRUCTURE_Shown entry is "no structure"

Field entry is automatically generated from the STRUCTURE field entry using commercial software (e.g., CambridgeSoft ChemFinder or ACD Labs ChemFolder).

Original DSSTox field name: Formula (modified August 2005).

STRUCTURE_
MolecularWeight

Return to the list above.Return to Top

#

Molecular weight or molar mass (atomic mass units) of displayed STRUCTURE .

blank if STRUCTURE_Shown entry is "no structure"

Molecular weight field entry is automatically generated from the STRUCTURE field entry using commercial software (e.g., CambridgeSoft ChemFinder or ACD Labs ChemFolder).

Original DSSTox field name: MolWeight (modified August 2005).

STRUCTURE_
ChemicalType

Return to the list above.Return to Top

defined organic/

inorganic/

organometallic/

no structure/

Nature of chemical displayed in STRUCTURE field:

"defined organic" = defined chemical structure containing carbon but not organometallic, i.e. containing no metal or metalloid atom other than simple salt alkali (I) or alkaline earth (II) metals (Na, K, Mg, Ca, etc.);

"inorganic" = defined chemical structure containing no carbon;

"organometallic" = operationally defined as a chemical structure containing carbon and any metal or metalloid atom other than alkali (I) or alkaline earth (II) metals that occur in simple salts;

"no structure" indicates STRUCTURE field is blank; only used when TestSubstance_Description = "undefined mixture" , "unspecified or multiple forms" , or "macromolecule".

Inferred directly from STRUCTURE field entry.

Allowable entries are operational definitions that delineate chemical content of the DSSTox SDF to facilitate relational searching, data segregation, and future analysis and use.

Structures classified as "inorganic" or "organometallic" are generally more difficult to characterize and incorporate into structure-activity modeling. Hence, simple operational definitions are employed that allow for easy identification and segregation of these compounds. A well-accepted, more general meaning of the term "organometallic" is employed here that includes compounds containing both carbon and a metal or metalloid, but not necessarily containing an explicit carbon-metal bond.

 

Original DSSTox field name: SubstanceType (modified August 2005).

STRUCTURE_
TestedForm_
DefinedOrganic
(no spaces)

Return to the list above.Return to Top

parent/

salt,
complex/

, Na, K, HCl, Cl, H2O, Ca, H2SO4, acetate, bis, etc.

Tested form of chemical displayed in STRUCTURE field only for STRUCTURE_ChemicalType = "defined organic".

Operational definitions of allowable entries as follows:

"parent" = single defined organic chemical entity, without counter ions or complexed chemical entities;

"salt" = simple ionic salts of defined organics with alkali (I) or alkaline earth (II) metal (e.g., Na, K, Mg, Ca) or halide (e.g., Cl) counter ions;

"complex" = any defined organic with associated acid, base, or hydrate.

blank if STRUCTURE_Shown entry is "no structure" or if STRUCTURE_ChemicalType entry is other than "defined organic".

Following the field entry "salt" or "complex", the counter ion or complexed chemical moiety is listed in abbreviated form, e.g.:
Na, K, HCl, Cl, H2O, Ca, H2SO4, acetate, etc.;"bis" signifies parent structure occurs twice in complex, etc.

Field applies exclusively to the subset of STRUCTURE_ChemicalType = "defined organic". Organometallics and inorganics are often difficult to clearly label salt or complex due to the more complicated coordinated binding patterns of metals and metalloids and, hence, are not labeled in this field (i.e., field entry is left blank). Field is intended for use in segregating data records for more focused SAR model study since "salt" and "complex" structures often must be further processed, i.e. desalted or simplified to parent form. If a corresponding desalted DSSTox data file is created, this field also retains potentially important and relevant information on the original tested form of the chemical and associated counter ions and complexed species.

Chemicals classified as "salt" or "complex" are represented as dissociated, multiple chemical entities in the STRUCTURE field unless the STRUCTURE_Shown field includes the modifier "simplified to parent" in which case the parent form of the salt or complex is portrayed in the STRUCTURE field. It is possible for two different salt or complex forms to share the same "simplified to parent" or "parent" structure; in these cases where the modifier "simplified to parent" occurs, these are labeled "replicate parent" in the ChemicalNote field.

Original DSSTox field names combined: TestedForm and AddToParent (modified August 2005).

STRUCTURE_
Shown

Return to the list above.Return to Top

tested chemical/

active ingredient in formulation/

representative isomer in mixture/

representative component in mixture/

monomer of polymer/

general form of chemical/

no structure/

, simplified to parent

Identifies relationship of the graphical structure displayed in the STRUCTURE field to the actual tested chemical substance :

"tested chemical" - structure displayed is the actual form of the chemical tested;

"active ingredient in formulation" - the tested form of the chemical substance was a mixture or formulation and only the active ingredient is displayed in the STRUCTURE field;

"representative isomer in mixture" - the structure shown is one isomer in a test substance consisting of a mixture of isomers (e.g., cis, trans, Z, E);

"representative component in mixture" - the structure shown is a major component in a test substance consisting of a mixture of distinct chemical substances;

"monomer of polymer" - the structure shown is a small repeating subunit of a polymer or macromolecule;

"general form of chemical" - chemical record contains toxicity data fields summarized from multiple experiments, where either multiple tested forms (e.g., salts or complexes) of the chemical were evaluated, or where the tested form of the chemical is not specified or ambiguous;

"no structure" - when no reasonable or representative structure can be provided, as when TestSubstance_Description entry is "mixture or formulation" or "unspecified or multiple forms".

", simplified to parent" - for desalted files, only occurs as a comma-separated modifier to another field entry; used when STRUCTURE_ChemicalType="defined organic" and STRUCTURE_TestedForm_DefinedOrganic ="salt" or "complex"; and signifies that STRUCTURE is being represented in its desalted, neutral or protonated forms, without counter ions or complexed chemical entities. An exception is quaternary ammonium ions, which are represented in positively charged state with salt counter ion removed.

Intended to provide a linkage between information in the STRUCTURE_... fields and information in the TestSubstance_... fields, relating what is shown in the STRUCTURE field, where the displayed structure may be a simplified, idealized, or representative version of the test substance, to what was actually tested in the experiments summarized in the DSSTox data fields.

In a major departure from earlier DSSTox treatment (prior to August 2005), many more representative structures for defined mixtures (i.e., where chemical components of the mixtures are known) are being added to DSSTox databases. This information allows a DSSTox record corresponding to a defined mixture to be located by chemical relational structure searching. A number of entries in the STRUCTURE_Shown field are primarily intended to help clarify this STRUCTURE information and its relationship to the actual tested substance.

For databases using the field entry "active ingredient of formulation" (e.g., pesticides or pharmaceuticals), the original TestSubstance_ChemicalName and TestSubstance_CASRN provided by the Source may correspond to either the formulation or to the active ingredient.

The field entry "general form of chemical" is used for databases reporting summary toxicity results, i.e. where test results were combined into a single summary result for multiple parent/salt/complex forms of a chemical.

The modifier "simplified to parent" entry is intended to be used for the annotation of "desalted" SDF files derived by the user from the original DSSTox SDF file for use in structure-activity modeling studies.

If available, additional information on substance purity or mixture components and composition are provided in the ChemicalNote field. This may include CASRN and/or SMILES of additional mixture components.

Original DSSTox field name: StructureShown (modified August 2005).

 

TestSubstance_
ChemicalName

Return to the list above. Return to Top

Text

Common or trade name of chemical. Field entry corresponds to TestSubstance_CASRN.

If STRUCTURE_Shown = "tested chemical", field entry corresponds directly to contents of STRUCTURE field.

A chemical name is the most frequent chemical identifier provided in Source toxicity databases and is often, but not always, accompanied by a CAS number. When CAS number or SMILES is provided, these are used to determine and verify STRUCTURE and to check for consistency among all structure identifiers. If inconsistencies are found, additional review is undertaken to resolve discrepancies see DSSTox Chemical Information Quality Review Procedures). Any unresolved discrepancies are documented in the ChemicalNote field.

Prior to January 2009, chemical names were usually provided in the DSSTox SDF exactly as listed in the original Source database, with the exception of symbols, which are converted to text (e.g., alpha), and unless an obvious error in the Source listing is detected and confirmed. Chemical names are non-unique structural identifiers. They are often trade or common names conveying imprecise chemical information, and frequently have many acceptable synonyms. A chemical name is provided in DSSTox data files as a reference index to the original Source database content, but their importance is largely superseded by the information content of the STRUCTURE field. Because its contents are Source-dependent, this is the only standard chemical field whose content is not consistent across the entire DSSTox inventory.

After January 2009, this field reverts ot providing a single accepted chemical name (common, generic, or standard) throughout all DSSTox files, linked to the DSSTox_Generic_SID. Henceforth, Source-provided chemical names, if needed for cross-correspondence, will be provided in the optional field "Source_ChemicalName".

Alternate chemical names or synonyms are generally not provided in DSSTox data files unless they were provided explicitly in the Source database or publication, or unless they were found to be essential for locating accurate STRUCTURE information or another name is more commonly used. A user wishing to search on chemical name synonyms should consult other public sources specializing in such lists. See DSSTox Chemical Information Quality Review Procedures)

Original DSSTox field name: ChemName (modified August 2005).

TestSubstance_
CASRN

(no spaces)

Return to the list above.Return to Top

######-##-#/

NOCAS/

Chemical Abstracts Service (CAS) Registry Number of the tested substance, formatted 000000-00-0. In general, corresponds to TestSubstance_ChemicalName.

If STRUCTURE_Shown = "tested chemical", field entry corresponds directly to STRUCTURE.

"NOCAS" indicates CAS registry number was unavailable from original Source data table or was not found.

 

A CAS Registry Number includes up to 9 digits separated into 3 groups by hyphens. The first part of the number, starting from the left, has up to 6 digits; the second part has 2 digits. The final part consists of a single check digit.

If CAS numbers were not provided in the original Source database, these are retrieved from various public sources based on the original Source chemical name (TestSubstance_ChemicalName) and any corresponding chemical structure information (e.g., SMILES provided by Source). For a more complete discussion of the various public and commercial sources used and the review procedures used for CASRN, structure, chemical name verification, see DSSTox Chemical Information Quality Review Procedures

The final digit of every valid CAS number is a check digit that is computed from the previous digits using a standard formula. The standard calculation formula provided by CAS is used to verify CAS number -- see CAS Check Digit Verifictation.exit EPA)

Original DSSTox field name: CAS (modified August 2005).

TestSubstance_
Description

Return to the list above.Return to Top

single chemical compound/

macromolecule/

mixture or formulation/

unspecified or multiple forms/

"single chemical compound" = pure, neat or approximately pure single chemical compound (could be parent, salt or complex) with defined molecular structure;

"macromolecule" = polymer, protein, DNA, or other large biomolecular species;

"mixture or formulation" = test substance consists of more than one chemical compound, which may be fully or partially characterized, or consists of an active ingredient in an unspecified formulation,or the individual chemical components are not known;

"unspecified or multiple forms" = either the exact nature of the test substance is unknown or the test results refer to more than a single form of the test substance (e.g., multiple salt forms or derivatives of a parent chemical).

When entry is "single chemical compound", TestSubstance_... fields correspond directly to STRUCTURE fields. Only exception is when STRUCTURE_Shown entry contains "simplified to parent", in which case a desalted parent version of tested substance is shown.

Note: replaced "defined mixture or formulation" and "undefined mixture" with single entry "mixture or formulation" (June 2007).

ChemicalNote

Return to the list above.Return to Top

Text,

ammonium, stereochem, tautomers, parent [CASRN], CAS replicate, replicate 2D, replicate parent,

etc.

 

Note provides additional information related to tested substance, e.g., when uncertainty exists in chemical name or CAS number, parent structure is "ammonium" ion, tautomeric forms are known to exist, mixture characteristics are known, "stereochem" information is known (e.g., cis, trans, Z, E, R, S), CAS of parent salt or complex is known, common chemical name synonym, etc.

This is a catch-all text note field used for a variety of purposes related to augmenting the standard chemical information contained in the database record. As of June 2007, the use and content of this field has been purged of Source-specific content and does not vary from database to database; Source-specific content pertaining to the tested chemical substance has been moved to the Source-specific field, Note_NAMEID.

This field is no longer used to indicate instances of "replicate" information pertaining to CAS or structure for another record within the database, or to document discrepancies between CASRN, Chemical name and structure from the Source data. This information has been moved to the Source-specific field, Note_NAMEID.

DSSTox field content modified (June 2007)

 

TestSubstance_
ChemicalName_
Other

Return to the list above.Return to Top

Text

Synonym or alternate common or trade name of tested chemical listed in original Source database.

Alternate chemical names or synonyms are generally not provided in DSSTox data files unless they were provided explicitly in the Source database or publication, or unless they were found to be essential for locating accurate STRUCTURE information or another name is more commonly used. A user wishing to search on chemical name synonyms should consult other public sources specializing in such lists. See DSSTox Chemical Information Quality Review Procedures

Original DSSTox field name: ChemName_Other (modified August 2005).

TestSubstance_
CASRN_Other

Return to the list above. Return to Top

######-##-#/

Additional CAS registry numbers for the tested substance or for closely related derivative forms of the tested substance, formatted 000000-00-0; multiple CAS numbers are comma separated.

blank entry means no additional CAS numbers were available.

Field used in databases where multiple CAS numbers are provided for a significant proportion of database records, either for exact match chemicals or for closely related chemical forms or derivatives that have been grouped under the same toxicity test results.

Original DSSTox field name: CAS_Other (modified August 2005).

STRUCTURE_
ChemicalName_
IUPAC

Return to the list above. Return to Top

Text

IUPAC (International Union of Pure and Applied Chemistry) refers to standardized nomenclature of organic chemistry. IUPAC chemical names are generated automatically from STRUCTURE using the ACD/Name generation softwareexit EPA (ACD Labs, see LogFile for version) or obtained as a systematic name from other chemical sources (see DSSTox Chemical Information Quality Review Procedures).

blank if STRUCTURE_Shown entry is
"no structure"

IUPAC names are provided in DSSTox files to provide a systematic nomenclature for chemical structures and because IUPAC names contain chemical information content, content that can be the basis for text searching of common chemical features and can be used to faithfully regenerate structures. An IUPAC name refers to a unique chemical structure, but there may be more than one acceptable IUPAC name for a given structure.

If an IUPAC name could not be generated from the ACD Labs Structure-to-Nameexit EPA application, other sources for systematic names were consulted, such as National Library of Medicine's TOXNET ChemID Plusexit EPA and CambridgeSoft ChemFinder.com exit EPA.

Original DSSTox field name: ChemName_IUPAC (modified August 2005).

STRUCTURE_
SMILES

Return to the list above.Return to Top

Text

SMILES ( S implified Molecular Input Line Entry System ) molecular text code of displayed STRUCTURE .

blank if STRUCTURE_Shown entry is
"no structure"

SMILES is a widely used linear text code for representing 2D molecular structures that is employed by a wide range of commercial and public applications. SMILES codes are obtained from the original DSSTox Source in some cases, but in most cases standard, non-unique SMILES are automatically generated from STRUCTURE using available commercial software (i.e., the most recent versions of either the CambridgeSoft ChemFinder or the ACD ChemFolder applications in most cases).

Where cis/trans information is known, this is explicitly represented in the SMILES (with slashes). When the isomer form is not specified or both isomers are present, the SMILES may be represented in general form. In some cases, 3D information is indicated in 2D structures by use of triangular hatched or bolded bonds; if they appear in the STRUCTURE and corresponding mol file, these may also be represented in the SMILES code. See also More on SMILES

Original DSSTox field name: SMILES (modified August 2005).

STRUCTURE_
Parent_SMILES

Return to the list above.Return to Top

Text

SMILES ( S implified Molecular Input Line Entry System ) molecular text code of displayed STRUCTURE unless STRUCTURE_TestedForm_DefinedOrganic entry is either "salt" or "complex", in which case field entry corresponds to parent structure in desalted or neutralized (protonated) form, without salt counter ions or complexed moieties.

STRUCTURE_Parent_SMILES only provided for STRUCTURE_ChemicalType = "defined organic".

blank if STRUCTURE_Shown entry is
"no structure"

Field provided to aid users who wish to generate a file containing "simplified to parent" or "desalted" defined organic structures for structure-activity modeling. Commercial applications are available that automatically generate "desalted" structures (see CRD Applications). See also More on SMILES

Original DSSTox field name: SMILES_Parent (modified August 2005).

STRUCTURE_
InChI

Return to the list above.Return to Top

Text

InChI = IUPAC (International Union of Pure and Applied Chemistry) NIST (National Institutes of Standards and Technology) Chemical Identifier, a unique, standardized, text-based code for molecular structure. InChI codes were generated automatically from the STRUCTURE using the publicly available NIST/IUPAC InChI code generator program (see Log File of DSSTox database for code version).

InChI codes encapsulate essential chemical structural information and can be used for text, web-based, chemical structure searching.

If STRUCTURE_Shown entry is
"no structure", InChI default entry is:
InChI=1//

Field entries after 01Feb2009 conform to newly released InChI 1.02 Standards exit EPA.

InChI codes are typically generated with a text string of additional information (AuxInfo) that can be used to regenerate the full molfile structure from the original SDF. Due to its redundancy with the SDF molfile information and character length (frequently exceeding 255 characters), AuxInfo strings are not included in DSSTox data files. This additional information can easily be generated by processing the DSSTox SDF file with the publicly available InChI software. See More on InChI.

InChI codes frequently exceed 200 characters in length. Strictly speaking, the MDL SDF format specifications, published in 1992 (Dalby et al., J.Chem.Inf.Sci., 1992, 32:244-255), require a hard carriage return be inserted in any text field exceeding 200 characters in length. DSSTox SDF files do not adhere to this specification since STRUCTURE_InChI and STRUCTURE_SMILES fields (which also can exceed 200 characters in length for larger molecules) are considered essential chemical information fields. See More on InChI

Original DSSTox field name: InChI (modified August 2005). Updated to InChI 1.02 Standards Feb 2009.

STRUCTURE_
InChIKey

Return to the list above.Return to Top

Text

"InChIKey" is a fixed-length (25-character) condensed digital representation of the InChI Identifier that can be used to facilitate structure look-up; the full InChI is required for structure-regeneration. For more information, see STRUCTURE_InChI and More on InChI.

If STRUCTURE_Shown entry is
"no structure", InChIKey default entry is:
MOSFIJXAXDLOML-UHFFFAOYAM

Field entries after 01Feb2009 conform to newly released InChI 1.02 Standards exit EPA

"InChIKey" cannot be used to regenerate structure, but is more manageable as a fixed-character length look-up tool. At the time of its incorporation into DSSTox files, the InChIKey generation software is in beta testing mode.

DSSTox Standard Chemical Field added Feb 2008. Updated to InChI 1.02 Standards Feb 2009.

Substance_modify_yyyymmdd

Return to the list above.Return to Top

#=yyyymmdd

Sortable numeric date assigned to every unique substance in the DSSTox inventory (i.e., every unique DSSTox_Generic_SID) indicating the most recent date of modification of the structure or Standard Chemical Fields associated with that ID.

Note that this date does not generally apply to changes in mapping of the DSSTox_RID to the DSSTox_Generic_SID within a file, only to corrections within substance-related fields.

yyyymmdd = year, month, day (e.g., 20081021 = 21 October 2008)

Field is used in DSSTox Master file DSSTox_Generic_SID table and was introduced to allow users to easily sort and locate recently modified or corrected DSSTox substance records (structure or substance descriptions), independent of updates to source-specific toxicity records.

Changes affecting the mapping of DSSTox_RID to DSSTox_Generic_SID within a file, such as when an incorrect structure was applied, does not affect this date if the DSSTox_Generic_SID information is correct, but all such changes will be documented in the Note_NAMEID field.

New field added October 2008 (will be added to all files published after this date)

Jump to main content.