Computational Toxicology Research Program
Choose a question from the quick links below:
- How do I find out more about the new DSSTox Structure-Browser?
- Why did you change your Standard Chemical Fields?
- Why did you keep changing DSSTox IDs in your Standard Chemical Fields?
- Why are you now including structures for mixtures in your files?
- Why does it take you so long to publish new DSSTox data files?
- Why do you include IUPAC chemical names, but not chemical name synonyms?
- Why are you including InChIs in DSSTox data files?
- Why did you discontinue offering the "desalted" DOP files?
- What do you mean by "distributed"?
- Will I be able to do a structure-search from the DSSTox website?
- How does a DSSTox SDF file differ from any other SDF file?
- Why did you choose SDF file format?
- Why not XML or CML? Are you attempting to standardize toxicity data?
- How will you be dealing with the issue of toxicity data quality?
- What quality control measures are applied to the standard chemical field content of DSSTox SDF files?
- How do I get started if I want to try to utilize DSSTox SDF files?
- Will I be able to search across multiple DSSTox SDF files and toxicities with available CRD applications?
- How do you plan to maintain and expand this public effort?
- Are you aware of other public initiatives to standardize and increase availability of toxicity data?
- How can I help your effort and show my support?
What do you mean by distributed? By distributed, we mean that DSSTox data files, along with their documentation files, are completely separate and distinct publication modules that could reside either on the DSSTox website or on a remote Source website, and can be modified, enhanced, and replicated on other sites for different purposes. Due to the inability of Source collaborators to maintain and host the DSSTox files, all current DSSTox data files are published on this website. We also provide on this site links to Other Public Databases that are toxicity databases in SDF format, and that have adopted some DSSTox standards or are otherwise coordinating with the DSSTox project.
Will I be able to do a structure-search from the DSSTox website? As of Aug 2007, with the launch of the DSSTox Structure-Browser (v1.0), a user can now structure-search (or chemical name, CAS, SMILES search) through all the published DSSTox Data Files. DSSTox also publishes standardized SDF data files that can be imported into virtually any structure-searching database application (More on CRDs). However, currently a user must provide their own chemical relational database (CRD) application. We offer links to commercial vendors offering CRDs for evaluation and purchase (see CRD Applications). In addition, we will be posting updated DSSTox published Data Files on PubChem giving a user access to all the search capabilities and expanded content of this large public resource.
How does a DSSTox SDF file differ from any other SDF file? Each DSSTox SDF includes, in addition to chemical structure and toxicity information, a set of standard chemical fields (More on DSSTox Standard Chemical Fields). Each DSSTox SDF file also is named according to a standard naming convention (More on DSSTox File Names), has some restrictions imposed on content, and adheres to "clean SDF" conventions (More on SDF). The largest difference, however, is in the toxicity data content and the stringently quality reviewed chemical information fields.
Why did you choose SDF file format? Why not XML or CML? SDF is already a defacto industry standard for data exchange between chemical relational database and modeling applications. It is a public standard with a very simple ASCI II text format and flat field structure (i.e., no nested fields within fields). More on SDF Neither XML nor the chemically-inclusive version, CML (Chemical Mark-up Language), is at present completely standardized or in common use in chemical database or modeling applications. However, the creation of standardized DSSTox SDF files will greatly facilitate migration to these or other data formats in the future. In addition, efforts are currently underway to coordinate with a LeadScope ToxML workgroup proposing standards for storage of chemical toxicity data in XML format (see Coordinating Public Efforts). In coordination with this effort, we have begun incorporating Standard Toxicity Fields, and are adopting standardized vocabulary for field names and field entries wherever possible to enhance cross database information mining.
Are you attempting to standardize toxicity data? Only, at present, by virtue of promoting the use of some controlled vocabulary, i.e., common data field names and entry formats for the same types of toxicity data across DSSTox data files. These are centrally stored and indexed in the Central Field Definition Table. The primary focus of Phase I of the DSSTox effort is aimed at creating faithful representations of existing toxicity data, while introducing DSSTox Standard Chemical Fields. We are beginning to coordinate our effort, however, with others focusing more particularly on promoting standard toxicity data fields (see Coordinating Public Efforts). In coordination with the LeadScope ToxML effort, we have begun incorporating Standard Toxicity Fields, and are adopting standardized vocabulary for field names and field entries wherever possible to enhance cross database information mining.
How will you be dealing with the issue of toxicity data quality? We deal with issues of data quality currently only as they pertain to accurate representation of the structural and chemical information content of DSSTox data files. We do not quality review toxicity data from an outside Source unless an error is obvious; we only strive to faithfully reproduce the Source toxicity data. However, maintaining direct linkages to the Source and providing literature citations and adequate data description better arm a user to place the toxicity data in an appropriate context with respect to the issue of data quality. Also, by shining a brighter light on existing toxicity data, the DSSTox project has the potential to more broadly impact issues of data quality.
What quality control measures are applied to the standard chemical field content of DSSTox SDF files? Accurate representation of the structural and chemical information content of DSSTox data files is a primary objective of this project. The Log File listed on the main SDF Download Page for each DSSTox data file (see Templates & Sample Files) summarizes the procedures that were undertaken to ensure accuracy and internal consistency of chemical structure fields within each data file. Also listed in the Log File and used for QA review are summary counts within categories of chemicals. Numerous outside sources are used to retrieve chemical structures for inclusion in DSSTox data files and these are cross checked for internal consistency with other DSSTox Standard Chemical Fields, such as TestSubstance_SMILES, TestSubstance_CASRN. The ChemicalNote field is used to document any inconsistencies or uncertainties in the final displayed structure, and missing data (e.g., CAS) are listed in the Log File. We offer files of Technical Procedures, and have provided a mechanism for users to File Error Reports for DSSTox data files. Finally, each data file undergoes a rigorous set of visual and automated checks, and the data file content is reviewed by multiple persons listed in the Acknowledgements on the SDF Download Page. These quality control procedures are continually being strengthened, automated, and enhanced. For more information, see: DSSTox Chemical Information Quality Review Procedures .
How do I get started if I want to try to utilize DSSTox SDF files? First, we recommend that you either purchase or otherwise acquire a Chemical Relational Database (CRD)application (see More on CRDs). Low-cost or no-cost options are available for individual-license use (see SDF Viewers, Structure Browsers & CRD Applications). These applications allow the direct import of SDF files, visualization of chemical structure/data records, and relatively sophisticated search capabilities for structural analogs combined with data fields. We are working to incorporate DSSTox data files into off-site structure search capabilities currently available on-line (see Coordinating Public Efforts and Searching DSSTox SDF Files on PubChem). With the recent launch of the DSSTox Structure-Browser (v1.0) you can now structure-search DSSTox SDF files directly from this website.
Will I be able to search across multiple DSSTox SDF files and toxicities with available chemical relational database (CRD) applications? Some CRD applications (MS Access-based and others) require that DSSTox SDF files be imported and merged into a central file before cross-file searching is allowed (see More on CRDs). Other CRD applications allow SDF files to be imported and converted to the application format, and to retain their separate database identities for cross-file searching. Which CRD application is chosen will depend on a user's specific needs and preferences (see SDF Viewers, Structure Browsers & CRD Applications). See also Searching DSSTox SDF Files on PubChem.
How do you plan to maintain and expand this public effort? We cannot expand this public effort without help, and for this we will need to involve toxicity database Sources and DSSTox users more directly in the DSSTox collaboration. We hope to entice more Sources into "publishing" their database in this manner, and want to encourage the view that publishing a database on the DSSTox website, while adhering to DSSTox documentation and data file standards, will provide greater visibility, use, and impact for a Source database (see How to Publish a DSSTox Database). Towards this aim, we provide templates of all documentation files (see Templates & Sample Files) and offer assistance in populating DSSTox Standard Chemical Fields.
Are you aware of other public initiatives to standardize and increase availability of toxicity data? Yes, we are aware of many of these efforts and are attempting to coordinate our efforts with many others (see Coordinating Public Efforts). We also appreciate if users could publicize the DSSTox project to others, and Contact Us with any information on new public initiatives.
How can I help your effort and show my support? The single most important thing you can do is to consider publishing, or encourage others to publish a database on the DSSTox website, or a database adopting some or all DSSTox data standards on another public website. The more databases we are able to offer publicly, the more our proposed documentation and file standards gain acceptance and the more toxicity data becomes available for general use in predictive toxicology modeling. Even if you are developing databases for internal use, you might consider adopting DSSTox data and field standards. Other ways to help are to consider volunteering to be a DSSTox database reviewer, to report any errors in DSSTox data files or documentation files, and to provide missing data (e.g. CASRN). Check out the Support DSSTox Effort page on this site.
Why are you including InChIs in DSSTox data files? Since public software tools are currently available to generate InChI codes for an sdf file, the question might be asked why we are providing them in our files? The answer is that we wish to support and endorse this public initiative and raise the public awareness of the use and value of InChIs. Two aspects of InChI chemical identifier codes make it very attractive: 1) that the technology (to convert structures to InChIs and, vice-versa, i.e. InChIs back to structures) is XML-based, entirely in the public domain, and NOT a registry system; and 2) that an InChI is a unique chemical identifier capable of encoding very nuanced and detailed chemical structure information, if available. We also want to follow the lead of others and encourage the use of InChIs to "tag" chemical information with chemical structures wherever such information is found on the internet. For an example of how a DSSTox Source collaborated used InChIs provided in our files to enhance public data offerings, see the new chemical data pages on the CPDB website, e.g. http://potency.berkeley.edu/chempages/ACETALDEHYDE.html; the InChI is included on each chemical data page allowing the page to be located by a general web structure (InChI) search. Check out the More on InChI page on this site.
Why did you discontinue offering the "desalted" DOP files? For the initial launch-versions of 3 of the 4 DSSTox data files published on this website (i.e., CPDBRM_v1a, EPAFHM_v1a, NCTRER_v1a), we offered, in addition to the main SDF of "tested form" chemical structures, an SDF file containing only Defined Organic Parent (DOP) structures, i.e. "desalted", where salts and complexes are simplified to the neutral, uncomplexed form of the chemical. We originally had to do some contortions with our standard fields to consistently represent both the Main and DOP files with the same set of standard fields. We also found these DOP files time-consuming to create and maintain in addition to the Main files, and decided that our limited human resources were better spent on getting new data files completed and ready for publication. Another factor in this decision was the capability provided in a number of commercial CRDs for "desalting" an SDF file automatically. See, e.g., the ACD ChemFolder product on the SDF Viewers, Structure Browsers & CRD Applications page on this site.
Why did you change your Standard Chemical Fields? As this project has grown, so has our need to provide clarity and consistency of chemical information across the diverse content all DSSTox Structure Data Files. For more explanation, see More on DSSTox Chemical Standard Fields.
Why did you keep changing DSSTox IDs in your Standard Chemical Fields? Initially, the motivation was to be able to interface with the PubChem Project, but eventually we saw the benefit of this type of data organization, i.e. where chemical structures and substance information are uniformly represented across all DSSTox data files. See also: More on DSSTox Chemical Standard Fields and Use of CIDs and SIDs in DSSTox Master File. Most recently, we have ported our DSSTox Master File inventory into a relational tabular environment that places strict constraints on use of IDs. Also, we have enlarged the scope of the DSSTox project to encompass structure-index files for High-throughput Screening Testing projects, such as NTPHTS and TOXCST, which require tracking of substances to the experimental sample level (lot, batch, manufacturer, etc). This latter use, and the need to uniquely index these in PubChem, necessitated implementation of the new DSSTox_RID unique record ID.
Why are you now including structures for mixtures in your files? Providing a STRUCTURE field for a defined mixture is a compromise; it does not fully represent the substance and yet allows at least some capability of locating the record with a structure or sub-structure search, where the data might be relevant. The STRUCTURE_Shown and TestSubstance content-linked fields provide further annotation on the nature of the substance.
Why does it take you so long to publish new DSSTox Structure Data Files? Many hurdles had to be overcome and very few resources and personnel were devoted to this project early on. With the major reorganization and update of the website (Aug2007), the update of the DSSTox Master File into MS Access relational environment, and many new automated procedures, we hope to accelerate the pace of DSSTox SDF publication in the future.
Why do you include IUPAC chemical names, but not chemical name synonyms? IUPAC chemical names are systematic chemical names with significant text information content. Common names and synonyms are numerous, unregulated, uninformative,and error-prone. The focus of the DSSTox Project is on structure-annotation, and on the migration of toxicology data into structure-annotated, field delimited forms useful for relational searching, data mining and modeling. Other, much larger public efforts (such as NLM ChemID Plus ) provide large libraries of synonyms and do this better than we could.
How do I find out more about the new DSSTox Structure-Browser ? Go to the information page: DSSTox Structure-Browser Information.