Computational Toxicology Research Program
Coordinating Public Efforts
Listed below are external public efforts working towards the aims of toxicity data standard ontologies and vocabularies, chemical data standards, open data access, and/or on-line structure-searchability. We are coordinating the DSSTox project with these efforts to the extent possible. As these collaborations advance, we will provide updates and links on this page. We invite suggestions from users of additional ongoing or planned public efforts where coordination with the DSSTox project would be productive. Similarly, users are encouraged to publicize and inform others of the DSSTox project to facilitate such collaborations. See also EPA Collaborations & Activities.
ChemSpider ** Update (Feb2009)
ACD/Labs Dictionary ** Update (Feb2009)
Carcinogenic Potency Project **New DSSTox Structure-Browser capability
DEMETRA Pesticide Eco-Toxicity Database for QSAR
FDA Center for Drug Evaluation & Research ICSAS Programs and Activities
ILSI Developmental Toxicity Database for Improving SAR Models
IUPAC/NIST InChI Project
Leadscope ToxML Project
LHASA Limited. VITIC Project
NCI Public Data Outreach – Structure Web Browser
NCTR Array Track - Structure Browser Collaboration
NIEHS CEBS Knowledge Base
NIH Molecular Libraries Roadmap - Chemical Genomics Center/ DPI Small Molecule Repository
NIH PubChem Project
NTP Public Databases & High-Throughput Screening**New DSSTox Structure-Browser capability
SRC PBT-Profiler and Analog Search Tools
WOMBAT - Database of Biological Activities of Pharmaceuticals
Future Planned Collaborations
(Contact: Tony Williams, email: email@example.com)
ChemSpider is a free access service providing a structure centric community for chemists, access to millions of chemical structures and computed properties, and integration to a multitude of other online services. DSSTox published substances have been incorporated into ChemSpider and ChemSpider provides link-outs to DSSTox SDF Download Pages, as well as direct linkages to URLs for chemical data pages provided within some DSSTox files, such as NTPBSI or CPDBAS. As of Sept 2008, the DSSTox Structure-Browser provides link-outs to the corresponding chemical structure page in ChemSpider using InChI codes. Given the importance of this new public resource, DSSTox will continue to find ways to connect users to ChemSpider and to try to keep DSSTox information current within ChemSpider.
Return to Top
ACD/Labs (Advanced Chemistry Development)
ACD/Labs markets a variety of physicochemical property and spectroscopy prediction software modules, as well as systematic nomenclature generation and analytical data processing and management solutions. One of their modules, ACD/ChemFolder, is a Windows-based chemical relational database application (See More on CRDs) accompanied by a Dictionary of high quality, curated chemical structures, synonyms and identifiers (e.g., CASRN). We make frequent use of this Dictionary, along with public sources, during DSSTox file construction and annotation process as part of our Chemical Information Quality Review Procedures. To assist our public effort and chemical curation and review, ACD/Labs has kindly provided the DSSTox project with the ACD Dictionary in SD format for internal EPA use (Feb2009). The DSSTox Master file structure inventory is being provided to ACD/Labs for their use in expanding their Dictionary offerings.
Return to Top
Carcinogenic Potency Project
(contact: Lois Swirsky Gold, email: firstname.lastname@example.org)
A multi-year collaboration between the DSSTox project and the Berkeley Carcinogenic Potency Project (CPDB) has resulted in publication of several versions of the modified and consolidated CPDB Summary Tables with chemical structure content (see CPDBAS). The CPDB website additionally hosts a large number of detailed data tables and plots pertaining to chemical carcinogenicity studies from NTP reports and literature sources that were previously organized by chemical names and CASRNs, but not effectively indexed from a chemical structure-searching point of view.
Motivated by the DSSTox collaboration and availability of additional chemical information from CPDBAS (such as InChIs, SMILES, etc.), central chemical indexing has been added to the CPDB website. Separate chemical data pages, with a distinct website URL address per chemical, containing summary data, analyses of individual experiments, and links to information throughout the CPDB website for each of the more than 1450+ chemical substances in the CPDB are now provided (see, Carcinogenic Potency Database, All Results for Each Chemical ). In addition to more effective chemical content organization, the CPDB website now has sufficient chemical structure identifier content (see, e.g., Acetaldehyde methylformylhydrazone ) for its pages to be "structure-located" by a general Internet text search of InChI codes.
** Update (Aug 2007): CPDB Chemical Data page URLs have been incorporated into the structure-search capability provided by the new DSSTox Structure-Browser.
Return to Top
DEMETRA Pesticide Eco-Toxicity Database for QSAR
(contact: Emilio Benfenati, email: email@example.com)
The DEMETRA Project (Development of Environmental Modules for Evaluation of Toxicity of pesticide Residues in Agriculture) is sponsored by a European Commission Community Research grant, Quality of Life and Management of Living Resources QLK5-CT-2002-00691. The project aims to develop software that will give a quantitative prediction of the toxicity of a molecule, in particular pesticides, candidate pesticides, and their derivatives. The model input will be the chemical structure of the compound, and the software algorithms will use "Quantitative Structure-Activity Relationships" (QSARs). Demetra Prediction Tools can be found at DEMETRA Tool .
The DEMETRA Project has created a pesticide QSAR database, derived from EPA's Office of Pesticide Programs Eco-toxicity database, that has been reviewed, structure-annotated, and endpoint-consolidated to make it more appropriate for use in QSAR model development. This DEMETRA SDF was provided to the DSSTox project, and the SDF modified to be consistent with DSSTox Standard Chemical Fields and structure and test substance representations. Plans are to publish this DSSTox data file as a companion database to the DEMETRA project, to be compatible with other DSSTox data files and comparable to other EPA pesticides data lists.
FDA (Food & Drug Administration)
Center for Drug Evaluation & Research:
Informatics and Computational Safety Analysis Staff (ICSAS) Programs and Activities
(contact: Daniel Benz, FDA CDER, email: BenzRD@cder.fda.gov)
The FDA/CDER Computational Safety Analysis Staff (D. Benz, E. Matthews, J. Contrera, N. Kruhlak) have been actively engaged in developing computational toxicology capabilities with a strong emphasis on use of structure-activity relationships. This has involved the integration of structure-searchable chemical relational databases with toxicity prediction algorithms, and collaborations with various predictive toxicology software and data-mining vendors. Since the target goal is the estimation of toxicity endpoints for pharmaceutical chemicals undergoing FDA review based largely on chemical structure, an essential component of this effort has been the creation of quality training databases of structure-linked toxicity information tailored to pharmaceuticals. In some cases, this has meant enriching public databases (largely consisting of industrial/environmental chemicals) with pharmaceuticals, whereas in other cases toxicity data for pharmaceuticals have been painstakingly extracted from FDA public archives. In conjunction with FDA data public offerings, and with FDA scientists, we hope to publish additional FDA databases in DSSTox format for public distribution as they become available (see e.g. FDAMDD).
Return to Top
ILSI (International Life Sciences Institute)
Developmental Toxicity Database for Improving SAR Models
(contact: Beth Julien, ILSI RSI, email: firstname.lastname@example.org)
ILSI Risk Science Institute is coordinating a project entitled "Improving the Use of Toxicity Data in Statistically Based Structure-Activity Relationship Models for Developmental Toxicity" to contribute to improved SAR models and prediction capabilities. The project goal is the design of a prototype developmental toxicity database via an interdisciplinary effort involving modelers, developmental toxicologists and database experts. The project is currently in Phase II. A panel of developmental toxicologists has been recruited to review the database design and test its adequacy per Phase II tasks. ILSI Research Foundation/Risk Science Institute anticipates moving to Phase III tasks during the summer of 2007, with completion of the project in early 2008. This project design is being coordinated with both the ToxML standardization effort, for toxicity data schema, and the DSSTox project, which will structure-annotate and provide a structure index file for the final prototype database. For description of early stages of this project, also see DSSTox CODDD publication.
The IUPAC InChI Project is a public initiative led by scientists at the National Institutes of Standards & Technology (NIST) in collaboration with IUPAC (International Union of Pure & Applied Chemists) to establish a robust, public standard for the unique representation of chemical structures in XML text code, termed InChI, along with the tools for generating these InChI codes. A number of third party freeware and commercial software and databases have incorporating InChI (see InChI Adopters),and we incorporate InChI codes as a DSSTox Standard Chemical Field into all DSSTox data files. MORE on InChI>
** Update (Sept 2007): InChI developers recently announced publication of a new fixed-length (25 character) condensed digital representation of the Identifier known as an InChIKey (http://www.iupac.org/inchi/release102.html) that will be useful in web searching applications and databases where the full InChI identifier had previously caused difficulties with line breaks and special character handling. We will evaluate this new InChI and determine whether to include it in future DSSTox files. MORE on InChI>
Return to Top
Leadscope ToxML Project
(contact: Chihae Yang, email: email@example.com)
ToxML is a public initiative led by scientists at LeadScope, Inc to promote adoption and use of controlled vocabularies and XML schema for storing chemical toxicity data. Leadscope is a data-mining software company with a particular focus on toxicology, marketing a range of data bases, exploration tools for developing structure-activity relationships, and predictive toxicity models. The LIST (Leadscope In Silico Toxicology)Focus Group consisted of industry, government and academic clients of Leadscope, and other invited and interested parties. Leadscope has recruited toxicity domain experts for developing relevant controlled vocabularies (initially in the areas of mutagenicity and carcinogenicity). Public funding of this effort through a NIST Advanced Technology Program Grant ensures that a basic ToxML schema, viewer, and data entry form are being made freely available to promote public adoption of the standardized format and vocabularies. In addition to the freely downloadable ToxML schema, Leadscope is now marketing several FDA databases for Genetox, Developmental, and Chronic/Sub-chronic Toxicity based on these ToxML schema. Scripts for converting from ToxML to SDF also have been developed to facilitate coordination with DSSTox. In addition, where possible, DSSTox will coordinate with the ToxML effort to promote common field structures and controlled vocabulary for chemical information as well as toxicity information, the latter primarily related to general study design characteristics that would apply to the diverse-content DSSTox files, e.g., fields incorporated in 2005, such as StudyType, Species, Endpoint (see More on DSSTox Standard Toxicity Fields), along with additional fields such as Sex, Tissue, Benchmark_Dose, etc. that may be incorporated as new DSSTox databases are developed. MORE >
Return to Top
LHASA Limited, VITIC SAR Database Project
(contact: Philip Judson, email: firstname.lastname@example.org)
The Virtual International Toxicology Information Centre (VITIC) project of LHASA Limited is building a centrailzed structure-searchable toxicity database for use in structure-activity relationships from public data and member contributions. This project grew out of an earlier SAR Toxicity Database effort coordinated by the ILSI Health and Environmental Sciences Institute (HESI). VITIC is a chemical relational database template, with standardized structure and toxicity data fields and will benefit from public availability of annotated and standardized toxicity databases such as provided by DSSTox. In addition, VITIC plans to use the DSSTox publication vehicle to make publicly available portions of their databases compiled from public domain data. For description of early stages of this project, also see DSSTox CODDD publication.
NCI (National Cancer Institute) Enhanced Database Browser
(contact: Marc Nicklaus, email: email@example.com)
NCI's Enhanced Database Browser, a project of the NCI Computer Aided Drug Discovery (CADD) group, currently provides open public access and structure-searchability for over a quarter million three-dimensional chemical structures residing in various NCI databases and the public domain. Complete databases or results of targeted searches can be downloaded in full as SDF files. The structure-browser is built upon the CACTVS system and various CACTVS tools can be freely accessed by the academic, non-profit community. The NCI database includes InChI, IUPAC names, PASS Activity Spectra, and HASH codes for managing tautomeric and stereoisomer invarient selections. The NCI website is posting and consolidating listings of various government public databases, including toxicity databases. DSSTox SDF files are enhanced and posted in searchable form, with relational query options. We are collaborating to coordinate offerings of the the latest DSSTox database versions within the newest enhanced NCI Browser.
Return to Top
NCTR (National Center for Toxicological Research) ArrayTrack - Structure Web Browser
(contact: Weida Tong, FDA NCTR, email: firstname.lastname@example.org)
NCTR's ArrayTrack , developed by researchers at NCTR's Center for Toxicogenomics, is a public software tool with a wide array of capabilities for data analysis and pathway mapping of microarray data. Incorporated as part of ArrayTrack is a structure drawing and browser technology, the former based on the JChemPaint chemical drawing program, an open-source technology distributed through SourceForge.net. The structure browser and substructure CRD search capabilities were developed by NCTR researchers based on in-house fingerprints using Tanimoto similarity. Plans are to coordinate DSSTox databases and structure-searching capability into the larger pathway mapping features of ArrayTrack.
NIEHS Chemical Effects in Biological Systems (CEBS) Knowledge Base
(contact: Jennifer Fostel, NIEHS CEBS project, email@example.com)
A consortium of academic, government and industry participants are working towards the goal of creating a central, consolidated, knowledge base of toxicogenomics information. A key component of this effort is the development of the CEBS (Chemical Effects in Biological Systems) Knowledge Base. CEBS is a publicly accessible relational database hosting toxicogenomics and toxicology data, the latter including information about the biological effects of chemicals and other agents and their mechanism of action. CEBS is designed to be fully searchable by compound, structure, toxicity, pathology, gene, gene group, SNP, pathway, and network. The DSSTox standard chemical fields are being incorporated as minimum annotation for the toxicogenomics experimental data. DSSTox standard chemical fields will provide the structure-searchable component of the CEBS database, as well as provide effective linkage to the historical toxicity data published as DSSTox SDF files. The EPA portion of this collaboration has been funded through the new EPA Computational Toxicology Program as a New Start Award entitled: Chemoinformatics Enhancement of the NIEHS/National Center for Toxicogenomics (NCT) Chemical Effects in Biological Systems (CEBS) Knowledge-Base (see Abstract).
NIH Molecular Libraries Roadmap - Chemical Genomics Center/ DPI Small Molecule Repository Collaborative
(contacts: Christopher Austin, NIH Chemical Genomics Center, email: firstname.lastname@example.org; Doug Livingston, Discovery Partners Int., email: DLivingston@discoverypartners.com)
NIH Molecular Libraries Roadmap, launched in mid-2003, is screening a very large library (>100K) of "small molecules" (Discovery Partners International - Small Molecules Repository) in hundreds of high-throughput bioassays and depositing these data into PubChem. The project is screening the DPI SMR to develop "chemical probes" of gene, pathway, and cellular functions, with the broad objective to advance understanding of the relationship between chemical structure and biological function. To the extent that this designed DPI SMR chemical library samples sufficient chemical space with associated reference toxicity data, there is also potential for the results to be relevant to toxicity inferences. The consolidated DSSTox Master Structure-Index file has been used to approximately assess the overlap of the DPI SMR with toxicity "chemical study space" and to inform the creation of new chemical inventories to be considered for inclusion in the DPI SMR or for screening by the NIH Chemical Genomics Center (NCGC). Initial chemoinformatics analysis of the DSSTox Master Structure Index File compared to the DPI SMR was performed by Tudor Oprea, Univ. of New Mexico (see Presentations). We are also providing the NCGC with DSSTox Structure-Index files NTPHTS and TOXCST to support collaborations between NCGC and the NIEHS National Toxicology Program - High Throughput Screening Initiative and EPA ToxCast Program.
PubChem is an on-line chemical data management model originally designed to serve as a large central public repository of chemical bioactivity data to be generated from the Molecular Libraries Screening Center Network (see NIH MLR). In addition, PubChem is a user-depositor system that invites chemical structure-annotated data submissions, preferrably of bioassay summary data. DSSTox Standard Chemical Fields have been further revised and expanded to add a Source-specific Record ID, DSSTox_RID, to the previous DSSTox Chemical (structure) ID, DSSTox_CID, to create full 1:1 compatibility with PubChem data management model(August 2007). The new DSSTox_Generic_SID substance ID additionally ensures consistent representation and look-across capability of chemical substances and structures across DSSTox files (August 2007).
** Update (Sept 2007): Previously published versions of DSSTox SDF files (see PubChem Announcements, Nov 22, 2005) are being withdrawn from PubChem to be replaced with updated DSSTox data files and identifiers, and to include better mapping of DSSTox Data File content to Bioassay (AID) organization in PubChem (est. Oct 2007). Once new PubChem Chemical IDs (CIDs) are obtained for deposited DSSTox Data File content, we will include directly linkage of the DSSTox Structure-Browser Substance Results Page to the corresponding PubChem CID page. For announcements of updates and more information on accessing DSSTox files in PubChem, consult Searching DSSTox Files in PubChem.
NIEHS's National Toxicology Program has invested considerable resources into upgrading current NTP on-line data offerings for their historical rodent bioassay studies. At present, records are indexed by chemical name and CASRN, with structures of tested chemicals provided as separate data files; however, the NTP website offers no structure searchability and very limited search-across, or relational, search capability for assisting modeling studies. The creation of DSSTox structure-index files (see Work-in-Progress) for NTP study areas and publication of DSSTox data files for summary toxicity results (rodent carcinogenicity, genetic toxicity, etc.) will complement the NTP on-line offerings and facilitate and encourage creation of an on-line structure-searchable component to the NTP databases. Additional collaborations with the NTP High-Throughput Screening program are currently underway, involving structure-annotation and publication of structure-index files for NTP chemical data sets being submitted for bioassay screening at the NIH Molecular Libraries Roadmap - Chemical Genomics Center. These efforts are being coordinated with similar EPA efforts and the DSSTox project is assisting and enabling the chemoinformatic component of these projects.
Update (Sept 2007): The DSSTox Structure-Browser is being made available from the NTP Bioassay On-line Database Search Page to allow structure-searching of the contents of the NTP website chemical inventory and URL links to chemical substance test data pages (est. Sept 2007).
Return to Top
SRC (Syracuse Research Corporation) PBT-Profiler and Analog Search Tools
(contact: Jay Tunkel, email: email@example.com)
Syracuse Research Corporation, working with the US EPA, has developed and currently hosts an on-line, public chemical property prediction system, termed the PBT Profiler (Persistent, Bioaccumulative, and Toxic Profiler). This Profiler was developed for use by industry submitters of Premanufacture-Notification Review chemicals, but is available to all. Efforts are currently underway to expand these on-line capabilities to include intelligent chemical analog searches through public toxicity databases. Coordination of this effort with DSSTox will broaden the reach of these analog searches across an expanding set of standardized public datasets. Again, the challenge will be to set this up in such a way as to maintain linkages to the DSSTox central website, field definitions, and documentation files. In addition, SRC is providing assistance to the DSSTox project to implement a public online structure browser for the EPA DSSTox website.
Return to Top
WOMBAT is a database of biologically active compounds, primarily consisting of pharmaceuticals. Sunset Molecular Discovery maintains this database of over 70,000 entries, including over 68,000 unique SMILES, over 3000 papers, covering over 140,000 activities on 630 targets. WOMBAT contains information published in the following journals from 1975 - 2003: Biochem. Pharmacol, Bioorg. Med. Chem. Lett., Chembiochem, Eur. J. Med. Chem., J. Amer. Chem Soc., J. Med. Chem, Quant Struct-Act. Relat and J. Healt Sci. Discussions are underway to make publicly available portions of the WOMBAT database pertaining to subsets of biological targets potentially relevant to toxicity mechanisms for DSSTox chemicals. Additional advice has been provided on problems with SMILES conversions, and SDF representations of 2D molecules with stereochemistry and chirality.
In addition, collaboration with T. Oprea and collaborators at the Univ. of New Mexico has provided chemoinformatics analysis for initial comparisons of the DSSTox Master Structure-Index File (over 5000 defined organics as of 2/2006) to the NIH MLR Small Molecule Repository of 66K molecules available through PubChem. A summary of these results is provided in DSSTox_vs_DPISMR presentation.
CambridgeSoft's ChemFinder.Com Chemical Search Website
(contact: Robert Joseph, CEO, CambridgeSoft, email: firstname.lastname@example.org)
CambridgeSoft's ChemFinder.Com website is a widely used on-line public utility for locating chemical structures by chemical names or CAS numbers. It also provides structure-indexing to a variety of public databases, offering linkage to the home website (e.g., Berkeley Carcinogenic Potency Database) or to a text data record for a particular chemical (e.g., a particular NTP Technical report). Plans are to offer DSSTox SDF files, pre-indexed by chemical structure, for listing on the ChemFinder website, with linkage back to the Main Database page on the DSSTox website for the chemical of interest. As we encounter errors or inconsistencies within the ChemFinder.com on-line database, we routinely report these to CambridgeSoft for correction.
Return to Top
eMolecules: Chemical Structure Searching on the WWW
(contact: Klaus Gubernator, CEO, eMolecules Inc, email: email@example.com)
eMolecules (formerly Chmoogle - launched in 2005) is an on-line public utility for locating chemical information on the WWW by structures, chemical names or CAS numbers. DSSTox published SDF files are indirectly accessible from eMolecules through the incorporation of the entire PubChem inventory into eMolecules. A structure-search provides links to individual PubChem chemical pages, which will also contain information from DSSTox datafiles if the chemical is in a DSSTox published database. Plans are to offer DSSTox SDF files, pre-indexed by chemical structure, for listing on the eMolecules website, with linkage back to either the Main Database page on the DSSTox website or link to a URL of a chemical data page on a public website provided in the DSSTox structure-index file.
(contact: Weida Tong, FDA NCTR, email: firstname.lastname@example.org)
MIAMI-Tox is an extended effort of the MGED:MIAMI (Microarray Gene Expression Data Society: Minimum Information for the Annotation of Microarray Inc) consortium into the realm of toxicogenomics. Such experiments generally pertain to single chemical toxicity exposures and, as such, will benefit from standard chemical structure-field annotation. General chemical information fields have been proposed in MIAMI-Tox and coordinated discussions are initiated with the LIST ToxML Focus Group. Alternatively, or in coordination with those proposals, DSSTox standard chemical fields could be included and expanded to serve as minimal annotation for toxicogenomics experiments dealing with chemical exposures. This would enable linkage of DSSTox SDF data files of historical toxicity data with newer data being generated in toxicogenomics experiments. Efforts are also underway within the DSSTox project to chemically index the on-line content of the ArrayExpress public microarray database, and to create a DSSTox companion structure-index file that could bring structure-searchability to that website.
Return to Top
NLM (National Library of Medicine) TOXNET
(contact: Phillip Wexler, SIS, NLM, email: email@example.com )
NLM's TOXNETcurrently provides on-line structure searchability, using the ChemID system, through a number of publicly available toxicity databases, including CCRIS, RTECS, GeneTox, IRIS, Toxics Release Inventory, and a few others. The structure-search results typically link a user to a text report page for a given chemical structure. Files also can be searched by CAS No., chemical name, and text string. Few of the TOXNET databases can be accessed or downloaded in full, however, and none can be downloaded with chemical structures. [For more information, see Publications: CRC QSAR Chapter]. Currently, the entire ChemID inventory, with linkage to ToxNet data pages is indexed by chemical structure within PubChem. Hence, current plans to link the DSSTox Structure-Browser Substance Results Page to the corresponding PubChem CID page will provide indirect access to corresponding ToxNet data pages. For announcements of updates and more information on accessing DSSTox files in PubChem, consult Searching DSSTox Files in PubChem.