Jump to main content or area navigation.

Contact UsRisk Management Sustainable Technology

Quantitative Structure Activity Relationship

Introduction

Quantitative Structure Activity Relationships (QSARs) are mathematical models that are used to predict measures of toxicity from physical characteristics of the structure of chemicals (known as molecular descriptors). Acute toxicities (such as the concentration that causes half of a fish population to die) are one example of the toxicity measures that may be predicted from QSARs. Simple QSAR models calculate the toxicity of chemicals using a simple linear function of molecular descriptors:

Toxicity = ax1+bx2+c

where x1 and x2 are the independent descriptor variables and a, b, and c are fitted parameters. Examples of molecular descriptors include the molecular weight and the octanol-water partition coefficient. Additional examples are provided in our Molecular Descriptors Guide Version 1.0.2 (PDF) (47 pp, 291 KB).

Dragon (version 6.0) was used as the benchmark software for the descriptors in T.E.S.T.

Uses of QSAR Toxicity Models

  • QSAR toxicity predictions may be used to screen untested compounds in order to establish priorities for traditional bioassays, which are expensive and time consuming.
  • QSAR models are useful for estimating toxicities needed for green process design algorithms such as the Waste Reduction Algorithm.

Objectives

    • Develop quantitative structure activity relationship (QSAR) methodologies to estimate toxicity from molecular structure
    • Develop software, such as the Toxicity Estimation Software Tool (TEST), that will enable users to easily estimate toxicity from molecular structure

QSAR Methodologies

Several QSAR methodologies have been developed:

    • Hierarchical method – The toxicity for a given query compound is estimated using the weighted average of the predictions from several different models. The different models are obtained by using Ward’s method to divide the training set into a series of structurally similar clusters. A genetic algorithm-based technique is used to generate models for each cluster. The models are generated prior to runtime.
    • FDA method – The prediction for each test chemical is made using a new model that is fit to the chemicals that are most similar to the test compound. Each model is generated at runtime.
    • Single-model method – Predictions are made using a multilinear regression model that is fit to the training set (using molecular descriptors as independent variables) using a genetic algorithm-based approach. The regression model is generated prior to runtime.
    • Group contribution method – Predictions are made using a multilinear regression model that is fit to the training set (using molecular fragment counts as independent variables). The regression model is generated prior to runtime.
    • Nearest neighbor method – The predicted toxicity is estimated by taking an average of the three chemicals in the training set that are most similar to the test chemical.
    • Consensus method – The predicted toxicity is estimated by taking an average of the predicted toxicities from each of the above QSAR methodologies.
    • Random forest method – The predicted toxicity is estimated using a decision tree which bins a chemical into a certain toxicity score (i.e., positive or negative developmental toxicity) using a set of molecular descriptors as decision variables. The random forest method is currently only available for the developmental toxicity endpoint. The random forest model for the developmental toxicity endpoint was developed by researchers at Mario Negri Institute for Pharmacological Research as part of the CAESAR project. Exit EPA Disclaimer

These methodologies are explained in detail in the publications below.

Toxicity Estimation Software Tool (TEST)

TEST will enable users to easily estimate acute toxicity using the above QSAR methodologies. The software is now available for download. The software is described in further detail in the User's Guide for TEST (version 4.1) (PDF) (66 pp, 540 KB).The software is based on the Chemistry Development Kit exit EPA, an open-source Java library for computational chemistry.

The software includes models for the following endpoints:

    • 96-hour fathead minnow 50 percent lethal concentration (LC50)Exit EPA Disclaimer
    • 48-hour daphnia magna 50 percent lethal concentration (LC50)Exit EPA Disclaimer
    • Tetrahymena pyriformis 50 percent growth inhibition concentration (IGC50) Exit EPA Disclaimer
    • Oral rat 50 percent lethal dose (LD50) Exit EPA Disclaimer
    • Bioconcentration Factor (BCF) The bioconcentration factor data set was compiled by researchers at the Mario Negri Isituto Di Ricerche Farmacologiche Exit EPA Disclaimer
    • Developmental Toxicity (DevTox) Exit EPA Disclaimer
    • Ames Mutagenicity (Mutagenicity) Exit EPA Disclaimer

The software now contains models for the following physical properties:

Models for additional endpoints will be added as they are completed.

Software disclaimer

T.E.S.T. estimates the toxicity values and physical properties of organic chemicals based on the molecular structure of the organic chemical entered by the user. The United States Environmental Protection Agency (US EPA) makes no warranty, express or implied, as to the merchantability of T.E.S.T. or its fitness for a particular purpose. Furthermore, the US EPA makes no claims concerning the accuracy of the data provided by T.E.S.T. or its reliability for any purpose.

envelope Get email alerts when new versions of the TEST software are posted.

Download TEST (version 4.1)

Note: The Windows Zip manual installer should be used by users who do not have administrative rights on their machines. Extract the files to a folder on your hardrive and open "TEST.exe".

The training and prediction sets used in T.E.S.T (in sdf format)..

Sample structure data files (such as a MDL SD file).

 

What's new in Version 4.1?

    • Results are now displayed for the most similar chemicals in the training and test sets (enables users to assess confidence in the predicted value)
    • The results pages now list which fragment is missing if the fragment constraint is violated
    • Fixed bug which occured when saving results files to network drives
    • Fixed bug that occured when editing chemicals in the batch list
    • Fixed bug where single model method was not included for batch mode predictions
    • Added the ability to copy the smiles of the current structure to the clipboard
    • Added the ability to load recently analyzed structures from the File menu
    • Added the ability to load recently generated batch results files from the File menu
    • Improved the speed of loading large aromatic compounds from MDL SD files
    • Updated/added endpoints as follows:

      Endpoint # chemicals in overall set
      Version 4.0 Version 4.1
      Fathead minnow LC50 816 823
      Daphnia magna LC50 337 353
      Tetramena pyriformis IGC50 1085 1792
      Oral rat LD50 7420 7413
      Bioaccumulation factor 600 676
      Boiling point 3754 5759
      Density 8607 8909
      Flash point 8100 8362
      Thermal conductivity 352 442
      Viscosity 433 557
      Surface tension 1421 1416
      Water solubility 5079 5020
      Vapor pressure* 0 2511
      Melting point* 0 9385
      * New endpoint

Prior Version History

    • 4.0 (6/7/11)
      • Physical properties are now estimated
      • Batch mode is improved:
        • Loading can now be interrupted
        • Chemicals with loading errors are displayed at the top of the batch table
        • Can now load SMILES files with no identifier field (chemicals are assigned arbitrary IDs)
      • Aromaticity detection is improved:
        • Can handle aromatic bond orders (bond order = 4) in mol or sd files
        • The SMILES parser has been improved to better handle complicated aromatic ring systems
      • Added Options screen:
        • Added ability to change the output directory after it has been set
        • The program now remembers the previously selected output folder
        • The "Relax fragment constraint" checkbox was moved to Options screen
    • 3.3 (7/8/10)
      • Daphnia magna LC50 endpoint was added
      • AMES Mutagenicity endpoint was added
      • The following changes were made for binary endpoints such as developmental toxicity and AMES mutagenicity:
        • QSAR models now have stricter statistical standards (leave one out concordance = 0.8, sensitivity = 0.5, and specificity = 0.5)
        • Model statistics such as concordance, sensitivity, and specificity are now displayed in the results web pages
    • 3.2 (12/18/09)
      • Reproductive toxicity endpoint was added
      • Random forest QSAR method was added (for reproductive toxicity endpoint only)
    • 3.1 (6/23/09)
      • Fixed issue with running TEST in non-english speaking countries
    • 3.0 (4/14/09)
      • Random selection is used to divide the data sets into training and test sets
      • Added BCF endpoint
      • Added consensus prediction method
    • 2.0 (2/24/09)
      • Each toxicity data set is now split into a training and test set.
      • The toxicity models included in the software are now fit to the training sets (previously they were fit to the overall sets)
      • The batch mode was improved (chemicals can be added and the list can now be saved as an SDF)
    • 1.0.3 (10/24/08)
      • Fixed calculation of "ieadje" molecular descriptor
      • Fixed definitions of chi descriptors in numbered list in molecular descriptors guide

System requirements

    • Java version 1.6 or higher exit EPA
    • Memory
      • For Windows XP®, 1 GB of RAM is recommended.
      • For Windows Vista®, 2 GB of RAM is recommended.

Installation Instructions

    1. Save the appropriate installation file to your hard drive. Due to the large size of the file, the download may take 15 minutes or longer depending on the speed of the connection.
    2. Double-click the installation file (for Linux users: open a shell, cd to the directory where you downloaded the installer and at the prompt type: sh ./install.bin).

Silent Installation Instructions for Network Administrators (for Windows users)

    1. The software can be installed silently by issuing the following command at the command prompt: install -i silent

Publications

Sushko, I.; Novotarskyi1, S.; Körner, R.; Pandey, A. K.; Cherkasov, A.; Li, J.; Gramatica, P.; Hansen, K.; Schroeter, T.; Müller, K.-R.; Xi, L.; Liu, H; Yao, X.; Öberg, T.; Hormozdiari, F.; Dao, F.; Sahinalp, C.; Todeschini, R.; Polishchuk, P.; Artemenko, A.; Kuz’min, V.; Martin, T.M.; Young, D. M.; Fourches, D.; Muratov, E.; Tropsha, A.; Baskin, I.; Horvath, D.; Marcou, G.; Varnek, A; Prokopenko, V. V.; Tetko, I.V. (2010). “Applicability domains for classification problems: benchmarking of distance to models for AMES mutagenicity set.” J. Chem. Inf. Model, 50, 2094-2111.

Cassano, A.; Manganaro, A; Martin, T.; Young, D.; Piclin, N.; Pintore, M.; Bigoni, D.; Benfenati, E. (2010). “The CAESAR models for developmental toxicity.” Chemistry Central Journal, 4(Suppl 1):S4.

Zhu, H.; Martin, T.M.; Young, D. M.; Tropsha, A. (2009). “Combinatorial QSAR Modeling of Rat Acute Toxicity by Oral Exposure.“ Chemical Research in Toxicology, 22 (12), pp 1913-1921.

Benfenati, E., Benigni, R., Demarini, D.M., Helma, C., Kirkland, D., Martin, T.M., Mazzatorta, G., Ouedraogo-Arras, G., Richard, A.M., Schilter, B., Schoonen, W.G.E.J., Snyder, R.D., and C. Yang. (2009). “Predictive Models for Carcinogenicity and Mutagenicity: Frameworks, State-of-the-Art, and Perspectives.” Journal of Environmental Science and Health Part C, 27, 2: 57-90.

Young, D.M.; Martin, T.M.; Venkatapathy, R.; Harten, P. (2008) “Are the Chemical Structures in your QSAR Correct?” QSAR & Combinatorial Science, 27 (11-12), 1337-1345.

Martin,T.M., P. Harten, R. Venkatapathy, S. Das and D.M. Young. (2008). “A Hierarchical Clustering Methodology for the Estimation of Toxicity.” Toxicology Mechanisms and Methods, 18, 2: 251–266.

Martin, T.M., and D.M. Young. (2001). “Prediction of the Acute Toxicity (96-h LC50) of Organic Compounds in the Fathead Minnow (Pimephales Promelas) Using a Group Contribution Method.” Chemical Research in Toxicology, 14, 10: 1378–1385.

Contact

Todd Martin, PhD.
Research Chemical Engineer

Risk Management Research: Air & Climate Change Research | Water Research | Ecosystems Restoration Research | Land Research | Technology Research: Sustainable Technology | Environmental Technology Verification Program (ETV) | Technology Assessments

Jump to main content.