Computational Toxicology Research Program
More on CRDs
The term "Chemical Relational Database", or CRD, refers to public or commercial applications that are primarily designed for storage, structure-searching, data mining, and retrieval of chemical information. These are a component of the broader category of SDF Viewers, Structure Browser & CRD Applications in providing full structure/data relational searching capability from an on-line or corporate server or a user's PC desktop. Such applications range from stand-alone, PC (generally Windows)-based systems, that are relatively low-cost and designed for use by individuals, to high-end server-based CRDs designed to provide a platform for shared database access and use throughout an agency or corporation, to on-line public CRD services offering free and flexible searching and data retrieval through publicly available chemical database files. The topic areas below provide further information on CRDs. For lists and description of public and commercial resources, see also SDF Viewers, Structure Browser & CRD Applications.
What is a Chemical Relational Database?
General features of available CRDs
Sample views of two CRD record displays
Examples of structure-search functions
Searching across multiple databases
Search value-range of property field
Search multiple fields simultaneously
NOTE: Two particular CRD applications were used extensively in early DSSTox data file development: ACD Labs/ChemFolder and CambridgeSoft's ChemOffice:ChemFinder . As a result, the examples below, on earlier published versions of DSSTox SDF files, illustrate general features of these two applications, exclusively. This does not constitute endorsement of these products by us or EPA (Disclaimers). Other CRD applications, both commercial and public, are available to users that will have similar features to these applications, and that may have enhanced or different features (See SDF Viewers, Structure Browser & CRD Applications).
A relational database is comprised of a number of records, each record corresponding to an informational unit and consisting of a list of text and numerical data fields associated with that informational unit that are standardized and searchable. An analogy might be an address card file, with each card (or record) associated with a particular person's name, and with each line entry on the card (i.e., data fields) associated with pieces of information belonging to that name, such as company, address, phone number, age, birthday, etc. In this case, the name is the main informational unit for the record, and the data fields are attributes of the name.
A Chemical Relational Database is a special type of relational database whose main informational unit is a chemical structure and whose fields are attributes or data associated with that chemical structure. Hence, a typical CRD consists of many chemical records and each record generally corresponds to a single chemical structure and its associated data fields. CRDs are enhanced in two ways compared to more general relational database applications:
addition of a structure field that can store a 2D or 3D graphical representation of a chemical structure; and
chemical intelligence embedded into the record-search functions to support searching by chemical structure, reactions, substructures, and generalized atom and bond features that can be included in a structure-query.
NOTE: Some CRDs can also support chemical reactions in the structure field, where multiple structures and their relation to one another are represented and are searchable.
Most or all CRD applications provide full or substructure search features, some generalized structure search features such as by similarity, text and data field search functions, use of wild-card characters in searches, and the ability to save the results of a search to a new file. CRD applications vary widely, however, in the particulars of these features, in the ability to search across multiple separate files, in the display of search results, and in the ability to perform advanced multiple-field constraint searches, or Boolean searches .
Since all CRDs have structure-search capabilities, they generally provide or are linked to a structure drawing application for creating the structure search query. For example, ACD/ChemFolder uses the companion ACD Labs/ChemSketch application, whereas CambridgeSoft's ChemFinder users the companion ChemOffice/ChemDraw application. Structure drawing programs are bundled in with the main CRD application purchase. In addition, some structure drawing applications are freely available off the web. ACD offers a free, downloadable version of the ACD Labs/ChemSketch application from its website and MDL offers free, downloadable versions of ISIS/Draw . See also SDF Viewers, Structure Browser& CRD Applications..
Within a corporate environment, users may be able to access high-end server-based CRD applications, such as MDL/ISIS or Accelrys/Accord for Oracle . These applications will import clean flat SDF files, such as provided on this website (More on SDF), but are also capable of creating heirarchical data files with many nested tables or levels of field-data storage. Users, alternatively, could choose a PC-based application to suit their particular needs and budget.
At the time of this writing, most CRDs available for individual user licenses are for Windows desktop PCs. We know of no commercial CRD applications available for the Macintosh, although there may be public CRD solutions available. [If a user is aware of a Mac-based CRD, please Contact Us with details]. A few publicly available structure file viewers, with limited or no structure/text/data/structure relational search capabilities are available for free download and use on desktop PCs. In addition, public on-line structure browsers can share some or most capabilities of CRDs, including structure/text/data relational searching across public databases and data search-result download capabilities. See SDF Viewers, Structure Browser & CRD Applications.
Shown below at left is a portion of an SDF file (More on SDF) corresponding to a single chemical record for benzyl cyanide, which when imported separately into two commercially available CRD applications produces the two CRD views shown at right (upper display produced by ACD/ChemFolder, ver. 5.7; lower display produced by CambridgeSoft's ChemFinder, ver. 7.0). The information is the same, but the formatted display of the data is CRD application-dependent. Notice, in particular, that both CRDs automatically interpret the SDF connection table structure as the same 2D graphical structure representation.
Structure-search functions of CRDs generally are of four types:
full structure search
generalized structure search
A full structure search uses the entire molecular structure as the search query and is successful if a record containing that exact structure is found in the database.
A substructure search looks for any molecule containing the substructure query; hence, multiple records with different overall structures sharing the common substructure would satisfy this query.
A generalized structure search uses generalized atom and bond symbols to broaden a search query; for example, a search query could specify a structure containing any non-hydrogen atom attached to the ortho position of the benzyl cyanide structure shown in the sample views above.
Finally, some CRD applications offer one or more similarity search algorithms based on adjustable criteria that consider overall descriptor similarity of two structures. This search option is generally the broadest of the 4, is "fuzzy" by nature, and is intended to identify compounds with similar overall features to the search query structure.
Examples of options for generalized atom and bond specifications that can be used to construct structure-search queries in the ACD Labs/ChemSketch/ChemFolder CRD are shown below.
The power and generality of structure-searching through chemical records can be illustrated simply with an example contrasting a search-by-chemical-name with a search-by-generalized-structure-fragment using the ACD Labs/ChemFolder application.
The top-right dialog boxes illustrates first the results of a search of a particular text term, "nitrile", in the chemical name field. The search result produced 4 "hits" out of 1354 total records in the CPDBRM file (NOTE: current version is CPDBAS) and 12 hits out of 617 total records in the EPAFHM file, with one hit shown in the record view of acetonitrile at left. The bottom-right dialog box is the result of a generalized substructure search of the chemical entity corresponding to the term "nitrile" through the same databases. The results in this case produced 14 hits in the CPDBRM file and 22 hits in the EPAFHM file, i.e. many more chemicals were found to contain the nitrile entity, but not the term nitrile in the chemical name. [Note: seaching by the synonym "cyano" produced only 5 hits in EPAFHM and none in the remaining databases.]
The above example also illustrates a case of a CRD able to perform searches across multiple distinct databases, displaying the search result summary within each database separately. Other CRDs, including Accelrys/Accord for Excel and CambridgeSoft's ChemOffice:ChemFinder , allow searches to be performed only within a single merged database file. If searches are desired across multiple databases, these CRDs require a user to merge the multiple databases into a single file prior to performing the search.
Another valuable feature of many CRDs is the ability to search for all records satisfying a value range specified for a property field. The example shown below illustrates a search for particular range of values in the property field, "LogP". In the sample data file considered, 15 search record results satisfy the condition 2.5 < LogP < 3.5.
Many CRDs allow Boolean searches , i.e. simultaneous searches across multiple fields, using "and", "or", "not" conditions. For CRDs that allow only single field searches, multiple search criteria can be simulated by sequential searching, i.e., subjecting the records resulting from the first search to a second search criteria. An example of simultaneous searches across multiple fields using the ChemOffice/ChemFinder CRD is shown below. In this example, simultaneous search queries are entered in the following fields: Structure (phenol), Activity Category (not *inactive), Formula (O2), ChemClass ERB (DES). Out of a total of 232 records in NCTRER, 11 records are retrieved that satisfy all 4 search criteria simultaneously. Note the use of "not" and a wildcard character "*" in the Activity Category field, and the color highlighting of the query substructure in the found structure.