Computational Toxicology Research Program
More on SDF
Structure Data Format (SDF) files, also known as SD Files, are simple, ASCII text files that adhere to a strict format for representing multiple chemical structure records and associated data fields. The format was originally developed and published by Molecular Design Limited (MDL) and has come to serve as the most widely used public standard for exchange of structure/data information on chemicals. Virtually all Chemical Relational Database (CRD) applications used for structure-searching of chemical information are capable of importing and exporting SDF files (More on CRDs). The topic areas below provide further information on SDF files.
MDL technical documentation on SDF and other MDL file formats can be downloaded from the MDL website:
In addition, users are referred to the main literature citation for MDL file formats:
Dalby, A., J.G. Nourse, W.D. Hounshell, A.K.I. Gushurst, D.L. Grier, B.A. Leland, J. Laufer (1992) Description of several chemical structure file formats used by computer programs developed at Molecular Design Limited, J. Chem. Inf. Comput. Sci. 32:244-255.
The general format of an SDF file consists of blocks of information, with a single compound record format represented below (Dalby et al., 1992, Fig. 11, Section 5):
*c = Compound record format is repeated for the length of the SDF file.
*d = Data item format is repeated for each data item associated with a compound record.
*l = A separate line is used for each data value.
MOLfile format is the MDL format for storage of chemical structure information.
In addition to their widespread use, the many consistent formatting features of SDF files and the ease of viewing and editing these files have made them ideal for DSSTox development purposes. An SDF file is simple ASCI II text; hence, it can be viewed in any conventional word processor. A sample SDF file is shown below for 2 simple compound records (1,2-trans-dichloroethene and bromochloroacetonitrile) containing 4 data fields each.
Return to Top
Note that if a field entry is blank or null for any particular record in the CRD, the field will not be listed in that record of the SDF file.
DSSTox SDF files adhere both to MDL SDF standards and to some additional restrictions on SDF content. DSSTox data files are characterized as "clean SDF" in the sense that they have been purged of CRD application-depended information that is automatically inserted upon SDF export from CRD applications used in DSSTox file development (More on CRDs). In particular, DSSTox SDF files contain only DSSTox Standard Chemical Fields and Source-Specific Fields, such as listed in the Central DSSTox Field Definition Table, and no extraneous field or file information. In addition, to ensure the proper ordering of fields upon import of DSSTox SDF files, we insert the text entry "blank" in any field in the first SDF record that has a blank or null entry in the CRD. We list below a few specific features of SDF files that are either restricted or included for use in DSSTox SDF files.
Header note is a text string inserted upon SDF export of data from a CRD application. Since header notes are generally application-specific, this note is deleted from DSSTox SDF files and replaced with a blank line.
Data fields (text), strictly according to MDL SDF format requirements, require a hard carriage return to be inserted in any text field exceeding 200 characters in length. We do not adhere to this specification in DSSTox files since InChI and SMILES fields frequently exceed 200 characters in length for larger molecules and are considered essential chemical information fields for DSSTox data files. We have contacted MDL with a request to consider modifying this limiting standard specification for SDF format.
x,y,z coordinates can support 2D or 3D structure representations. Main DSSTox SDF files generally contain 2D structure representations that can be easily printed and visualized in 2D; such representations have the z coordinates set to zero.
Stereochemistry can be represented in a limited way by special atom and bond labels, even when 2D (x,y) coordinate representations are used.
In the course of the DSSTox project and database development, we have encountered some instances where CRD applications were not totally compliant with SDF standards upon SDF file export. These problems are typically specific to a particular CRD version and to the export of particular types of chemical information. Since we have not exhaustively evaluated all currently available CRD applications, and some problems have been corrected in subsequent CRD version releases, we do not list these specific problems here. Difficulties are not generally encountered with the import of "clean SDF" files into these applications, but more frequently upon export-to-SDF from these applications. Problems include: field lengths truncated, bond types represented in non-standard ways, and application-specific fields automatically added to the SDF. When encountered, we have reported problems to the product vendors; in addition, we have developed procedures to compensate for such problems. See also Known Problems & Fixes.
Strictly speaking, the MDL SDF format specifications, published in 1992 (Dalby et al., J.Chem.Inf.Sci., 1992, 32:244-255), require a hard carriage return to be inserted in any text field exceeding 200 characters in length. We deliberately violate this specification in DSSTox files since InChI and SMILES fields frequently exceed 200 characters in length for larger molecules and are considered essential chemical information fields. We have contacted MDL with a request to consider modifying this limiting standard specification.
Many features of SDF files, such as the strict ASCII text and content formatting, and labeling of fields and records, make these files relatively easy to modify and manipulate with automated procedures. See Tools & Scripts for a listing of downloadable program scripts, mainly open-source code developed by us and others, that can be applied to editing and modifying SDF files.