Jump to main content or area navigation.

Contact Us

Computational Toxicology Research Program

Known Problems/Fixes

In the course of the DSSTox project, we have encountered problems and issues with the use of various public and commercial applications used in DSSTox file creation. In most cases, we have found solutions, work-arounds, or fixes. We list our experiences here to assist the user community and to encourage further reporting of problems and fixes.

blue bullet graphic SDF Export & Import Problems

SDF field length truncation upon application export
SDF field length truncation upon application import
Application-specific header lines inserted upon export to SDF
Application-specific fields inserted upon export to SDF
SDF import and missing structure display

blue bullet graphic InChI Codes in DSSTox Files

InChI distinction of chirality in 2D structures
InChI codes exceeding 200 and 255 characters
InChI codes with special characters

blue bullet graphic MS Excel & CASRN Problems

Conversion of CASRN to dates on SDF import

blue bullet graphic SMILES

Non-unique and non-standard format

 

SDF Export & Import Problems

Several applications have been used in the course of DSSTox database development to construct SDF files or modify these to final form. MDL SDF file specifications (see CTFiles documentation) indicate a field length limit of 200 characters. This field length specification is a problem given the larger length of common chemical "identifiers", such as IUPAC names, SMILES, and InChI codes, which can frequently exceed 200 characters in length.

The following are a few common problems encountered upon export to SDF:

Problem: SDF field length truncation upon application export

ACD's ChemFolder exit EPA application imports SDF field lengths greater than 200 characters without truncation. However when the file is subsequently exported to SDF from within the application, it inserts a "HardCarriageReturn" [HCRt] at the 201'st character. This is ACD's specific implementation of the MDL SDF standard field length.

Fix: Post-processing of ACD SDF to elimate [HCRt]
ACD provided us with a ChemBasic script for use in ACD ChemFolder that will post-process the ACD SDF and eliminate any HCRt, saving to a new SDF file.

Problem: SDF field length truncation upon application import

CambridgeSoft's ChemDraw for Excel exit EPA application exports to SDF field lengths greater than 200 characters without truncation. However, when the same SDF is reimported into this application, the field is truncated at the 252'rd character. This is a function of the Excel import.

Return to the list above. Return to Top

Problem: Application-specific header lines inserted upon export to SDF

All commercial CRD applications used in DSSTox file construction insert an application-specific header line upon export to SDF. Upon import of the SDF into a different commercial CRD application this original header information is discarded and a new application-specific header will be inserted upon SDF export.

Fix: Eliminate header line in DSSTox SDF
Since DSSTox SDF files are intended to be generic files, independent of, and not endorsing any particular commercial CRD application, we eliminate the SDF header line in DSSTox SDF files by post processing using a custom Python script.

Problem: Application-specific fields inserted upon export to SDF

CRD applications used in DSSTox file construction carry application-specific fields (e.g., Formula, Mol_Comment, etc) upon export to SDF that cannot be eliminated by the user from within the application.

Fix: Eliminate extra fields in DSSTox SDF
For final DSSTox SDF files, we eliminate extra fields inserted by CRD applications by post processing using a custom Python script.

Problem: SDF import and missing structure display

Hyleos {Freeware} ChemFileBrowser exit EPA is a PC-based Windows desktop application that can be freely downloaded and used to import, view, edit, print and export SDF files. However, we have found that the smallest molecules, such as "bromomethane", with only 2 heavy atoms, are not displayed in this application.

Return to the list above. Return to Top

InChI Codes in DSSTox Files

InChI codes are generated directly from the structure in DSSTox SDF files using the public InChI generation program (see More on InChI).

Problem: InChI distinction of chirality in 2D structures

DSSTox SDF files store a 2D representation of the molecular structure with solid and hashed bonds used to denote 3D stereochemistry. These distinctions are usually interpreted correctly upon InChI conversion. However, without absolute assignment of wedged bonds (in or out of plane), the absolute chirality is not assigned and the InChI's are identical.

Fix: Select correct InChI Structure Option
Users of the public "wInChI" generation software must make sure that the "InChI Options" --> Structure Options" has "Include Stereochemistry" and "Absolute" selections checked to generate InChI codes that distinguish chirality differences in 2D structure representations.

Problem: InChI codes exceeding 200 or 255 characters

For larger molecules, InChI's can exceed 200 or 255 characters, which can cause problems with either SDF import or export. This is known to occur particularly using applications based on MS Access (e.g. CambridgeSoft ChemFinder exit EPA).

Fix: Define Access "memo" field for STRUCTURE_InChI import
MS Access has a default field limit of 255 characters which can be changed if field is defined as "memo" field prior to SDF import. Alternatively, the InChI field can be deleted from the SDF using an SDF processing script or import options.

Problem: InChI codes with special characters

InChI codes include numeric, text, and punctuation characters that may not be recognized, or may be misinterpreted in some applications. If a HardCarriageReturn has been inserted upon SDF import, this will need to be deleted for the InChI to be recognized as such.

Fix: Delete InChI field
Prior to import, or within the application, users could delete InChI field using an SDF processing script.

Return to the list above. Return to Top

MS Excel & CASRN Problems

Some CRD applications are MS Excel based (Accelrys Accord for Excelexit EPA, CambrigeSoft ChemDraw for Excel exit EPA) and experience problems from default import settings.

Problem: Conversion of CASRN to dates on SDF import

When SDF files are imported by some CRD applications for viewing and editing in Excel, Excel may interpret and reformat CASRN as dates if they satisfy the date criteria. These errors are typically infrequent and therefore not readily apparent.

Fix: Post-processing of CASRN field contents
If a column is preformatted as "text" within Excel, the CASRN will not be converted to dates. The original CASRN column, stored prior to SDF import, can be pasted correctly in this way.

Return to the list above. Return to Top

 

SMILES

Various implementations of SMILES codes are generated by different CRD applications and multiple non-standard SMILES codes can be generated for a particular molecule (see More on SMILES).

Problem: Non-unique and non-standard format

In our experience, valid SMILES generated in one CRD application are not always correctly interpreted by or translated to STRUCTURE by another CRD application.

Fix: Regenerate problematic SMILES
If a SMILES is not translated correctly by one CRD application, or produces an error, we regenerate the SMILES until it is correctly translated by multiple CRD applications.

Jump to main content.