Computational Toxicology Research Program
More on InChI
IUPAC International Chemical Identifier (InChI)TM ** Updated February 2008
InChI is a unique character string representation of a chemical structure, generated by a publicly available open-source application, that is capable of many levels of precision with regard to chemical description (e.g. charge states, tautomeric form, chirality, etc) – see additional text and links below. InChI is currently being incorporated into a variety of public and commercial chemistry databases and applications, including NLM's PubChem, NCI's Structure-Browser data collection, and NIST's Chemistry WebBook. Additional interfaces between these and other systems are under development. Unlike CAS Registry Numbers, SMILES, and common or systematic chemical names, an InChI is a unique and invariant representation of a chemical structure and the generation code is in the public domain.
For answers to a wide range of user questions, see "Unofficial" InChI FAQ: What is an InChI?
** Note: In November 2004, the INChI identifier was renamed InChI (IUPAC International Chemical Identifier) to allow trademark patenting.
The below description was provided by InChI (IUPAC-NIST) developers: S Stein, S Heller, D Tchekhovskoi, A McNaught:
IUPAC, the International Union of Pure and Applied Chemistry (http://www.iupac.org/index_to.html ), has long been involved in the development of systematic and standard procedures for naming chemical substances on the basis of their structure. The resulting rules of nomenclature, while covering almost all compounds, were designed for text-based media. IUPAC is supporting development of a means for representing chemical substances in a format more suitable for digital processing, involving the computer processing of chemical structural information (connection tables). This is being implemented in the IUPAC-NIST (National Institute of Standards and Technology) Chemical Identifier. Details of the IUPAC project can be found at:http://www.iupac.org/inchi/ . The project aim is to create a method for generating a freely available, non-proprietary identifier for chemical substances that can be used in printed and electronic data sources, thus enabling easier linking of diverse data compilations and unambiguous identification of chemical substances.
InChI is not a registry system. It does not depend on the existence of a database of unique substance records to establish the next available sequence number for any new chemical substance being assigned an InChI. Instead, it is simply the transformation of the chemical structure itself to a string of characters by algorithms. The conversion of structural information (in the form of a ‘connection table') to the Identifier is based on a set of IUPAC structure conventions, and rules for normalization and canonicalization (conversion to a single, predictable sequence) of an input structure representation to establish the unique label. This label is simply a series of characters that serve to uniquely identify the compound from whose structure it was derived. It will thus enable an automatic conversion of a graphical representation of a chemical substance into the unique InChI label, which can be created independently of any organization anywhere in the world and which could be built into any chemical structure drawing program, transferred between any two entities and created from any existing collection of chemical structures.
InChIKey was introduced in Sept 2007 as part of the InChI 1.02beta Software release . It is a fixed-length (25-character) condensed digital representation of the InChI Identifier that can be used to facilitate structure look-up; the full InChI is required for structure-regeneration.
"InChIKey" is described by the InChI developers as follows:
"A fixed-length (25-character) condensed digital representation of the Identifier to be known as InChIKey. In particular, this will:
facilitate web searching, previously complicated by unpredictable breaking of InChI character strings by search engines
allow development of a web-based InChI lookup service permit an InChI representation to be stored in fixed length fields
make chemical structure database indexing easier
allow verification of InChI strings after network transmission. "
Example of InChI with its InChIKey equivalent:
First block (14 letters), encodes molecular skeleton (connectivity): RYYVLZVUVIJVGH
Second block (8 letters), encodes proton positions (tautomers), stereochemistry, isotopes, reconnected layer: UHFFFAOY
Flag character, indicates InChI version, presence/absence of fixed H layer, isotopes, and stereochemistry: A
Check character: W
The DSSTox project incorporates InChI codes into all DSSTox data files to promote the use of InChIs and to provide an alternate text-based representation of chemical structure that can serve multiple uses in database construction, maintenance, data integration, and chemoinformatics.
InChI codes are typically generated with a text string of additional information (AuxInfo) that can be used to regenerate the full molfile structure from the original SDF. Due to its redundancy with the SDF molfile information and character length (frequently exceeding 255 characters), AuxInfo strings are not included in DSSTox data files. This additional information can easily be generated by processing the DSSTox SDF file with the publicly available InChI software (see Freely Available InChI Software below). InChI codes also frequently exceed 200 characters in length. Strictly speaking, the MDL SDF format specifications, published in 1992 (Dalby et al., J.Chem.Inf.Sci., 1992, 32:244-255), require a hard carriage return be inserted in any text field exceeding 200 characters in length. DSSTox SDF files do not adhere to this specification since the DSSTox Standard Chemical Fields, STRUCTURE_InChI and STRUCTURE_SMILES, which also can exceed 200 characters in length for larger molecules, are considered essential chemical information fields.
** Update (Feb 2008): InChI Keys have been incorporated as a DSSTox Standard Chemical Field STRUCTURE_InChIKey in all DSSTox files. Both the InChI and InChIKey code generation require a user to select standard or non-standard options, with different options producing different codes. All DSSTox InChI and InChIKey generation use the NIST recommended standard options as follows:
/FixedH /RecMet /SPXYZ /SAsXYZ /Newps /Fb /Fnud
/FixedH - Turn off Mobile H perception
/RecMet - Include bonds to metal
/SPXYZ - Include Phosphines Stereochemistry
/SAsXYZ - Include Arsines Stereochemistry
/Newps - Narrow end of wedge points to stereocenter
/Fb - Fix bug leading to missing or undefined sp3 parity
/Fnud - Fix non-uniform drawing issues
IUPAC/InChI Project Description page with links and periodic information and release information:
"Unofficial" InChI FAQ page (developed by Nick Day of the Murray-Rust Research Group, Cambridge University), acknowledged and endorsed by InChI developers, provides a veritable mother-lode of information:
NIST documentation, examples, and executables to create InChI:
"Googling for InChIs; A remarkable method for chemical searching", by P. Murray-Rust and coworkers, Oct 2004;
online pdf publication
SourceForge.net InChI Facilities and Applications listings:
The first public release v1.0 InChI open source code was made available in March 2005. InChI v1.01 was released in August 2006. All current DSSTox files have been updated to include v1.0 InChIs. The third-party list below is by no means exhaustive and many new uses and applications for InChI are coming on-line.
From the main InChI page, http://www.iupac.org/inchi/, after registering for a standard open source license at http://www.iupac.org/inchi/license.html, users are presented with options to download the following versions of the InChI software and documentation at http://www.iupac.org/inchi/download/index.html:
InChITM version 1 (software version 1.01) documentation, and Windows and Linux (i386) executable programs
[InChI-1.zip - 6.81MB]
InChITM version1 source code and Application Program Interface (API)
[InChI-1-API.zip - 3.82MB]
InChITM validation protocoll
[InChI_TechMan.pdf - 3.55MB]
What's new in InChI software version 1.01
[Whats New.pdf - 28KB]
Third party software (by Bedrich.Kosata) capable of converting an InChI back to a chemical structure:
Third party software (ACD/ChemSketch+ACD/ChemBasic, v9.0) integration with InChI (v1.0):
The following websites provide the facility to generate InChIs:
Services provide methods to manipulate InChI Strings and InChIKeys, including conversion to and from the MOLfile format, checking validity of the InChI identifiers, searching ChemSpider using InChI inputs etc.
ACD/Labs' freely available structure-drawing program ChemSketch includes the facility to generate InChIs from drawn structures.
PubChem Server Side Structure Editor v1.8 includes a facility for generating InChIs as you draw the structure.
Copyright © The International Union of Pure and Applied Chemistry 2005: IUPAC International Chemical Identifier (InChI) (contact: email@example.com)