Jump to main content or area navigation.

Contact Us

Computational Toxicology Research Program

More on InChI

IUPAC International Chemical Identifier (InChI)TM ** Updated February 2008

blue bullet graphic InChI Overview

blue bullet graphic InChI Key

blue bullet graphic InChI Codes in DSSTox Files

blue bullet graphic InChI NIST Contacts

blue bullet graphic Documentation and Information Links

blue bullet graphic Freely Available InChI Software

blue bullet graphic Websites that Generate InChIs

 

InChITM

InChI is a unique character string representation of a chemical structure, generated by a publicly available open-source application, that is capable of many levels of precision with regard to chemical description (e.g. charge states, tautomeric form, chirality, etc) – see additional text and links below. InChI is currently being incorporated into a variety of public and commercial chemistry databases and applications, including NLM's PubChem, NCI's Structure-Browser data collection, and NIST's Chemistry WebBookexit EPA. Additional interfaces between these and other systems are under development. Unlike CAS Registry Numbers, SMILES, and common or systematic chemical names, an InChI is a unique and invariant representation of a chemical structure and the generation code is in the public domain.

For answers to a wide range of user questions, see "Unofficial" InChI FAQ: What is an InChI? exit EPA

** Note: In November 2004, the INChI identifier was renamed InChI (IUPAC International Chemical Identifier) to allow trademark patenting.

The below description was provided by InChI (IUPAC-NIST) developers: S Stein, S Heller, D Tchekhovskoi, A McNaught:
IUPAC, the International Union of Pure and Applied Chemistry (http://www.iupac.org/index_to.html exit EPA), has long been involved in the development of systematic and standard procedures for naming chemical substances on the basis of their structure. The resulting rules of nomenclature, while covering almost all compounds, were designed for text-based media. IUPAC is supporting development of a means for representing chemical substances in a format more suitable for digital processing, involving the computer processing of chemical structural information (connection tables). This is being implemented in the IUPAC-NIST (National Institute of Standards and Technology) Chemical Identifier. Details of the IUPAC project can be found at:http://www.iupac.org/inchi/ exit EPA. The project aim is to create a method for generating a freely available, non-proprietary identifier for chemical substances that can be used in printed and electronic data sources, thus enabling easier linking of diverse data compilations and unambiguous identification of chemical substances.

InChI is not a registry system. It does not depend on the existence of a database of unique substance records to establish the next available sequence number for any new chemical substance being assigned an InChI. Instead, it is simply the transformation of the chemical structure itself to a string of characters by algorithms. The conversion of structural information (in the form of a ‘connection table') to the Identifier is based on a set of IUPAC structure conventions, and rules for normalization and canonicalization (conversion to a single, predictable sequence) of an input structure representation to establish the unique label. This label is simply a series of characters that serve to uniquely identify the compound from whose structure it was derived. It will thus enable an automatic conversion of a graphical representation of a chemical substance into the unique InChI label, which can be created independently of any organization anywhere in the world and which could be built into any chemical structure drawing program, transferred between any two entities and created from any existing collection of chemical structures.

InChIKeyTM

InChIKey was introduced in Sept 2007 as part of the InChI 1.02beta Software release exit EPA. It is a fixed-length (25-character) condensed digital representation of the InChI Identifier that can be used to facilitate structure look-up; the full InChI is required for structure-regeneration.

"InChIKey" is described by the InChI developers as follows:

"A fixed-length (25-character) condensed digital representation of the Identifier to be known as InChIKey. In particular, this will:

blue bullet graphic facilitate web searching, previously complicated by unpredictable breaking of InChI character strings by search engines
blue bullet graphic allow development of a web-based InChI lookup service permit an InChI representation to be stored in fixed length fields
blue bullet graphic make chemical structure database indexing easier
blue bullet graphic allow verification of InChI strings after network transmission. "

Example of InChI with its InChIKey equivalent:

Caffeine:
InChI=1/C8H10N4O2/c1-10-4-9-6-5(10)7(13)12(3)8(14)11(6)2/h4H,1-3H3
InChIKey=RYYVLZVUVIJVGH-UHFFFAOYAW
First block (14 letters), encodes molecular skeleton (connectivity): RYYVLZVUVIJVGH
Second block (8 letters), encodes proton positions (tautomers), stereochemistry, isotopes, reconnected layer: UHFFFAOY
Flag character, indicates InChI version, presence/absence of fixed H layer, isotopes, and stereochemistry: A
Check character: W

Back to top list. Return to Top

InChITM Codes in DSSTox Files:

The DSSTox project incorporates InChI codes into all DSSTox data files to promote the use of InChIs and to provide an alternate text-based representation of chemical structure that can serve multiple uses in database construction, maintenance, data integration, and chemoinformatics.

InChI codes are typically generated with a text string of additional information (AuxInfo) that can be used to regenerate the full molfile structure from the original SDF. Due to its redundancy with the SDF molfile information and character length (frequently exceeding 255 characters), AuxInfo strings are not included in DSSTox data files. This additional information can easily be generated by processing the DSSTox SDF file with the publicly available InChI software (see Freely Available InChI Software below). InChI codes also frequently exceed 200 characters in length. Strictly speaking, the MDL SDF format specifications, published in 1992 (Dalby et al., J.Chem.Inf.Sci., 1992, 32:244-255), require a hard carriage return be inserted in any text field exceeding 200 characters in length. DSSTox SDF files do not adhere to this specification since the DSSTox Standard Chemical Fields, STRUCTURE_InChI and STRUCTURE_SMILES, which also can exceed 200 characters in length for larger molecules, are considered essential chemical information fields.

** Update (Feb 2008): InChI Keys have been incorporated as a DSSTox Standard Chemical Field STRUCTURE_InChIKey in all DSSTox files. Both the InChI and InChIKey code generation require a user to select standard or non-standard options, with different options producing different codes. All DSSTox InChI and InChIKey generation use the NIST recommended standard options exit EPAas follows:

/FixedH /RecMet /SPXYZ /SAsXYZ /Newps /Fb /Fnud

/FixedH - Turn off Mobile H perception
/RecMet - Include bonds to metal
/SPXYZ - Include Phosphines Stereochemistry
/SAsXYZ - Include Arsines Stereochemistry
/Newps - Narrow end of wedge points to stereocenter
/Fb - Fix bug leading to missing or undefined sp3 parity
/Fnud - Fix non-uniform drawing issues

Back to top list. Return to Top

InChITM NIST Contacts: Stephen Stein, email: steve.stein@nist.gov; Stephen Heller, email: srtheller@nist.gov


InChITM Documentation and Information Links exit EPA:

blue bullet graphic IUPAC/InChI Project Description page with links and periodic information and release information:
http://www.iupac.org/inchi/

blue bullet graphic "Unofficial" InChI FAQ page (developed by Nick Day of the Murray-Rust Research Group, Cambridge University), acknowledged and endorsed by InChI developers, provides a veritable mother-lode of information:
http://wwmm.ch.cam.ac.uk/inchifaq/

blue bullet graphic NIST documentation, examples, and executables to create InChI:
http://www.hellers.com/steve/pub-talks/columbus-702/frame.htm

blue bullet graphic "Googling for InChIs; A remarkable method for chemical searching", by P. Murray-Rust and coworkers, Oct 2004;
online pdf publication

blue bullet graphic SourceForge.net InChI Facilities and Applications listings:
http://sourceforge.net/projects/inchi/

Back to top list. Return to Top


Freely available InChITM software exit EPA:

The first public release v1.0 InChI open source code was made available in March 2005. InChI v1.01 was released in August 2006. All current DSSTox files have been updated to include v1.0 InChIs. The third-party list below is by no means exhaustive and many new uses and applications for InChI are coming on-line.

blue bullet graphic From the main InChI page, http://www.iupac.org/inchi/, after registering for a standard open source license at http://www.iupac.org/inchi/license.html, users are presented with options to download the following versions of the InChI software and documentation at http://www.iupac.org/inchi/download/index.html:

InChITM version 1 (software version 1.01) documentation, and Windows and Linux (i386) executable programs
[InChI-1.zip - 6.81MB]

InChITM version1 source code and Application Program Interface (API)
[InChI-1-API.zip - 3.82MB]

InChITM validation protocoll
[InChI_TechMan.pdf - 3.55MB]

What's new in InChI software version 1.01
[Whats New.pdf - 28KB]

blue bullet graphic Third party software (by Bedrich.Kosata) capable of converting an InChI back to a chemical structure:

http://bkchem.zirael.org/inchi_en.html
http://www.zirael.org/bkchem/download_en.html
current release version: bkchem-0.11.6

blue bullet graphic Third party software (ACD/ChemSketch+ACD/ChemBasic, v9.0) integration with InChI (v1.0):
http://www.acdlabs.com/download/technotes/80/draw_db/inchi.pdf

Back to top list. Return to Top


Websites that Generate InChIsTM exit EPA:

The following websites provide the facility to generate InChIs:

blue bullet graphic www.chemspider.com/inchi.asmx
Services provide methods to manipulate InChI Strings and InChIKeys, including conversion to and from the MOLfile format, checking validity of the InChI identifiers, searching ChemSpider using InChI inputs etc.

blue bullet graphic www.acdlabs.com/download/chemsk.html
ACD/Labs' freely available structure-drawing program ChemSketch includes the facility to generate InChIs from drawn structures.

blue bullet graphicpubchem.ncbi.nlm.nih.gov/edit/
PubChem Server Side Structure Editor v1.8 includes a facility for generating InChIs as you draw the structure.

Copyright © The International Union of Pure and Applied Chemistry 2005: IUPAC International Chemical Identifier (InChI) (contact: secretariat@iupac.org)

Back to top list. Return to Top

Jump to main content.