Simplified molecular input line entry specification
The
simplified molecular input line entry specification or
SMILES is a specification for unambiguously describing the structure of
chemical molecules using short
ASCII strings. SMILES strings can be imported by most
molecule editors for conversion back into
two-dimensional drawings or
three-dimensional models of the molecules.
The original SMILES specification was developed by
Arthur Weininger and
David Weininger in the late
1980s. It has since been modified and extended by others, most notably by
Daylight Chemical Information Systems Inc. Other 'linear' notations include the
Wiswesser Line Notation (WLN),
ROSDAL and
SLN (Tripos Inc). Recently, the
IUPAC has introduced the
InChI as a standard for formula representation. SMILES is generally considered to have the advantage of being slightly more human-readable than InChI; it also has a wide base of software support with extensive theoretical (eg,
graph theory) backing.
The term
Canonical SMILES refers to the version of the SMILES specification that includes rules for ensuring that each distinct chemical molecule has a single unique SMILES representation. A common application of Canonical SMILES is for indexing and ensuring uniqueness of molecules in a
database.
The term
Isomeric SMILES refers to the version of the SMILES specification that includes extensions to support the specification of
isotopes,
chirality, and configuration about double bonds. A notable feature of these rules is that they allow rigorous partial specification of chirality.
In terms of a graph-based computational procedure, SMILES is a string obtained by printing the symbol nodes encountered in a
depth-first tree traversal of a chemical graph. The chemical graph is first trimmed to remove hydrogen atoms and cycles are broken to turn it into a
spanning tree. Where cycles have been broken, numeric suffix labels are included to indicate the connected nodes. Parentheses are used to indicate points of branching on the tree.
Atoms are represented by the standard abbreviation of the
chemical elements, in square brackets, such as [Au] for
gold. The
hydroxide anion is [OH-]. Brackets can be omitted for the "organic subset" of B, C, N, O, P, S, F, Cl, Br, and I. All other elements must be enclosed in brackets. If the brackets are omitted, the proper number of implicit hydrogen atoms is assumed; for instance the SMILES for
water is simply O and that for
ethanol is CCO.
The
double-bonded carbon dioxide is represented as O=C=O and the triple-bonded
hydrogen cyanide as C#N.
Branches are described with parentheses, as in CCC(=O)O for
propionic acid and C(F)(F)F for
fluoroform, which could also be described by the non-canonical formula FC(F)F.
Cyclohexane is represented as C1CCCCC1, the idea being that the two 'number ones' label the same position in the molecule, thus forming a ring with six carbons. Note that the label is the numeral (in this case the 1) rather than the combination of 'C1'.
Aromatic C, O, S and N atoms are shown in their lower case 'c', 'o', 's' and 'n' respectively. Bonds in an aromatic cycle are rarely marked explicitly except in SMARTS search patterns. Thus
Benzene is c1ccccc1.
Isomeric SMILES
|
Representation of cis-difluoroethene |
Configuration around double bonds is specified using the characters "/" and "\". For example, F/C=C/F is one representation of
trans-
difluoroethene, in which the Fs are on opposite sides of the double bond,whereas F/C=C\F is one possible representation of
cis-difluoroethene, in which the Fs are on the same side of the double bond, as shown in the figure.
SMARTS is a modification of SMILES that allows, in addition to the SMILES elements, the specification of
wildcard atoms and bonds. This is used in specifying search structures and is widely used in
chemical database search applications. This practise has led to a common misconception that chemical substructure search is achieved computationally by matching SMILES/SMARTS strings, when, in fact, it is achieved by the computationally more intensive search for
subgraph isomorphism in the graphs reconstructed from the SMILES representations.
SMILES can be converted back to 2-dimensional representations using Structure Diagram Generation algorithms (Helson, 1999). This conversion is not always unambiguous. Conversion to 3-dimensional representation is achieved by energy minimization approaches.
*
Chemistry Development Kit (2D layout and conversion)
*
International Chemical Identifier (InChI), the free and open alternative to SMILES by the
IUPAC.
*
OpenBabel,
JOELib,
OELib (conversion)
* Helson, Harold E. (1999) Structure Diagram Generation: in
Reviews in Computational Chemistry 13, 313â€"98, Eds. Lipkowitz, K.B, Boyd, D.B., Wiley-VCH Press.
*
"SMILES - A Simplified Chemical Language"*
"SMARTS - SMILES Extention"*
Daylight SMILES tutorial* Web-based applications capable of converting SMILES strings to 2D structure images
**
Daylight Depict**
CACTVS at NCI GIF/
PNG converter with more controls
**
PubChem online molecule editor that supports SMILES/SMARTS,
InChI and all common chemical file formats
*
JME molecule editor applet that can create SMILES
*
Parsing SMILES*
ACD/ChemSketch freeware
*
Jmol molecule viewer for SMILES
*
ChemAxon SMILES aware Java based molecule editor and 2D/3D viewer (Marvin), database and complete cheminformatics toolkit (JChem) with API, free for teaching, academic research and for free public access web sites
*
Smormo-Ed Molecule editor for Linux which can read and write SMILES
*
E-BABEL Interactive conversion of molecules on the web using
OpenBabel*
InChI.info - an unofficial InChI website featuring on-line converter from InChI and SMILES to molecular drawings