2.4. Substructure and superstructure search

        When a chemical structure occurs as a part of a bigger chemical structure, the former is called a substructure and the latter is referred to as a superstructure.  For example, ethanol is a substructure of acetic acid, and acetic acid is a superstructure of ethanol.

In substructure search, one provides an input substructure as a query to find molecules that contain the query substructure (that is, superstructures that contain the query substructure).  On the contrary, superstructure search returns molecules that comprise or make up the provided chemical structure query (that is, substructures that is contained in the query superstructure).  It should be noted that substructure search does not give you substructures of the query and that superstructure search does not return superstructures of the query.

        It is possible to include explicit hydrogen atoms as part of the pattern being searched.  For example, if you choose to do so, the SMILES queries [CH2][CH2][OH] and [CH3][CH][OH] will return molecules whose formula are R-CH2-CH2-OH and CH3-CH(R)-OH, respectively.  Substructure/superstructure searches implemented in many databases remove by default explicit hydrogens from the query molecule prior to search, the two SMILES queries [CH2][CH2][OH] and [CH3][CH][OH] may give you the same result as what the SMILES query CCO does, unless you specify that explicit hydrogens should be included in pattern matching.

        In addition to explicit hydrogen atoms, there are additional factors that may affect results of substructure/superstructure searches, for example, whether to ignore stereochemistry, isotopism, tautomerism, formal charge, and so on.




PubChem Substructure Video Tutorials The following tutorials were created by a student in the Fall 2015 Cheminformatics class as part of his project.

No votes yet