A generic structure indicates a group of structurally similar compounds, using a symbol such as “R” (as in R-CH2-OH, where R = H, CH3, CH2CH3, CH(CH3)2, C(CH3)3, and so on). Generic structures are commonly used in chemistry texts as well as in chemical patents in which the inventor claims a whole class of related compounds. Generic structures are more often called “Markush” structures after Dr. Eugene A. Markush, who involved in a legal case which set a precedent in the USA for generic chemical structure patent filing.
An early example of research projects on Markush structure storage and retrieval is the Sheffield Generic Structures Project, which led to a text-based language for generic structure description called GENSAL (GENeric Structure LAnguage)34 as well as an extended connection table representation for generic structures35. The Sheffield generic structures system was never implemented commercially, but influenced two commercial systems: MARPAT36 (developed by CAS) and Markush DARC (currently Thomson Reuters’ Merged Markush Service37).
Some public databases, such as PubChem, allow one to search for generic structures, using SMARTS (SMiles ARbitrary TargetSpecification). It is a language used for describing molecular patterns. SMARTS is useful for substructure searching, which finds a particular pattern (subgraph) in a molecule. SMARTS are straightforward extensions of SMILES. All SMILES symbols and properties are legal in SMARTS. SMARTS includes logical operators and additional molecular descriptors. Detailed information on SMARTS is given in the SMARTS specification document38 in the Daylight theory manual and SMARTS tutorial.39
Another extension of SMILES is SMIRKS40,41, which is a line notation for generic reactions. A generic reaction represents a group of reactions that undergo the same set of atom and bond changes. Note that SMILES and SMARTS can be used to represent reactions, using the “>” symbol between the reactants, products, and agents, as described in the SMILES and SMARTS specification documents. (Therefore, these SMILES and SMARTS that describe reactions are often called reaction SMILES and reaction SMARTS, respectively.) On the other hand, SMIRKS is used to represent types of reactions (e.g., SN2 reaction). More detailed information on SMIRKS is given in the SMIRKS specification document40 and SMIRKS tutorial41.