

Learning Objectives:
- To recognize various different kinds of chemical names, formulas, and other identifiers.
- What you do and do not know about a chemical compound based on one of these names, formulas, or identifiers.
- How one kind of chemical name, formula, or other identifier can be translated into another, and what sorts of information can be inadvertently lost or added in translation.
- How chemists interpret various kinds of chemical names, formulas, and other identifiers in chemically meaningful ways.
Table of Contents
1.0 Communicating chemical structure with formulas and names
Overview
Definitions
1.1. Formulas
1.2. Names
1.3. Further Reading
1.4 Chemical structure drawing programs
1.5. Exercises
1.0 Communicating chemical structure with formulas and names
Overview
Chemistry involves a lot of communication. In the classroom, in the laboratory, or at the computer screen, as a chemist, you are constantly referring to all sorts of different chemical substances and molecular entities. You do so using chemical names, formulas, and notation. You’re probably already so accustomed to chemical names, formulas, and notation that you barely need to think about them when you use them, and can instead focus on the molecules that you’re drawing, writing, or talking about. In this module, we’re going to turn things around and think about chemical names, formulas, and notation themselves.
Why would we want to do that?
Where there’s communication, there’s always a danger of misunderstanding. Experienced human chemists are generally able to figure out when they’ve misunderstood each other over the identity of a particular compound. However, work in cheminformatics almost always involves communicating not just with other chemists, but with computer systems. Often, it also involves different computer systems communicating with each other. In these cases, it’s often easier for miscommunication to go undetected. When it is detected, it’s often difficult to figure out what went wrong.
You can minimize the impact of this kind of miscommunication by keeping in mind what various sorts of chemical names and formulas DO and DO NOT tell you about a particular compound, and by documenting the sources of the names and formulas that you use.
In Part 1 of this module, we will dig into the most common kinds of chemical names, formulas, and notation to figure out a) how they work, b) why they work like they do, c) where they are most often used, and d) what they do and do not tell you about a chemical structure.
In Part 2, we’ll introduce several chemical identifiers and representations developed specifically for use on computers.
Later modules of this course will focus on how these various sorts of identifiers are used in cheminformatics applications. In this module, we’ll focus on the communications tasks that almost all chemists engage in. A convenient mnemonic for these tasks is “RSVP”: Register, Search, View, Publish. Most forms of chemical representation were developed with these uses in mind.
(A quick note to reassure you before we dive in: we’re not going to be memorizing any nomenclature rules. Systematic chemical nomenclature has become so complicated that even experts in the field use computer systems to review their work and catch their mistakes. In Part 2, we’ll talk a little bit about how this has happened, since it will help you understand how do deal with some of the challenges that might come up when you have to deal with systematic chemical names in your own work.)
The ability to communicate effectively using chemical names, formulas, and notation is a kind of literacy. As with regular literacy, this chemical literacy is something that you will get better at with practice. The better you understand what’s going on “under the hood” of various forms of chemical representation and the computer systems that make use of them, the better a chemical communicator you will become.
1.0.1. Definitions
Chemical identifiers and representations
There are lots of different kinds of chemical names and formulas. Confusingly, many of the terms that refer to them can be used in different ways.
Instead of trying to specify a single, unambiguous meaning for each term, we’re going to lay out the various different things that people might mean when they’re talking about, for example, an “empirical formula.”
Formulas
A structural formula is any formula that indicates the connectivity of a compound – that is, which of its atoms are linked to each other by covalent bonds. There are various different kinds of structural formulas:
A line formula depicts connectivity but no three-dimensional structural information.
A condensed formula expresses the same information as a line formula using atomic symbols only.
A Lewis formula explicitly shows valence lone pairs in addition to bonds.
A skeletal formula is a simplified line formula in which carbon atoms are depicted as unlabeled vertices and hydrogens atoms bonded to carbon are suppressed. Skeletal formulas are the most common structural formulas.
Dash-wedge formulas use dashes and wedges to represent stereochemistry at sp3 stereocenters.
Projection formulas indicate conformation.
These different ways of drawing structural formulas are often combined or used alongside one another, sometimes in different parts of the same formula. For this reason, it’s not especially important or useful to memorize these terms and their definitions. Rather, you need to be able to interpret the kind of information that each of these formulas expresses. We’ll discuss this in more detail below.
Empirical and molecular formulas indicate the composition of a compound only:
An empirical formula expresses the ratio of the elements (or sometimes polyatomic ions) that make up a compound, in lowest integer terms.
A molecular formula indicates the total number of atoms of each element in one molecule of a compound.
Names
A systematic name is a chemical name based on the structural formula of a compound. If you know the rules and vocabulary of the system in question, you should be able to write a name based on a structural formula and vice-versa. Chemists have developed various ways of translating formulas into names, so it is nearly always possible to write more than one systematic name for a given compound.
Locants and sterochemical descriptors are numbers, letters (such as R, S, E, and Z), and prefixes (cis, trans) that indicate how the molecular fragments indicated by different parts of a systematic name fit together in the named compound.
A trivial name is a relatively short, memorable name that identifies a chemical entity without describing its structure.
IUPAC nomenclature is a well-known international system of chemical names. In general, IUPAC nomenclature is systematic but flexible, offering several ways of writing a systematic name for any given compound. IUPAC nomenclature rules also allow the use of certain well-established trivial names as IUPAC names.
A preferred IUPAC name (PIN) is one of the possible IUPAC names for a compound, singled out as the name to be used in official contexts such as regulation.
Notation
Line notation expresses the structure of a compound using a string of characters. Line notation is designed to be easy for computers to process rapidly and reliably (and is usually not particularly legible to people). Currently, the most commonly used forms of line notation are SMILES/SMARTS and InChI.
Registry numbers are unique identifiers for chemical substances. They are designed not to give you any information whatsoever about a compound’s structure or its relationships to other compounds.
CAS Registry Numbers (CAS RNs) are the registry numbers used in the Chemical Abstracts Service Chemical Substance Registry, a major chemical database that can be searched with CAS applications including SciFinder and STN. They have often been used as official identifiers for chemical substances, especially in the US.
A connection table is a table listing all of the atoms and bonds in a molecule. It is the most common format used by computer programs to store, search, compare, and sort chemical structures. Connection tables are even harder for humans to read than line notation.
The MDL Molfile (.mol file) is a widely-used file format for connection tables.
1.2. Unit 2: Names
1.2.1. How do they work?
There are two kinds of chemical name: trivial names and systematic names. Trivial names identify a compound (or sometimes a few closely related compounds), but provide little or no information about its structure and its relationships to other compounds. A trivial name may be a technical chemical term, or it may be a common name taken from regular, nonscientific language. You can think of acronyms for systematic names (THF, DMSO, and so forth) as a kind of trivial name.
Systematic names indicate the complete constitution of the compound. Systematic names are based on structural formulas. Writing a systematic name involves taking apart a structural formula into subunits, finding the appropriate term for each subunit, and putting those terms together to form the name. You should therefore be able to draw a structural formula for a compound based on its systematic name, by taking apart the name into its subunits and writing down the structural formulas for each of these subunits, connecting them as specified in the name.
Trivial:
Systematic:
Semi-systematic names take a trivial name of a related compound as a root and name the compound systematically as a derivative of that compound.
The root of a systematic name indicates the compound’s primary chain or parent compound, and prefixes and suffixes indicate the atoms or groups that are attached to that parent compound.
Locant numbers (and occasionally letters) indicate where these substituent groups are attached to the parent compound (or “substituted” for hydrogen atoms of this parent, hence the term “substituent.”) Stereochemical prefixes – cis and trans, E and Z, R and S – are used to indicate stereochemistry. (If you need a refresher on assigning E/Z and R/S, here’s a primer.)
Several different forms of systematic nomenclature have been used both in the past and in the present. Furthermore, the best-known nomenclature system, that of the International Union of Pure and Applied Chemistry (IUPAC), provides various options for how to name a compound. Therefore, most compounds have more than one systematic name. Fortunately, most systems of nomenclature in wide use are based on more or less the same principles and the same vocabulary.
L-threo-Hex-2-enonic acid, γ-lactone
L-3-Keto-threo-hexuronic acid lactone
2-oxo-L-threo-hexono-1,4-lactone-2,3-enediol
(R)-3,4-dihydroxy-5-((S)-1,2-dihydroxyethyl)furan-2(5H)-one
(R)-5-((S)- 1,2-dihydroxyethyl)-3,4-dihydroxyfuran-2(5H)-one
Five systematic names for vitamin C (L-ascorbic acid)
IUPAC recently published rules for determining one Preferred IUPAC Name (PIN) for each compound. The rules for determining these names are rather complicated; however, as we will see shortly, other forms of notation are often used when you need a unique identifier for a compound.
1.2.2. Why do they work that way?
Systematic names were originally designed primarily for use in alphabetical indexes of chemical substances. However, the effort to make these names both unambiguous and canonical (see Unit 3.b in this module) for this purpose made many of these names extraordinarily difficult to read, let alone say out loud. Chemists came up with different approaches to systematic nomenclature tailored for different sorts of compounds and different ways of organizing a chemical index; that’s how we ended up with so many different systematic names for the same compound.
Though some chemists initially predicted that systematic nomenclature would completely replace trivial names, this never happened. Trivial names convey little or no chemical information, but they have the advantage over systematic names in many of the qualities that we usually associate with good names: they are short, memorable, pronounceable, and easy to distinguish from other names.
1.2.3. Where they are most often used?
Trivial names are used constantly in informal chemical communication. Chemists working together on specific complex compounds will typically develop their own trivial “nicknames” for their compounds of interest.
Systematic names are often required if you want to register a new compound and for compounds discussed in publications. They are typically listed in database records accessible through search applications like PubChem, SciFinder,Reaxys, and ChemSpider, as well as on the Wikipedia pages for chemical substances. However, because of the various different systems of nomenclature in use, because IUPAC names are not unique, because names formed according to now-defunct rules often stick around, and because of human error (a particularly issue in a crowd-curated site like Wikipedia), the systematic names that you find in these locations can sometimes vary.
Sections of systematic and semi-systematic names corresponding to a substructure of interest can be useful in searching for compounds containing that substructure, particularly in non-chemical settings like Google. However, this approach is generally less reliable than substructure searches that accept structural formulas as input.
1.2.4. What questions should I ask?
Are there structural ambiguities that the structural formula would clearly indicate but that the systematic name obscures?
When you’re dealing with systematic names rather than structural formulas, it’s much harder to recognize when you need to pay attention to delocalization, stereochemistry, and tautomerism. You may wish to sketch a structural formula based on the name (or make use of a computer program that does so) to determine whether any of these factors – particularly stereochemistry – apply.
What system of nomenclature does the name fit within?
Are you dealing with an IUPAC name? A Preferred IUPAC Name (PIN)? A CAS index name? A name that describes a structural formula without quite following any specific set of nomenclature rules?
Why am I using a systematic name, anyway?
Systematic names are difficult to read and to write. Before you decide to use them, make sure that there isn’t a different chemical identifier that serves your purposes better. (See Unit 3.c below.)
- Log in to post comments
1.3. Further reading & references
Formulas
Jonathan Brecher, Pure and Applied Chemistry 80, no. 2 (January 1, 2008), 227–410. URL: http://pac.iupac.org/publications/pac/pdf/2008/pdf/8002x0277.pdf (accessed Sept. 15).
Antony Williams, “Chemical Structures,” in The ACS Style Guide (American Chemical Society, 2006), 375–83. URL: http://dx.doi.org/10.1021/bk-2006-STYG.ch017 (accessed Sept. 2015).
Neil G. Connelly and Ture Damhus, eds., IUPAC Nomenclature of Inorganic Chemistry (Cambridge: Royal Society of Chemistry, 2005), 53–67. (The “Red Book”). URL: http://old.iupac.org/publications/books/rbook/Red_Book_2005.pdf (accessed Sept. 2015).
Wikipedia entry on the Red Book. URL: https://en.wikipedia.org/wiki/IUPAC_nomenclature_of_inorganic_chemistry_2005 (accessed Sept. 2015).
Compound Interest, http://www.compoundchem.com/ (accessed Sept. 2015).
(good examples of effective communication using formulas)
Names
ACS/CAS
“Names and Numbers for Chemical Compounds,” in The ACS Style Guide (American Chemical Society, 2006), 233–54. URL: http://dx.doi.org/10.1021/bk-2006-STYG.ch012 (accessed Sept. 2015).
American Chemical Society, Naming and Indexing of Chemical Substances for Chemical Abstracts, 2007 Edition (Columbus, OH: American Chemical Society, 2008). URL: http://www.cas.org/File%20Library/Training/STN/User%20Docs/indexguideapp.pdf (accessed Sept 2015).
IUPAC
Henri A. Favre and Warren H. Powell, eds., Nomenclature of Organic Chemistry: IUPAC Recommendations and Preferred Names 2013 (Cambridge: Royal Society of Chemistry, 2014). (The “Blue Book”). URL: http://pubs.rsc.org/en/content/ebook/9780854041824 (accessed Sept. 2015).
Wikipedia entry on the Blue Book. URL: https://en.wikipedia.org/wiki/IUPAC_nomenclature_of_organic_chemistry (accessed Sept. 2015).
Neil G. Connelly and Ture Damhus, eds., IUPAC Nomenclature of Inorganic Chemistry (Cambridge: Royal Society of Chemistry, 2005), 53–67. (The “Red Book”). URL: http://old.iupac.org/publications/books/rbook/Red_Book_2005.pdf (accessed Sept. 2015).
Wikipedia entry on the Red Book. URL: https://en.wikipedia.org/wiki/IUPAC_nomenclature_of_inorganic_chemistry_2005 (accessed Sept. 2015).
[1] Jonathan Brecher, “Graphical Representation Standards for Chemical Structure Diagrams (IUPAC Recommendations 2008),” Pure and Applied Chemistry 80, no. 2 (January 1, 2008), 278. URL: http://pac.iupac.org/publications/pac/pdf/2008/pdf/8002x0277.pdf (accessed Sept. 2015).
[2] Ibid., 280.