Learning Objectives:
- To recognize various different kinds of chemical names, formulas, and other identifiers.
- What you do and do not know about a chemical compound based on one of these names, formulas, or identifiers.
- How one kind of chemical name, formula, or other identifier can be translated into another, and what sorts of information can be inadvertently lost or added in translation.
- How chemists interpret various kinds of chemical names, formulas, and other identifiers in chemically meaningful ways.
Table of Contents
1.0 Communicating chemical structure with formulas and names
Overview
Definitions
1.1. Formulas
1.2. Names
1.3. Further Reading
1.4 Chemical structure drawing programs
1.5. Exercises
1.0 Communicating chemical structure with formulas and names
Overview
Chemistry involves a lot of communication. In the classroom, in the laboratory, or at the computer screen, as a chemist, you are constantly referring to all sorts of different chemical substances and molecular entities. You do so using chemical names, formulas, and notation. You’re probably already so accustomed to chemical names, formulas, and notation that you barely need to think about them when you use them, and can instead focus on the molecules that you’re drawing, writing, or talking about. In this module, we’re going to turn things around and think about chemical names, formulas, and notation themselves.
Why would we want to do that?
Where there’s communication, there’s always a danger of misunderstanding. Experienced human chemists are generally able to figure out when they’ve misunderstood each other over the identity of a particular compound. However, work in cheminformatics almost always involves communicating not just with other chemists, but with computer systems. Often, it also involves different computer systems communicating with each other. In these cases, it’s often easier for miscommunication to go undetected. When it is detected, it’s often difficult to figure out what went wrong.
You can minimize the impact of this kind of miscommunication by keeping in mind what various sorts of chemical names and formulas DO and DO NOT tell you about a particular compound, and by documenting the sources of the names and formulas that you use.
In Part 1 of this module, we will dig into the most common kinds of chemical names, formulas, and notation to figure out a) how they work, b) why they work like they do, c) where they are most often used, and d) what they do and do not tell you about a chemical structure.
In Part 2, we’ll introduce several chemical identifiers and representations developed specifically for use on computers.
Later modules of this course will focus on how these various sorts of identifiers are used in cheminformatics applications. In this module, we’ll focus on the communications tasks that almost all chemists engage in. A convenient mnemonic for these tasks is “RSVP”: Register, Search, View, Publish. Most forms of chemical representation were developed with these uses in mind.
(A quick note to reassure you before we dive in: we’re not going to be memorizing any nomenclature rules. Systematic chemical nomenclature has become so complicated that even experts in the field use computer systems to review their work and catch their mistakes. In Part 2, we’ll talk a little bit about how this has happened, since it will help you understand how do deal with some of the challenges that might come up when you have to deal with systematic chemical names in your own work.)
The ability to communicate effectively using chemical names, formulas, and notation is a kind of literacy. As with regular literacy, this chemical literacy is something that you will get better at with practice. The better you understand what’s going on “under the hood” of various forms of chemical representation and the computer systems that make use of them, the better a chemical communicator you will become.
1.0.1. Definitions
Chemical identifiers and representations
There are lots of different kinds of chemical names and formulas. Confusingly, many of the terms that refer to them can be used in different ways.
Instead of trying to specify a single, unambiguous meaning for each term, we’re going to lay out the various different things that people might mean when they’re talking about, for example, an “empirical formula.”
Formulas
A structural formula is any formula that indicates the connectivity of a compound – that is, which of its atoms are linked to each other by covalent bonds. There are various different kinds of structural formulas:
A line formula depicts connectivity but no three-dimensional structural information.
A condensed formula expresses the same information as a line formula using atomic symbols only.
A Lewis formula explicitly shows valence lone pairs in addition to bonds.
A skeletal formula is a simplified line formula in which carbon atoms are depicted as unlabeled vertices and hydrogens atoms bonded to carbon are suppressed. Skeletal formulas are the most common structural formulas.
Dash-wedge formulas use dashes and wedges to represent stereochemistry at sp3 stereocenters.
Projection formulas indicate conformation.
These different ways of drawing structural formulas are often combined or used alongside one another, sometimes in different parts of the same formula. For this reason, it’s not especially important or useful to memorize these terms and their definitions. Rather, you need to be able to interpret the kind of information that each of these formulas expresses. We’ll discuss this in more detail below.
Empirical and molecular formulas indicate the composition of a compound only:
An empirical formula expresses the ratio of the elements (or sometimes polyatomic ions) that make up a compound, in lowest integer terms.
A molecular formula indicates the total number of atoms of each element in one molecule of a compound.
Names
A systematic name is a chemical name based on the structural formula of a compound. If you know the rules and vocabulary of the system in question, you should be able to write a name based on a structural formula and vice-versa. Chemists have developed various ways of translating formulas into names, so it is nearly always possible to write more than one systematic name for a given compound.
Locants and sterochemical descriptors are numbers, letters (such as R, S, E, and Z), and prefixes (cis, trans) that indicate how the molecular fragments indicated by different parts of a systematic name fit together in the named compound.
A trivial name is a relatively short, memorable name that identifies a chemical entity without describing its structure.
IUPAC nomenclature is a well-known international system of chemical names. In general, IUPAC nomenclature is systematic but flexible, offering several ways of writing a systematic name for any given compound. IUPAC nomenclature rules also allow the use of certain well-established trivial names as IUPAC names.
A preferred IUPAC name (PIN) is one of the possible IUPAC names for a compound, singled out as the name to be used in official contexts such as regulation.
Notation
Line notation expresses the structure of a compound using a string of characters. Line notation is designed to be easy for computers to process rapidly and reliably (and is usually not particularly legible to people). Currently, the most commonly used forms of line notation are SMILES/SMARTS and InChI.
Registry numbers are unique identifiers for chemical substances. They are designed not to give you any information whatsoever about a compound’s structure or its relationships to other compounds.
CAS Registry Numbers (CAS RNs) are the registry numbers used in the Chemical Abstracts Service Chemical Substance Registry, a major chemical database that can be searched with CAS applications including SciFinder and STN. They have often been used as official identifiers for chemical substances, especially in the US.
A connection table is a table listing all of the atoms and bonds in a molecule. It is the most common format used by computer programs to store, search, compare, and sort chemical structures. Connection tables are even harder for humans to read than line notation.
The MDL Molfile (.mol file) is a widely-used file format for connection tables.
1.2. Unit 2: Names
- Log in to post comments
1.3. Further reading & references
Formulas
Jonathan Brecher, Pure and Applied Chemistry 80, no. 2 (January 1, 2008), 227–410. URL: http://pac.iupac.org/publications/pac/pdf/2008/pdf/8002x0277.pdf (accessed Sept. 15).
Antony Williams, “Chemical Structures,” in The ACS Style Guide (American Chemical Society, 2006), 375–83. URL: http://dx.doi.org/10.1021/bk-2006-STYG.ch017 (accessed Sept. 2015).
Neil G. Connelly and Ture Damhus, eds., IUPAC Nomenclature of Inorganic Chemistry (Cambridge: Royal Society of Chemistry, 2005), 53–67. (The “Red Book”). URL: http://old.iupac.org/publications/books/rbook/Red_Book_2005.pdf (accessed Sept. 2015).
Wikipedia entry on the Red Book. URL: https://en.wikipedia.org/wiki/IUPAC_nomenclature_of_inorganic_chemistry_2005 (accessed Sept. 2015).
Compound Interest, http://www.compoundchem.com/ (accessed Sept. 2015).
(good examples of effective communication using formulas)
Names
ACS/CAS
“Names and Numbers for Chemical Compounds,” in The ACS Style Guide (American Chemical Society, 2006), 233–54. URL: http://dx.doi.org/10.1021/bk-2006-STYG.ch012 (accessed Sept. 2015).
American Chemical Society, Naming and Indexing of Chemical Substances for Chemical Abstracts, 2007 Edition (Columbus, OH: American Chemical Society, 2008). URL: http://www.cas.org/File%20Library/Training/STN/User%20Docs/indexguideapp.pdf (accessed Sept 2015).
IUPAC
Henri A. Favre and Warren H. Powell, eds., Nomenclature of Organic Chemistry: IUPAC Recommendations and Preferred Names 2013 (Cambridge: Royal Society of Chemistry, 2014). (The “Blue Book”). URL: http://pubs.rsc.org/en/content/ebook/9780854041824 (accessed Sept. 2015).
Wikipedia entry on the Blue Book. URL: https://en.wikipedia.org/wiki/IUPAC_nomenclature_of_organic_chemistry (accessed Sept. 2015).
Neil G. Connelly and Ture Damhus, eds., IUPAC Nomenclature of Inorganic Chemistry (Cambridge: Royal Society of Chemistry, 2005), 53–67. (The “Red Book”). URL: http://old.iupac.org/publications/books/rbook/Red_Book_2005.pdf (accessed Sept. 2015).
Wikipedia entry on the Red Book. URL: https://en.wikipedia.org/wiki/IUPAC_nomenclature_of_inorganic_chemistry_2005 (accessed Sept. 2015).
[1] Jonathan Brecher, “Graphical Representation Standards for Chemical Structure Diagrams (IUPAC Recommendations 2008),” Pure and Applied Chemistry 80, no. 2 (January 1, 2008), 278. URL: http://pac.iupac.org/publications/pac/pdf/2008/pdf/8002x0277.pdf (accessed Sept. 2015).
[2] Ibid., 280.