1.1. Formulas

1.1.1. Structural formulas

“The purpose of a chemical structure diagram,” begins an article on how to draw these diagrams, “is to convey information—typically the identity of a molecule—to another human reader or as input to a computer program. Any form of communication, however, requires that all participants understand each other.”[1]

Below, we’ll go over the various ways in which structural formulas are most often drawn. Once again, our goal is to get you thinking about the kinds of structural formulas that you’ve gotten used to using without having to think too much about them. What could you possibly be misunderstanding in someone else’s structural formula? How could somebody misunderstand your structural formula? Is there a chemical feature in your head that didn’t make it into the formula that you drew? Is there more in the formula that you drew than you meant to express?

(We’ll be going over the ways in which formulas can be drawn. If you are interested in learning more about how formulas should be drawn, and in sharpening your own formula-drawing, we highly recommend checking out this detailed guide. Here’s what you’ll find:

Production of good chemical structure depictions will likely always remain something of an art form. There are few cases where it can be said that a specific representation is “right” and that all others are “wrong”. These guidelines do not try to do that. Rather, they try to codify the sorts of general rules that most chemists understand intuitively but that have never been collected in a single printed document. Adherence to these guidelines should help produce drawings that are likely to be interpreted the same way by most chemists and, as importantly, that most chemists feel are “good-looking” diagrams.[2]

When chemists talk about “structure,” what do they mean? Chemical structure can mean several different things:

Connectivity (also known as constitution): which atoms are linked to which by covalent bonds?
Stereochemistry: what is the relative arrangement of these atoms and bonds in three-dimensional space? Are two groups across a double bond or ring cis or trans to each other? Is a stereocenter R or S?
Conformation: in which of the many configurations permitted by rotation around single bonds are all of the atoms of a compound arranged in space?
Crystal structure: what is the precise position of each atom in the compound, in three-dimensional coordinates?

Structural formulas always express connectivity and often express stereochemistry. Both of these aspects of structure can usually be translated in a relatively straightforward way between different chemical formulas and names.

While structural formulas may also contain information about conformation, it is often more difficult to translate conformation from one formula to another or to a name. And while structural formulas may be drawn to suggest the shape of a molecule, they almost never contain reliable information about crystal structure.

1.1.1.1. How do they work?

To get us started, here are some structural formulas:

V11.A V11.B V11.C

Compressing structural formulas: skeletal formulas, condensed formulas, and abbreviations.

In order to draw structural formulas more quickly and clearly, chemists typically draw carbon atoms as unlabeled vertices. We also typically leave out lone pairs and hydrogen atoms bonded to carbon. Formulas drawn in this way are sometimes call skeletal formulas.

Even skeletal formulas can take up a lot of space, and sometimes, you’re only really interested in the structure of one part of a molecule.

Structural formulas often make use of abbreviations for common molecular subunits: i-Pr (isopropyl), Ph (phenyl), Me (methyl), Et (ethyl), Bu (butyl), t-Bu (tert-butyl), Ac (acetyl), among others. (Here’s a list.) (I.C, IV.B, IV.C)

In order to abbreviate structural formulas even more, condensed formulas express structure without using any lines. A condensed formula can be written in place of an entire structural formula (II.C, III.C) or in place of a portion of a structural formula (I.B, VI.A-D).

These condensed formulas-within-a-structural formula are sometimes called “contracted atom labels.” IUPAC guidelines for graphical representation provide the following specifications for how to write and interpret contracted labels:

Contracted atom labels attached to only one bond should be read outwards from that bond, usually from left to right if the bond is on the left of the label. If the bond is instead attached to the right of the label, the label will normally be read from right to left (313-14).

Parentheses are used when more than two non-hydrogen atoms are bonded to the same atom (e.g., branching; III.C).

The advantage of condensed formulas is that they can be written in normal type. However, it is often more difficult to perceive structural features in a condensed formula than in one of the graphical alternatives.

Stereochemistry

Structural formulas typically indicate cis-trans isomerism across double-bonds (II.D, II.E).

Structural formulas are sometimes drawn in a way that keeps this ambiguous. A crossed double bond and/or a double bond aligned linearly with its neighboring single bonds indicates unknown cis-trans configuration (II.A, II.B).

The configuration of chiral centers is shown using dashes and wedges. (III.D, III.E)

A wavy line explicitly indicates unspecified stereochemistry or a mixture of stereoisomers (III.B). A chiral carbon with only regular bond lines, on the other hand, could indicate that the chemist who drew the formula just didn’t notice the stereocenter (III.A).

Condensed formulas usually do not show stereochemistry (II.C, III.C).

Here’s a little more on stereochemistry in structural formulas.

Delocalization

Delocalization may be drawn via resonance structures (I.D), circles within aromatic rings (I.D, also). Dashed or dotted double bonds are also sometimes used to show delocalization. In some contexts, these can be confusing, since dotted and dashed bonds are also used to depict transition states, coordination relationships, hydrogen bonds, and other bonds that behave differently than covalent sigma and pi bonds.

Many chemistry databases index by structural formulas based on explicit connectivity for organic small molecules. However molecules such as coordination compounds and other delocalized systems do not fit easily into these conventions for representing bonds.

VII.A is the standard IUPAC graphical representation for publication, however, it is complex for software to interpret all the specialized notation, such as the wedged bonds and the bond into the middle of the ring. A human chemist would understand that this indicates a general relationship between the metal and the delocalized system. A computer program, however, might interpret this as a bond to a methyl group, unaffiliated with the ring.

VII.B is a more common representation for searching coordination compounds in chemistry indexes, with explicit bonding between the metal and all ring atoms. However, this is incorrect notation for publication as the association between the metal and the ring atoms are not covalent bonds.

VII.C is considered an acceptable alternative in the IUPAC standards. Not all databases will have provisions to interpret the circular bond notation or the dotted bond to confer a non-standard covalent system.

Delocalization is difficult to program, and almost all software applications do it differently. It is important to keep in mind the purpose for the formula, for human or computer readership.

1.1.1.2. Why do they work that way?

Structural formulas tell you a lot more about the atoms that make up a compound than its valence electrons. Bonds represent electrons, of course, and you can draw in lone pairs. You can add curved arrows to show electron movement, draw resonance structures or dotted bonds to show delocalization, and sketch in orbitals, of course. But in structural formulas themselves, as usually drawn, the electrons are mostly implicit – you know they’re there, but they aren’t actually what the drawing depicts.

This is peculiar, since the majority of chemical phenomena depend on interactions involving valence electrons.

There’s a reason for this. Chemists began using structural formulas a hundred and fifty years ago, after they’d figured out the basic features of organic chemical structure (carbon atoms form chains, carbon forms four bonds, etc.) but before things like cis-trans isomerism, tetrahedral carbon, and even the electron itself had even been hypothesized.

By the time electrons and stereochemistry came along, structural formulas had come into general use, and chemists were quite familiar with them and fond of them. So they kept on using the same formulas, even though they’d been developed without electrons or stereochemistry in mind.

Eventually, chemists developed additional bits of notation – electron dots, dashes and wedges, and the like – to incorporate the electronic theory of bonding and stereochemistry into these familiar formulas. But even though electrons and stereochemical relationships became absolutely central features of how chemists think, it has always been a little bit difficult to represent them in structural formulas. It’s just not what structural formulas were built for.

Of course, structural formulas continued and continue to be enormously productive ways of representing compounds. Chemists have learned to think of these formulas as expressions of contemporary chemical ideas. However, in some situations – cheminformatics among them – we sometimes run into an awkward disconnection between, on the one hand, the historical origin of structural formulas as maps of connections between atoms, and on the other hand, our present-day scientific understanding of the nature of chemical substances.

One more point: structural formulas were originally developed within the context of organic chemistry, and then applied in other fields such as coordination chemistry. Be aware that both people and computer programs will tend to assume, as a default, that structural formulas represent covalently bonded organic compounds. If you are working with structural formulas for complexes involving coordination or hydrogen bonding, make sure that these bonds aren’t accidentally mistaken for the covalent bonds of organic compounds. (Some suggestions on how to avoid this pitfall are available on pages 292–295 here.)

1.1.1.3. Where they are most often used?

Everywhere in which you’re able to draw diagrams. Unfortunately, this excludes a lot of places, such as word processing programs, free-text search boxes, databases, and anytime you find yourself talking chemistry without a notepad in your pocket.

There’s an easy solution for the last of these cases (keep a notepad in your pocket!); for the others, the solution is systematic nomenclature and notation, which we will discuss in the next two units.

One particularly useful feature of structural formulas is that you can easily draw a structural formula for a section of a molecule or identify one molecule as a section of another. Many applications for searching chemical databases (such as SciFinder and PubChem) allow you to perform substructure searches (for all molecules containing a certain structural formula subunit) and superstructure searches (for all molecules whose structural formulas can be found within a certain structural formula).

1.1.1.4. What questions should I ask?

Are we showing any implicit H’s or lone pairs? Are we worried about the ones we aren’t showing?

One or more H’s can be drawn in when there’s chemistry happening at an H, or if you want to indicate the configuration of a stereocenter (V.B). The same goes for lone pairs, when you have reason to call attention to them (I.D).

When you look at a skeletal formula, you know that all of the hydrogen atoms and valence lone pairs that you would expect to be there are in fact present, even though they aren’t drawn in. Keep this in mind if you find yourself communicating with a human or a computer that you can’t count on to fill in those missing H’s and electrons.

How are we dealing with stereochemistry?

Structural formulas can specify stereochemistry (II.D-E, III.D-E, V.A-B) or leave it unspecified (II.A-C, III.A-C). In the latter case, it is typically impossible to tell from the structural formula alone whether you’re dealing with a mixture of stereoisomers or unknown stereochemistry.

If you’re concerned about stereochemistry – and in most cases, you probably are – be alert for stereocenters (including rings with multiple substituents) with unspecified stereochemistry.

Watch out for double-bonds just drawn on top of single bonds without considering cis-trans isomerism (and don’t make this mistake yourself!).

Note that there is a chemical difference between substances of unspecified stereochemistry and mixtures of stereoisomers. (“What’s the stereochemistry? I don’t know!” vs. “What’s the stereochemistry? We’ve got both isomers!,” respectively.) However, when you’re dealing with a structural formula drawn without stereochemistry specified, it can be difficult to know which of these cases you’re dealing with.

How are we dealing with delocalization?

If there’s a delocalized π system in your molecule, think about whether you’ve chosen the appropriate resonance form, or whether it’s worth drawing multiple forms or indicating delocalization with a dotted bond.

How are we dealing with tautomers?

If your compound can tautomerize, think about whether you’ve chosen the appropriate tautomer for your purposes, whether it’s worth drawing both.

Keep in mind that both tautomerism and delocalization are much easier to recognize when you’re working with structural formulas then when you’re working with systematic names or other sorts of notation. (Delocalization is very difficult even to represent using any other form of name or notation.) When you translate structural formulas into another form, make sure delocalization and tautomerism don’t get lost in the shuffle.

1.1.2. Empirical and molecular formulas

You don’t always know, or need to express, or want to express the structure of a compound that you’re working with. In the case of inorganic salts, there’s little or no molecular structure (connectivity, that is) to represent.

In these cases, empirical and molecular formulas give you a way to identify the compound by its composition alone. And if you’re interested in a compound’s composition for its own sake, better to write down a molecular formula than keep counting each atom in a structural formula.

1.1.2.1. How do they work?

Empirical and molecular formulas are pretty straightforward: you just count the atoms or the ions.

Empirical formulas are most often used to identify salts. Empirical formulas typically express the relative amount of each element that the compound contains, in lowest integer terms.

NaCl AlCl₃ Fe₂O₃

Salts containing polyatomic ions are frequently represented with a formula expressing the relative amount of each ion that the compound contains, in lowest integer terms. Such formulas sometimes just referred to as “chemical formulas” and sometimes as empirical formulas.

NH₄NO₂ (NH₄)₂SO₄ Mg₃(PO₄)₂

To write a molecule formula, just count the atoms in one molecule of the compound.

C₂H₆O (ethanol) C₇H₆O₂ (benzoic acid) C₂H₂ (ethylene) C₆H₆ (benzene)

Salts containing polyatomic ions are sometimes represented by a “molecular formula” expressing the total number of atoms of each element that are present when the ions combine in lowest integer terms.

C₂H₇NO₂ (NH₄C₂H₃O₂, ammonium acetate) H₄N₂O₂ (NH₄NO₂, ammonium nitrite)

1.1.2.2. Why do they work that way?

Empirical and molecular formulas predate structural formulas, but they actually became more important, not less, once structural formulas appeared on the scene. This was because:

Looking up chemical compounds was hard.

If you knew the structural formula for a compound and wanted to look it up in a big chemical dictionary, it was usually pretty easy to find it if the dictionary was organized by molecular formula. That way, you only had to look through the names for the couple dozen isomers that shared a molecular formula.

Sometimes chemists were wrong about their structure determinations.

It was helpful to have a formula that you could go on using unchanged if it turned out that the double bond wasn’t where you thought it was, for multiple tautomeric forms of a compound

It’s useful to think of a molecular formula as a general label for a compound rather a specific one, especially when you’re dealing with an organic small molecule that almost certainly has a bunch of isomers.

That is: don’t think to yourself “diethyl ether is C₄H₁₀O” but rather “diethyl ether is one of the things in the C₄H₁₀O box, along with 1-butanol, 2-butanol, etc.”

C₄H₁₀O

1.1.2.3. Where they are most often used?

Empirical formulas are often used to represent the empirically-determined composition of an unknown sample.

Molecular formulas are most often used to identify molecular entities (organic compounds, covalently bonded inorganic compounds, coordination complexes) and their salts.

They usually show up in database entries for compounds, so you can use them to search for compounds (particularly useful if you suspect that tautomers might be throwing off your search).

1.1.2.4. What questions should I ask?

Are we grouping atoms into ions or just listing them element by element?

How are we ordering the atoms?

The two most common ways in which empirical and molecular formulas are ordered are:

From electropositive to electronegative / cation to anion

NaCl CaCO₃ AlCl₃ Fe₂O₃

SO₂ H₂O

NH₄C₂H₃O₂NH₄NO₂

Exceptions:

NH₃

In the order C, then H, then everything else, in alphabetical order. (This is sometimes called Hill system order.)

ClNa CCaO₃ AlCl₃ Fe₂O₃

O₂S H₂O

C₂H₇NO₂H₄N₂O₂

H₃N

Is this an empirical formula (a ratio of lowest terms) or a molecular formula (the total count of atoms in a particular chemical structure)?

C₂H₂ (ethylene) C₆H₆ (benzene)

…or

CH (ethylene and benzene)

Are we referring to a specific isomer, and how do we know which one?

This almost goes without saying: for organic compounds and many inorganic ones, there are almost always a bunch of isomers that share the same molecular formula.

1.1.3. Other kinds of compounds and formulas

As we mentioned above, most of the general principles behind chemical names and formulas were originally worked out for organic compounds and were later adapted to other sorts of chemical entities.

Molecular formulas for coordination complexes are often written in brackets, in the order [central atom (usually a metal), then negative ligands, then neutral ligands]. They may also be written in Hill system order.

[CoCl₃(NH₃)₃] [CoCl(NH₃)₅]²⁺ [CoCl(NH₃)₅]Cl₂

= = =

H₉Cl₃CoN₃ H₁₅ClCoN₅²⁺ H₁₅Cl₃CoN₅

Projection formulas indicate stereochemistry or relative conformation.

The sequence of a fragment of biological polymer (a polypeptide or nucleic acid) is similar to a condensed formula, since it represents a linear chain of chemical units.

You may come across formulas in which one of these units is expanded.

Rating:

No votes yet