Discussion | DivCHED CCCE: Cheminformatics OLCC

Re: Human vs. connection table to MOL file

Thanks for asking - we've gotten a few questions about chirality in MOL files.

Yes, chirality can be expressed within the atom and bond tables of a MOL file, and a parsing algorithm or a visual structure editor program can render that chiral center.

Specifically, chirality can be represented in two ways. First, parity (stereocenter orientation) can be indicated by a 1 or 2 in the third field of the atom properties block (that big block of mostly zeros at the right side of the atoms table of a MOL file). This is analogous to the familiar R/S (Cahn-Ingold-Prelog) convention for indicating atom parity, although it uses a different technique that does not precisely align with R/S assignments.

Second, bond stereo orientation (wedge/dash) can be indicated by a 1 (wedge) or 6 (dash) in the four field of a bond table entry. (The figures MOL XI and MOL XII above show this technique.)

However, the fact that these two options are available can create problems that make it difficult to handle MOL file stereochemistry without bringing human judgement into play. As friend of the course Dr. Alex Clark has written, "The format has two ways to specify atom-centred tetrahedral chirality, and they are not harmonious....The relationship between wedges and parity is not well defined, and must be handled based on the circumstances. The format provides no assistance, which necessitates guesswork, and forces cheminformatics software to choose between generality and robustness, but not both."

(See for a detailed discussion of this and other pitfalls of MOL v2000 by Dr. Clark.)

Best,
Evan

Hello!

My name is Pranay, and I am a senior at the University of Illinois, Springfield, and I am a Chemistry major. i decided to take this class because I wish to go into the field of computer science and i think that this course can help bridge my chemistry background into the computer science field.

Howdy

This is Andrew Cornell, a graduate student in chemistry at University of Arkansas at Little Rock (UALR). My future research will involve the combination of cheminformatics with public safety.

A. Cornell

HELLO EVERYONE

MY NAME IS CHINENYE , I AM EXCITED TO BE IN THIS CLASS I LOOK FORWARD TO LEARNING MORE ABOUT CHEMINFORMATICS

Aromatic bonds - MOL file format

I was just wondering why the MOL file format uses the number 4 to indicate aromatic bonds?

Human vs. connection table to MOL file

Does a human have to indicate a point of chirality or other stereospecific characteristic at some point in this process, or is the MOL file able to identify these things from the information conveyed in the connection table(s)?

SCTs

Dear all,

A couple of comments on the "SCT," in response to student questions that have come up.

First, as noted above, the SCT is **not** an actual chemical structure file format. Rather, it is a simplified, bare-bones version designed to help bring out some of the general features of connection tables.

Second, SCTs are unambiguous but not canonical. That means that, while each SCT corresponds to *one* chemical structure, each chemical structure may be represented by *several* SCTs. This is because atoms may be numbered in any way you wish, and bonds may be written in any order you wish in the bonds table. There are methods for “canonicalizing” connection tables so that each chemical structure corresponds to only one connection table, using an algorithm. (The best known algorithm of this sort is called the “Morgan algorithm,” named for the CAS engineer who came up with in during the 1960s.) But this is an additional procedure applied after the connection table is formed.
The point here is that you should not assume that atoms are numbered or bonds are listed in any particular order when a connection table is created, unless you know for sure that a canonicalization algorithm has been applied.

Evan

Re:

Thank you Evan. I realized that was based on comparing with the Kekule structures not just all "conjugated". I will try with the benzene ring to see a comparison .

Daniel

Re: Question on Anatomy of an MOL file

The short answer: no.

The long answer:

For each of MOL I, IV, and V, the bond table contains a collection of entries that make a ring (atom 1 to atom 2, 2 to 3, 3 to 4, 4 to 5, 5 to 6, and 6 back to 1, closing the circuit). Without bond table entries that form a closed chain in this way, you won't end up with a ring.

So what happens if you enter "4" (indicating an aromatic bond) in the bond type field for a structure that can't possibly be aromatic, such as a straight chain? Exercise 1 at the end of this section asks you to investigate how Hack-a-mol deals with such situations.

It's important to remember that there is not an absolute "right" answer to this question. There are just the answers that happen to have been programmed into a particular parsing algorithm or software for rendering connection tables as (graphical) structural formulas. You can imagine a few possibilities: a program might return an error (the thinking: "That's not aromatic! Bad file!"). Alternatively, a program might display conjugated double bonds (the thinking: "Okay, that's not aromatic, but whatever program created this file must have just meant "conjugated" instead of aromatic. And aromaticity is not an especially precise concept, anyway.") You can probably imagine other choices that a cheminformatics programmer might make.

(These sorts of ambiguities are among the reasons why some databases default to Kekule structures. Even though they can be confusing, they avoid possible ambiguities in how to correlate connection table data with specific chemical structures.)

Anyway - try messing around with Hack-a-mol and see what you can discover about how it treats those aromatic "4s" when applied to bonds that can't possibly be aromatic.

Evan

Question on Anatomy of an MOL file

When using tricky features such as aromaticity, Is it a given that MOL V will always represent a ring structure no matter the atom and bond level as opposed to the kekule structures represented as MOL I and MOL IV?