Discussion

Robert Belford's picture
Robert Belford | Tue, 02/14/2017 - 13:46

Hi All,

I'd like to add a few comments which may help out, but the bottom line is play around.

First, you can go into the 2D editor and simply X out the fluorine and carbons that you do not want, then hit the right arrow, and the Jmol image and mol file become updated.

Second, if you edit the mol file, it is probably easiest to cut it into a text editor, and then paste it back. If you find you need to add an extra line, I think this is the only way, as hitting enter to create a new line updates the 2D and 3D editors. And remember, you need to change the total number of atoms and bonds, as Evan indicated above.

Third, in the 3D visualization there is a "labels" tab, which gives the atoms their number, as reflected in the atom and bond table.

Evan Hepler-Smith's picture
Evan Hepler-Smith | Tue, 02/14/2017 - 13:33

Hi Amita,

This is a very tricky problem.

First, the most important thing to keep in mind is that you should never have to do conversions like this manually in your cheminformatics work (thank goodness!). The point of this exercise is to get you thinking about the kind of issues that you may have to deal with when designing or dealing with scripts that read and/or manipulate connection tables.

Second, you will have to make sure that all three parts of the MOL file are updated to reflect acetic acid:

- Counts line (see diagram of counts line above)
- Atoms table (all and only the atoms contained in acetic acid)
- Bonds table (all and only the bonds contained in acetic acid)

In doing so, you will need to **make sure that atom numbering reflects the current rows in the updated atoms table**. For instance, if you want to delete rows 6 through 10 in an atoms table but preserve the bonding pattern among the other atoms in a structure, you will need to BOTH delete the bond table entries for the atoms you've deleted AND update the numbers for other atoms in the bond table (old atom 11 becomes new atom 6, old atom 12 becomes new atom 7, etc.) The bond table cannot contain any numbers greater than the number of rows in the atoms table.

Third, take a look at the structural formula and think about what atoms you would keep, delete, and change from one element to another in order to carve an acetic acid out of the structure of octanoic acid without moving the 2D/3D position of any of the atoms. (You could also try adjusting 3D coordinates, but that makes it even tougher to create a decent-looking structure - which is about as much as you can expect from a 3D structure, when you aren't using measured or calculated 3D coordinates.)

I hope that this helps!

All best,
Evan

Evan Hepler-Smith's picture
Evan Hepler-Smith | Tue, 02/14/2017 - 12:58

Mea culpa. As Sunghwan will no doubt describe, PubChem periodically reviews and curates the data pulled into the system from outside sources. Since I last test-drove this question, apparently the CAS number that made its way into PubChem from some source or another was eliminated from this record.

This underscores an important point. When you are working with database record IDs, you are almost always going to be safest working with the identifier that is actually **used** as a database record ID in the system that you're working with. In this case, that would be the PubChem CID. From the perspective of PubChem, a CAS RN is just another synonym, which is only as trustworthy as the source from which the synonym was derived. (Apparently not trustworthy enough in this instance, in the judgment of PubChem curation algorithms!)

Put another way, if your primary concern is using a chemical identifier that's going to be stable, you're best off using one that's stable within systems to which you and your intended audience have access. InChIKey is an excellent option, since it is designed to be stable, openly accessible, and used across platforms.

My apologies for the confusion. It's a productive lesson for all of us to keep in mind!

Evan

Olcc S14 | Tue, 02/14/2017 - 12:42

I am trying to convert octanoic acid n to acetic acid in mol file. I got acetic acid in 2D structure but in 3D, it looks really bad. I am also unable to remove all other carbon and hydrogen atoms present in the octanoic acid. So can any body please help me to convert octanoic acid to acetic acid?
Thank you

Amita

Robert Belford's picture
Robert Belford | Tue, 02/14/2017 - 11:41

I am going through [read: grading] student work on exercise 1 problem 1, and none of my students have been able to figure the CAS number for 1-aminoethane-1,2-diol, and none asked for help. I understand there are enantiomers and the correct number for the R enantiomer is 13053-46-8 and that it was abstracted from a 2014 patent, but I do not see how one would be able to find that information within PubChem, or the other sources we have been working with in this assignment.

Could you expound a bit on this?
Thanks,
Bob

Shyleen Frost | Mon, 02/13/2017 - 13:25

Thank you so much for this clarification. You addressed some of the exact questions I had.

Sunghwan Kim | Sun, 02/12/2017 - 16:25

I have recently written a paper about how to use PubChem to identify potential multi-target ligands for subsequent in silico or in vitro experiments (which means small molecules that simultaneously bind multiple protein targets). (The paper hansn't been published yet, so I will share it with those who show some interest in this project.) While the protocol described in this paper use PubChem's web-based tools and interfaces only, it can be implemented in a computer program using PubChem's programmatic access. So the proposed project is to write a program that identify potential multi-target ligands from PubChem and download it on user's computer. Ideally, it would be useful if we also develop a web tool associated with this program. Please let me know if you are interested in this project.

Evan Hepler-Smith's picture
Evan Hepler-Smith | Fri, 02/10/2017 - 13:32

Yep! You're correct that the "1" in the counts line at the top of the file just indicates "chiral," not the number of chiral centers. However, each specific chiral center should be indicated in the atoms and/or bonds table. Check the third column of the atom properties block (that big mess of zeros at the right side of the atoms table) for a 1 or 2 indicating the orientation of an atom that is a chiral center. Also check the fourth column of the bonds table for a 1 or 6 indicating that a particular bond is wedged or dashed.

(However, note that this stereo information may be captured in different ways, on not at all, depending on the program that created the connection table. For more details, take a look at my answer to the question "Human vs. connection table to MOL file" below.)

Best,
Evan

olcc s16 | Fri, 02/10/2017 - 13:21

I am wondering if Is there any clue on the Mol File that if the structure has more than one chirality center? Thanks

Evan Hepler-Smith's picture
Evan Hepler-Smith | Thu, 02/09/2017 - 09:57

Good question - I guess they figured that it was available, assuming that there was no need to document quadruple bonds.

This brings up a good point about the thinking underlying connection tables. While it is possible to use connection tables to represent inorganic and organometallic compounds, like most other general conventions for chemical representation, they were designed with organic compounds in mind. For 150 years or so, chemical naming, representation, and classification has been an especially pressing question in organic chemistry, since such a massive number of organic compounds are known and can be predicted, and because keeping track of all of these organics has been such an important challenge for the chemical and pharmaceutical industries.

If the MOL file had been developed with, say, transition metals in mind, perhaps the designers would have reserved the 4 in the bond order field to represent quadruple bonds. But it was developed primarily for keeping track of organic compounds. And while an organic quadruple valence bond **may** be possible , such species are not likely to be of much concern for cheminformatics.

Evan