Discussion

OLCC S69 | Wed, 02/08/2017 - 09:48

Hello. My name is Taylor. I am a junior biochemistry and chemistry major at Campbell University in North Carolina.

OLCC S71 | Wed, 02/08/2017 - 09:43

My name is Angela. I'm a chemistry major at Campbell University. I am taking this course as a seminar credit for the curriculum.

OLCC S61 | Wed, 02/08/2017 - 09:38

Hi,

I'm Victoria, a chemistry major at Campbell University. I'm originally from Florida, but moved to Raleigh, NC. I'm taking this cheminformatics course as part of my undergraduate degree. I hope you all have a great semester!

Victoria

OLCC S63 | Wed, 02/08/2017 - 09:33

My name is Nathan. I am a senior at Campbell University and I am looking to expand my knowledge in all areas of chemistry. I"m looking forward to learning and growing in the area of Cheminformatics.

Haley Greiner | Wed, 02/08/2017 - 09:31

Hey, my name is Haley and I am a junior at Campbell University with a major in Chemistry.

Damon Ridley's picture
Damon Ridley | Tue, 02/07/2017 - 01:56

Phuc

I think Leah and Sunghwan have answered your question very well. You may also be interested in comments Anja/I made about CAS RNs in the document: Reaxys_SubstancesSearch_270117 that appears on the Exploring Reaxys module. Specifically:

A note about CAS Registry Numbers
CAS Registry Numbers were developed by CAS to use as a single identifier for the systematic indexing of (precise) substances in CAS databases. Because many substances can be identified easily in this way, CAS Registry Numbers are quite commonly used worldwide.
However, nearly all database producers have alternative ways to (systematically) index substances. Many of these index terms are text-based and are often the most commonly used name for the substance. Some database producers also index substances such as salts and mixtures under the name of the parent substance, so, unlike with the CAS registry system, a single index name can be used to include many closely related substances.
Algorithm-based correlations between CAS Registry Numbers and the systematic substance terms used by other databases are quite difficult to establish. Commonly, intellectual correlations need to be undertaken. In practice, this means that CAS Registry Numbers should be used only as an initial step to find the corresponding Substance Records in Reaxys. They should not be considered as precise or comprehensive terms for use in Reaxys, as you may find the answer with an alternative search.

Over the years I have often heard chemists say: "The CAS Registry System is crazy", and one example they refer to is the registration of sodium acetate in Registry, specifically the molecular formula: C2H4O2.Na . In taking this course you will realise that such a registration (and the various other registrations used by other databases) has logic - and use (e.g., it allows related salts to be retrieved easily). You will further understand that Sunghwan's comment that the way we talk about substances in lectures is not necessarily the way computers think about substances - for good reason.

Damon

OLCC S21's picture
OLCC S21 | Mon, 02/06/2017 - 15:31

Hi! My name is Jen. I'm an electronic resources librarian at Centre College in Kentucky, and also serve as the science department liaison librarian. I'm taking this course to improve my work with our chemistry students.

Sunghwan Kim | Mon, 02/06/2017 - 15:06

Hi, Phuc.

Yes, a compound can have multiple CAS numbers for *many* reasons. to help you understand this topic, please visit the PubChem Compound Summary page for "benzene" (CID 241).

https://pubchem.ncbi.nlm.nih.gov/compound/241#section=CAS

You will see four different CAS numbers for benzene:

71-43-2 (for benzene; from 8 sources)
8030-30-6 (for naphtha, which a primary source of benzene; from 3 sources)
26181-88-4 (for benzene with Carbon-14 & Trinitum (hydrogen-3); from 1 source)
27271-55-2 (for benzene with Carbon-14; from 1 source)

You can check who provided the CAS numbers to PubChem, by *clicking* the "from XXX, YYY, ZZZ, ......." string below each CAS number. After you expand the source information, carefully look at the "Record Name" part for each CAS number. You will see that different CAS numbers were assigned to each of four cases, including benzene, naphtha (the primary natural source of benzene), benzene with C-14 isotopes, and benzene with C-14 and tritinium.

Similarly, you can check what CAS numbers phenol have *in what context* from the PubChem Compound Summary page for phenol (CID 996):

https://pubchem.ncbi.nlm.nih.gov/compound/996#section=CAS

108-95-2 (phenol)
65996-83-0 (Extracts, coal tar oil alk.)
61788-41-8 (Phenol, sulfurated)
84650-60-2 (Tea, extrats)
27073-41-2 (Phenol, homopolymer)
63496-48-0 (phenosmolin)
73607-76-8 (phenol, labeled with C-14 isotope)

Well, while chemists tend to think the one-to-one correspondence between structures and identifiers (or names), here you can see that there are a lot of contexts about "phenol", in terms of isotopic labels, polymeric states, "natural" sources (e.g., whether extracted from coal tar or tea), or whether it's mixed or modified (e.g., sulfurated).

Well, as a side note, in cheminformatics (or at least in this course), it is critical to unteach yourself about what you've learned from traditional chemistry classes. These courses are designed to teach you knowledge that is relevant to a particular context, often leading you to use a wrong assumption/context when looking at the data generated in different assumptions/contexts.

Leah McEwen's picture
Leah McEwen | Mon, 02/06/2017 - 14:41

Dear Phuc,

Excellent question. CAS numbers are assigned to chemical substances that are registered with Chemical Abstracts. These many include many forms of a compound(s), such as isotopes, polymers, natural products, mixtures, etc. The only authoritative way to positively confirm any given CAS number is to look it up in the CAS Registry, via SciFinder, or the small free CAS database: Common Chemistry (http://www.commonchemistry.org/).

PubChem compiles data from many open sources, which often report CAS numbers, and arranges this data by compound. In the case of phenol (CID 996) for example, CAS numbers are included from several sources that are reporting on many different forms of phenol (https://pubchem.ncbi.nlm.nih.gov/compound/phenol#section=CAS). If you click on the name of the source(s) listed below each CAS number, PubChem displays more information about the original source record. There you will see that, in addition to pure phenol, some sources are covering coal tar oil extract, sulfurated phenol, tea extract, the phenol homopolymer, phenosmolin, or carbon-14 labeled phenol.

This example illustrates that database record numbers, such as CAS RN or PubChem CID, are not necessarily one-to-one matches with chemical compounds. This is an important consideration for using this data, to make sure the chemical information and source are clear. It is also a concern for automating informatics functions that need direct information about chemical identity without having to take extra steps in the data source to confirm. There are other chemical identifiers that are more suitable for informatics, such as InChI (the IUPAC International Chemical Identifier).

olcc s16 | Mon, 02/06/2017 - 13:38

While i was doing the exercise for compound search, I got confused with the CAS number. I thought if one substance would then represent as one CAS number but in several compound, the result gave me like more than 2 CAS numbers. Some of them might make sense due to enantiomer (3-aminopropane-1,2-diol) but some makes no sense like (phenol, which has no enantiomer at all). So, is it right to have more than one CAS number on 1 compound?

Thanks,
Phuc