Discussion | DivCHED CCCE: Cheminformatics OLCC

Hello

Hi all,

Let me introduce myself, too. I'm Evan Hepler-Smith, a historian of science and technology specializing in the history of chemical information. Right now, I'm a postdoc at the Harvard University Center for the Environment, looking at the role of chemical information systems in the history of environmental toxicology and chemical regulation.

I am also quite interested in the present and future of chemical information, of course - that's why I'm here! But my research has shown me that understanding where our information systems, file formats, and notation came from unlocks a much richer understanding of the features (and bugs) of our present-day cheminformatics tools. I try to pass along a little bit of that perspective in Module 2, which I developed. Looking forward to working with you all!

All best,
Evan

Re: Question on Data Structure

Hi Vince,

Thanks for your question. Yes, we'll encounter plenty of examples of data structures that are systematically defined and consistently (or sometimes not so consistently!) applied throughout the rest of the course. For example, take a look at the "Anatomy of a MOL file" in Part 2 of this module , and the discussions of InChI and SMILES in Part 3 . Later on in this course, you'll have plenty of opportunity to become familiar with the ins and outs of various standards for representing chemical structure and substructure in your work using public compound databases and in some of the student projects and special topics modules.

One thing to keep in mind: standards are supposed to be stable, but they are never totally permanent, nor should they be. Often (though not always!), changes to well-designed standards will maintain the validity of data structured according to the previous version of the standard. (You can think of this as the backwards-compatibility of standards.)

For example, during the 1950s and 1960s, chemists introduced a new element into the (print-based) data structure of systematic organic nomenclature: the R/S notation for indicating the configuration of stereocenters. This didn't make older systematic chemical names invalid, but it did mean that new chemical names could contain more information. It's helpful to keep tabs on standards development work going on related to a data structure that you use - you might be able to do more with your chemical data than you thought!

Thanks,
Evan

Question on Data Structure

In your module you state, "To automate functions on chemical data, the data structure needs to be systematically defined and consistently applied", https://goo.gl/5crSQ7 . Can you expound on this? Are there contemporary issues that we should be aware of? Will the upcoming modules be helping us to understand this, and if so, what should we be looking for?

Thanks,
Vince

Hello everyone.

I am Esther and am originally from Nigeria and i just got into the Department of Chemistry in University of Arkansas, Little rock, i just joined your class and i look forward to asking, finding, discussing and learning chemical information. i never knew i could work on Reaxys until i joined the class, am happy and i look forward to learning and advancing my career.

Hello from Jordi Cuadros (Barcelona, Spain)

Hello everyone. I am Jordi Cuadros, an Associate Professor at IQS Univ. Ramon Llull in Barcelona, Spain. I will be leading (with Roger) the class that will be taking the course from Spain. I am a chemist but I spent most of my time in front of a computer. I teach a first-year introduction to computer programming class for chemists and engineers and devote my research to the use of computers to discipline-specific education (chemistry, statistics, physics). I was involved to the first edition of this course in Fall 2015 (on using databases programmatically) and I am back to keep on learning. My personal view into the topic of cheminformatics is probably closer to chemistry information than to medicinal chemistry or biology. Given today accessibility to chemical information, I feel any chemist should have the skills to understand, manage and use this wealth of data. This is to me the point of this course. I am looking forward to jump into it!

Greetings from the Netherlands

Hi all, I'm Herman Bergwerf. I'm currently a BSc Nanobiology student at the Technical University of Delft (small city in the Netherlands). I am also a professional software engineer and I do a lot of coding in my free time (more than anything else). Among other things, I have built http://molview.org, a website to draw, view and search for molecular structures. I'm involved in this course as a potential project mentor. Additionally, I wrote a small 'Special Topic' module about molecular visualization on the web (which might be interesting for you final project).

Heynow

Hello nice people,
I'm Jo, and I'm a chemistry major here at U of A Little Rock. I want to teach high school chemistry and math. This course is giving me tools and ideas which I can pass on to my future students. Whether those kids go to college, stay on the farm, or spin wrenches, I believe informatics will help smooth their way to a better future. We can make possible real research at the secondary level, and inspire students to be even more creative and inquisitive if we give them the means to explore whatever rabbit hole they fancy diving into. Now, that's some fun.

Greetings from Ehren

Hi All,

I am Ehren Bucholtz, associate professor of organic chemistry at St. Louis College of Pharmacy. My background is medicinal chemistry, so I teach my introductory and organic chemistry courses with lots of pharmaceutical and biological examples. I am on sabbatical this semester, but when I found out about this opportunity, I had to be a part of it. My sabbatical proposal was based on cheminformatics, so this is a perfect fit. One of the projects that I am working on during sabbatical is using Optical Structure Recognition to grade student work. I think this could be an interesting project for students to participate in.

Another project idea that I had for this course was based on some emails I have had with Sunghwan Kim (whom you met in a previous post) before the semester started. We had discussed searches in pubchem related to pharmaceuticals. It got me thinking about figuring out new lead molecules that might be interesting to test for Methicillin-resistant Staphylococcus aureus (MRSA). I have since talked with one of my colleagues who is a micobiologist and she is willing to test any molecules that we find that could have activity, and are commercially available. ( We would have to try to stick with lead molecules that are purchasable as this course isn't about synthesis.)

Looking forward to working with everyone!

Thanks. That's why I chose

Thanks. That's why I chose this reaction, so that I could find familiar points. I'll go back later - much later, as it is 0200 here - and try drawing out the reaction like you said and searching that.

Diels Alder rxn

Many points here, including:

- Quick search: diels alder mechanism gives choices with 20K (actual reactions indexed with Diels Alder - this normally is great, but not in your case because you want the mechanism) and 1136 documents with the terms in titles, abstracts and indexing. Way forward from the latter may be to View Results => Filters => Document Type => Reviews which will give 69 that would be helpful.

- Create structure template from name is great if you have simple names, but matching even half-way complex names is challenging (the system would have to do complicated part name searches).

- Draw the product alone is a sound approach but you would need to sort through the answers (as you have done).

- We have given you access to Reaxys, but not to full text documents through ScienceDirect (sorry); maybe your Uni has a ScienceDirect subscription and you could find the article through that.

- For a specific reaction, it is best to draw the full reaction and search As drawn. When you do this you get 5 documents (although a Similarity search gives 12). Reaction Records are searchable in Reaxys by structure diagrams but also by text terms. It so happens, though, that 'mechanism' is not systematically indexed in association with Reaction Records. So you basically have to rely on a text search in Document Records (which is what you did when you got the 1136 documents above). However, remember that you are searching text in titles, abstracts, and index terms here - and also remember that most Diels Alder reactions are done for synthetic reasons (so mechanisms would not be discussed, since the general mechanism is known (see below)).

- However, the reaction you want is a classic Diels Alder reaction which would follow the normal [4+2]cycloaddition mechanism (HOMO/LUMO) which you would find in most organic text books (or say in Wikipedia: dies alder).

Summary

- what you did was great and sound as far as the mechanics of the search goes (what you click and where you click)
- the issue really relates to understanding the database content (what is indexed, and where); this ALWAYS is the difficult part (the mechanics of searching is trivial, right?)
- you get to understand database content, but trying searches (as you did).

Many other comments could be made, but I hope this explains most of the issues. Please don't hesitate to get back to me if you need further help (on this or on other searches).

Damon