This page is being used to discuss the cheminformatics OLCC. If you log in, you will see a block on the right with files that have been uploaded containing various syllabi. If you go to the "Calendar Forum Topic", you will see the dates of the schools involved, and the syllabus must conform to these dates. If you comment, an email will be sent to everyone following comments (you need to log in to comment, and to see who is following comments, or updates). You can also send emails from the site to anyone following this page that you want, and include files in those emails.
Introducing Cheminformatics: David Wild (Fall 2013 course)
Generic Course Description:
Cheminformatics OLCC: 3 credit-hour hybrid intercollegiate introductory course in cheminformatics hosted by the ACS DivCHED Committee on Computers in Chemical Education. This is a "team" taught course where students interact online with cheminformatic content expert lecturers while a local faculty member runs the remainder of the course. Course topics include chemical information management, digital representation of chemicals and chemical reactions, structure/substructure searching, scientific databases and data discovery, including chemical structural, physical and spectral data. Predictive technologies, an introduction to semantic web, ontologies, scientific mark-up languages, electronic laboratory notebook technologies and the provenance of experimental data and meta-data representation, generation and standards. Due to the intercollegiate nature of the course emphasis will be given to open access resources and open source technologies that are available to all institutions.
Undergraduate chemistry students (probably not freshman students)
Expected previous experience
Some use of common desktop applications and at standard user level.
A semester of organic chemistry or similar and some chemistry lab experience.
Objectives of the course
At the end of the course, the students:
- will know and be able to use the most common formats which are used to store, transform and manage chemical information in digital environments;
- will have a basic knowledge of the most common software tools and web services (at least, those easily available) that current chemists and chemistry researchers use.
The course will not use/require any computational skill (no programming) nor will make use of any chemistry computational tools. It is intended to address what any chemist or chemistry student should know to use current digital chemistry information resources.
Course Content at a Glance
1. 8/17 Module 1: WVU, UALR, Preliminary Information Skills -McEwen
2. 8/24 Module 1: OSU, UNF join -McEwen
3. 8/31 Module 1: Centre joins -McEwen
4. 9/7 Module 1: Note labor day is 9/7 -McEwen
5. 9/14 Module 2: Information Science for Chemists - Chalk
6. 9/21 Module 3: Cheminformatics: What it is, and why we should care - Clark
7. 9/28 Module 4: 2D Structures Part 1: Composing - Clark
8. 10/4 Module 4: 2D Structures Part 2: Manipulating - Clark
9. 10/12 Module 5: Chemical Identifiers (UALR/WVU have 12/13 off, OSU has 15,16 off) - Kim
10. 10/19 Module 5/6: Chemical Identifiers/Comparing and Searching Chemical Entities - Kim
11. 10/26 Module 6: Comparing and Searching Chemical Entities -Kim
12. 11/2 Module 7: Representing and Managing Digital Spectra - Chalk
13. 11/9 Module 8: Interacting with Databases: Desktop and Web Based Applications (UNF, OSU have 11/11 off) - Lange and Cuadros
14. 11/16 Module 8: Interacting with Databases: Desktop and Web Based Applications - Lange and Cuadros
15. 11/23 Module 9: (Projects/Special Topics) Thanks Giving (WVU has whole week off)
16. 11/30 Module 9: (Projects/Special Topics)
Tentative Course Content (by module)
Module 1: Preliminary Information Skills (Leah McEwen)
Module 2: Information Science for Chemists (Stuart Chalk)
- Basics of computer systems
- Information Data types
- Strings, text, enum and set
- Numeric, integer, decimal, float and double
- Dates and timestamps
- Understanding common files and formats
- Saving information in files
- Text files versus binary files
- Open versus proprietary format
- Computer languages
- For desktop applications
- For websites
- Information in databases
- Types of database
- Websites as databases
- Data websites and Application Programming Interfaces (APIs)
Module 3: Cheminformatics: What is it and Why should we care? (Alex Clark)
Module 4. Presenting chemistry in 2D (Alex Clark)
1. Using molecular editors
2. Common file formats for 2D representation of chemical entities
Module 5. Identifying chemical entities (Sunghwan Kim)
- Names – IUPAC, CAS, Beilstein, Variability of systematic names based on settings, PIN names
- Markush structures
- Line notations
- SMILES and related notation
- International Chemical Identifier (InChI) and InChIkey
- Other: ROSDAL, WLN…
- Non-readable identifiers
- Registry numbers
- CAS RN - check digit
- Beilstein IDs
- Other - ChemSpiderID, PubChemID
Module 6. Comparing and searching chemical entities (Sunghwan Kim)
Using chemical entities databases
NIST Chemistry WebBook
Understanding chemical searches
Using the databases programmatically
3.2. Web scraping
Module 7. Representing and managing digital spectra (Stuart Chalk)
Applications to PROCESS and review spectral data – ACD/NMR Processor, MestreLabs M-Nova
Using applications to visualize and analyze spectral data
ChemDoodle Spectral viewer
ACD/Labs spectral viewer
Learn Chemistry Spectraschool
NIST Chemistry Webbook
Common file formats for chemical spectra – binary vs standard file formats – loss of information – phasing, referencing, analysis etc.
Module 8. Interacting with Databases: Desktop and web based applications (Andrew Lange and Jordi Cuadros)
Module 10: Projects/Special Topics
This module could be where students present their own projects, or work with lecturers on special topics that are of interest to them. These could actually be started during module 1 and be developed through out the course. We could set up a virtual poster session where students share and discuss their work across campuses.
Possible Special Topics:
1. Green Chemistry Resources
1.1 Green Chemistry Databases
1.2 Green Chemistry Assistant
2. Mobile Devices
arkup languages, electronic laboratory notebooks (ELNs), provenance of experimental data and metadata representation, organization and generation.
Item VIII seems like a real can of worms. Every web page with chemical information on it is a subset of this, despite the fact that almost all the markup is information for formatting and display. Another extensible data format that isn't included (and I don't think it should be for this) is hdf/hdf5, which is for quantitative data sets. Anyway, I'm not sure how to include this in a succinct and useful way in a one semester course.
I just added a UNF course description.
Stuart,By Chemical Hygienic Factors I mean chemical health and safety. That material (item VI.d) probably does not belong there, but it belongs somewhere.Also items VII and IX are really there to cause discussion, and I am hoping you can figure where/how to embed XML data representations into the course. At the end of the day, I think we need one basic syllabus for all schools, although each school can customize the content.
The following SlideShare presentation is from a librarian (the great Diane Hillman is more a metadata architect) perspective but is useful I think in terms of the concepts of metadata, linked data, etc. that needs to be taught in this course (in my opinion). If folk agree a Chemistry version of this would be valuable and I would be willing to put it together. Leah, I hope you agree :)http://www.slideshare.net/smartbroad/whats-goin-on-42614446Thoughts?
I agree with Stuart that this module needs to be included and would appreciate his effort to put one together.
I just added a feature to upload draft files, and changed the terminology of the flags for subscribing. These are the files that were in a Drobox, along with one Jordi sent. I suggest we all look at Jordi's. It has some thought in it. We need to start collecting Fall 2015 schedules, and figure what we are going to do for the first several weeks, when not all schools will be in session. I was thinking of using this as a more traditional library information science module, sort of "get to know your librarian".We are still trying to get the bulk email feature to work, I will send an email to the entire site later today, even if that feature is not figured out by then.Cheers,Bob
All, I have worked on the calendar page, we have 4 schools on board, and I am trying to get at leaset one more, as 5 seems like a good number. We all start the week of the 17th or the 24th. I think we need to pow-wow as what to do then, I am thinking library resources/information management, and maybe running it an extra week. so that "module" is either 2 or 3 weeks, depending where you are.We are in lock step through September. I suggest this be David Wild's Cheminformatics type stuff.The week of August 12 is a bit messy. 3 schools have a 2-day break, but they are at other times. I suggest that week be merged with the previous. This should probably be the final project week of the molecule representations part of the class, or first of the next phase. Which I am going to call the chemical/big data phase, for lack of better word.The second week of Nov. 2 schools have Veterens day off.WVU has the entire Thanksgiving week off. I suggest all modules be done by then, and classes work on projects.UNF finishs the next week, Dec. 4.In summary, we have from Aug 25 to Dec. 7, As soon as we have a bit more thought/agreement on this I will contact the CINF people, but I want an idea of how much time we have, and it is you folks that are the ones responsible for the grades and giving the students their money's worth. I have also been in contact with the Openchrom people and they are interested in running a module, http://openchrom.net/. Maybe we should all have a conference Skype call.Cheers,Bob
Here is the link to the calendar, I meant to include it in the above posthttp://olcc.ccce.divched.org/content/academic-calendarsCheers,
I know some of the interested facilitators are involved with computational chemistry, and it seems like Avogadro may be an open source package that merges computational chemistry with cheminformatics. Are any of you familiar with it? Here is a pdf from JCI http://www.jcheminf.com/content/pdf/1758-2946-4-17.pdf
I have slightly tested Avogrado a few times I can not see any improvement over what Jmol offers. I even prefer the edition capabilities of Jmol (and its limited drawing interface) over Avogadro.Anyway and besides including Avogadro or not, a more general question to me would be: Is there anyone who edits molecules in 3D? Should the course include it (or just mention it)?
I have minimal experience with Avogadro. But I believe it is capable of working as a front-end for submitting calculations to GAMESS. In my book, that makes it much more powerful than Jmol, which is much more familiar to me.Jennifer
Looking at Jordi Caudros' syllabus it seems to capture most of the content that I think is relative to this course although I still think that from a chemical informatics perspective1) the 3D module should be added to the end of the 2D as I dont think it is enough to be stand alone2) the ontology/semantic web portion of module 9 should be broken out into a separate module because it is fundamentally different information than the other portions of module 9. It is also the more recent/cutting edge work/perspective that goes at the end of the course?3) I think section 3 of module 6 should be bigger and go into detail about REST which is becoming the standard for API's. I would include how the backend database (e.g. MySQL) and the scripting language (e.g. PHP) are used to generate dynamic pages. This demystifies such sites and students understand the whole process of searching for chemicals; request -> process -> response. (I have also taught the Model-View-Controller (MVC) style of programming which is great for REST websites but this may be too much :) )Other notes- I have messed with Avogadro a bit and i'm not jumping up and down about it. However, does read and write...- ...Chemical JSON http://wiki.openchemistry.org/Chemical_JSONStuart
I am about to go through Jordi and Tony's modified syllabus and add it to the front page here. If you look at the calendar, you will see that we have 11 modules possible, if we start 8/31 and end before Thanksgiving. Please note, this is the start of the third week for 2 schools, and second week for the other 2. In accordance with my initial vision, I would like to propose the first module be a "meet your library" type of module. So some schools would have three weeks on this, other's two.
During this time I suggest each class go to their library, become familiar with their librarian and chemistry resources. At UALR our library has a special training room we could use, and our librarian could give a low-level training session on Scifinder and Refworks. But I would also like Justin to offer something on Zotero as I know he is good at training, and more than that, allow librarians from other schools to interact and make this the most useful possible experience (people like Leah could interact with Bret, our libraian and ....). This section could blow up into the whole course, and that is not our objective. But a survey of your home schools resources, and a bit on information management is a logical thing to do during the startup, and is something that different schools could easily spend different amounts of time on. For my students at UALR, I was thinking of seeing if they could generate some kind of guide to a resource that UALR has, which could be uploaded to XCITR, and be a project based graded module. (If it passes my muster, we actually submit it to XCITR for peer review, the flip-side, is students and local librarians get exposed to resources like XCITR).
Then we have 10 modules to move forward where all classes are in lock-step. But this first module, must be flexible, in that some schools are having classes while others are not.
I would really like the other facilitators input on this. Shall we make week one (which is really the second or third week of the semester-depending on your school), a flexible module on local resources and information management?
I vote +1 on this.
(1) Fundamentals of chemical informatics: two molecules, are they the same?... it seems like the module layout at the moment is rather tied to products & services, so I'm not sure where this would fit. Maybe it could replace module 5, and what was there before could be tacked onto another module.The basic idea is to explain the issues that go into answering what most people think ought to be a simple question, but in actual fact gets to the heart of why cheminformatics is actually a hard subject:- orientation: 2 molecules with slightly different coordinates (e.g. rotated); they are "the same" in most ways, and easy enough to compare- numbering: 2 identical structures with permuted atom order look the same to a person, but for software, establishing this requires a graph isomorphism test, which is a non-trivial algorithm that can be a performance problem with certain edge cases- hydrogens: especially explicit vs. implicit, which makes direct mapping complicated, and also introduces issues to implicit counting formulae and format deficiencies for specifying these- stereocentres: unspecified vs. specified R/S and E/Z stereocentres introduce issues with "sameness", especially if one side is unknown; this has to fit in with isomorphism tests and be able to work with/without explicit hydrogens; there are also multiple ways to indicate stereocentres, e.g. wedges, 3D geometry, parity and CIP- resonance: particularly common with aromatic molecules, multiple ways to draw double bonds that are equally valid, and must be equivalent; divergence between actual aromaticity and resonance equivalence becomes an issue in many edge cases- tautomers: two molecules that are not literally the same structures may become the same in aqueous solution due to tautomer transformations- bonding types: styles of functional groups, e.g. nitro with charge separated vs. hypervalent, introduces equivalence problems; also many non-organic molecules are hard or impossible to represent well with only single, double & triple bonds- abbreviations: chemists often use shortcuts like "Et" and "Ph", which need to be expanded out to structure definitions in order to verify equality- undefined graphics: common habits like using free-text (e.g. "R/S" or "+") or graphical objects (e.g. circle for aromaticity) do not map to cheminformatics concepts, and will not get the results expectedEach of these subsections could have a set of yes/no questions for evaluation purposes.(2) Green chemistry: about use of reaction metrics like process mass intensity, E-factor and atom economy should be used as a way to evaluate reactions, similar to how yield is used now. Some discussion about how the reaction has to be represented completely, with structures, role and stoichiometry for all components, in order for the algorithm to be able to calculate these automatically.Calculating the metrics manually could be used for evaluation.(3) Special topic/assignment: Bayesian modelsThe latest version of CDK allows building & applying Bayesian models to predict likelihood of a particular property (e.g. activity). It has no user interface, so it is necessary to compile a small control program to invoke the functionality. The assignment would include a training set (as an SDfile), the control program (.java) and instructions on how to compile, link & run it. The user would be responsible for gathering these together and following the instructions, in order to turn the training set into a Bayesian model. The next step would be to use any software of choice to create several molfiles for test molecules, and run the control program to apply the Bayesian model, and observe the predictions. No programming would be required, but the user would be exposed to the step by step process of creating new software using an existing toolkit, so they would come away with some idea of what cheminformaticians do for a living.For bonus points the user can propose a new molecular structure that scores better than the indicated test compounds; and for more bonus points, prove that it is not in the training set.
Hi All, It was brought to my attention that our syllabus is rather heavy on the product orientation, and I agree. Now much of cheminformatics is accessed through products, but I think we need this to be theoretical/underlying concepts at the introductory level, and that in turn can support understanding of products.Any input on this?I am also creating a pedagogy forum, and will shortly ask the faciliators to subscribe to it. I had a topic going on POGIL which Jennifer made some valuble comments, but want to move that to a new forum, and will try and do it tomorrow, I think it is prudent for us to try and make activites in our classrooms that can span campuses.Leah had also come up with the idea of a special symposium at the ACS National meeting where we, and our students could present. This aligns with the theme of the meeting, and I am seeking funds to assist with this. We also need to get the lecturers lined up this week, so they can create their papers in June, we can discuss them in July,and start posting and teaching in August. I think this is coming together, although I am having problems too. But that is life.Cheers, Bob
All,I have been trying to finalize the syllabus, and there are still issues. We are covering too many topics, and many of these really need to be two weeks.Do you think we can move module 7, Comparing and searching chemical entities to in front of 5 and 6 (pesenting chemistry in 2D and 3D?) In fact can we put the "presenting chemistry in 2D and 3D" at the very end? And omit the searching chemical reactions? Cheers,Bob
I'm OK with omitting the chemical reactions part, if people feel that will make the course easier and more doable.
On the other side, I would keep Module 7 (searching in databases) after module 5 and 6 (representations in 2D and 3D) since drawing structures is a way to query the searching engines and its output may include 3D representations (mol files).
- I would move "Application Programming Interfaces (APIs)" from 3.6 and "Using the databases programmatically" (7.2.4) to module 9