Discussion Syllabus

20 posts / 0 new
Last post
Robert Belford
Robert Belford's picture
Discussion Syllabus

This page is being used to discuss the cheminformatics OLCC.  If you log in, you will see a block on the right with files that have been uploaded containing various syllabi.  If you go to the "Calendar Forum Topic", you will see the dates of the schools involved, and the syllabus must conform to these dates. If you comment, an email will be sent to everyone following comments (you need to log in to comment, and to see who is following comments, or updates).  You can also send emails from the site to anyone following this page that you want, and include files in those emails.

Relevant Sites

Introducing Cheminformatics: David Wild (Fall 2013 course)

Generic Course Description:

Cheminformatics OLCC: 3 credit-hour hybrid intercollegiate introductory course in cheminformatics hosted by the ACS DivCHED Committee on Computers in Chemical Education. This is a "team" taught course where students interact online with cheminformatic content expert lecturers while a local faculty member runs the remainder of the course. Course topics include chemical information management, digital representation of chemicals and chemical reactions, structure/substructure searching, scientific databases and data discovery, including chemical structural, physical and spectral data. Predictive technologies, an introduction to semantic web, ontologies, scientific mark-up languages, electronic laboratory notebook technologies and the provenance of experimental data and meta-data representation, generation and standards. Due to the intercollegiate nature of the course emphasis will be given to open access resources and open source technologies that are available to all institutions.

Intended audience
Undergraduate chemistry students (probably not freshman students)

Expected previous experience
Some use of common desktop applications and at standard user level.
A semester of organic chemistry or similar and some chemistry lab experience.

Objectives of the course

At the end of the course, the students:

  • will know and be able to use the most common formats which are used to store, transform and manage chemical information in digital environments;
  • will have a basic knowledge of the most common software tools and web services (at least, those easily available) that current chemists and chemistry researchers use.

The course will not use/require any computational skill (no programming) nor will make use of any chemistry computational tools. It is intended to address what any chemist or chemistry student should know to use current digital chemistry information resources.

Course Content at a Glance

Week/Module of:
1.   8/17    Module 1:  WVU, UALR, Preliminary Information Skills -McEwen
2.   8/24    Module 1:  OSU, UNF join 
3.   8/31    Module 1:  Centre joins -McEwen
4.   9/7      Module 1:  Note labor day is 9/7 -McEwen
5.   9/14    Module 2:  Information Science for Chemists - Chalk
6.   9/21    Module 3:  Cheminformatics: What it is, and why we should care - Clark
7.   9/28    Module 4:  2D Structures Part 1: Composing - Clark
8.   10/4    Module 4:  2D Structures Part 2: Manipulating - Clark
9.   10/12  Module 5:  Chemical Identifiers (UALR/WVU have 12/13 off, OSU has 15,16 off) - Kim
10/19  Module 5/6: Chemical Identifiers/Comparing and Searching Chemical Entities - Kim
11. 1
0/26  Module 6:  Comparing and Searching Chemical Entities -Kim
11/2    Module 7:  Representing and Managing Digital Spectra - Chalk
13. 11/9    Module 8:  Interacting with Databases: 
Desktop and Web Based Applications (UNF, OSU have 11/11 off) - Lange and Cuadros
14. 11/16  Module 8:  Interacting with Databases: 
Desktop and Web Based Applications - Lange and Cuadros
15. 11/23  Module 9: (Projects/Special Topics)  Thanks Giving (WVU has whole week off)
16. 11/30  Module 9: (Projects/Special Topics)

Tentative Course Content (by module)

Module 1: Preliminary Information Skills   (Leah McEwen)

1- library resources  
content: local holdings and services
exercise: case study (ex. chemical toxicology/safety profiles), develop roadmap (could be multi-media)
2- citation management
content: zotero as example; searching, full text retrieval (PMC); reference formats
exercise: comparison chart, transfer matrix
3- sharing docs & data
content: rights & responsibilities, identity & linking, provenance  
exercise: data manipulation case study 
4- data management 101
content: notebooks, metadata, file formats, data carpentry (spreadsheet hygiene), discovery & analysis overview
exercise: retro-description of notebook entry or supplemental information file into spreadsheet form using dc, SI and simplified s88 analytical data description 
5- chemical representation
content: visual structure representations in standard forms (names, line formulas, graphical/connection tables), limitations with translation among these and between humans and machines
exercise: notation jumble

Module 2: Information Science for Chemists  (Stuart Chalk)

  1. Basics of computer systems
  2. Information Data types
    1. Strings, text, enum and set
    2. Numeric, integer, decimal, float and double
    3. Dates and timestamps
    4. Boolean
  3. Understanding common files and formats
    1. Saving information in files
    2. Text files versus binary files
    3. Open versus proprietary format
  4. Computer languages
    1. For desktop applications
    2. For websites
  5. Information in databases
    1. Types of database
    2. Websites as databases
  6. Data websites and Application Programming Interfaces (APIs)

Module 3: Cheminformatics: What is it and Why should we care?  (Alex Clark) 

Module 4. Presenting chemistry in 2D   (Alex Clark)   

 1.   Using molecular editors

 2.  Common file formats for 2D representation of chemical entities


Module 5. Identifying chemical entities   (Sunghwan Kim) 

  1. Names – IUPAC, CAS, Beilstein, Variability of systematic names based on settings, PIN names
  2. Formulas
    1. Markush structures
  3. Line notations
    1. SMILES and related notation
    2. International Chemical Identifier (InChI) and InChIkey
    3. Other: ROSDAL, WLN…
  4. Non-readable identifiers
    1. Registry numbers
    2. CAS RN -  check digit
    3. Beilstein IDs
    4. Other - ChemSpiderID, PubChemID


Module 6. Comparing and searching chemical entities  (Sunghwan Kim) 

  1. Using chemical entities databases

    1. ChemSpider

    2. PubChem

    3. NIST Chemistry WebBook

    4. ZINC

    5. ChemExper

    6. ChemIDPlus

    7. ChEBI

  2. Understanding chemical searches

    1. Exact search

    2. Substructure search

    3. Similarity search

      1. Fingerprints

      2. Distance measurements

      3. Virtual screening

    4. Using the databases programmatically
      3.1.    APIs
      3.2.    Web scraping

Module 7. Representing and managing digital spectra   (Stuart Chalk) 

  1. Applications to PROCESS and review spectral data – ACD/NMR Processor, MestreLabs M-Nova

  2. Using applications to visualize and analyze spectral data

    1. JSpecView as Java Applet plus the Javascript version

    2. ChemDoodle Spectral viewer

    3. JDXView

    4. ACD/Labs spectral viewer

  3. Spectra databases

    1. SDBS

    2. Learn Chemistry Spectraschool

    3. ChemSpider

    4. NMRDB

    5. NIST Chemistry Webbook

  4. Common file formats for chemical spectra – binary vs standard file formats – loss of information – phasing, referencing, analysis etc.

    1. JCAMP-DX

    2. AnIML

    3. CML

Module 8. Interacting with Databases: Desktop and web based applications   (Andrew Lange and Jordi Cuadros) 



Module 10: Projects/Special Topics

This module could be where students present their own projects, or work with lecturers on special topics that are of interest to them.  These could actually be started during module 1 and be developed through out the course. We could set up a virtual poster session where students share and discuss their work across campuses.

Possible Special Topics:
1. Green Chemistry Resources
   1.1 Green Chemistry Databases
  1.2 Green Chemistry Assistant
2. Mobile Devices
  2.1. IOS
  2.2. Android

arkup languages, electronic laboratory notebooks (ELNs), provenance of experimental data and metadata representation, organization and generation.

Robert Belford
Robert Belford's picture
Comment by Jonathan Gutow

Item VIII seems like a real can of worms.  Every web page with chemical information on it is a subset of this, despite the fact that almost all the markup is information for formatting and display.  Another extensible data format that isn't included (and I don't think it should be for this) is hdf/hdf5, which is for quantitative data sets. Anyway, I'm not sure how to include this in a succinct and useful way in a one semester course.

Stuart Chalk
Stuart Chalk's picture

I just added a UNF course description.

Robert Belford
Robert Belford's picture
Chemical Hygienic Factors

Stuart,By Chemical Hygienic Factors I mean chemical health and safety.  That material (item VI.d) probably does not belong there, but it belongs somewhere.Also items VII and IX are really there to cause discussion, and I am hoping you can figure where/how to embed XML data representations into the course. At the end of the day, I think we need one basic syllabus for all schools, although each school can customize the content.

Stuart Chalk
Stuart Chalk's picture
Metadata Explained

The following SlideShare presentation is from a librarian (the great Diane Hillman is more a metadata architect) perspective but is useful I think in terms of the concepts of metadata, linked data, etc. that needs to be taught in this course (in my opinion).  If folk agree a Chemistry version of this would be valuable and I would be willing to put it together.  Leah, I hope you agree :)http://www.slideshare.net/smartbroad/whats-goin-on-42614446Thoughts?

Rick Spinney

I agree with Stuart that this module needs to be included and would appreciate his effort to put one together.

Robert Belford
Robert Belford's picture
Updates to OLCC Site

I just added a feature to upload draft files, and changed the terminology of the flags for subscribing. These are the files that were in a Drobox, along with one Jordi sent.  I suggest we all look at Jordi's.  It has some thought in it.  We need to start collecting Fall 2015 schedules, and figure what we are going to do for the first several weeks, when not all schools will be in session.  I was thinking of using this as a more traditional library information science module, sort of "get to know your librarian".We are still trying to get the bulk email feature to work, I will send an email to the entire site later today, even if that feature is not figured out by then.Cheers,Bob

Robert Belford
Robert Belford's picture
Academic Calendar

All,  I have worked on the calendar page, we have 4 schools on board, and I am trying to get at leaset one more, as 5 seems like a good number.   We all start the week of the 17th or the 24th.  I think we need to pow-wow as what to do then, I am thinking library resources/information management, and maybe running it an extra week.  so that "module" is either 2 or 3 weeks, depending where you are.We are in lock step through September.  I suggest this be David Wild's Cheminformatics type stuff.The week of August 12 is a bit messy. 3 schools have a 2-day break, but they are at other times.  I suggest that week be merged with the previous.  This should probably be the final project week of the molecule representations part of the class, or first of the next phase.  Which I am going to call the chemical/big data phase, for lack of better word.The second week of Nov. 2 schools have Veterens day off.WVU has the entire Thanksgiving week off.  I suggest all modules be done by then, and classes work on projects.UNF finishs the next week, Dec. 4.In summary, we have from Aug 25 to Dec. 7, As soon as we have a bit more thought/agreement on this I will contact the CINF people, but I want an idea of how much time we have, and it is you folks that are the ones responsible for the grades and giving the students their money's worth.  I have also been in contact with the Openchrom people and they are interested in running a module, http://openchrom.net/.   Maybe we should all have a conference Skype call.Cheers,Bob

Robert Belford
Robert Belford's picture
eGhad-forgot link

Here is the link to the calendar, I meant to include it in the above posthttp://olcc.ccce.divched.org/content/academic-calendarsCheers,

Leah McEwen
Leah McEwen's picture
Leah the Librarian finally weighing in...
I took at look at Jordi Caudros' version and this resonates with me as a comprehensive orientation on the whole topic. Here are my thoughts below on what a librarian can offer within that approach.  These ideas could be part of the introduction, scattered among the other modules, and/or follow the outline above with 2-3 weeks first in a more 'traditional library' scenario. 
1- chemical entity representation literacy, with some history; this seems part of the introduction already, but introducing such concepts as these which percolate through the rest of the course:
a) Semantics. Can you interpret any given form of representation in terms of [a small number of fundamental Gold Book terms, e.g. atoms, bonds, electron configurations, ligands, crystal structures, etc.]  Can you write any given form of representation given a sufficiently detailed description of a compound in these fundamental terms?
b) Translation. Can you take one form of representation for a structure and rewrite it as another form?
c) Assumptions. What kind of assumptions/approximations/simplifications go into writing a given representation?  Example: standard structural formulas approximate a "bond" as a discrete object; so do some connection tables; you need to be aware of this when you're working with aromatic compounds. This applies to all representations; it's not that the kekule structure of benzene is "wrong" and the resonance structure w/ delocalized pi bond system is "right" or even "less wrong," but rather that they are based on different assumptions/simplifications/approximations; where did these come from and why they were introduced in the first place?
d) Selection. Can you evaluate the relative merits of different forms of representation for expressing particular properties of a compound?
e) Juxtaposition. Can you recognize significant similarities and differences among representations of different compounds in order to make chemical comparisons or interpret a series of representations as a series of chemical changes?
f) Manipulation. Can you take representations apart, put them together, and otherwise manipulate them in order to creatively explore possible chemical relationships, properties, and changes, to perform first-order comparative evaluations in terms of criteria like yield, side products, interfering functionality, electronic properties, other physical and chemical properties, suitability for a particular chemical purpose, etc? In other words, can you use representations to model chemical substances and transformations in order to select fruitful avenues for experimental work?
g) Search. Can you efficiently locate a compound or group of compounds within a large reference database using an appropriate representation? Can you predict how your method of search (type of query + particular repository searched) might have returned insufficient or inappropriate results and cross-check your results using another method?
2- data management and documentation (including ELNs)
covering file naming, backing up/archiving data, metadata, data sharing, notebooks, and the qualitative aspects of information gathering and metrology that feed into all the rest... 
3- I and colleagues could cover/collaborate information sources and rights, suggested by Jordi in modules 9 & 10
Robert Belford
Robert Belford's picture

I know some of the interested facilitators are involved with computational chemistry, and it seems like Avogadro may be an open source package that merges computational chemistry with cheminformatics.  Are any of you familiar with it?  Here is a pdf from JCI http://www.jcheminf.com/content/pdf/1758-2946-4-17.pdf  

Jordi Cuadros
Jordi Cuadros's picture
Avogadro and/or editing molecules in 3D

I have slightly tested Avogrado a few times I can not see any improvement over what Jmol offers. I even prefer the edition capabilities of Jmol (and its limited drawing interface) over Avogadro.Anyway and besides including Avogadro or not, a more general question to me would be: Is there anyone who edits molecules in 3D? Should the course include it (or just mention it)? 

Jennifer Muzyka

I have minimal experience with Avogadro.  But I believe it is capable of working as a front-end for submitting calculations to GAMESS.  In my book, that makes it much more powerful than Jmol, which is much more familiar to me.Jennifer 

Stuart Chalk
Stuart Chalk's picture
Syllabus thoughts

Looking at Jordi Caudros' syllabus it seems to capture most of the content that I think is relative to this course although I still think that from a chemical informatics perspective1) the 3D module should be added to the end of the 2D as I dont think it is enough to be stand alone2) the ontology/semantic web portion of module 9 should be broken out into a separate module because it is fundamentally different information than the other portions of module 9.  It is also the more recent/cutting edge work/perspective that goes at the end of the course?3) I think section 3 of module 6 should be bigger and go into detail about REST which is becoming the standard for API's.  I would include how the backend database (e.g. MySQL) and the scripting language (e.g. PHP) are used to generate dynamic pages.  This demystifies such sites and students understand the whole process of searching for chemicals; request -> process -> response.  (I have also taught the Model-View-Controller (MVC) style of programming which is great for REST websites but this may be too much :) )Other notes- I have messed with Avogadro a bit and i'm not jumping up and down about it.  However, does read and write...- ...Chemical JSON http://wiki.openchemistry.org/Chemical_JSONStuart 

Robert Belford
Robert Belford's picture
First Module


I am about to go through Jordi and Tony's modified syllabus and add it to the front page here. If you look at the calendar, you will see that we have 11 modules possible, if we start 8/31 and end before Thanksgiving.  Please note, this is the start of the third week for 2 schools, and second week for the other 2. In accordance with my initial vision, I would like to propose the first module be a "meet your library" type of module.  So some schools would have three weeks on this, other's two.

During this time I suggest each class go to their library, become familiar with their librarian and chemistry resources.  At UALR our library has a special training room we could use, and our librarian could give a low-level training session on Scifinder and Refworks.  But I would also like Justin to offer something on Zotero as I know he is good at training, and more than that, allow librarians from other schools to interact and make this the most useful possible experience (people like Leah could interact with Bret, our libraian and ....). This section could blow up into the whole course, and that is not our objective.  But a survey of your home schools resources, and a bit on information management is a logical thing to do during the startup, and is something that different schools could easily spend different amounts of time on.  For my students at UALR, I was thinking of seeing if they could generate some kind of guide to a resource that UALR has, which could be uploaded to XCITR, and be a project based graded module.  (If it passes my muster, we actually submit it to XCITR for peer review, the flip-side, is students and local librarians get exposed to resources like XCITR). 

Then we have 10 modules to move forward where all classes are in lock-step.  But this first module, must be flexible, in that some schools are having classes while others are not.

I would really like the other facilitators input on this.  Shall we make week one (which is really the second or third week of the semester-depending on your school), a flexible module on local resources and information management?  


Stuart Chalk
Stuart Chalk's picture
My vote on Module 1

I vote +1 on this.

Alex Clark
Alex Clark's picture
Modules I could do: fundamentals, green chemistry, special topic

(1) Fundamentals of chemical informatics: two molecules, are they the same?... it seems like the module layout at the moment is rather tied to products & services, so I'm not sure where this would fit. Maybe it could replace module 5, and what was there before could be tacked onto another module.The basic idea is to explain the issues that go into answering what most people think ought to be a simple question, but in actual fact gets to the heart of why cheminformatics is actually a hard subject:- orientation: 2 molecules with slightly different coordinates (e.g. rotated); they are "the same" in most ways, and easy enough to compare- numbering: 2 identical structures with permuted atom order look the same to a person, but for software, establishing this requires a graph isomorphism test, which is a non-trivial algorithm that can be a performance problem with certain edge cases- hydrogens: especially explicit vs. implicit, which makes direct mapping complicated, and also introduces issues to implicit counting formulae and format deficiencies for specifying these- stereocentres: unspecified vs. specified R/S and E/Z stereocentres introduce issues with "sameness", especially if one side is unknown; this has to fit in with isomorphism tests and be able to work with/without explicit hydrogens; there are also multiple ways to indicate stereocentres, e.g. wedges, 3D geometry, parity and CIP- resonance: particularly common with aromatic molecules, multiple ways to draw double bonds that are equally valid, and must be equivalent; divergence between actual aromaticity and resonance equivalence becomes an issue in many edge cases- tautomers: two molecules that are not literally the same structures may become the same in aqueous solution due to tautomer transformations- bonding types: styles of functional groups, e.g. nitro with charge separated vs. hypervalent, introduces equivalence problems; also many non-organic molecules are hard or impossible to represent well with only single, double & triple bonds- abbreviations: chemists often use shortcuts like "Et" and "Ph", which need to be expanded out to structure definitions in order to verify equality- undefined graphics: common habits like using free-text (e.g. "R/S" or "+") or graphical objects (e.g. circle for aromaticity) do not map to cheminformatics concepts, and will not get the results expectedEach of these subsections could have a set of yes/no questions for evaluation purposes.(2) Green chemistry: about use of reaction metrics like process mass intensity, E-factor and atom economy should be used as a way to evaluate reactions, similar to how yield is used now. Some discussion about how the reaction has to be represented completely, with structures, role and stoichiometry for all components, in order for the algorithm to be able to calculate these automatically.Calculating the metrics manually could be used for evaluation.(3) Special topic/assignment: Bayesian modelsThe latest version of CDK allows building & applying Bayesian models to predict likelihood of a particular property (e.g. activity). It has no user interface, so it is necessary to compile a small control program to invoke the functionality. The assignment would include a training set (as an SDfile), the control program (.java) and instructions on how to compile, link & run it. The user would be responsible for gathering these together and following the instructions, in order to turn the training set into a Bayesian model. The next step would be to use any software of choice to create several molfiles for test molecules, and run the control program to apply the Bayesian model, and observe the predictions. No programming would be required, but the user would be exposed to the step by step process of creating new software using an existing toolkit, so they would come away with some idea of what cheminformaticians do for a living.For bonus points the user can propose a new molecular structure that scores better than the indicated test compounds; and for more bonus points, prove that it is not in the training set.  

Robert Belford
Robert Belford's picture
product oriented and facilitator (Pedagogy) forum

Hi All, It was brought to my attention that our syllabus is rather heavy on the product orientation, and I agree.  Now much of cheminformatics is accessed through products, but I think we need this to be theoretical/underlying concepts at the introductory level, and that in turn can support understanding of products.Any input on this?I am also creating a pedagogy forum, and will shortly ask the faciliators to subscribe to it.  I had a topic going on POGIL which Jennifer made some valuble comments, but want to move that to a new forum, and will try and do it tomorrow,  I think it is prudent for us to try and make activites in our classrooms that can span campuses.Leah had also come up with the idea of a special symposium at the ACS National meeting where we, and our students could present.  This aligns with the theme of the meeting, and I am seeking funds to assist with this. We also need to get the lecturers lined up this week, so they can create their papers in June, we can discuss them in July,and start posting and teaching in August.  I think this is coming together, although I am having problems too.  But that is life.Cheers, Bob

Robert Belford
Robert Belford's picture

All,I have been trying to finalize the syllabus, and there are still issues.  We are covering too many topics, and many of these really need to be two weeks.Do you think we can move module 7, Comparing and searching chemical entities to in front of 5 and 6 (pesenting chemistry in 2D and 3D?)  In fact can we put the "presenting chemistry in 2D and 3D" at the very end?  And omit the searching chemical reactions? Cheers,Bob

Jordi Cuadros
Jordi Cuadros's picture
Re: Syllabus


I'm OK with omitting the chemical reactions part, if people feel that will make the course easier and more doable.

On the other side, I would keep Module 7 (searching in databases) after module 5 and 6 (representations in 2D and 3D) since drawing structures is a way to query the searching engines and its output may include 3D representations (mol files).

Other ideas:
- I would move "Application Programming Interfaces (APIs)" from 3.6 and "Using the databases programmatically" (7.2.4) to module 9


Log in to post comments