You are here

Spring 2017 Cheminformatics OLCC

Course Date: 
Monday, January 9, 2017 - 11 to Tuesday, May 9, 2017 - 11

[[{"attributes":{},"fields":{}}]]Cheminformatics OLCC: An Introduction to Chemical Data and Public Compound Databases.


On this page you will find reading material and assignments for the Spring 2017 intercollegiate cheminformatics course. There are 12 schools participating in this course and each school has it's own syllabus. Parts 1 & 2 represent the core modules that students will discuss with authors, and the course list will be subscribed to these modules.  Actual dates may vary between schools due to different academic calendars.  Part 3 will provide links to collaborative project pages, and Part 4 provides access to special topic modules, the class list will not be subcribed to the content in parts 3 & 4, and students and faculty interested in these should self-subscribe.  Discussions on these pages will be ongoing throughout the duration of the course.

WEB SITE Tutorials - Go to this page to learn how to use the website and participate in course activities.

Part I: Cheminformatics and Data Representations  (1/30/17-2/27/17)

  1. Introduction
  2. Chemical Representations on Computer: Part I
    Chemical Representations on Computer: Part II
    Chemical Representations on Computer: Part III
  3. Data Representation on Computer for Chemists

Part II: Public Compound Databases (Focus on PubChem) (2/27-17-4/1/17)

  1. Understanding Public Chemical Databases
    This module will expose students to informatics associated with online chemical databases.
  2. Database Searching for Chemicals: Text Search
    This module will guide students through text based search features
  3. Database Searching for Chemicals: Structure Search
    This module will guide students through structure based search features
  4. Accessing PubChem Programmatically
    This module will build upon the Programmatic Access to Web-Based Chemical Information

Part III: Student Projects (4/1/17-5/1/17)
   Students will have a chance to team up with students from other campuses and work with experts in the field to develop original projects

Part IV: Special Topics Modules

   These modules will be available for discussion throughout the semester, but unlike the core modules in sections 1 & 2, the class list will not be subcribed to these modules.  Instead students and faculty need to subscribe themselves to the special topic modules that interest them.

Evan Hepler-Smith, Harvard University
Leah R. McEwen, Cornell University

Acknowledgements: Alex Clark, Sunghwan Kim

Learning Objectives:

  • Describe and be able to identify ambiguousunambiguous, and canonical representations of chemical structure, as well as explicit and implicit information contained in these representations. 
  • Describe each of the four major approaches to machine representation of chemical structure (connection tables, graphic visualizations, line notation, and descriptive representations), as well as the advantages and drawbacks of each of these forms.
  • Describe how database record IDs relate to representations of chemical structure.
  • Describe lookup and translation approaches to exchanging chemical identifiers, including what countertranslation is and why it can be useful.

Evan Hepler-Smith
Leah McEwen

Learning Objectives

  • Understand the principles behind connection table representation of chemical structures
  • Translate structural formulas into simplified connection tables and vice-versa
  • Recognize the parts of a MOL file, a common connection table file format
  • Map the correspondence between features of a structural formula and entries in a MOL file
  • Adjust connection tables to make simple modifications to chemical structures
  • Track how changes in a chemical sketch program and the underlying connection table data relate to each other. 

Sunghwan Kim, National Center for Biotechnology Information

Learning Objectives

  • Explain what SMILES, SMARTS and SMIRKS are.
  • Explain what InChI and InChIKey are.
  • Review SMILES specification rules.
  • Compare and contrast SMILES and InChI.
  • Demonstrate how to interpret SMILES, SMARTS, InChI strings into their corresponding chemical structures.

Learning Objectives


By the end of this module students will:

  • Understand how computers represent letters, numbers, and symbols
  • Be able to identify different information types
  • Appreciate the difference between binary and text file types
  • Be able to identify different computer languages used on computers to develop applications and construct webpages
  • Understand what a relational database is and the difference between an SQL and noSQL database
  • Appreciate data websites and the concepts behind how an application programming interface (API) can be developed to access such sites
  • Representing & Managing Digital Spectra
    • Understand the formats for representing spectral data
      • JCAMP-DX, AnIML, ANDI, NetCDF, CSV, Tab delimited (XY format)
    • Where to obtain reliable spectral information
      • AIST Spectral Database for Organic Compounds (SDBS)
      • NIST Chemistry WebBook
      • ChemSpider
    • Simulated spectra
    • Spectral software

Sunghwan Kim, National Center for Biotechnology Information

Learning Objectives

  • Explain what primary and secondary databases are.
  • Explain what data provenance is.
  • Review publicly available chemical databases in different domains.
  • Understand how PubChem data are organized.
  • Learn how to critically assess data in public databases.

Sunghwan Kim,
National Center for Biotechnology Information

Learning Objectives

  • Explain what Entrez indices, filters, and links are.
  • Explain what depositor-supplied and MeSH synonyms in PubChem are.
  • Retrieve compounds that have a particular type of information (e.g., boiling point, melting point, and so on).
  • Submit multiple text queries using the Identifier Exchange Service.
  • Retrieve annotated information contributed by a given data source.
  • Combine multiple queries using Entrez history.

Sunghwan Kim,
National Center for Biotechnology Information

Learning Objectives

  • Review identity search, substructure/superstructure search, and similarity search.
  • Review basic knowledge of molecular similarity methods.
  • Learn how to retrieve bioactivity data from PubChem.
  • Learn how to use PubChem’s Structure Clustering and Structure-Activity Relationship (SAR) Analysis tools.
  • Learn how to analyze bioactivity data using PubChem’s web-based interfaces.

Sunghwan Kim,
National Center for Biotechnology Information

Learning Objectives


  • Know how to formulate a PUG-REST request URL.
  • Know how to access PubChem data from a spread sheet (in Google Sheet)
  • Know how to access PubChem data from a python script.



This page will provide links to Special Topics Modules that students and faculty can participate in. Do not subscribe to this page, but to the modules this page links to.