Course Date:

Monday, January 9, 2017 - 11 to Tuesday, May 9, 2017 - 11

Cheminformatics OLCC: An Introduction to Chemical Data and Public Compound Databases.

On this page you will find reading material and assignments for the Spring 2017 intercollegiate cheminformatics course. There are 12 schools participating in this course and each school has it's own syllabus. Parts 1 & 2 represent the core modules that students will discuss with authors, and the course list will be subscribed to these modules. Actual dates may vary between schools due to different academic calendars. Part 3 will provide links to collaborative project pages, and Part 4 provides access to special topic modules, the class list will not be subcribed to the content in parts 3 & 4, and students and faculty interested in these should self-subscribe. Discussions on these pages will be ongoing throughout the duration of the course.

WEB SITE Tutorials - Go to this page to learn how to use the website and participate in course activities.

Part I: Cheminformatics and Data Representations (1/30/17-2/27/17)

Part II: Public Compound Databases (Focus on PubChem) (2/27-17-4/1/17)

Understanding Public Chemical Databases
This module will expose students to informatics associated with online chemical databases.
Database Searching for Chemicals: Text Search
This module will guide students through text based search features
Database Searching for Chemicals: Structure Search
This module will guide students through structure based search features
Accessing PubChem Programmatically
This module will build upon the Programmatic Access to Web-Based Chemical Information

Part III: Student Projects (4/1/17-5/1/17)
Students will have a chance to team up with students from other campuses and work with experts in the field to develop original projects

Part IV: Special Topics Modules

These modules will be available for discussion throughout the semester, but unlike the core modules in sections 1 & 2, the class list will not be subcribed to these modules. Instead students and faculty need to subscribe themselves to the special topic modules that interest them.

1. Introduction

Nathan Brown, London Institute of Cancer Research

2.1 Chemical Representations on Computer: Part I

Evan Hepler-Smith, Harvard University
Leah R. McEwen, Cornell University

Acknowledgements: Alex Clark, Sunghwan Kim

Learning Objectives:

Describe and be able to identify ambiguous, unambiguous, and canonical representations of chemical structure, as well as explicit and implicit information contained in these representations.
Describe each of the four major approaches to machine representation of chemical structure (connection tables, graphic visualizations, line notation, and descriptive representations), as well as the advantages and drawbacks of each of these forms.
Describe how database record IDs relate to representations of chemical structure.
Describe lookup and translation approaches to exchanging chemical identifiers, including what countertranslation is and why it can be useful.

2.2 Chemical Representations on Computer: Part II

Evan Hepler-Smith, Harvard University
Leah R. McEwen, Cornell University

Learning Objectives

Understand the principles behind connection table representation of chemical structures
Translate structural formulas into simplified connection tables and vice-versa
Recognize the parts of a MOL file, a common connection table file format
Map the correspondence between features of a structural formula and entries in a MOL file
Adjust connection tables to make simple modifications to chemical structures
Track how changes in a chemical sketch program and the underlying connection table data relate to each other.

2.3 Chemical Representations on Computer: Part III

Sunghwan Kim, National Center for Biotechnology Information

Learning Objectives

Explain what SMILES, SMARTS and SMIRKS are.
Explain what InChI and InChIKey are.
Review SMILES specification rules.
Compare and contrast SMILES and InChI.
Demonstrate how to interpret SMILES, SMARTS, InChI strings into their corresponding chemical structures.

3. Data Representation on Computer for Chemists

Stuart Chalk, University of Northern Florida

Learning Objectives

By the end of this module students will:

Understand how computers represent letters, numbers, and symbols
Be able to identify different information types
Appreciate the difference between binary and text file types
Be able to identify different computer languages used on computers to develop applications and construct webpages
Understand what a relational database is and the difference between an SQL and noSQL database
Appreciate data websites and the concepts behind how an application programming interface (API) can be developed to access such sites
Representing & Managing Digital Spectra
- Understand the formats for representing spectral data
  - JCAMP-DX, AnIML, ANDI, NetCDF, CSV, Tab delimited (XY format)
- Where to obtain reliable spectral information
  - AIST Spectral Database for Organic Compounds (SDBS)
  - NIST Chemistry WebBook
  - ChemSpider
- Simulated spectra
- Spectral software

4. Understanding Public Chemical Databases

Sunghwan Kim, National Center for Biotechnology Information

Learning Objectives

Explain what primary and secondary databases are.
Explain what data provenance is.
Review publicly available chemical databases in different domains.
Understand how PubChem data are organized.
Learn how to critically assess data in public databases.

5. How to Search PubChem for Chemical Information (Part 1)

Sunghwan Kim, National Center for Biotechnology Information

Learning Objectives

Explain what Entrez indices, filters, and links are.
Explain what depositor-supplied and MeSH synonyms in PubChem are.
Retrieve compounds that have a particular type of information (e.g., boiling point, melting point, and so on).
Submit multiple text queries using the Identifier Exchange Service.
Retrieve annotated information contributed by a given data source.
Combine multiple queries using Entrez history.

6. How to Search PubChem for Chemical Information (Part 2)

Sunghwan Kim, National Center for Biotechnology Information

Learning Objectives

Review identity search, substructure/superstructure search, and similarity search.
Review basic knowledge of molecular similarity methods.
Learn how to retrieve bioactivity data from PubChem.
Learn how to use PubChem’s Structure Clustering and Structure-Activity Relationship (SAR) Analysis tools.
Learn how to analyze bioactivity data using PubChem’s web-based interfaces.

7. Programmatic Access to Public Chemical Databases

Sunghwan Kim, National Center for Biotechnology Information

Learning Objectives

Know how to formulate a PUG-REST request URL.
Know how to access PubChem data from a spread sheet (in Google Sheet)
Know how to access PubChem data from a python script.

Cheminformatics OLCC Student Projects

Special Topics Modules

This page will provide links to Special Topics Modules that students and faculty can participate in. Do not subscribe to this page, but to the modules this page links to.

Spring 2017 Cheminformatics OLCC

Learning Objectives

Learning Objectives

Learning Objectives

Learning Objectives