Fall 2015 Cheminformatics

Course Date: 
Tuesday, August 18, 2015 - 16 to Monday, December 7, 2015 - 16

Cheminformatics OLCC: An InterCollegiate Introductory Course to Some of the Topics of Cheminformatics

Welcome to the Cheminformatics OLCC course website.  Cheminformatics is a vast and complex subject as evidenced by a quick view of the symposia options in any CINF (Division of Chemical Information) program of an American Chemical Society  national meeting. The Fall 2015 program has topics ranging from library sciences and information skills, to Chemogenomics, Drug Discovery, Big Data, Chemical Identifiers, Text Mining and a plethora of other topics.  This leads to the tough question of what should an undergraduate level introductory course in cheminformatics teach? Clearly, there needs to be multiple introductory cheminformatics courses that focus on specific areas of this complex domain.  

Classical cheminformatics evolved out of the pharmaceutical industries, in silico medicinal chemistry, the ability of computational software to predict chemical properties and the needs to manage large chemical data sets. As the amount of data evolved new forms of scientific discovery in line with Microsoft Research's Fourth Paradigm of eScience evolved, and today, cognitive scientists could state that cheminformatics is changing the fundamental cognitive artifacts used to represent, manipulate and communicate chemical information. The world of big data is here, and the focus this particular cheminformatics course has taken deals with providing students an understanding of the nature of digital chemical data, and how to connect the workflow of practicing chemists to the infrastructure of online chemical data repositories.

This course is currently being offered at 4 universities, in the Fall of 2015. 

 

Table of Contents

1a:  Finding Information for Research in Chemistry
1b: Collaborative Citation Management
2:   Information Science for Chemists
3:   Data Management Best Practices
4a: Communicating Chemical Structure with Formulas and Names
4b: Communicating About Chemical Structure on Computers
5:   Chemical Identifiers   
6:   Comparing and Searching Chemical Entities
7:   Representing and Managing Digital Spectra
8:   Interacting with Databases: Desktop and Web Based Applications
9.   OLCC Project Discussions

 

Course Schedule at a Glance:

Week/Module of:
1.   8/17    Module 1:          WVU, UALR, Finding Information for Research in Chemistry - Li & McEwen
2.   8/24    Module 1:          UNF joins 
-Li & McEwen
3.   8/31    Module 1:          Centre joins, add Collaborative Citation Management - Li, McEwen & Shorb
4.   9/7      Module 2:          Information Science for Chemists - Chalk
5.   9/14    Module 2:          Information Science for Chemists - Chalk
6.   9/21    Module 3:          Data Management Best Practices - Briney, Li & McEwen
7.   9/28    Module 4a/4b:   Communicating with Chemical Formulas and Names - Hepler-Smith & McEwen
8.   10/4    Module 4b:        Identifying Chemical Compounds on Computer - Hepler-Smith & McEwen
9.   10/12  Module 5:          Chemical Identifiers (UALR/WVU have 12/13 off) - Kim
10. 
10/19  Module 5/6:       Chemical Identifiers/Comparing and Searching Chemical Entities - Kim
11. 1
0/26  Module 6:          Comparing and Searching Chemical Entities -Kim
12. 
11/2    Module 7:          Representing and Managing Digital Spectra - Chalk
13. 11/9    Module 8:          Interacting with Databases: 
Desktop and Web Based Applications (UNF has 11/11 off) - Lang and Cuadros
14. 11/16  Module 8:          Interacting with Databases: 
Desktop and Web Based Applications - Lang and Cuadros
15. 11/23  Module 9:          (Projects/Special Topics)  Thanks Giving (WVU has whole week off)
16. 11/30  Module 9:          (Projects/Special Topics)

 

Authors:   Ye Li, Univeristy of Michigan-Ann Arbor
                Leah McEwen, Cornell University

Learning Objectives:

  • Choose an appropriate resource to search for needed data and information in Chemistry
  • Describe how to search for needed data and information in a specific resource
  • Evaluate the validity of resources and sources identified
  • Summarize the data and information discovered to answer the research question
  • Cite the sources and resources used
  • Make a tutorial of these steps in the format of text description (with screenshots), or video/audio, and other multimedia

Justin M. Shorb, Hope College

Learning Objectives:

  • Students will be able to add citations via a browser plugin to their Zotero Library and organize them using folders.
  • Students will be able to use drag-and-drop to add bibliography-style references to MS Word and Google Docs.
  • Students will be able to set Zotero default Style and export citations and bibliographic references.
  • Students will be able to use the Zotero plugin for MS Word to dynamically add references to their writing and install new Zotero Styles from Zotero Style Repository. 
  • Students will use the Zotero website to form groups and organize shared references in shared folders. 
  • Students will be able to collaboratively write a short paper with an extensive bibliography and references, and report their paper in multiple Styles. 

 

Stuart Chalk, University of North Florida

Learning Objectives:

  • Understand how computers represent letters, numbers, and symbols
  • Be able to identify different information types
  • Appreciate the difference between binary and text file types
  • Be able to identify different computer languages used on computers to develop applications and construct webpages
  • Understand what a relational database is and the difference between an SQL and noSQL database
  • Appreciate data websites and the concepts behind how an application programming interface (API) can be developed to access such sites

Kristin Briney - University of Wisconsin-Milwaukee
Ye Li, University of Michigan-Ann Arbor
Leah McEwen, Cornell University

Learning Objectives

Develop awareness, through examples and lab exercises, of the challenges associated with managing scientific experimental data. Learn best practices for managing data across the entire research lifecycle, from conceiving of an experiment to sharing resulting data and analysis.  Students will develop skills in the following areas:

  • Know how to organize data to better find it later
  • Follow best practices for storage and backup to protect files from loss
  • Document data so that anyone can follow what you did
  • Properly cite and follow license conditions when using another researcher’s data

Evan Hepler-Smith, Princeton University
Leah R. McEwen, Cornell University

Learning Objectives:

  • To recognize various different kinds of chemical names, formulas, and other identifiers.
  • What you do and do not know about a chemical compound based on one of these names, formulas, or identifiers.
  • How one kind of chemical name, formula, or other identifier can be translated into another, and what sorts of information can be inadvertently lost or added in translation.
  • How chemists interpret various kinds of chemical names, formulas, and other identifiers in chemically meaningful ways.
Evan Hepler-Smith, Princeton University Leah R. McEwen, Cornell University Alex M. Clark, Molecular Materials Informatics

Learning Objectives:

  • To recognize various different kinds of chemical identifiers used on computers
  • How computer systems interpret various kinds of chemical names, formulas, and other identifiers in chemically meaningful ways.
  • What kind of information about chemical structure a computer program can and cannot derive from different representations and identifiers.
  • How a connection table represents chemical structure
  • The factors that you should consider in selecting an appropriate kind of chemical name, formula, or identifier to use, including the information you want to communicate, the kind of chemical entity you’re referring to, the audience with whom you’re communicating, and the medium in which you’re communicating.

Sunghwan Kim, National Center for Biotechnology Information

Learning Objectives:

  • Review various chemical identifiers used for representing small molecules.
  • Explain what common names, systematic names, and INN names are.
  • Explain what SMILES, SMARTS and SMIRKS are.
  • Explain what InChI and InChIKey are.
  • Review SMILES specification rules.
  • Compare and contrast SMILES and InChI.
  • Demonstrate how to interpret SMILES, SMARTS, InChI strings into their corresponding chemical structures.

Sunghwan Kim, National Center for Biotechnology Information

Learning Objectives:

  • Review publicly available chemical databases in different domains.
  • Review text search, identity search, substructure/superstructure search, and similarity search.
  • Review basic knowledge of molecular similarity methods.
  • Perform chemical searches using the PubChem homepage and Chemical Structure Search page.

Stuart Chalk, University of North Florida

By the end of this module students will:

  • Understand the formats for representing spectral data
    • JCAMP-DX, AnIML, ANDI, NetCDF, CSV, Tab delimited (XY format)
  • Where to obtain reliable spectral information
    • AIST Spectral Database for Organic Compounds (SDBS)
    • NIST Chemistry WebBook
    • ChemSpider
  • Simulated spectra
  • Spectral software

 

Jordi Cuadros, Universitat Ramon Llull
Andrew Lang, Oral Roberts University

Learning Objectives

●Have a basic understanding of the main interfaces and technologies involved in using Web APIs
●Gain a knowledge of the main chemistry Web APIs
●Be able to pull chemical information programmatically from some chemistry online databases and services

This page is where students and faculty can discuss projects