1. Introduction

Nathan Brown, London Institute of Cancer Research

This introduction has two purposes; to introduce you to cheminformatics, and to introduce you to the course.

Part I: The Introduction to Cheminformatics.

In this page we are posting an introduction to cheminformatics from the perspective of an in silico Medicinal Chemist, Nathan Brown, who has also shared his recent text; "In Silico Medicinal Chemistry", which students who have logged in can access from the bottom of this page.  Please note, you only need to click on the file name.  If you click the radio button and "save", you will delete the file. Although the modules will follow the initial chapters of Dr. Brown's  text, the course will focus on public chemical compound databases, and how chemicals, and chemical data are represented on computer.  That is, there is a lot more to the field of cheminformatics than what this course will attempt to cover, and this introduction is to help us see where the course material fits in the larger field of cheminformatics.

Part II: The Introduction to this Course and the Participants.

This is an intercollegiate course where students, faculty and non-academic professionals have an chance to interact through the course website, and we thought a good way to learn how to use the course website would be for everyone to introduce themselves in a short comment.  We would appreciate if students could indicate their school, and tell us a little about themselves and why they are taking this course, and we ask students to only use first names, not last names. We also encourage students and anyone who is new to the course to look at the videos on the WebTutorials page, especially the first video on Logging in and Discussing Modules.



Part I: Introduction to Cheminformatics*

By Nathan Brown

Please note, this is being developed like a blog or a wiki, and the text of this introduction is currently being written.The advent of the widespread availability of electronic computers, primarily since the 1970s, has led to huge advances in many scientific disciplines. The field of chemistry itself has benefitted greatly from this availability, but the development of many new methods, algorithms, and data sources was necessary to realise the compute power now available to the chemist. The interface science of Cheminformatics has the objective of applying computer science approaches in the representation, analysis, design, and modelling of chemical structures and associated metadata, such as biological activity endpoints and physicochemical properties. The field of Cheminformatics not only draws on expertise in computer science and chemistry, but also mathematics, statistics, biology, physics, and biochemistry. In this introduction to the course on Cheminformatics, we will introduce some of the overarching concepts in the field and introduce some of the open access resources that may be applied in understanding the data types and methods that are widely available to the community.


Representing Chemical Structures in the Computer

The representation of chemical structures in the computer has a history going back some centuries, to the advent of atomistic theory in the mid-19th century, and even further to the development of the mathematical discipline of graph theory in the early-18th century. The famous mathematician, Leonhard Euler, used an abstraction of a real-world problem in the early-18th century to understand whether it is possible to devise a walk around the town of Königsberg in Prussia (present day Kaliningrad, Russia), while crossing each and every one of the seven bridges connecting the mainland to the island in the centre of town across the river Pregel, once and only once. In analysing this problem, it led Euler to devise an abstraction of the real-world problem - that could be easily represented on a geographic map - into one that pared back the details to only those that were important. The salient details required to solve this problem were, namely: the land masses of Königsberg and bridges, or connections, between them. It was not at all necessary to know the shape, topography, elevation, or any other details of the land masses including the routes internal to each landmass, other than that they existed. Similarly, it was only necessary to know that two land masses were or were not connected to another landmass, and how many connections there may be between them.


In devising this abstraction of a real-world problem, Euler was to make a significant impact on mathematics, essentially formalising a new sub-discipline called graph theory. Euler’s work here led to the development of a field of endeavour that is today applied widely, not only in chemistry, but also in social network analysis and biochemical pathways, amongst others. But what of Euler’s initial problem? Euler demonstrated that for a walk to be possible between the land masses (or nodes or vertices), across the multiple bridges (or edges or arcs), it was reliant on the number of connections to each of those nodes. The number of connections to each node is called the degree of the node. Euler showed with his abstraction that for such a walk to be possible, then only zero or two nodes may be permitted to have an odd degree. Therefore, given that all nodes in the Koenigsberg representation have an odd degree (3, 3, 3, and 5, respectively) then a walk fulfilling the defined restrictions would be impossible. The solution is called the Eulerian walk or path in Euler’s honour and led to the formalisation of the fields of graph theory, topology, network analysis, and combinatorics.


Molecular Similarity

One of the key and enduring concepts in Cheminformatics is that of molecular similarity. Quantifying the similarity of molecules has a wide range of applications, many of which will be covered later in this introduction, but the fundamental aspect that underpins all of these applications is the similar property principle. The similar property principle suggests that often, if two chemical structures are similar, they will also exhibit a number of similarities in their properties. However, although this heuristic holds true in many examples, it is also observed that highly similar chemical structures have significantly differences in properties, particularly in biological activity, a phenomenon known as Activity Cliffs. In some instances this can give rise to terminology such as the Magic Methyl, where a single carbon atom may bestow or remove activity, although the effect of this kind of alteration can typically be rationalised by some property change, such as a clash with a protein binding site, or a forced conformational change that is advantageous or detrimental.

Given the subjective nature of some aspects of molecular similarity, particularly when simply compared visually, it is often important to generate objective measures of molecular similarity based on the actual chemical structures, similarity of molecular descriptors, or similarity in some measured or predicted property. The comparisons made according to structure only often rely on graph theoretic algorithms to calculate molecular graph similarity, but also can be a shape and electronic similarity, such as that generated in pharmacophoric descriptor generation tools like ROCS (Rapid Overlay of Chemical Structures) from OpenEye Scientific Software. There exist many molecular descriptors in the literature that are used to rapidly generate molecular similarities, which can be simply classified into property descriptors, topology descriptors (those generated from molecular connectivity alone), and topographical descriptors (those that are generated from the geometric shapes of molecular structures).


Molecular Property Descriptors

The first class of molecular descriptors to be covered here are the property descriptors, or modelled properties that indicate some reliable prediction of a physicochemical property, such as molecular weight or the octanol-water partition coefficient (ClogP). The descriptors tend to convolute any different properties into these simple scalar descriptors, but can be highly effective in certain circumstances and are widely appreciated for their interpretability in interactive systems. One such set of property descriptors that has gained wide acceptance is the Lipinski rule-of-five, which has been suggested as an heuristic for indicating the oral absorption of a potential drug based on marketed orally-dosed drugs. The rule-of-five applies four calculated properties and defined cut-offs, each of which is a multiple of five. The four properties and their cut-off ranges are: Molecular Weight (MW or MWt) less than 500 daltons; predicted octanol-water partition coefficient (ClogP) less than five; fewer than five hydrogen bond donors (HBD = total number of nitrogen-hydrogen and oxygen-hydrogen bonds); and fewer than ten hydrogen bond acceptors (HBA = total number of all nitrogen and oxygen atoms). As indicated the Lipinski rule-of-five is an heuristic, albeit useful, and is often applied, somewhat crudely, as a drug-likeness descriptor in curating screening collections.


Topological Descriptors

The second class of molecular descriptor to be discussed in this introduction is the class of topological descriptors. Topological descriptors are those calculated from the molecular structure, typically using only the atomic connectivity data and eschewing any geometric data - although exceptions do exist. Two types of molecular descriptor are often used, molecular indices and molecular fingerprints. Molecular indices are single real-valued descriptors that summarise some characteristics of the molecular structure under consideration. One of the older topological indices is the Wiener index, developed by Harry Wiener in 1947. The Wiener index is calculated as the sum of distances between all carbon atoms. Another popular index is the Randic index, developed in 1975 by Milan Randic, and focusses on the atom connectivities, or node degrees.

The second class of topological descriptor to be considered here is the molecular fingerprint. A molecular fingerprint is often a long, contiguous array of bits, but also sometime integers and real-valued descriptors, which can be compared to each other using a similarity coefficient  As with many molecular descriptors, a large number of molecular fingerprints have been defined. The fingerprint was originally designed as a rapid screen-out descriptor prior to the more computationally intensive substructure search being performed in chemical information retrieval systems. The substructure of interest was encoded into a fingerprint to be compared a database of pre-calculated fingerprints for chemical structures of interest. If, when using the substructure query fingerprint as a bit-mask, a given database fingerprint has precisely the same bits set, then there is a high probability, depending on the molecular fingerprint being used, that the substructure is contained within that database structure. If a match is identified then the database fingerprint is passed to the more computationally intensive substructure searching algorithm using graph theory. More recently, however, molecular fingerprints have been applied to a variety of pressing challenges in Cheminformatics, including cluster analysis, predictive modelling and similarity searching, more of which later in this introduction.

Molecular fingerprints can be subdivided into two different classes: knowledge-based fingerprints, and information-based fingerprints. Knowledge-based fingerprints use dictionaries of molecular substructures with a corresponding bit in the fingerprint assigned to each substructure unambiguously if it is present in the structure under consideration. Typically, even if the substructure appears multiple times, it will only be counted once in the fingerprint. Dictionaries of substructures tend to be relatively small, a few hundred, and can suffer from brittleness when considering new and unusual chemistry that may not have been considered when the dictionary was compiled - a little like Samuel Johnson not know what an Aardvark is. This brittleness can be overcome by applying information-based fingerprints, which do not suffer from such brittleness.

The information-based molecular fingerprint takes the chemical structure under investigation and transforms that structure into a fingerprint representation using one of a variety of algorithms. One of the most famous information-based molecular fingerprints is the Daylight Fingerprint designed and implemented by Daylight Chemical Information Systems. Here, the chemical structure is examined by iterating over each individual atom and enumerating all possible atom-bond-atom paths up to an specific length, typically seven bond length paths in these fingerprints. Each path, of all lengths from zero (just the atom itself), up to and including length seven, are then passed to a hashing algorithm that converts that string path into a number that is in a high range of something in the order of (-2^32 to 2^32). The resulting number is then ‘folded’ into the length of the fingerprint and the corresponding bit at that index is set to one. One challenge in this approach in encoding the fingerprint is the probability of bit collisions, where two different paths encode to the same fingerprint index. The effect of bit collisions can be somewhat overcome by passing the original hashed value as a seed to a pseudo random number generator (RNG), and the first few values taken from the RNG and those values folded into the fingerprint and set to one. Fingerprints are often quite long, 1024 or 2048 bits is not uncommon, and these lengths offer an effective balance between calculation speed of the molecular similarity and the information capacity to appropriately describe the molecular structures.


* Please note, this section is currently being written and is not complete.  More information is available in the book that can be downloaded from the bottom of this page.


Join the conversation.

Comments 56

OLCC S18 | Thu, 01/19/2017 - 05:30

Greetings to OLCC Spring 2017!

I'm very much looking forward to learning more about cheminfomatics. The polish and ease of use of the reaxys tool was impressive! Uploading my export for assignment one....

Damon Ridley's picture
Damon Ridley | Sat, 01/28/2017 - 03:46


Thanks for your comments. Reaxys is indeed an impressive tool, but (like with so many information retrieval products) we need to understand a little about how it works. We are writing additional teaching materials (with practice problems) and I hope you look at them also. Don't hesitate to let me know if there are any special topics for which you would like me to prepare materials.


Sunghwan Kim | Mon, 01/30/2017 - 14:00

Dear all,

I am Sunghwan Kim, a Staff Scientist at the U.S. National Institutes of Health. I have been working in the PubChem project since I joined the team back in 2007. As an OLCC faculty member, I am responsible for writing some reading materials and developing class activities (primarily for the second half of this OLCC course), but I'm more than happy to help you learn cheminformatics from this course.

By the way, PubChem has a blog and social media accounts to keep our users updated, so please follow us for the most recent developments in PubChem.

PubChem Blog (https://pubchemblog.ncbi.nlm.nih.gov/)
Google+ (https://plus.google.com/115030503755312217027)
FaceBook (https://www.facebook.com/pubchem)
Twitter (https://twitter.com/pubchem)

Usually, there will be no more than one post per month, so it will not going to bother you much. The most recent post is about molecular weights, which is much harder to calculate than some people may think.


I hope you enjoy this post (and the OLCC, too). Thank you and have a good day.



Tanya Gupta | Mon, 01/30/2017 - 14:53

Hello All,

We are a bit behind but excited to join you all for Cheminformatics. Damon, I am sure my students will have good number of questions and inputs for Reaxys.

Anja Brunner | Tue, 01/31/2017 - 04:18

Hello everyone!
Damon Ridley and I work for the Reaxys team at Elsevier. Bob kindly invited us to participate in the OLCC about a month ago so, like many of you, we are new at this and learning as we go along. In our case, though, this learning involves working out ways to help you!

Damon is a chemist with decades of experience in teaching and as a consultant to industry. He became interested in chemical information retrieval as a young academic at the University of Sydney (Organic Chemistry Department) in 1983 because he "could see electronic information retrieval was the way for the future, and could also see the intellectual challenges." In addition to his academic work at the University, Damon was a consultant with the Chemical Abstracts Service for almost 30 years. His work with CAS started in 1984 with CAS Online, then with the STN Network (command line-driven information retrieval) and finally with SciFinder. Damon is author of 4 books and over 50 publications on chemical information retrieval, and has given well over 1,000 workshops and lectures on the topic worldwide. Now he consults for Reaxys.

I had a somewhat unorthodox entry into the world of chemical information. Trained as a biologist, I taught lower and upper division biology courses at community colleges and small universities for 7 years before transitioning into industry for marketing and innovation management. My work as a science writer and content strategist brought me to Reaxys at Elsevier. For the last 3 years, I have supported their efforts to explore ways in which Reaxys can contribute to and enhance learning experiences in the chemistry classroom and lab. And therein lies my interest: finding ways to make scientific information accessible to everyone, not only as a technological solution but also as an intellectual skill fomented by education at any age and in any discipline.

Damon and I are based in Germany (although Damon gets to go "home" to Sydney for 4 months each year). Please understand that with the difference in time zones, we cannot promise immediate replies to your comments and questions. But we will try to stay on top of things as much as possible.

We are looking forward to this course!
Anja and Damon

Robert Belford's picture
Robert Belford | Tue, 01/31/2017 - 10:17

Greetings students and fellow faculty,

I know you have all been getting emails from me, and so thought I'd take a minute to introduce myself. I am an Associate Professor of Chemistry at the University of Arkansas at Little Rock, I have run the ACS Division of Chemical Education Committee on Computers in Chemical Education ConfChem Website for the past decade. With Jon Holmes of UW-Madison I am responsible for this course website, which I think should be viewed as an interactive electronic textbook, that is being discussed and shared by multiple campuses, and where the students even get to ask the authors questions! So please feel free to contact me directly if you have any problems with the website, rebelford@ualr.edu.

This is actually the third OLCC that I have been involved with, as I was part of the 2004 OLCC on Chemical Hygiene and the 2015 OLCC on Cheminformatics. My PhD is in physical chemistry and I have no formal training in cheminformatics, a term that I do not believe was even "coined" when I was a graduate student. But I do see in this age of pervasive digital technologies how cheminformatics and eScience are on the cutting edge of scientific discovery, and I am excited to have an opportunity to be part of this course and learn more.

I should also mention that I am excited to be part of a course that involves students and faculty from multiple campuses, and I hope my students at UALR will get to interact with and collaboratively learn with students from other campuses.

I would like to finally comment that the authors of the papers have spent a lot of time and effort developing this material, and it is here for you, the students. So please make comments, ask questions, and interact with the authors and other experts who are involved with this class. I know that at least for my students at UALR, this is a great chance to learn from experts, many of whom we would never have had a chance to interact with if it was not for this class. So do participate with the online communications!
Bob Belford, Ph.D.

Cody Ward | Tue, 01/31/2017 - 10:59


My name is Cody Ward, and I am a student at South Dakota State University. I am majoring in biology and biochemistry, and I will be graduating this spring (hopefully). I decided to take this course because I understand that computers are being used more and more everyday in most areas of chemistry, and I thought that it would be to my advantage to learn of some of the applications for cheminformatics before I finish my undergrad studies.

OLCC S01 | Tue, 01/31/2017 - 12:48

I am Brian, a senior biochemistry student at South Dakota State University. I will be graduating this May. My future plans are to attend graduate school and obtain a Ph.D. in biochemistry. I am taking this course as I wanted to explore cheminformatics as it is becoming an increasing valuable and needed skill.

OLCC S04 | Tue, 01/31/2017 - 12:49


Hello, my name is Casey. I am majoring in chemistry and minoring in biology at South Dakota State University. I am a senior this year and am hoping to graduate this spring. The reason I chose to take this course is because I want to improve my knowledge of computer usages in the field of chemistry.

olcc s16 | Tue, 01/31/2017 - 17:17


My name is Phuc and I am from Vietnam. I am a student at University of Arkansas at Little Rock and majoring in Chemistry. My interest is pharmaceutical and medicinal chemistry. My future plan is obtaining PhD in pharmaceutical science. I take this class because cheminformatics has became the powerful field nowaday to aid researchers and this is how the future of research would look like. I would like to understand the principle of cheminformatics so I would apply in my future laboratory research

OLCC S198 | Tue, 01/31/2017 - 17:07

My name is Lyndsie! I am an undergraduate at the University of Arkansas at Little Rock. I will be graduating this year and I plan on teaching high school chemistry. I believe cheminformatics will teach me how to use some tools that will be helpful in the classroom.

Olcc S10 | Tue, 01/31/2017 - 17:08

Hello Everyone,

My name is Emily. I am a masters of Chemistry student at University of Arkansas at Little Rock. My interests are to be a teacher of college level students. I hope that may be in the future to many be one of the teachers of this course.


Olcc S15 | Tue, 01/31/2017 - 17:38


My name is Daniel , i'm from Cameroon. I am attending University of Arkansas at Little Rock and i am a graduate student studying Chemistry. I decided to take this course because i have been working in the Pharmaceutical field for the past two years and i realized how ChemInformatics is important both in research and in drug searches. I am looking forward in communicating with you all and sharing in this learning experience.

Olcc S11 | Tue, 01/31/2017 - 17:46

Hi everyone,
My name is Libby, and I'm from Arkansas. I'm a Ph.D. student at U of A Little Rock studying Applied Chemistry with a focus on Medicinal Chemistry. I'm taking this class to help me in my research and to help me be better able to navigate databases.

Bob Hanson's picture
Bob Hanson | Tue, 01/31/2017 - 22:01

Welcome, all, to our course! I'm Bob Hanson, and I have had a lot of fun writing some interactive web pages I hope you find useful. I'm a professor at St. Olaf College where I teach general chemistry, organic chemistry, and medicinal chemistry. I'm also the principal developer of Jmol, which you will soon be working with if you haven't already. I hope you like it! One of the great aspects of Jmol is that it has a web-based mode that lets an interested student, instructor, or research scientist channel their creativity to design the interactive molecular page of their dreams. Maybe some of you will try that out as part of a project even. It's great fun! Google Jmol images and see what you can learn! Or follow this link for a simple example. https://chemapps.stolaf.edu/jmol/jmol.php. I hope you enjoy the course.

Bob Hanson's picture
Bob Hanson | Thu, 02/02/2017 - 18:34

ps. As I am not able to read all messages to this list, I just ask that if you have a question specific to Jmol, please contact me directly at hansonr@stolaf.edu, not by leaving a message for me here. (I probably won't even see a reply to this message, actually...) Thanks! Looking forward to interacting! Looks like a great course.

Otis Rothenberger's picture
Otis Rothenberger | Tue, 01/31/2017 - 22:55


I'm Otis Rothenberger, a retired professor of chemistry from Illinois State University. I'm an organic chemist, but I spent most of my years at Illinois State teaching non-major general chemistry.

I'm the author of the CheMagic Virtual Molecular Model Kit - a Jmol Web application. During the development of this application, I discovered cheminformatics for the first time, and I've been in learning mode ever since. I'm still in the student learning stage of cheminformatics.

I view myself as the official "old guy" of this course, and as such, I stick my two cents in every now and then. My current two cents is that I hope you focus on the fun aspect of this course.

During this term, I'm interested in mentoring a student project that develops a useful cheminformatics educational tool that might be of interest to ChemWiki - now Chemistry LibreTexts.

Salem the Cat is in the forefront of my id image. In cat years, he is as old as I am. He's better looking than I am, hence he's in the forefront.

Ehren Bucholtz | Wed, 02/01/2017 - 09:13

Hi All,

I am Ehren Bucholtz, associate professor of organic chemistry at St. Louis College of Pharmacy. My background is medicinal chemistry, so I teach my introductory and organic chemistry courses with lots of pharmaceutical and biological examples. I am on sabbatical this semester, but when I found out about this opportunity, I had to be a part of it. My sabbatical proposal was based on cheminformatics, so this is a perfect fit. One of the projects that I am working on during sabbatical is using Optical Structure Recognition to grade student work. I think this could be an interesting project for students to participate in.

Another project idea that I had for this course was based on some emails I have had with Sunghwan Kim (whom you met in a previous post) before the semester started. We had discussed searches in pubchem related to pharmaceuticals. It got me thinking about figuring out new lead molecules that might be interesting to test for Methicillin-resistant Staphylococcus aureus (MRSA). I have since talked with one of my colleagues who is a micobiologist and she is willing to test any molecules that we find that could have activity, and are commercially available. ( We would have to try to stick with lead molecules that are purchasable as this course isn't about synthesis.)

Looking forward to working with everyone!

Sunghwan Kim | Wed, 02/01/2017 - 21:23

Dear Professor Bucholtz,

I am also looking forward to working with you and all the others during this course. By the way, I would like to ask you to post your ideas about potential student projects to the Student Projects section (http://olcc.ccce.divched.org/Spring2017OLCCStudentProjects), too. Because one semester is not very long, we may not have enough time to work on student projects (of course, depending on the nature of projects and students' backgrounds) if we start thinking about potential projects later in March or April. So, I think it is better to start discussion now about what projects to work on as early as possible. This will also help students to pay attention to what skill sets they would need for their projects as the course proceed.



OLCC S19 | Wed, 02/01/2017 - 09:42

Hello nice people,
I'm Jo, and I'm a chemistry major here at U of A Little Rock. I want to teach high school chemistry and math. This course is giving me tools and ideas which I can pass on to my future students. Whether those kids go to college, stay on the farm, or spin wrenches, I believe informatics will help smooth their way to a better future. We can make possible real research at the secondary level, and inspire students to be even more creative and inquisitive if we give them the means to explore whatever rabbit hole they fancy diving into. Now, that's some fun.

Herman Bergwerf | Wed, 02/01/2017 - 10:23

Hi all, I'm Herman Bergwerf. I'm currently a BSc Nanobiology student at the Technical University of Delft (small city in the Netherlands). I am also a professional software engineer and I do a lot of coding in my free time (more than anything else). Among other things, I have built http://molview.org, a website to draw, view and search for molecular structures. I'm involved in this course as a potential project mentor. Additionally, I wrote a small 'Special Topic' module about molecular visualization on the web (which might be interesting for you final project).

Jordi Cuadros's picture
Jordi Cuadros | Wed, 02/01/2017 - 10:49

Hello everyone. I am Jordi Cuadros, an Associate Professor at IQS Univ. Ramon Llull in Barcelona, Spain. I will be leading (with Roger) the class that will be taking the course from Spain. I am a chemist but I spent most of my time in front of a computer. I teach a first-year introduction to computer programming class for chemists and engineers and devote my research to the use of computers to discipline-specific education (chemistry, statistics, physics). I was involved to the first edition of this course in Fall 2015 (on using databases programmatically) and I am back to keep on learning. My personal view into the topic of cheminformatics is probably closer to chemistry information than to medicinal chemistry or biology. Given today accessibility to chemical information, I feel any chemist should have the skills to understand, manage and use this wealth of data. This is to me the point of this course. I am looking forward to jump into it!

Evan Hepler-Smith's picture
Evan Hepler-Smith | Wed, 02/01/2017 - 15:20

Hi all,

Let me introduce myself, too. I'm Evan Hepler-Smith, a historian of science and technology specializing in the history of chemical information. Right now, I'm a postdoc at the Harvard University Center for the Environment, looking at the role of chemical information systems in the history of environmental toxicology and chemical regulation.

I am also quite interested in the present and future of chemical information, of course - that's why I'm here! But my research has shown me that understanding where our information systems, file formats, and notation came from unlocks a much richer understanding of the features (and bugs) of our present-day cheminformatics tools. I try to pass along a little bit of that perspective in Module 2, which I developed. Looking forward to working with you all!

All best,

Vincent Scalfani | Wed, 02/01/2017 - 16:08


My name is Vincent Scalfani. I am a Science and Engineering Librarian at The University of Alabama. After studying polymer chemistry in graduate school, I decided to become a Science librarian where I now teach information skills (and some basic informatics!) to chemistry and chemical engineering students.

I became interested in Cheminformatics a couple of years ago when using Jmol. Cheminformatics is so much fun and useful, particularly in libraries when we are thinking about advancing the discovery of information.

Now I mostly use Matlab for my cheminformatics projects. Matlab is a good beginner programming language and we have a lot of students at The University of Alabama using Matlab, so it gives me an opportunity to help students with their class projects too. I’m still very new to cheminformatics, but I have been able to accomplish a few neat projects like programmatically accessing chemical identifiers, creating music with InChIKeys, and finding names within InChIKeys.

I hope to learn more from the OLCC course and contribute with comments and perhaps some mentorship with a project if anyone is interested in Matlab.


Leah McEwen's picture
Leah McEwen | Wed, 02/01/2017 - 16:20

Hello, everyone!

I am Leah McEwen, Chemistry Librarian at Cornell University. I manage several international projects related to chemical representation standards for many of the notations and file formats you will use in this course. Technology creation is exciting and standards help by improving machine interpretation of scientific content and enabling smooth and accurate data exchange between systems. I contributed to the OLCC Chemical Representation Module 2 with Evan Hepler-Smith and happy to follow up with any questions on this topic.

I hope everyone enjoys tinkering under the hood of chemical information!


OLCC S45's picture
OLCC S45 | Wed, 02/01/2017 - 16:41

My name is Meagan Turner and I am a Biology major at the University of Illinois Springfield. I am a student and I look forward to learning more about ChemInformatics. I hopes this class will make me more well-rounded in using varies databases. I plan on using the information covered in this course to use in my future professional career as well as using this information in my current coursework.

Sunghwan Kim | Wed, 02/01/2017 - 21:49

Hello, everyone.

I'm Sunghwan at PubChem. I've already introduced myself through this comment section, so please see my previous post below from last Monday.

By the way, I would like to ask you all to subscribe the updates & comments for the Student Projects section (http://olcc.ccce.divched.org/Spring2017OLCCStudentProjects). As you may know, students are required to work on projects at the end of this course (likely in April - May, depending on individual school's schedule). However, a semester is not very long, and if students start finding a potential project in late March or April, they may not have enough time to work on the project.

Therefore, I strongly recommend you to start thinking about your projects as early as possible. So, I suggest that all of you subscribe to the updates and comments for the Student Project section (http://olcc.ccce.divched.org/Spring2017OLCCStudentProjects) now. I (and hopefully other faculty members and students, too) will post several ideas to the section, so that we can start some discussion about them. Of course, I strongly encourage students to post their own ideas because we can provide some advice/insight/help about your projects.



Milind Khadilkar's picture
Milind Khadilkar | Wed, 02/01/2017 - 22:43

Hi all,
Thanks, Prof. Belford for your continued leadership.
I am a software-mathematics professional who got re-introduced to Chemistry during the early part of my programming career in the 1980s through some excellent books (most notably, 107 Stories in Chemistry) and associating with my former collegemates who went on to major in Chemistry. I became professionally involved with cheminformatics at multiple points as a software consultant, and have a continuing interest in it. I have taught cheminformatics informally to both chemists and software developers, and have written software that utilized InChI while InChI was yet to be released. My drawback is that I know little chemistry and have no personal acquaintance with equipment and chemicals and cannot distinguish between, say, H2O and NaCl by their looks. I usually use Python for programming and am exploring the associated scientific packages that loosely make up Scientific Python.
I have been following these OLCC/DevChed initiatives for quite a few years, usually passively. I hope to learn more through this edition. Thanks and best wishes.

OLCC S53 | Thu, 02/02/2017 - 10:39

My name is Chandler I'm a student at St. Louis College of Pharmacy I'm going for a dual Bachelors of Health Sciences and Pharm.D. I decided to take this class to further my knowledge of programming as I was big on programming in high school.

Jennifer Muzyka | Thu, 02/02/2017 - 14:19

Hello Cheminformatics folk!
My name is Jennifer and I'm a faculty member at Centre College. I enjoyed participating in the fall 2015 course with a group of students. This time I have a few faculty/staff colleagues participating in these discussions. I look forward to learning more about retrieval of chemical information. I'm especially interested in seeing the projects that will develop as a result of this class.

OLCC S51 | Thu, 02/02/2017 - 20:16

Hello everyone! My name is Jeremy. I am currently in a 7 year dual degree program at St. Louis College of Pharmacy! I am finishing up my Bachelors degree in Health Science and progressing into working on my Doctorate of Pharmacy. I was originally interested in taking this course due to my love of computers and technology. I am hoping to simply learn about anything I can, and maybe eventually be able to incorporate that in my future career with my knowledge of pharmacy in some manner. I know that is somewhat broad in terms of what I hope to learn; however, I am mostly excited about data analysis. Especially when it involves drug design or designing specific molecules.

OLCC S51 | Thu, 02/02/2017 - 20:16

Hello everyone! My name is Jeremy. I am currently in a 7 year dual degree program at St. Louis College of Pharmacy! I am finishing up my Bachelors degree in Health Science and progressing into working on my Doctorate of Pharmacy. I was originally interested in taking this course due to my love of computers and technology. I am hoping to simply learn about anything I can, and maybe eventually be able to incorporate that in my future career with my knowledge of pharmacy in some manner. I know that is somewhat broad in terms of what I hope to learn; however, I am mostly excited about data analysis. Especially when it involves drug design or designing specific molecules.

Nathan Brown's picture
Nathan Brown | Fri, 02/03/2017 - 06:12

Hi All, I'm Nathan Brown a Group Leader at The Institute of Cancer Research in London, UK. I've been involved in Chemoinformatics since 1999 when I started my PhD with Prof. Peter Willett in Sheffield. I currently run a research group at the ICR where we actively contribute to academic drug discovery programmes - a recent example is a drug I helped design using multiobjective de novo design methods I developed that has now entered two Phase I clinical trials.

I am a computer scientist by training, but switched to chemical information and chemoinformatics during my PhD. I have conducted research in academia (Sheffield, Erlangen-Nuremberg, ICR), biotech (Avantium Technologies, Amsterdam), and big pharma (Eli Lilly, and Novartis, Basel) during my career. In 2015 I published a new text book on Chemoinformatics methods, which I have shared with the course members at the bottom of this page.

My research interests are wide and varied. While I am now actively involved in around ten ongoing drug discovery projects, my original interests in the field were in algorithm development and implementing new methods. My coding is a little rusty now, but I still like to think I am a programmer, in my head at least.

If you want to find our more information about me and my research, please follow the links below. I am also quite active on Twitter, tweeting about our science and a number of other interests.

Personal Homepage: https://sites.google.com/site/nathanbroon/
ICR Homepage: http://www.icr.ac.uk/our-research/researchers-and-teams/dr-nathan-brown
Google Scholar: https://scholar.google.co.uk/citations?user=N0-4IgoAAAAJ&hl=en
Twitter: https://twitter.com/nathanbroon

Best wishes, Nath

OLCC S199 | Fri, 02/03/2017 - 07:55

Hello everyone. My name is Sooyah. I am an undergraduate student at the university of Arkansas at Little Rock, majoring in biology and minoring in Chemistry. I plan on furthering my career in pharmaceutical science, and look forward to learning more about cheminfomatics.

OLCC S52 | Fri, 02/03/2017 - 08:46

Hi everybody, my name is Matea. I am currently in a 7 year dual degree program at St. Louis College of Pharmacy. I will be finishing my Bachelors degree in Health Science and continuing to work on my Doctorate of Pharmacy. I am hoping to get anything and everything out of this course. I hope to one day be able to use the information learned now into being able to have my own input in drug discovery in the future to prevent some undesirable effects in drugs that occur all through technology. A quote that stood out to me in "in Silico Medicinal Chemistry" book was "without careful thought and experimental design with appropriate controls, we will only find the wrong answers faster and still waste a great deal of time in physical experiments based on inappropriate predictions made using computation methods". This quote explains exactly what I am trying to do for our future, improve efficiently the methods of drug discovery.

I look forward to connecting with everyone on here!

Ling Huang's picture
Ling Huang | Fri, 02/03/2017 - 08:50

Hello Everyone! I'm Ling Huang, a Chemistry faculty member at Hofstra University on Long Island, New York. I'm thrilled to learn from everyone in this course. I am personally interested in using more Chemistry apps in chemical education. I've published a review of Chem Apps (dx.doi.org/10.1021/ed300329e | J. Chem. Educ. 2013, 90, 320−325) and a book chapter on this topic, both of which cited the work done by several of the course contributors here. I'm looking forward to learning more from the experts.

Martin Walker | Fri, 02/03/2017 - 12:06

Hello from snowy northern New York! I teach organic chemistry at the State University of New York in Potsdam, and I have two students and a librarian taking the class as an independent study. I'm mainly a bench chemist, but I have a longstanding interest in chemical reaction searching and chemistry on Wikipedia.

Stuart Chalk's picture
Stuart Chalk | Sat, 02/04/2017 - 04:20

HI, everyone. This is Stuart Chalk faculty member at the University of North Florida. I was trained as an analytical chemist but have morphed into a cheminformatician. I do research in the area of data standards, knowledge representation, data curation, and natural language processing. Currently, I have research projects on the extraction of chemical property data from PDF files, ontology development for the IUPAC color books, and scientific data integration. See in module 3!

Olcc S14 | Sat, 02/04/2017 - 17:48

Hello everyone,
I am Amita Nakarmi, a citizen of Nepal. I am a graduate student of University of Arkansas at Little Rock, doing PhD in Chemistry. My research is related with nano-materials and water treatment. Besides that, I am also interested in a renewable energy, synthesis and bio-energy.I want to broaden my knowledge of information technology related with Chemistry and I am hoping this course will be helpful in my research project and lead me to my bright future as Research Scientist.

Shyleen Frost | Sun, 02/05/2017 - 12:44

I'm a little late to the party, but I still wanted to jump in and say hello. My name is Shyleen and I am a biology student at the University of Illinois Springfield. I am excited about all the different people this class has brought together and I'm looking forward to learning more about cheminformatics!

Kedan He | Sun, 02/05/2017 - 19:10

Dear All:

I'm Kedan He, a Faculty of Chemistry at Centre College. I got my Ph.D. in Computational Chemistry at the University of Georgia, and I'm also interested in Data Analysis and Machine learning. My current plan this time is to follow as much as I could and explore some interesting projects. I'm really excited to join this learning community!


OLCC S103 | Mon, 02/06/2017 - 07:07

Hello everyone!
My name is Laia and I am currently adjunct professor at IQS - Universitat Ramon Llull in Barcelona (Spain), where I teach General Chemistry to first-year students. Although I did my PhD in experimental organic chemistry, I am currently member of the Molecular Design Lab at IQS - Universitat Ramon Llull. This is why I am sure this course will be very useful for me! :-)

OLCC S107 | Mon, 02/06/2017 - 07:37

Hello from Barcelona!
My name is Roser, and I'm a librarian. I am the library manager of the IQS (Institut Químic de Sarrià), University of Ramon Llull. I hope to see that I can discover in this world of cheminformatics to apply to our service for users

OLCC S102 | Mon, 02/06/2017 - 07:57

Hi everyone!
I am Elisabeth. I am a PhD student at IQS (Bercelona) focused on drug design and synthesis. I am taking this course inorder to improve my knowledge on cheminformatics and apply it to my thesis

OLCC S31's picture
OLCC S31 | Mon, 02/06/2017 - 10:07


My name is Aiden Farragher-Gnadt, a senior at SUNY Potsdam. I am taking this cheminformatics course in order to familiarize myself with the way chemical information is stored. I am particularly interested in the InCHI system, and how it may be used in concert with programmatic techniques to search for structures with a conserved moiety.

OLCC S21's picture
OLCC S21 | Mon, 02/06/2017 - 15:31

Hi! My name is Jen. I'm an electronic resources librarian at Centre College in Kentucky, and also serve as the science department liaison librarian. I'm taking this course to improve my work with our chemistry students.

Haley Greiner | Wed, 02/08/2017 - 09:31

Hey, my name is Haley and I am a junior at Campbell University with a major in Chemistry.

OLCC S63 | Wed, 02/08/2017 - 09:33

My name is Nathan. I am a senior at Campbell University and I am looking to expand my knowledge in all areas of chemistry. I"m looking forward to learning and growing in the area of Cheminformatics.

OLCC S61 | Wed, 02/08/2017 - 09:38


I'm Victoria, a chemistry major at Campbell University. I'm originally from Florida, but moved to Raleigh, NC. I'm taking this cheminformatics course as part of my undergraduate degree. I hope you all have a great semester!


OLCC S71 | Wed, 02/08/2017 - 09:43

My name is Angela. I'm a chemistry major at Campbell University. I am taking this course as a seminar credit for the curriculum.