OLCC code libraries

Code Library (this is an abstract and title for a BCCE talk by Herman Bergwerf). The idea is throughout the course to slowly introduce students to small snipets of code that is really designed for non-programmers, so by the time the course is over, than can do something original.

BCCE 2016 Abstract:

Introducing students to programming [with Python and Google Apps Script] during the Cheminformatics OLCC

The cheminformatics OLCC is an intercollegiate hybrid online/face-to-face course to introduce undergraduate chemistry students to basic cheminformatics concepts; http://olcc.ccce.divched.org/. This course uses a Project-Based Blended Learning approach involving online guest lecturers and residential faculty facilitators (instructors of record). In today's emerging world of big data and interconnected science it is critical that students develop skills in the handling of data. This module will focus on the use of large chemical datasets and target students with no prior programming skills. Well known databases that are used include PubChem, ChEMBL and ChemSpider. In order to provide the students with the tools and knowledge to apply what they have learned during this course in later work, we want to introduce them to some basic programming with Python and Google Apps Script. To make this easier we are developing a library that provides all basic functions the students will need to develop a simple project. We will provide the students with bits of code to perform basic tasks during the course to gradually enable them to use the functions in this library. Afterwards the students will be asked to assemble what they have learned into a slightly larger project of their own. The lead developer of this library is himself an undergraduate student and will be able to assist the students with their projects.

Purpose

The purpose of the code libraries is to provide students with utilities for building code during the course. During each module in the course we want to introduce students to small snippets of code that apply the knowledge they just learned in a small program (this will require a small guide on how to get things up and running). Assignments could refer to this code and ask students to modify it in order to acchieve a slightly differen task. We want students to use popular/professional Python libraries for this purpose as much as possible (PubChemPy, ChemSpiPy, OpenBabel, RDKit, PyMOL, ...). The reason for using these libraries instead of a minimal, custum developed one, is that the students are much more likely to use these libraries for future coding projects (because they provide tons of other features we wont using during the course). One of the objectives of the course is to provide the students with tools and knowledge they could use in the future to sovle cheminformatics related tasks.

However, we do not want the students to give up because of the complexity of some of these libraries (like RDKit), and programming in general. In my experience (I'm a bachelor student myself) it's not enough to provide the theory (even if it is very comprehensible) in order to learn something to students. To get the students to really engage and actively learn, you have to motivate them (in fact I believe this is much more important that providing the knowledge). Todays students can easily find the theory they need on the internet*. To achieve this in the coding part of the course  should provide quick results that might normally take more effort and programming experience (and if students are succesfully motivated, they will find out themselves how to expand the examples**). This is were the coding library comes in. It can provide utility functions for acchieving basic tasks (for example: build a simple native GUI, opening a window with a certain 3D visualization, generating a number of conformers for a given connection table).

 

* In fact this is also the way I learn for all my courses, I go through the slides and look up everything I don't understand on the internet, not in my book. The reason for this is that books often contain contextual information. I'm often not interested in this, I only want to know what is neccesary to understand a certain theory. Todays internet is still not very good at this and it might actually take some time to find your answer. But in the dawn of resource description frameworks I believe the internet of tomorrow will be much better at answering specific questions. Take the ChemWiki for example, during chemistry courses I found this a very good resource of specific information. But I have never read an entire page of the ChemWiki yet. Instead I scoll to the table, or section, that answers my specific question. Tommorows internet will probably also do that for me.

** This is another reason why I believe it is a good idea to involve libraries like RDKit and OpenBabel already in this stage. There are tons of forums and code snippets online and these libraries have extensive API documentation. If we can succesfully integrate this into the course (e.g. looking up the documentation to learn what parameters do). Students might be able to collect the knowledge they need to solve different programming tasks themselves. However I'm not sure about this, it might be a step to far (but we're dealing with pretty smart people here, so...)

 

Google Apps Script

Although we want to use Python as primary progamming language (due to its popularity and high availability of libraries). The idea came up to develop a Google Apps Script library that provides similar functions as the students will later use in Python to get them to learn basic concepts. The great thing about this is that it can be directly used inside Google Sheets. The benefits of this are obvious:

  • It works on any computer with an internet browser without having to get anything set up. It can be a real enthousiasm killer if the start of a programming course is spending 3 hours on getting things set up.
  • It works with Google Drive. This means we can easily create and share template code. Students can easily collaborate. And it is quite easy to build and review assingments.
  • It works together with a spreadsheet. Not only does this mean students will get to learn some spreadsheet scripting skills. They can directly observe the result of their code in the spreadsheet. It seems reasonable that a lot of the code in this course will generate tabular data.

 

Learning basic skills

During my own bachelor program (I'm currenly doing a Nanobiology bachelor in the Netherlands, this is a kind of combination between biophysics, bioinfomatics, and molecular biology subjects), we have a course called Biomolecular Programming. The target of this course is to develop a program in Java that can simulate random motion in 2D in combination with membrane isolation and particle reactions. The first part of this course consists of learning about the basic concepts of programming in Java (using variables, arrays, if/else statements, loops, etc.). We will most likely need such an introduction as well. I presonally favour mixing this up with the code snippets provided with each course module. Not only to spare time, also because to me this seems like a much more motivating way to learn these concepts. Many students during our programming course did not actually complete these trial assingments because of a lack of motivation.

Rating: 
0
No votes yet

Annotations