14. Extending Chemical Structure Search Capability using R and Open Source Application Programming Interfaces (APIs)

R is an open source programming language for statistical computing, maching learning and graphics with broad support in the scientific community. Developers extend R by creating "packages" which implement a desired functionality. An existing R package, WebChem, retrieves chemical information from the web by interacting with major chemical data sources such as ChemSpider, OPSIN, PubChem, etc. Following the guidelines of open source, community development, webchem can be extended with additional features to include further integration with and visualization of Pubchem's chemical data resources.
A second useful extension of webchem will be adding an ability to integrate chemical searches using graphical images of molecular structures. This may be feasible by integrating an R package with the Optical Structure Recognition Application (OSRA), which is an open source app capable of generating SMILES or SDF representations from reading over 90 graphical formats (GIF,JPEG,PNG,etc).
No votes yet
Join the conversation.

Comments 4

OLCC S31's picture
OLCC S31 | Thu, 04/27/2017 - 22:57
Hello! My name is Aiden Farragher-Gnadt, a student of SUNY Potsdam. I am interested in this project. I have had some luck using the Webchem R package with the R console to search for chemical data. My use of this has been limited to the chemical identifier resolver tool, as I do not have an API key for chemspider. I must admit that my expertise with R is very limited, and I would like to know what kind of materials I would be submitting for this project. I am currently enrolled in two credits of this course, any help or information about this project would be greatly appreciated. Thank you very much!

Bob Belford's picture
Bob Belford | Wed, 05/03/2017 - 09:39
Hi Aiden, I apologize for not getting back quicker, but our school has a different schedule than yours, and my students gave their presentations on April 13, 20, 25 & 27. One student did do a data visualization project using R, and although he is still subscribed to this module, his semester is also over and I fear it is just too late to join in on a project (he gave his presentation April 13). Cheers, Bob

Bob Belford's picture
Bob Belford | Wed, 05/03/2017 - 13:04
I note Dr. Walker is now subscribed to this module and think you need to communicate with him on what it is you would like to do, the idea was to open up the option for students to work with R, and the students sort of need to identify the projects they want, and get the green light from their facilitator, as that is the one who gives them their grades. You should feel free to comment here on ideas, and hopefully, there would be someone subscribed that could help. My personal feeling as that since we have PubChem people on board, projects involving PubChem would be easiest to get support with. Have you tried doing anything with R shiny,<a href="https://shiny.rstudio.com/">https://shiny.rstudio.com/</a> ? Also, one of my students made a spreadsheet in Excel that she uploaded boiling and melting points from PubChem depositors to, and used the API to get molar masses, and then set it up to filter results based on the values of an "unknown". That is, sort of the opposite of what we have been doing (you are not searching the physical property of a molecule, but the molecule that has the physical property). the issue we ran into is that many of them do not come up over the API, in fact if I understand right, only computed values come through the API. But you can still get the actual data sets from the depositors, and then maybe do some task with that. The bottom line is you need to pick a project that interests you, and that your faciliator approves, and hopefully we can find someone who can help. Cheers, Bob

OLCC S31's picture
OLCC S31 | Sun, 05/07/2017 - 16:23
Bob Belford, I apologize for how long it took me to get back to you. I am in the process of moving, and lost my internet. I am interested in a component of webchem that allows me to access the chemspider database. This is great, because it allows me to create lists using R to access multiple chemical properties with a single command, where the CIR Identifier does not. Dr. Walker is currently assisting me in aquiring the API key for the chemspider database. I will ask Dr. walker what he feels is sufficient for the project. Am I correct in my understanding that the format of the submission is in the form of a presentation of my usage of R to access chemical data? If so, I could be ready to give a presentation to Dr. Walker by weeks end at the very latest. All the best, Aiden