The 5 of us went in on the abstract that Perry is going to present, and I had created a discussion page, and would like everyone to subscribe themselves to it, so we can use this, instead of emails, to discuss the project. Here is the direct link:
There have been some discussions going over email, and I am going to take the liberty to post to the body of the above page their content. One of these deals with curating solubility data that has been uploaded to figshare.
As we included predicting solubilities in the abstract I am having Brandon give a presentation to my lecture class on the Abraham's equation, and we can share his powerpoint presentation on the above site.
Now, with regard to the data curation, I have an idea, which I would like to discuss on the above web page, but I need you all to subscribe. With Jennifer and Perry's approval, this may be something that would be appropriate for Perry.
One of the things we did with the WikiHyperGlossary was take an IUPAC glossary, you NIH and ChemSPider services to generate an InChI, and figure if they generated the same InChI, the word was a chemical. (We then associated the InChI with the "word" and used that for various software tasks). That is, we needed to know which words were chemicals, and which were not.
Now, we published this work here,
But Additional file 4
describes this process using additional 5 as a spreadsheet.
Now, I have not looked at figshare (I am very busy and still have to upload the last part of module 8), but I understand we need to make sure the sturcutres and names match. So my question is, can we do something like we did with the WikiHyperGlossary, where we convert the names to InChI, and the Stuructures to InChI, and then match the InChI (or more accurately, flag when they do not match).
OK, Please subscribe to the project page. Also, please realize that there are two, potentially 3 projects going on. That is, both Brandon and Perry are doing their own project for this class, which will be over withing 4 weeks, and we are all doing a project that will be presented at the ACS meeting. Hopefully we can synch Brandon and Perry's class projects for their respective schools with the overall project.
So I thought if Brandon tackled the Abraham's eq., and maybe Perry tackle a script using web APIs to convert structures and names to InChI and compare them. He can possibly use the spreadsheet in file 5 of the supporting documents as a template.
If so, we probably need to acknowledge Andrew Cornell, as he spent quite a bit of time on the project.
IUPAC-NIST Solubility Database
Response to clarification
Broken Excel File
Excel file and J Chem Inf
Interesting and maybe worth noting
The Excel file seems zipped (so the extension is wrong)
Solubility Dataset and Solubility Unit Conversion Scripts