I generated an Excel file that compares Wikipedia safety information with PubChem LCSS information by hand for about 90 chemicals. The list of chemicals came from the original LCSS roster in Prudent Practices. In the process, I suspect that I made clerical errors, so replicating this effort electronically would be a good step. I hope that the Excel file that Bob generated will help us do this.
These are the columns in the spreadsheet and what they indicate:
Wikipedia Entry Is there a wikipedia entry for this chemical
Chembox Does the wikipedia entry have a Chembox
Safety info Does the chembox contain any safety information?
GHS info Does the chembox contain GHS information
PubChem LCSS? Is there a PubChem LCSS for this chemical?
Pubchem sections How many content sections are there in the Pubchem LCSS
Number of sets How many different sets of GHS symbols does PubChem present?
Distinct sets How many different sets of symbols are there?
The interesting result is that wikipedia has more coverage of the chemicals listed (95% vs, 82%), but less safety information (78% in wikipedia, 82% in Pubchem) and much less GHS info (33% of chemical listed have GHS info in wikipedia).
Brian, is it possible to verify these numbers?
Google docs doesn't allow me access to the Excel files you posted; do you need to share them with me? My google account is <a href="mailto:keenestateehs@gmail.com">keenestateehs@gmail.com</a>
For some reason several Zip programs would not unzip the LCSS dump. I was able to do so with 7-zip, but it is too big to upload to our site. Fortunately, UALR has unlimited Google Drive, and so I put the original file here.
<a href="https://drive.google.com/open?id=0ByRWZ4TaLO_0NndIMHFfVF9PbVk">https://drive.google.com/open?id=0ByRWZ4TaLO_0NndIMHFfVF9PbVk</a>
I had to remove the following in order to load it to Excel
<a href="https://drive.google.com/open?id=0ByRWZ4TaLO_0NndIMHFfVF9PbVk">https://drive.google.com/open?id=0ByRWZ4TaLO_0NndIMHFfVF9PbVk</a>
And here it is in Excel,
<a href="https://drive.google.com/open?id=0ByRWZ4TaLO_0NndIMHFfVF9PbVk">https://drive.google.com/open?id=0ByRWZ4TaLO_0NndIMHFfVF9PbVk</a>
Now Brian and I found another way to get the list of chemicals for which there is a PubChem LCSS. Brian, are you going to try and use the NIH Resolver to convert the names to InChI-Key (I think it does that). Sort of the way we did in the second module? (except get InChI Key instead of molar mass).
I found a free web site with apps that convert files into and out of XML. It's at
<a href="http://xmlgrid.net/xml2text.html">http://xmlgrid.net/xml2text.html</a>
and might be useful in exploring the PubChem data in Excel.
- Ralph
Safety information: PubChem vs Wikipedia
LCSS Data Dump
LCSS Data Dump
XML utility web site
Embedded Sheet
I've subscribed to this project page
I've subscribed to this project page.