Google Sheet Connecting IUPAC name to PubChem

Google Sheet IUPAC name to PubChem

This video shows how to use Google Sheets to link chemical names to PubChem. If you click on the title you will also get the exact script used, and can discuss this video.

We will use two Google Sheet Functions, IMPORTDATA and HYPERLINK.

The strategy is to use the Centre for Molecular Informatics OPSIN (Open Parser for Systematic IUPAC nomenclature) web service to convert IUPAC names to standard InChI Keys and then upload the Keys to PubChem.  This allows us to identify mispelled names early, and takes advantage of the Web compatibility of the InChI Key.

Step 1: Paste your IUPAC names into a column of your spread sheet

Step 2: Convert IUPAC name to Standard InChI key
type the following script into the top cell of the column you want to place your keys into, and hit enter"

=IMPORTDATA("http://opsin.ch.cam.ac.uk/opsin/"&[SPREADSHEET CELL WITH IUPAC NAME]&".stdinchikey")

  • the ampersand(&)concatenates the cell content to the URL
  • the ampersand must be surrounded by quotation marks
  • the URL must be in quotation marks

Click on the black box in the bottom right corner of cell and drag down, converting the entire column of names to keys.

Step 3: Hyperlink the key to PubChem
Type the following script into the top cell of the column you want to place your links into, and hit enter"n

=HYPERLINK("https://pubchem.ncbi.nlm.nih.gov/compound/"&[SPREADSHEET CELL WITH INCHIKEY]&"")

  • the ampersand(&)concatenates the cell content to the URL
  • the ampersand must be surrounded by quotation marks
  • the URL must be in quotation marks

NOTE, these are dynamic cells - And will be recalculated everytime you open the page, or change the chemical name.  If you want them to be static, you can copy the block of cells, and paste to another location as text.

You can also download the sheet as an Excel Spreadsheet, but the downloaded sheet will not be dynamic.  It will be linked, but will not change if you change the IUPAC name. 

 

Rating: 
0
No votes yet
Join the conversation.

Comments 17

Ehren Bucholtz | Mon, 02/27/2017 - 11:13

I have been working with a Google spreadsheet this morning. Trying to access OPSIN and Cactus. There are a couple things that I thought I would share.

1) It looks like the OPSIN implementation of SMILES is non canonical, and appears to give same result as openbabel. Therefore, I created a new column that takes the OPSIN smiles, and uses that as a new input in cactus at the chemical identifier web tool. That appears to give the Daylight or canonical smiles.

2) With 4 columns doing web access of data ( one each for opsin name to smiles, opsin name to inchikey, open structure image, and the Canonical SMILES from Opsin smiles) I am getting lots of "loading" and "#NA". Google sheets then gives me an error "Error: Loading data may take a while because of the large number of requests. Try to reduce the amount of IMPORTHTML, IMPORTDATA, IMPORTFEED or IMPORTXML functions across spreadsheets you've created."

Is anyone else getting really slow response times, or the same error when importing multiple data columns? One of the issues that I think we will have in the future is that the spreadsheet is building this data dynamically every time. Is there a way to have the Google sheet store the data once collected so that it doesn't have to repopulate each time? As it stands now, if I close the sheet and come back in, it has to reload all the data. It is suggested in the above information that maybe we should copy and paste the data once collected into a new spread sheet- a sort of two database system where one is dynamic and one is static. I like how easy it is to do this with Google Sheets, but I wonder if there is a way to do this with Excel so that I can tell the spreadsheet to refresh only when I tell it to. Any ideas?

Sunghwan Kim | Thu, 03/02/2017 - 09:21

Well, I believe both OPSIN and OpenBabel give you canonical SMILES.  But these canonical SMILES may not be the same as canonical SMILES from Daylight or other programs.  (That is, all SMILES generation programs can generate what they call "canonical" SMILES, but  this canonicalization algorithm itself has been implemented differently by individual software developers, the resulting canonical SMILES are different, too.  But they are still called canonical SMILES.)

    My impression is that you are thinking the term "canonical" is sort of "universally accepted by different SMILES developers".  However, "canonical" here actually means "canonical only for a particular software developer".  Overall, there is *no* gaurantee that different SMILES generators would give you the same canonical SMILES for the same molecule.

Sunghwan Kim | Thu, 03/02/2017 - 09:35

I have a similar problem.  It seems that the number of webservice calls that you can make from one google sheet is limited to 50.  Please see this document:

 

https://support.google.com/docs/answer/3093335?hl=en

 

I found this is very discouraging because many informatics projects often require hundreds or thousands of calls to import data from the web.  

 

Microsoft Excel has a function called "webserivce", which is similar to IMPORTDATA in google sheet.

 

=WEBSERVICE("https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/name/aspirin/cids/txt")

 

OR if you need to use a cell identifier (e.g., A4) in the URL:

 

=WEBSERVICE(CONCATENATE("https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/name/",A4,"/cids/txt"))

 

But I also found that excel has its own problem (e.g., not easy to share with other people).  So, which is better would heavily rely on the nature of your project.

 

Ehren Bucholtz | Thu, 03/02/2017 - 10:43

The Excel implementation seems to be much faster. THis also gets around the 50 link limit. I have an excel sheet with over 100 molecules now that I am trying to convert name to SMILES through opsin. Thank you for figuring this out.On thing that I cannot figure out is that I would like to have an image in the spreadsheet from the OPSIN or Pubchem websites. THis was really easy to do with the google sheet, but doesn't seem to be allowing me to do it in Excel. For example, I can put the following in my google sheet and it will display an image from OPSIN that correlates with the name. =IMAGE("http://opsin.ch.cam.ac.uk/opsin/"&B4&".png")My simple work around is to use the google sheet with the above image, and then copy and paste into the excel spreadsheet. I have tried to find if it is possible in excel, but so far it looks like this is only a function in google sheets.

Robert Lancashire | Thu, 03/02/2017 - 12:05

Alternatively you can store the SDF in a column.
I tried this and it works fine and then copy and paste into Jmol or into the JSmol console...

Otis Rothenberger's picture
Otis Rothenberger | Thu, 03/02/2017 - 12:23

JSME is also a thought here. JSME allows the pasting of a SMILES into the interface to produce a structure in the editor.

On a broader note, both JMSE and Jmol are cheminformatics powerhouses. Put them together into a single browser app, and you have a real power tool. Writing such a power tool for a specific course spreadsheet would make an excellent project.

Also note that Jmol can compare 2 SMILES from different sources and determing match/no-match.

Otis

Otis Rothenberger's picture
Otis Rothenberger | Thu, 03/02/2017 - 12:48

This is just a follow-up on my previous JSME note. This is a test page that puts a smiles query string ( | delimited ) quickly into JSME depict. Not the # must be URL encoded in the string to %23. This capability is now part of the JSME installation.

By the way, this test page generates a bunch of default structures if you do not include a query string. Feel free to play:

http://chemagic.org/molecules/JSMEdepict.htm?smis=CCCC|C/C=C\C|C%23C|C1=CC=CC(O)=C1

Otis

Bob Hanson's picture
Bob Hanson | Mon, 03/06/2017 - 11:08

I had the same problem recently. Our stockroom manager gave me an Excel sheet with all of our chemicals listed. I needed a set of unknowns and also was interested in proofing the list for spelling. I found the best way was to move that to a Google Sheet and use a links such as =IMAGE("https://cactus.nci.nih.gov/chemical/structure/" & substitute(A2," ","%20") & "/image") -- which I learned how to do here, by the way! That worked great. There were about 950 structures. They did finally come in, but it was not fast. Fortunately, I wasn't in a hurry, either. Didn't seem to be an issue with a limit of 50 for whatever reason. Maybe because different times I would open it I would get more. Or perhaps the 50-hit limit is not strictly enforced. Just now I opened it up, structures are flooding in, and way more than 50 structures are present. What I see is a series of calls that look like this:  
https://docs.google.com/a/stolaf.edu/spreadsheets/d/1Czpe-yyZla0t1wxDvrTiXEmz5bg-5MgxG6bCovQ/externaldata/fetchData

so perhaps it is a limit of 50 for each of those. Don't know....

Jordi Cuadros's picture
Jordi Cuadros | Thu, 03/02/2017 - 16:42

The limit of WEBSERVICE/IMPORTDATA calls can be overcome by using some programming.

It is not super-easy but I have an example in Google Sheets here: https://docs.google.com/spreadsheets/d/1KUV3kbNWZ-ED57-E-J-J1MMOim1ljmz_MyaFuae6QUg/copy

Best,

Jordi

OLCC S03 | Tue, 02/28/2017 - 11:05

yes, I experieced that it was very slow. I am still playing with it but would welcome any suggestions to address this.

OLCC S53 | Tue, 02/28/2017 - 22:17

Hello, from S53, S52, and S51,  We were working on a google sheet together that would import data from different sources and allow people to select which data output they wanted(IE smiles, inchi, inchikey, cas, so on). Well we were having a issue when dealing with InChI codes.  It appears as though whenever you import the data from any website with an inchi string output, whenever a "," is included in the string, such as the following, InChI=1S/C4H8Cl2/c1-2-4(6)3-5/h4H,2-3H2,1H3 , the program will read those ,'s as though they are skip functions in the doc so it moves to the next cell as shown in the image provided below.  If anyone has any suggestions or comments on how they might go about getting around this issue please let us know.

Thanks,  S51-53.

 

update* it wont let me upload an image so theres no image, Just know that the cell will say "inchi=xxxxxx" "xx" "xxx" and each set of quotes is a different cell and all of the x's come from the same inchi key import.

 

Bob Belford's picture
Bob Belford | Wed, 03/01/2017 - 11:01

Did you watch the YouTube video on the "Googlable InChI Key?
http://olcc.ccce.divched.org/2015OLCCModule5P1TLO5-3-2C

The jist is that InChI can be very long and contain characters that can be misinterpeted by web browsers and other software agents.  That is why we used the InChI Key in our sheet.

I am not a programmer, but you may be able to protect the string with a quotation mark, or something like that.

Cheers,

Bob

OLCC S03 | Wed, 03/01/2017 - 14:20

I was able to do it without any issues - but I stayed with the video and used Inchi key.

OLCC S53 | Wed, 03/01/2017 - 17:54

Yeah, we have our sheet pulling various naming protocols from smiles to inchikey, inchi itself is the only one that comes back with an error because for some reason google spreadsheet reads the , as a "tab" function. We were trying to see if anyone else knew anything about that or around it.  

OLCC S03 | Wed, 03/01/2017 - 15:20

How can I add compound name to a google sheet? Can we have google sheet pull off compound name from PubChem or from ChemSpider or NIST. Also can we connect the two for example name or property of a compound in two columns using one set of codes.

Annotations