Discussion

Sunghwan Kim | Sat, 04/08/2017 - 18:00

For Entrez Indices, try Questions 1 and 2 of the Module 5 homework.  These questions were designed to help you search PubChem using Entrez indices.

For Entrez filters, try various filters shown in the second Figure on this module (http://olcc.ccce.divched.org/sites/olcc.ccce.divched.org/files/2017OLCCModule5fig2.png).  The “has pharm” filter selected in this Figure gives you all compounds that have pharmacological actions.  Probably, if you try Question 2, you would feel that getting all compounds satisfying Lipinski’s rule of 5 is somewhat tedious.  You can do this using the Entrez filter “Lipinski rule of 5”.  (The definition used in this filter is slightly different from those used in the homework question).

For Entrez links, try the dropdown menu called “Find Related Data” on the DocSum page returned from any PubChem search.   This dropdown menu is available on the bottom right of the DocSum page.  If you are looking for a more practical example, try Homework Question 3 (a) - (d) in Module 6 (not Module 5).  You don’t need to read the Module 6 material to solve Question 3(a) - (d).  Question 3(d) in Module 6 uses an Entrez Link, but you need to start Question 3(a) to understand the context of the task in 3(d).

 

Sunghwan Kim | Sat, 04/08/2017 - 16:54

>> for later work, what would be the smart and faster way to do clean up just like the last list?

Well, it would be case-by-case.  In my case, I would do the mapping first with the raw list.  If your query does not exist in PubChem (because it's not small molecules or for whatever reason), the search will not return any.  Then, you have a list of the queries that fails, so you can look into only those failures while subsequent cleaning up.

 

OLCC S71 | Sat, 04/08/2017 - 12:50
Hi, I was wondering if you knew of an example search that could be used to utilize those three elements in pubchem to be able to demonstrate the functionality of these three components of Entrez? Im also not quite sure of what the filters exactly are/do if you could maybe try to explain it in a different way than is provided in the module? Thanks!

OLCC S52 | Fri, 04/07/2017 - 13:05
Are we able to output all molecules in the PubChem data base with a molecular weight range, for example of 100 to 200, through the PUG-REST search? We know we can do this through the advanced search builder on PubChem, but is this able to be done programatically?

olcc s16 | Thu, 04/06/2017 - 21:31

Dr. Kim
>>Mapping names with CID- After cleaning up from the original list, I sorted the list alphabetically.Then, I convert the list from Excel to csv file, and do some clean up with the csv file because there are some quotation marks that Pubchem couldn't read( which giving me error report). Then I upload it through the identifier exchange service and choose synonyms as ID Input List.

>>Google Sheet- yes, you have mentioned about it previously but I don't know how to do it on Excel yet and since the list is cleaned up a bit so i thought it would be ok. I'll figure out how to redo it again on Excel.

>>I don't see the one-to-one mapping of the synonyms and the CIDs (which name was mapped to which CID).
I'm sorry. I forgot to upload that list

>> for later work, what would be the smart and faster way to do clean up just like the last list?
What I mean is cleaning up the list from biologics because I have been cleaning up the list manually and I'm afraid that I might skipped some or encountered a larger list in the future.

Thanks

Sunghwan Kim | Thu, 04/06/2017 - 17:51
By the way, how did you map the names with the CIDs? Please explain it to me briefly. (one or two sentences should be fine).

Sunghwan Kim | Thu, 04/06/2017 - 17:48

>> I tried to upload the list into GoogleSheet and linked to CACTUS resolver to retrieve the Standard Inchi Key.

I've already mentioned that GoogleSheet can't not handle a large number of web service requests.  Excel is the one you should.

>> I upload the list to Pubchem as synonyms to find and I got 673 results

I don't see the one-to-one mapping of the synonyms and the CIDs (which name was mapped to which CID).  I recommend that this information should be kept in one file.  (Ideally, it would be useful if you add an additional column for the mapped CID to the original excel file downloaded from EMA.  (Of course, you need to keep a copy of the unmodified original file seperately).

>> for later work, what would be the smart and faster way to do clean up just like the last list?

Are you talking about getting InChIKeys or mapping synonyms with CIDs, or something else?  Please tell me more specifically.

 

Sunghwan Kim | Thu, 04/06/2017 - 17:19

>> How many blue or green cells exist in this region of the heat map?

This is the question that you will need to answer, after finding the group of assays that target histamine receptors. (You don't need to report AIDs/GIs (because you were not asked to report.)  If you still don't have a good idea, think about what anti-histamines are and what the blue/green color means.

Sunghwan Kim | Thu, 04/06/2017 - 16:54

Sorry for this delayed reply. I don't know why, but this platform did not post my reply that I made previously.

 

Long story short,

=== The biological line notations for the two compounds are:

CID 71120 : Ac-D-Asp-Glu-OH

CID 188803 : Ac-Asp-Glu-OH

===

 

Sorry that the biological line notations for these two compounds have been removed by a filter that PubChem introduced to remove biological notations for non-biologic molecules. This filter has an accuracy of 90%, but the two compounds used in this homework happened to be removed.

olcc s16 | Thu, 04/06/2017 - 12:32
And also, for later work, what would be the smart and faster way to do clean up just like the last list? Because i have to do it manually and it would take a while. Thanks