Module 6: Questions


  1. Conceptually, data in a database are stored in the same way as we would record them in a table or excel spreadsheet.  The rows in the table correspond to compounds, and the columns correspond to properties or descriptions for those compounds (e.g., melting and boiling points, chemical names, toxicity, bioactivity, target proteins, and so on).  These columns are commonly called “data fields”.  You may want to perform a search against all data fields or only a particular field.    To search the chemical name field of the records in the PubChem Compound database, a chemical name query needs to be suffixed with either of the “[synonym]” or “[completesynonym]” index.  The “[synonym]” index will search for molecules whose names contain the query chemical name as a part (that is, partial matching), and the “[completesynonym]” index will search for those whose names completely match the query (that is, exact matching).  If no index is given after the query, PubChem will search all data fields.

Go to the PubChem homepage ( and select the “Compound” tab above the search box. Provide the following queries in the search box and click the “Go” button.  How many hits do you get for each search?  Clicking the image of each compound will direct you to the Compound Summary page of that compound, which provides comprehensive information on the compound.  On the Compound Summary page of each compound, check the “Depositor-Supplied Synonyms” section to see if any of the chemical names of the molecule contains the string “zyrtec”.

(1) zyrtec
(2) zyrtec[synonym]
(3) zyrtec[completesynonym]




  1. Usinf this Link, To perform an identity search for Cymbalta (CID 60835), go to the Chemical Structure Search page ( and select the “Identity/Similarity” tab.  Expand the “Options” section by clicking the “plus” button and select the “Identical Structures” with “same connectivity” from the drop-down menus.  Expand the Filters section and limit the number of covalent units to 1 (by setting the range to “from 1 to 1”).  Provide the query CID in the search box and run the search.  Repeat the search with the “same isotopical labels” option selected.  Explain how the two different options affect the identity search results.




  1. Perform a 2-D similarity search using CID 5090 as a query.  Select the “Identity/Similarity” tab and expand the Options sections by clicking the “plus” button next to the “Options” section heading.  Select the “Similar Structures” and “95%” from the drop-down menus.  Expand the Filters section and limit the number of covalent units to 1.  Provide the CID query in the search box and press the “search” button.  Repeat the search with the following similarity search threshold: 90%, 85%, and 80%.  How many records are returned for each search?





The right column of the last search result page (for threshold >= 80%) shows what kind of information is available for the returned compounds.  Click the “Pharmacological Actions” link under “BioMedical Annotation” to choose the compounds with the Pharmacological Action annotations.  For each compound, check the information under the “Pharmacology and Biochemistry” section.  What pharmacological actions do these compouns have?




  1. Select the “3D Conformer” tab to perform a 3-D similarity search using CID 5090 as a query.  Expand the Options section and select the “(Sort results by) Shape-then-feature” and “(output to) NCBI Entrez” options from the drop-down menus.  Expand the Filters section and limit the covalent unit count to 1.  Type the query CID in the search box and press the “search” button.  How many compounds are returned?  How many CIDs have pharmacological action annotations.  Compare the results from 3-D similarity search with those from 2-D similarity search.





Note to instructors:  These answers may be changed a little bit from your key, because the contents in PubChem are updated regularly on a daily/weekly/monthly basis.  However, key points that each question is intended to emphasize will still be valid.

No votes yet
Join the conversation.

Comments 16

Sarah House (not verified) | Thu, 10/22/2015 - 20:59
On questions 3 and 4, when searching the database for compound 5090, I am getting the exact same result for all of the searches. It takes me directly to the page for the compound Rofecoxib. I wanted to make sure I wasn't doing anything wrong.

Sunghwan Kim | Thu, 10/22/2015 - 21:54
It works okay on my PC, and I can't reproduce the results you got. It sounds to me that you were getting the results from identity search, which should return CID 5090. Would you please tell me what options/filters were used during the similarity search?

Sarah House (not verified) | Fri, 10/23/2015 - 13:43
I was using the identity/similarity structure. I was using the filters listed in the information, but it still always lead me to the exact page for the rofecoxib compound.

Sunghwan Kim | Fri, 10/23/2015 - 14:58
I am not able to reproduce the results you said you got. I've uploaded a pdf file that contains a step-by-step instruction about how to perform a similarity search in question 3. Please follow the steps in the pdf file. If you do the same things as shown in the pdf file and still have the same problem, you can try two additional things: (1) Delete all histories/caches of your browser, and re-run the search again, or (2) Re-run the search using a different browser (e.g., any of IE, Safari, Google, FireFox, Opera, and so on). Please let me know whether you can reproduce what's seen in the pdf file.

Sunghwan Kim | Fri, 10/23/2015 - 15:25
I've uploaded a pdf file (screenshots.pdf) that contains a step-by-step instruction for similarity search using PubChem Structure Search tool (in Question 3) in case that you have difficulty in working on Question 3. The file can be found at the bottom of the Module 6 Questions page (<a href=""></a>).

John House (not verified) | Sun, 10/25/2015 - 22:07
When doing a search for all available structures on PubChem that meet certain substituent parameters, is this deemed a substructure search? Also, what other databases would you suggest one use other than PubChem for this purpose?

Sunghwan Kim | Sun, 10/25/2015 - 22:53
What do you mean by “meet certain substituent parameters”? Does it mean that molecules have certain substituents (e.g, carboxyl, amino, nitro, and halogen groups)? If not, please clarify what it means.

John House (not verified) | Sun, 10/25/2015 - 22:57
Correct. For example, you have a benzene ring. At carbon 1, 3, and 5 you can have any halogen bonded while at carbon 2 and 4, you can have any carboxyl group.

Sunghwan Kim | Sun, 10/25/2015 - 23:39
I see. What you are saying is indeed a substructure search. I am sure that many commercial databases support substructure search. Among public chemical databases, I’ve quickly tested PubChem, ChemSpider, and ChEMBL, and all of them support substructure search. However, although all of them support SMILES string as a query, I am not sure whether ChemSpider and ChEMBL support a SMARTS string as a query for substructure query. It means that, with these two databases, you will need to input fluorobenzene, chlorobenzene, and bromobenzene (as a query for three independent searches) to search for halobenzene substructures. If SMARTS string queries are supported, you need only one search.

John House (not verified) | Thu, 10/29/2015 - 16:58
Is there a way to avoid doing separate individual searches, and performing a single search where it returns all applicable results containing the designated halogens?

Sunghwan Kim | Thu, 10/29/2015 - 18:36
The answer depends on what kind of search you are doing. (1) For identity search or similarity search, you should provide a specific chemical structure as a query to find structures that are identical or similar to the query. If you have multiple compound queries to search for, you will need to perform independent identity/similarity searches. In this case, you will need to do it programmatically. (2) For substructure/super structure search, you can provide a “pattern” of chemical structures (e.g., for those containing benzene ring with halogen substituents), if a database search system supports chemical representations for generic structures (i.e., patterns), such as SMARTS. Go to Chemical Structure Search (<a href=""></a>) and do a *substructure* search using this query: a. SMILES string query: c1(c(c(c(c(c1F)N)F)N)F)N b. SMARTS string query: c1(c(c(c(c(c1[F,Cl,Br,I])N)[F,Cl,Br,I])N)[F,Cl,Br,I])N The second one looks somewhat complicated, but in essence, all F atoms are replaced with [F,Cl,Br,I], meaning that those atoms are any of the four halogen atoms. Note that the second SMARTS query includes the first SMILES query and other molecules with heavier halogens. One caveat of substructure search with a generic query is that it often takes too long, and the search will fail due to time-limit (30 seconds for substructure search, I believe). If the query is too generic (e.g., find any compounds with a hydroxyl group), it will return too many hits (often more than millions). If you have multiple query molecules that cannot be represented using a single SMARTS string (meaning they do not share a common structural characteristic or pattern), or if the database does not support SMARTS as a query language, you will need to do the searches programmatically.

John Turner (not verified) | Thu, 11/05/2015 - 08:17
For the second question the link is incomplete. The text suggest its attempting to go to <a href=""></a> however the search/search.cgi is not included so the link instead goes to <a href=""></a>

Sunghwan Kim | Thu, 11/05/2015 - 08:35
I've fixed the link. Thank you for notifying us of the error.

Judat Yazigi (not verified) | Thu, 11/05/2015 - 08:51
For question 4, when we're comparing the 3-D and 2-D similarity, do we compare all the different threshold (95%,90%,85%,80%) to the 3-D similarity or do we just pick one and compare???

Sunghwan Kim | Thu, 11/05/2015 - 09:16
Please compare 3-D similarity search results with each of the 2-D similarity search results at different thresholds.

Sunghwan Kim | Thu, 11/05/2015 - 10:25
Just in case some people are not familiar with comparing a long list of compounds with another, I've uploaded a file named "Similarity Search Comparison.pdf" that illustrates how this task can be done using "search histories". Like many other databases, PubChem automatically stores your search results as a "history". By using these histories (in conjunction with operators such as AND, OR, NOT), you can combine results from different searches. The file is available at the bottom of this page (<a href=""></a>)