I see. What you are saying is indeed a substructure search. I am sure that many commercial databases support substructure search. Among public chemical databases, I’ve quickly tested PubChem, ChemSpider, and ChEMBL, and all of them support substructure search. However, although all of them support SMILES string as a query, I am not sure whether ChemSpider and ChEMBL support a SMARTS string as a query for substructure query. It means that, with these two databases, you will need to input fluorobenzene, chlorobenzene, and bromobenzene (as a query for three independent searches) to search for halobenzene substructures. If SMARTS string queries are supported, you need only one search.
Correct. For example, you have a benzene ring. At carbon 1, 3, and 5 you can have any halogen bonded while at carbon 2 and 4, you can have any carboxyl group.
What do you mean by “meet certain substituent parameters”? Does it mean that molecules have certain substituents (e.g, carboxyl, amino, nitro, and halogen groups)? If not, please clarify what it means.
This question is more for any of the chemical search engines, even though this particular sub-section was on PubChem.
Chemical search engines are getting extremely complex and this seems to be a great thing for researchers to save time. My question is that when doing my background research on something and later publishing it, should I be keeping really detailed notes on every single option used in my chemical searches to be included into the citations or would this be overdoing things?
In the past, it may have only been necessary to include which search engine was used and the keyword, but with replicating an exact search being very complex, it seems there is a big need for this in the science community in regards to publications.
When doing a search for all available structures on PubChem that meet certain substituent parameters, is this deemed a substructure search? Also, what other databases would you suggest one use other than PubChem for this purpose?
If you know the structure of a molecule (in the form of xyz coordinates, molecular graph, etc.), you can generate a fingerprint from it. Therefore, if you have correct/valid structure-based chemical identifiers (e.g., InChI, SMILES, and so on) that have information on molecular structures, you can generate molecular fingerprints from them.
Molecular fingerprints represent the presence or absence of a particular functional group in a molecule, and one can use fingerprints to encode the number of a particular functional group. Therefore, they can be used to identify functional groups in molecules and how many, using the approach that you suggested in your post.
However, in essence, what you are trying to do is “substructure” search, which is discussed in Section 2.4 of Module 6 (<a href="http://olcc.ccce.divched.org/2015OLCCModule6P1TLO-2-4">http://olcc.ccce.divched.org/2015OLCCModule6P1TLO-2-4</a>). What you are trying to do can also be done by substructure search with a SMARTS string (that represents a functional group) as a query. (Of course, you will need to convert InChI to SMILES before the substructure search).
I've uploaded a pdf file (screenshots.pdf) that contains a step-by-step instruction for similarity search using PubChem Structure Search tool (in Question 3) in case that you have difficulty in working on Question 3. The file can be found at the bottom of the Module 6 Questions page (<a href="http://olcc.ccce.divched.org/2015OLCCModule6TLO2">http://olcc.ccce.divched.org/2015OLCCModule6TLO2</a>).
I am not able to reproduce the results you said you got. I've uploaded a pdf file that contains a step-by-step instruction about how to perform a similarity search in question 3. Please follow the steps in the pdf file.
If you do the same things as shown in the pdf file and still have the same problem, you can try two additional things:
(1) Delete all histories/caches of your browser, and re-run the search again, or
(2) Re-run the search using a different browser (e.g., any of IE, Safari, Google, FireFox, Opera, and so on).
Please let me know whether you can reproduce what's seen in the pdf file.
I was using the identity/similarity structure. I was using the filters listed in the information, but it still always lead me to the exact page for the rofecoxib compound.
It works okay on my PC, and I can't reproduce the results you got. It sounds to me that you were getting the results from identity search, which should return CID 5090. Would you please tell me what options/filters were used during the similarity search?
Yes, it is a substructure search.
Correct. For example, you
Meet certain substituent parameters?
Search Citing Question
Structure searches
Fingerprints can be used for that purpose, but......
Step-by-step instruction for similarity searches in Question 3
Follow the instructions in the pdf file at the bottom
I was using the identity
Are you running identity search or similarity search?