3. Accessing and Searching Chemical Information Databases

Access to chemical information databases has been primarily subscription-based due to the large amount of manual labor needed for curating high-value chemical data and information as well as their often proprietary nature. In recent years, some open access databases such as PubChem and ChemSpider have emerged with the advance of cheminformatics techniques. However, chemists still need to rely on subscription-based databases for accurate and efficient searches. The table below summarizes some major databases we often use in chemistry research, their scopes, and major functionalities. Please choose the ones available through your institution to explore. The tutorials linked in the last column will help you get started with using these databases. For more information on databases covering chemical substance data, please consult Ben Wagner’s book chapter3 and the poster presented by Ye Li and Leena Lalwani.4

Resource

Access

Link to about page

 

Coverage and Scope

Chemical data indexed?

Structure search enabled?

Link to tutorials

Discipline

Years

Type of reference indexed

Cited reference included?

SciFinder

Subscription based

About Page

Chemistry related areas

Early 1800's - present

Journal articles, patents, conference proceedings, books

Yes

Yes

Yes

Scifinder Tutorial

Reaxys

Subscription based

About Page

Organic, inorganic, organometallic Chemistry

1771 - present

Journal articles and patents

Partially cross linked to Scopus

Yes

Yes

Reaxys Tutorial

Web of Science

Subscription based

About Page

General in science, social science, and humanities

1898 - present

Journal articles, books, conference proceedings

Yes

No

With additional subscription

Web of Science Tutorial

Scopus

Subscription based

About Page

General in physical sciences, life sciences, health science, and social sciences

Varies; as early as 1823;

Journal articles, books, conference papers

No

No

No

Scopus Tutorial

PubMed

Open access

About Page

Biomedical research

Varies, as early as 1809

Journal articles and books

No

No

No

PubMed Tutorial

PubChem

Open access

About Page

Biological activities of small molecules

Unclear

Chemical substance, property, biological activity and associated literature

No

Yes

Yes

PubChem Tutorial

ChemSpider

Open access

About Page

Chemical structure and associated data and literature

Unclear

Chemical substance, property, spectra, and associated literature

No

Yes

Yes

ChemSpider Tutorial

Please note: most of these databases index the citation and abstracts of articles; fulltext of the articles are often linked with the database records through your institutional citation link resolver to journals your institution have access to. Occasionally, you may discover articles that your institution does not have access to online. You can consult with your institution’s library for access to the print source or through interlibrary loan service.

3.1 Search Techniques for Databases in Chemistry

Most of the databases, especially the subscription based ones, provide useful tools to refine your search. You may use the advanced search features and/or filter options in these databases to optimize your search strategy. Here are a few useful tips in general for searching databases in Chemistry.

  • Use Boolean logic ( AND, OR, NOT) to connect your search terms instead of typing in a full sentence 5
  • Pay attention to subject terms (standard terms a database used to index literature) and use them for more accurate search 5
  • Track down publications by a specific author through author name search or author ID such as ORCID 5
  • Search for references cited in a chosen article and search for other publications cited a chosen article to expand your findings5
  • Learn structure, substructure6, and reaction7 searching through practice and understanding how chemical structures are indexed (covered more in later modules)

3.2 Search Techniques for Physical Properties and Spectra of Chemical Substance

Searching for physical properties and spectra of chemical substances is an important skill for chemists to master. Physical property data and spectra data are often indexed around chemical substances in databases and handbooks with referencing to the primary literature published the data.3

  • Databases such as SciFinder, Reaxys, PubChem, and ChemSpider listed in the above table index physical properties, spectra, and bioactivities and can be searched through chemical substances. When you locate the interested substance in these databases, you can navigate to the particular physical property data or spectra under various sections in the substance record. Please note: you may need to go to the linked primary literature from the database to obtain the actual spectra or the conditions of how a piece of physical property was measured. There is another type of databases which function as repositories of spectra, such as the Spectral Database for Organic Compounds, SDBS. In SDBS, you can download the spectral data directly.
  • Handbook examples include CRC Handbook of Chemistry and Physics and ASM Handbooks. You may either browse the handbook by property categories or use the index to search for a specific substance. Online versions for many of these handbooks are available and will allow you to search for property data and spectra with keyword searching. Some handbooks available through the Knovel platform may even allow you to download and interact with individual tables and graphs containing substance property data.

Depending on the database and handbooks available for you, you can explore more of the search techniques specific to the resource using their guide or help. Also, you may learn more about them from Ben Wagner’s book chapter. 3

When you use physical properties and spectra discovered from literature in your own writing or presentation, the condition of how the data were measured are often important to include; because different conditions can generate very different data from the same substance. Without the original context, the audience of your writings will not be able to interpret the data you cited and evaluate your research based on the data.

3.3 Learning to Become a Power Searcher While Doing Research

You can always consult publications and learning materials from chemical information professionals for advanced search skills like those can be found in XCITR.1 It takes practice and patience to become a power searcher in the chemical information world. In reality, you may need to balance the comprehensiveness of results and the time spent to seek for them. Please never hesitate to consult with your librarians and experienced researchers around you for suggestions and tips. Further more, with what you will learn in this class, you will be able to see the chemical information world from an insider point of view. For example, in module 5 of this class: Comparing and Searching Chemical Entities, you will learn more about how databases like PubChem and ChemSpider are organized and updated in order to understand the advanced methods to retrieve chemical information from them.

Nowadays, search engines like Google and Google Scholar and collaborative references online like Wikipedia can often provide quick and easy access to chemical data and literature online. They can be used to get started with a topic as long as you pay attention to the true source of the data and information discovered and evaluate the sources carefully as described in the next section. For comprehensive research on a topic and finding credible sources, these quick tools are usually not sufficient. Using the databases mentioned above will ensure you perform effective and efficient searches to identify data and information for your research.

Rating: 
0
No votes yet
Join the conversation.

Comments 6

Judat Yazigi (not verified) | Wed, 09/02/2015 - 17:46
I've always found that these subscription based data-bases are a much more reliable sources then if you would go on google and find random websites. At least you would know for sure that the information presented is credible and also go into more detail. I definitely will start using these websites The more credible the information, the more in-depth the information is....the better!!

Robert Belford's picture
Robert Belford | Fri, 09/04/2015 - 15:24

Hi All, I was actually hoping there might be some initial discussion on this post, and I think this is a topic that will come up multiple times this semester. How do we evaluate data? I know some very reputable scientists who will only publish data if it is open. In fact one could argue that in this evolving world of Big Data, the data needs to be open. There are also national funding agencies that require taxpayer funded work to be open. Also, it is my understanding that many subscription based services actually mine the primary literature, which adds a layer of potential error, without consideration of issues like retraction watch, http://retractionwatch.com/. I am really curious what people with actual knowledge and experience in these issues think, and hope this topic will come up in many of the modules as we progress through the semester. I thank you for bringing up this topic.

John House (not verified) | Fri, 09/04/2015 - 17:38
Excellent point. It leads me to ask just how effectively can scientific literature be vetted for accuracy when access is restricted to private libraries. More needs to be done to push the scientific community towards open access of information.

Dr. Briney | Sun, 09/06/2015 - 14:54
There is definitely a lot of work being done to make more data openly available and to increase the quality of this data at the same time. Beyond just posting data when you publish an article - something that funding agencies are now strongly encouraging - there is a new mode of publishing called a "data paper". This is where you publish a paper *about a dataset* (not about an analysis on the dataset) and both the data paper AND the data are then peer reviewed and published. The Nature journal "Scientific Data" is one of the big players in this area but other journals, like PLOS, also publish data papers. Data peer review is still a new topic and generally involves: reviewing the dataset for consistency/errors, checking that the documentation completely and logically describes the dataset, and evaluating the importance/relevance of the data. Between data sharing, data papers, data mining, and other new types of scholarship, we're going to be seeing many changes to the way we research and publish going forward!

Leah Rae McEwen | Mon, 09/07/2015 - 15:57
It might be helpful in this conversation to distinguish a bit between processes associated with evaluating credibility of what is usually meant by "literature" (e.g. journal articles, book chapters, written word), and data sets (often numeric based, direct from research or compiled). Research data and written articles are related types of information in scientific research and not always completely separable, but they have tended to be distinguished in their paths to initial publication, how they are made discoverable by indexing services and available in libraries, how they are evaluated before and after publication and how they are used. As Kristin suggests, these lines are blurring somewhat with new modes of handling data and publishing in the digital environment. However, it is still useful for everyday research and also helpful when evaluating sources to be aware of these distinctions. For example, this Module 1 is primarily focusing on searching, evaluating and organizing written word literature; how to manage data sets are further considered in Module 3 and in other lectures throughout the course. I posted separately on evaluating the secondary literature databases discussed in this Module. The "data" in some of these sources primarily consist of citations and abstracts of articles, such as the Web of Science; other databases reprint numeric and process data from articles and other sources, such as Reaxys. The older types have traditionally been indexed by human professionals in part as another layer of review, such as Chemical Abstracts. More recent databases may be extracted by algorithm, such as PubChem, which reprints data from other compilations "as is" with a full disclosure of the source for the user to consider directly in their own evaluation process. As Bob mentions, databases may not pick up on the types of human sourced errors highlighted in "Retraction Watch", a blog that focuses primarily on data published within articles. Another vote for researchers to review data, and the rationale published with it either in a more traditional analysis or a "data paper", before using in their own work. A few other approaches to data evaluation related to chemistry in particular that are worth mentioning briefly as they will not be otherwise covered are crystallographic data and materials property data. The crystallographic data research community has formulated over time a process for peer review of these data through a robust standard file format that enables automated validation checks as well as human expert review. The data are concurrently published in a sustainable repository that supports both open retrieval of individual data sets through the original publication and subscription based analysis software (Cambridge Crystallographic Database, <a href="http://www.ccdc.cam.ac.uk/pages/Home.aspx">http://www.ccdc.cam.ac.uk/pages/Home.aspx</a>). The National Institute of Standards and Technology employs rigorous data evaluation strategies for various types of materials properties reported from various sources based on binary assessment of acceptable vs. non-acceptable for re-use in industry applications. An early version of a decision tree for ceramics materials is the NIST Interactive Data Evaluation Assessment Tool that identifies several levels of evaluated data, including certified, validated, qualified, commercial, typical, research and unevaluated (IDELA, <a href="http://www.ceramics.nist.gov/IDELA/IDELA.htm">http://www.ceramics.nist.gov/IDELA/IDELA.htm</a>). The more recent NIST ThermoData Engine employs similar types to dynamically evaluate data based on large compilations of experimental data (<a href="http://trc.nist.gov/tde.html">http://trc.nist.gov/tde.html</a>), the latest effort in a long history to systematically collect and evaluate data for re-use.

Leah Rae McEwen | Mon, 09/07/2015 - 13:11
Hi Judat, thank you for the comment. This presents a great opportunity to discuss evaluating information sources for research purposes. It is not enough to assume credibility of a resource based solely on the fact that it exists. And with seeming more information available through the Web than ever before, it becomes a critical part of the research workflow to discern the best quality of sources for the research question at hand. It can be helpful to look at range of characteristics for evaluating the reliability of a source, including depth of content, currency, authority, accuracy and bias. All of these characteristics will vary among sources based in part on the purpose for which they were developed, how they are structured and maintained, and by whom, which can be usually be determined from the About Pages. A list of rubrics for assessing information sources is available at: <a href="http://railsontrack.info/rubrics.aspx?catid=6">http://railsontrack.info/rubrics.aspx?catid=6</a> . This quick chart from McHenry College is fairly representative and easy-to-use, <a href="http://www.mchenry.edu/library/tutorial/pdf/EvaluatingSourcesRubric.pdf">http://www.mchenry.edu/library/tutorial/pdf/EvaluatingSourcesRubric.pdf</a> . With increasing access and easy tools and devices for using the Internet and lowering the barriers for public posting of information, more sources are available but at a greater range of potential quality. It is not enough to assume the first free source is the best. The decision of what is most appropriate to use ultimately remains with the researcher, hopefully an informed decision. Considering a few of the specific databases mentioned, Chemical Abstracts (aka, SciFinder) has a goal to be a comprehensive source of information on characterized compounds to support competitive R&D in the chemical industry, among other uses. This database indexes daily a broad range of publication types, from articles to conference proceedings to patents in chemistry and related sciences in more or less depth depending on the relevance. The information is abstracted and indexed by degreed chemists employed by the largest chemical scientific society based on chemical structures and fairly detailed terminology, and it is a substantial cost for institutions to subscribe. PubMed provides similarly broad coverage of the biomedical research literature with a robust medical focused index maintained by professional staff scientists and librarians at a national library, and the cost is subsidized for public use by a federal government. Both of these sources would pass most criteria for the evaluation characteristics mentioned above, although only one is subscription based.

Annotations