Guidance on Searching for Chemical Information and
Guidance on Searching for Chemical Information and Data
IntroductionLocating Studies and Data
Types of Information Sources
Locating Studies and Data
Suggested Information Sources
Looking for information on chemicals is often complicated by the fact that a chemical may be called different things by different people. Because of the complexity of searching for data on chemicals, it is recommended that CAS Registry Numbers be used as the primary key when searching electronic databases.
Searches using chemical nomenclature may also be appropriate in some cases, although such searches can either miss relevant studies because a different chemical synonym was used. On the other hand, such searches may sometimes result in a large number of "false drops" because other chemicals share many parts of the desired chemical's name. In addition, some studies may have looked at the effects of a chemical as part of a mixture or formulated product.
CAS Registry Numbers are preferred because they were created specifically to function as unique identifiers to help eliminate the confusion caused by the variety of synonyms that could be used for the same chemical. They have since been adopted by many government agencies and other organizations as a standard. Searching by CAS Registry Numbers for individual chemical substances enables one to specifically identify a substance without needing to know the particular synonym an author or database publisher may have used for the chemical.
Limitations of searching by CAS Registry Number
In cases where some articles or studies may not include CAS Registry Numbers or where a publisher may not have indexed by CAS Registry Number, searchers will need to search by some other means. The other possibilities are to search by synonym or by chemical structure (if a system has that capability).
If a database indexes studies by CAS Registry Number there will be less need to do a search by synonym, but a sponsor may want to try a variety of synonyms just to be on the safe side. Most chemical databases index by CAS Registry Number and allow fielded searching by Registry Number, but some do not. Library catalogs and similar databases will generally not be indexed by Registry Number, but will usually be searchable on the title and subject headings. For example, while most of the records within the National Library of Medicine's TOXLINE database are indexed by CAS Registry Number, some are not. Searchers should check a system's documentation to see whether controlled vocabulary terms relevant to their search are used.
Another point to consider in developing a search strategy is the possible value of looking for "other forms" of the chemical. For example, if the HPV chemical is a labile salt of an acid or an amine, there may be value in looking for relevant information on other salt forms or the free acid or amine. One way of approaching this task is via chemical substructure searching to identify the CAS Registry Numbers of other salt forms of the target chemical. These Registry Numbers and chemical identities could then be run through the search strategy to identify data which may be applicable via a Structure Activity Relationship (SAR)-based argument. (Note that the identity of the counter ion may be an important element to consider in evaluating the relevance of the data. For example, consider the situation involving data on a lead (Pb) salt of an HPV acid versus data on the soldium salt of that acid.)
Other issues surrounding searching chemical databases
In addition to problems
created by authors using different synonyms for the same chemical, another
problem arises because database publishers and vendors can construct databases
differently and the software used to create the database can operate in
different ways, especially in terms of the syntax used to construct search
statements. Would-be searchers should realize that advanced training
is required to search some databases.
Because of this need for training, interested parties have two basic options. One, learn to search the databases themselves, or two, have someone to do it for them. We will describe some of the basic approaches you can take in searching. Because of the complexity of the systems, it would be we have included information on contacting the vendors regarding system documentation and appropriate training.
For those people who plan on doing large amounts of searching, it is recommended that you refer to one of the many volumes written on this subject. (Maizell, Ridley, Wiggins, Wexler)
General search strategies for identifying relevant studies
Searchers may want to search databases that are free or relatively inexpensive first. Some chemicals may have had many studies and articles written about them, but it is still possible for there to be little or no data about particular SIDS endpoints. Doing such searches on more expensive databases could result in large charges.
One of the best places to start is with a "pointer" database such as NLM's ChemID database. Pointer databases provide an indication of the specific databases where information on the chemical can be found.
Some systems take that one step further by providing the capability to do a search across groups or clusters of databases. This way searchers can determine whether there are any articles or records concerning a the chemical in question in a particular database, thus eliminating the need to search that database. EPA's Chemical Hazard Data Availability Study relied on such an element in its strategy for identifying which databases in the Chemical Information System (CIS) might contain relevant studies by submitting a list of CAS Registry Numbers to CIS's Structure and Nomenclature Search System (SANSS).
Once searchers have an idea of the number of articles in question, they can decide whether they need to narrow the scope of their search to a particular endpoint. For example, if a search on a particular chemical only gives three "hits", it would be easier to just display those citations rather than intersect that search set with terms for the various endpoints.
Most systems have a variety of print and display options. These can be include "citation only"; "citation and abstract"; "citation, abstract, and subject headings"; or, in some systems, even the entire text of the article. The cost associated with these options varies by vendor and by database. Searchers should consult their documentation for information on fees.
It will be clear from the titles of some articles what endpoints a study was investigating while for others it may not be particularly clear whether any included data are relevant to the HPV Challenge Program.
In some cases, a search on a particular chemical may result in dozens, even hundreds, of hits. As noted above, just because one aspect of a chemical has been well documented does not mean that there will be data for all the endpoints requested under the HPV Challenge Program. Searchers can take one of two approaches here.
One, most systems have the capability to display a list of the titles of the articles. (Note: It will sometimes be helpful to include at least the names of journals in citation displays since they may provide clues as to what endpoint was being investigated.) Searchers will need to determine which studies to examine for possible relevance.
Two, the search set can be combined with appropriate terminology for a particular endpoint. For example, a search for articles concerning reproductive effects might include a search for the words "REPRO?" or "TERATO?" (where "?" represents a wildcard character to allow for different forms of the words--different search engines may use different characters as wildcards). Some vendors or database producers might even have indexed the articles cited in their databases using controlled vocabulary terms or special classification codes. Since it is not possible to include all the details for that here searchers should consult the users' guide or other documentation for the particular system they are using. If there are few hits, combining search sets will probably not be needed.
One way of identifying relevant search terms is to identify one relevant article, then view the terms and classification codes used to index it. Some database producers include thesauri containing controlled vocabulary terms and related terms. Experienced searchers can use those to narrow or broaden their searches.
Searchers should also be aware that studies found searching one database may duplicate those found while searching another database. In cases where a database contains subfiles, duplicates may show up in search results. In such cases the number of hits may not reflect the number of studies actually cited.
Verifying relevance of studies identified in a search
It is important to verify that the articles or studies found do indeed deal with the chemical of interest. There is always the possibility the study may not have focused on the chemical searched for, but that it was indexed on a chemical because it was used as a solvent or a substrate. Verifying the chemical's identity is especially important when searching by synonym or by chemical structure.
Obtaining copies of studies
For the full text
of articles not available electronically, the original publishers may
be able to provide copies (for a price). Document delivery services
and information brokers can also obtain copies of many articles for a
charge. Companies with libraries may be able to get copies through
interlibrary loan. Note: With the exception of U.S. government publications,
most of the articles searchers will find will be covered by U.S. copyright
laws. Searchers are responsible for making sure they are in compliance
with these laws.
Basic procedure for searching for chemical information in databases
Examples: ChemID, TOXLINE
- Identify the correct CAS Registry Number for the chemical in question.
- In databases use the database's search function to search for the CAS Registry Number. (Note: If a database is not indexed on CAS Registry Numbers searchers will need to use synonyms.)
- If the search results in many hits you will need to narrow the search by combining the search set with appropriate terms.
Examples: Merck Index
- Identify the correct CAS Registry Number for the chemical in question.
- Check to see whether the publication has a CAS Registry Number index (many do, though not all).
- If the publication has a CAS Registry Number index, use that to determine under what entry data about the chemical is listed. This will often be by name.
- If the publication does not have a CAS Registry Number index, searchers will need to use the general index or table of contents to determine where in the publication information about the chemical is located.
IMPORTANT: The fact that a resource is included in this guide does not mean that EPA is endorsing those sources. Nor does it mean that EPA will automatically accept data included in or referenced by those sources. Studies and data will need to meet the requirements as spelled out in the guidance document on data adequacy in order to be accepted under the HPV Challenge Program.
Send comments to the Chemical Right to Know staff (firstname.lastname@example.org)
CAS Registry Numbers are a registered trademark of the Chemical Abstracts Service.