Background

Retrieving the best current evidence for a specific medical discipline when searching in large electronic databases such as MEDLINE can be challenging. This challenge is due to the scatter of relevant articles in low concentration across a large number of journals, inherent limits in indexing, and lack of searching skill on the part of the user of the database [1]. For instance, MEDLINE searches take place in a database containing over 13 million citations from over 4,800 journals with over 571,000 new articles added each year [2]. MEDLINE includes articles on basic biomedical research and the clinical sciences including nursing, dentistry, veterinary medicine, pharmacy, allied health, and pre-clinical sciences and also covers life sciences, including some aspects of biology, environmental science, marine biology, plant and animal science as well as biophysics and chemistry [2]. Attempting to find articles relevant to a specific area or topic can be daunting for the searcher.

Researchers have developed search strategies to help retrieve scientifically sound, clinically relevant articles while searching in MEDLINE. To date the majority of the search strategies have been developed when searching for therapy, diagnostic and review articles [313]. In addition to these areas, we have also developed search strategies to identify scientifically sound, clinically relevant articles about causation, prognosis, economics, clinical prediction, and studies of a qualitative nature [1421]. These search strategies have been adapted for use in the Clinical Queries interface of PubMed http://www.ncbi.nlm.nih.gov/entrez/query/static/clinical.html as well as the limits screen of Ovid http://gateway.ut.ovid.com/gw1/ovidweb.cgi.

Although these search strategies are helpful in identifying scientifically sound, clinically relevant articles for clinical matters (e.g., treatment), they are not designed to detect content for any particular disorder (e.g., depression). When conducting a "usual" search in MEDLINE, content terms would be "ANDed" to the methodologic search strategies that have been developed (e.g., diabetes mellitus, type I.sh. AND randomized controlled trial.mp,pt.). To date, we are unaware of any studies reporting empirically tested search strategies for identifying articles for a particular disease or clinical discipline combined with methodologic search terms.

The objectives of this study were to develop optimal search strategies to detect articles of interest to the discipline area of mental health and to determine the effect that content search strategies have on the performance of methodologic search strategies for treatment when the strategies are combined using the Boolean "AND".

Methods

We compared the retrieval performance of mental health content search terms in MEDLINE with a manual review (hand search) of each article for each issue of 29 journal titles for the year 2000. Overall research staff hand searched 170 journal titles. These journals were chosen based on recommendations of clinicians and librarians, Science Citation Index Impact Factors provided by the Institute for Scientific Information, and ongoing assessment of their yield of studies and reviews of scientific merit and clinical relevance for the disciplines of internal medicine, general medical practice, mental health, and general nursing practice (list of journals provided by the authors upon request). Of these 170 hand searched journals, 161 were indexed in MEDLINE. Search strategies for the study we report here were developed using a 29 journal-subset chosen based on those journals that had the highest number of methodologically sound studies in the area of mental health, that is, those that contributed > 1 article to the journal Evidence-Based Mental Health http://ebmh.bmjjournals.com during the year 2000 (list of journals provided by the authors upon request).

We compiled a list of 3,395 index terms and textwords (list of terms tested provided by the authors upon request). This list was compiled after surveying 140 mental health specialists from around the world, reviewing the search strategies from 5 mental health focused Cochrane groups, and mapping textwords to MeSH terms. Examples of the search terms tested are '(learn: adj problem)', 'schizoid', 'depression', and 'mania', all as textwords; 'phobic disorders', the index term; and the index term 'aggression', exploded (i.e., a search term that automatically includes closely related indexing terms).

As part of a larger study [22], 6 trained, experienced research assistants read all issues of 170 journals for the publishing year 2000. Each article was rated using purpose and quality indicators and categorized into clinically relevant original studies, review articles, general papers, or case reports. The original and review articles were then categorized as 'pass' or 'fail' for methodologic rigor in the areas of therapy/quality improvement, diagnosis, prognosis, causation, economics, clinical prediction, and review articles. The research staff were rigorously calibrated before reviewing the journals and inter-rater agreement for identifying the format of articles (e.g., original study, review article) was 92% beyond chance (kappa statistic, 95% confidence interval (CI) 0.89 to 0.95). Inter-rater agreement for which articles met all scientific criteria (e.g., treatment study, diagnostic study) was 89% beyond chance (kappa statistic, CI 0.78 to 0.99) [22]. One research assistant then hand searched all articles in each issue of the 29 journal subset and indicated if the article was of interest to the area of mental health. The predetermined criteria for "of interest to mental health" were as follows:

Pharmacological interventions for persons with mental health problems; cognitive and behavorial approaches to helping any patient (e.g., including cancer patients); etiology pertaining to mental health; diagnosis pertaining to mental health; or economic issues pertaining to mental health.

The proposed search strategies were treated as "diagnostic tests" for sound studies and the manual review (hand search) of the literature was treated as the "gold standard". We determined the sensitivity, specificity, precision, and accuracy of each single term and combinations of terms in MEDLINE using an automated process. Sensitivity for a given topic is defined as the proportion of high quality articles for that topic that are retrieved; specificity is the proportion of low quality articles not retrieved; precision is the proportion of retrieved articles that are of high quality; and accuracy is the proportion of all articles that are correctly classified.

Individual search terms with sensitivity > 15% and specificity > 80% for articles of interest to mental health were incorporated into the development of search strategies that included 2 or more terms. All combinations of terms used the Boolean OR, for example, "mania.tw. OR depression.sh.". For the development of multiple-term search strategies to optimize either sensitivity or specificity, we tested all 2-term search strategies with sensitivity at least 75% and specificity at least 50%. For optimizing accuracy, 2-term search strategies with accuracy > 75% were considered for multiple-term development. 11,317 search strategies were tested in the development of mental health content search filters. To enhance the performance of the most sensitive mental health content search strategy, the single search terms with the highest sensitivity were successively added to the top performing 3-term search strategy until the best sensitivity was achieved while keeping specificity ≥50%.

In addition to developing mental heath content search strategies as just described, we also evaluated the performance of the methodologic search filters for treatment articles when "ANDed" with the mental health content filters.

Results

Indexing information was downloaded from MEDLINE for 12,233 articles from the 29 journals hand searched. Of these 3,277 (26.8%) were considered to be of interest to mental health. Search strategies were developed using all 12,233 articles. Thus, the strategies were tested for their ability to retrieve mental health articles from all other articles.

Table 1 shows the best single term for high-sensitivity, high-specificity, and best balance of sensitivity and specificity. The single term, exp mental disorders, produced the best sensitivity of 74.7% while keeping specificity at 94.0%. This term also produced the highest specificity and the optimal balance between sensitivity and specificity.

Table 1 Single term with the best sensitivity (keeping specificity ≥50%), best specificity (keeping sensitivity ≥50%), and best optimization of sensitivity and specificity (based on the lowest possible absolute difference between sensitivity and specificity) for detecting mental health content in MEDLINE in 2000

Combination of terms with the best results for sensitivity, specificity and optimization of sensitivity and specificity are shown in Tables 2, 3, 4. Combinations of terms improved on single search term performance for sensitivity. The 29-term search strategy shown in Table 2 achieved a sensitivity of 98.4% (a 23.7% improvement over the single term) while keeping specificity at 50.0%. The 3-term strategy shown in Table 3, psychiatr:.mp., OR exp mood disorders OR psycho:.tw., had the highest specificity at 97.1% (a 3.1% increase over the single term) while keeping sensitivity at 51.7%. The 4-term combination shown in Table 4, depress:.mp. OR behav:.mp. OR exp mental disorders OR psych:.mp., resulted in the best optimization strategy achieving above 89% for both sensitivity and specificity.

Table 2 Combination of terms with the best sensitivity (keeping specificity ≥50%) for detecting mental health content in MEDLINE in 2000 and performance when combined with the most sensitive strategy for detecting treatment studies
Table 3 Combination of terms with the best specificity (keeping sensitivity ≥50%) for detecting mental health content in MEDLINE in 2000 and performance when combined with the most specific strategy for detecting treatment studies
Table 4 Combination of terms with the best optimization of sensitivity and specificity (based on the lowest possible absolute difference between sensitivity and specificity) for detecting mental health content in MEDLINE in 2000 and performance when combined with the best optimization strategy for detecting treatment studies

Each of the top performing strategies for detecting mental health content were "ANDed" with the top performing methodologic search strategies for detecting scientifically sound, clinically relevant treatment studies. The results of these combinations are also shown in Tables 2, 3, 4. Comparing the search results of the most sensitive mental health content strategy alone with the results when it was combined with the most sensitive methodologic treatment strategy we found a 3-fold decrease in the absolute number of articles to be sorted through to detect those articles on target, that is, those articles with mental health content that were scientifically sound and clinically relevant for evaluating a treatment question (Table 2; 7,700 vs. 2,414). This means that when searching for scientifically sound treatment articles on mental health topics using the mental health content search strategy alone 1.7% of the retrieved articles were on target (1 out of every 60 articles). However, when searching for scientifically sound treatment articles on mental health topics using the mental health content search strategy combined with the most sensitive methodologic treatment strategy 5.3% of the retrieved articles were on target (1 out of every 19 articles). This effect was more dramatic when searching using the most specific strategies: a 17-fold absolute decrease was found (Table 3; 1,954 [1 out of every 29 articles were on target] vs. 117 [1 out of every 1.5]) whereas when using the optimization strategies, there was a 13-fold decrease (Table 4; 3,844 [1 out of every 33 articles were on target] vs. 304 [1 out of every 2.5]). Although there was a gain in terms of having to shift through fewer articles to find one on target, these search strategies do lead to some loses. For instance, when searching using the most sensitive combination just one on target article was lost. This loss is small because the sensitivity is so high. However, when searching using the most specific combination that loss was more substantive, 40 on-target articles were lost. The optimal combination led to 10 on target articles being missed.

Discussion

Our study documents search strategies that can help discriminate the literature with mental health content from articles that do not have mental health content. General practitioners, mental health practitioners, and researchers wanting an overview of the best current evidence in the area of mental health will best be served by the most sensitive search strategy when they have time to sort through articles. This search will have the highest probability of retrieving all relevant articles (in this study one on-target article missed), but will have the lowest precision, retrieving many irrelevant articles. With less time on their hands general practitioners, mental health practitioners, and researchers they may wish to search with the strategy that optimizes the balance between sensitivity and specificity (10 on target articles missed) or the strategy that optimizes specificity (40 on target articles missed).

As indicated in our previous papers [1421], when searching with the methodologic search filters alone we found that precision was generally low and therefore of concern. This was expected given the low proportion of relevant target articles for a given purpose in a very large, multipurpose database. This means that searchers will continue to need to spend time discarding irrelevant retrievals.

As reported in this paper, we set out to test whether precision would be enhanced by combining the methodologic search strategies with content specific terms using the Boolean 'AND'. We found a 3- to 17-fold decrease in the absolute number of articles that would need to be sorted through to find articles that are on target. This decrease is substantive and shows that combining empirically derived search strategies for enhancing the retrieval of relevant content with search strategies derived for enhancing the retrieval of scientifically sound, clinically relevant articles can have a profound impact on searching.

The example used in this paper is for retrieving high quality treatment papers with mental health content. Treatment was used because the sample size was sufficient to test the performance of combined search strategies (content and methods) in this 29 journal subset (n = 129). Other purpose categories, for example diagnosis, did not lend themselves to this test because the number of scientifically sound diagnostic articles with mental health content in this 29 journal subset was low (e.g., pass diagnosis articles with mental health content, n = 29).

Conclusion

Selected combinations of indexing terms and textwords can achieve high sensitivity or specificity in retrieving articles with mental health content in MEDLINE. Combining content search strategies with methodologic search strategies can lead to a substantive decrease in the absolute number of articles that need to be sorted through to find those articles that are on target.

Conflict of interest statement

No conflicts of interest. Both authors, Nancy L. Wilczynski and R. Brian Haynes, had full access to all the data in the study and had final responsibility for the decision to submit for publication.