Background

Systematic reviews (SR) and meta-analyses (MA) are key elements in both evidence-based healthcare [1] and evidence-based research [2] By synthesizing the available evidence, SRs support clinicians in making well-informed decisions about health care [3] and researchers in deciding which topics are the most relevant for new research [4]. When conducting SRs, it is essential to perform a comprehensive literature search to identify all published studies relevant to the research question as a failure to do so can result in selection bias and distort the conclusion of the review by potentially over- or underestimating of the treatment effect [3, 5]. One of the recommended methods to identify scientific literature in health science is searching electronic databases [3, 6]. However, when doing a high quality search two main questions arise; which databases is necessary to searched, and how many? According to the Cochrane Collaborations Methodological Expectations of Cochrane Intervention Reviews (MECIR) three main databases: MEDLINE, EMBASE and CENTRAL are mandatory electronic databases to search when performing a Cochrane Review [7]. Several studies indicate that searching MEDLINE identifies the highest number of studies [810] and others that the gains from searching beyond MEDLINE and in particular EMBASE are modest [11], however multiple studies have found that searching MEDLINE alone is not sufficient [810, 1219]. In addition, when analysing the use of databases in Cochrane reviews, it was found that between 1 and 27 different databases was used [20], even though some studies indicates that searching no more than 3–5 databases seems to be sufficient,[8, 16, 17] and searching only one database would not be enough [19]. Even though MEDLINE, EMBASE and other major medical databases yield a high proportion of relevant studies, some studies found it necessary to include other sources such as reference- and citation search, browsing conference proceedings, asking experts and alike to identify all the relevant studies [9, 14, 17, 19]. The difference between the results from these studies could be due to their evaluation of different areas or due to the methods used to search the different databases; hence some of the above mentioned studies construct a new search strategy to identify studies in a given area thereby making the evaluation be dependent on not just the database, but also the quality/accuracy of the search strategy constructed [2123]. The great variations thus indicate a need to evaluate if the MECIR guideline recommendations to search MEDLINE, EMBASE and CENTRAL combined would be enough when performing a literature search or whether additional databases should be added to this list.

In order to focus this evaluation, we choose to concentrate on musculoskeletal disorders. The area of musculoskeletal injuries and diseases is the leading cause of long-term pain and physical disability [2426] and are associated with 130 million health care encounters and estimated to cost over $50 billion annually in the United States [27]. In addition, the Cochrane Musculoskeletal Review Group (CMSG) is among the largest review groups in the Cochrane Collaboration, responsible for more than 200 SRs.

The aim of this study is therefore to evaluate the relative recall in the databases recommended by MECIR for systematic literature searches within the area of musculoskeletal disorders. Furthermore, this study addressed the question: What is the increase in recall when searching additional databases to searching MEDLINE, EMBASE and CENTRAL combined?

Methods

Selection of systematic reviews

All SRs from the Cochrane Database of Systematic Reviews (CDSR) published by CMSG were obtained [28]. SRs were excluded if they: (i) did not include at least five randomised controlled trials (RCT), as the consequence of missing one study in reviews with few studies included would affect the overall percentage more than with a higher total of included studies. (ii) had been withdrawn, (ii) did not report any search history in the SR (iii) did not search all of the following sources: MEDLINE, EMBASE, CENTRAL, reference- and hand searching, as recommended by The Cochrane Handbook [3] and MECIR guidelines [7]. This strategy was defined in order to identify systematic reviews, which had included all (or close to all) relevant studies related to at certain research question by both searching electronic databases and other sources.

The recall was used to evaluate the ability of a search strategy to identify all relevant studies [29]. Recall can be defined as the percentage of relevant records retrieved divided by the total number of included studies in the individual systematic reviews. However, to estimate true recall one need to know the total amount of relevant records in a database, which is not an easy (if not impossible) task. Thus often, relative recall is estimated by firstly defining a pool of relevant records (the included studies in a SR) and then determines what proportion of this pool the literature search retrieves.[30, 31] In this study we therefore used the included studies in each of the included SRs as the pool of relevant records.

Identification of databases

From the pool of SRs, a list of databases used was created. Databases were ranked in descending order according to how many SRs that had searched the database. Databases other than MEDLINE, EMBASE and CENTRAL were deemed eligible if they (i) was indexing RCTs, (ii) was in English, (iii) was used by at least three SRs.

Data-extraction

The following data were extracted for each included SR: (i) details of the search strategy as described in the review (ii) date of when the search was performed or updated (iii) full bibliographic details of all primary studies included in the SR (i.e. title of the study, author names, journal title, publications year etc.).

Searching individual databases

The search strategy/strategies reported in each SR was replicated and used for searching all the databases included in the SR. For databases with no reported search strategy, the MEDLINE search strategy was replicated and searched in all included databases. MEDLINE syntax (i.e. fields, truncation, adjacency) was modified to the individual database. When possible, the exact search dates from the SR was used for each database. However, CENTRAL for instance, only allows specification by month and year. End Note ×7.5.3 software (Thomson Reuters™) was used to manage records retrieved from searches of electronic databases.

Statistical analysis

Relative recall for each of the included database was calculated separately and for each of the included SR. Relative median recall for each database was calculated for all included SRs combined. Cumulative median relative recall was estimated for searches in MEDLINE, EMBASE and CENTRAL adding databases in descending order (based on how often the databases was searched in the SRs published by CMSG). Data managing was performed using Microsoft Excel 2016 and data analysis was performed using STATA version 13.1 (StataCorp, College Station, Texas) software package.

Additional analyses

Subgroup analysis was pre-specified and planned to assess the cumulative median recall for subgroups, rehabilitation, medicine or other content (surgery, lifestyle intervention, electrical stimulation etc.) as previous studies have found differences in recommendations depending on the topics searched [32, 33]. One post-hoc sensitivity analysis was conducted to address to what extend the inclusion of SRs with a cut-off of three included RCTs instead of five would affect the result.

Results

Eligible databases and systematic reviews

A timeline is displayed in Fig. 1. A set of 164 SRs where identified and obtained from the CMSG on March 3, 2013 and revisited for an update on July 3 2013 by the first author (Fig. 2). Of the 164 SRs assessed for eligibility by title and abstract, 10 were excluded. Nine for being withdrawn and one for being an overview of reviews. Of the remaining 154 SRs assessed in full-text, 114 were excluded, as they did not search one or more of the following sources: MEDLINE-, CENTRAL-, EMBASE, or reference- and hand searching. 11 SRs were excluded, as they did not include five or more studies in their analysis. Six were excluded for not reporting any search strategy in the SR, neither reporting where one could be acquired. A final set of 23 SRs finally met all inclusion criteria [3456], (Table 1).

Fig. 1
figure 1

A timeline of the selecting, inclusion and analysis process

Fig. 2
figure 2

Flowchart for inclusion of Cochrane reviews and databases

Table 1 Characteristics of the included Cochrane reviews

The generated list of databases other than MEDLINE, EMBASE and CENTRAL included a total of 58 databases identified in the 23 included SRs. Of these 58 databases, 48 databases were excluded; 10 did not index RCTs (i.e. trial registry etc.), nine where included in other databases (i.e. Premedline in MEDLINE etc.), 28 were used in less than three SRs, and one database was not in English. The following 10 databases met the inclusion criteria: AMED (via EBSCOhost), CINAHL (via EBSCOhost), HealthSTAR (via OVID), MANTIS (via OVID), OT-Seeker, PEDro, PsychINFO (via OVID), SCOPUS, SportDISCUS (via EBSCOhost) and Web of Science. Searching MEDLINE was performed using the host specified in the SR (i.e. via OVID or Pubmed), EMBASE via the OVID and CENTRAL via the Wiley InterScience portal.

Characteristics of the included systematic reviews

The 23 SRs included a total of 365 primary studies. Each review included from 5 to 103 studies (median 10) (Table 1). The number of search strategies reported in the SRs ranged from 1 to 7 (median 2). Eleven out of 23 SRs reported only one search strategy; 4 of which reported a “standard search strategy” that was adapted to other databases searched, while 7 reported the search strategy used for MEDLINE (Table 1). Of the 23 SRs, the intervention in 5 was classified as rehabilitation, 14 as medicine and 4 was classified as other content (Table 1).

Synthesis of results

Table 2 displays the median relative recall for the combined search in MEDLINE, EMBASE and CENTRAL and for the additional 10 databases included. Data shown are the median recall and interquartile range (IQR) from the total number of SRs and for the three subgroups separately. MEDLINE, EMBASE and CENTRAL combined yielded a median recall of 88.9% (IQR 81.6–100%), followed by SCOPUS (85.7%) and HealthSTAR (83.3%) (Table 2).

Table 2 Median recall analysis for each database in descending order

Results of the overall cumulative median analysis on relative recall are displayed in Fig. 3 and Table 3. The most exhaustive search (i.e. the minimum number of databases required to be searched to retrieve the maximum number of studies) involved searching MEDLINE, EMBASE and CENTRAL with the addition of SCOPUS and CINAHL. Results show that adding these databases to MEDLINE, EMBASE and CENTRAL increased the median recall by 2.0 percentage points, from 88.9% to a median recall of 90.9% (IQR 83.3–100%). Adding the remaining 8 databases did not increase the median recall.

Fig. 3
figure 3

The accumulating percentage as a boxplot

Table 3 Overall cumulative median analysis on relative recall

Additional analyses

Subgroup analysis

Subgroup analyses according to content area demonstrated some variations. The most exhaustive search for the rehabilitation group involved searching MEDLINE, EMBASE and CENTRAL with the addition of SCOPUS and CINAHL for a cumulative median recall of 100% (IQR 60–100%) (Table 4), the medicine group with the addition of SCOPUS for a cumulative median recall of 87.3% (IQR 83.3–97.7%) (Table 4), and the other content groups with the addition of Scopus and CINAHL for a cumulative median recall of 100% (IQR 97.7–100%) (Table 4).

Table 4 Cumulative subgroup analysis according to content area

Post-hoc analysis

Post-hoc sensitivity analysis showed that with the inclusion of SRs with at least three included RCTs, 4 SRs would be added to the analyses [5760]. The analyses showed that the cumulative median recall increased when adding these SRs, however the IQR remained unchanged (Table 5).

Table 5 Overall cumulative median analysis on relative recall with a cut-off of three included studies

Discussion

This study supports the recommendations by Cochrane Collaboration to prioritize MEDLINE, EMBASE and CENTRAL as the basic databases for literature search to locate RCTs in the musculoskeletal area. Secondly, this study indicates that besides MEDLINE, EMBASE and CENTRAL a literature search to locate RCTs in the musculoskeletal area could also consider SCOPUS and CINAHL. Finally, this study indicates that even with the addition search of 10 other often used databases median recall is not improved noticeably.

Thirteen different databases were not enough to identify all relevant references. Searching MEDLINE, EMBASE and CENTRAL retrieved 88.9%, and searches in 10 additional databases increased the median recall by only 2 percentage point. Thus, results from this study could be interpreted, as an indication that searching databases is not sufficient to identify all relevant references and that other sources must be included in the literature search in order to achieve a larger recall. This study does not evaluate which source that may be the most important. Savoie et al. and Helmer et al. [61, 62] found that 29.2% of all items retrieved for two SRs could be uncovered by extended systematic search methods; searching subject- specific or specialized databases (including trial registries), hand-searching, scanning reference lists, and communicating personally with experts. Yet Robinson et al. [63] recently showed that researchers do not cite all possible previous trials, and that less than half (38%) of RCTs could be identified by citation network searching.

It therefore remains to be evaluated how much higher recall could be achieved by supplementing the database search with reference and/or citation search, and which impact if any these additional sources have on the pooled estimate effect.

Searching SCOPUS and CINAHL increased the median recall by 2 point. Yet, as results from the subgroup analysis showed, each database contributed differently depending on the field groups searched. SCOPUS increased the median recall slightly in the Other group, and had some effect on the IQR in all three groups. This could be due to the fact that SCOPUS is a generic database containing studies from a wide range of subject fields. The large increase in median recall in the rehabilitation group searching CINAHL could be because CINAHL is a database including research from health care professionals often involved with rehabilitation. The fact that CINAHL only increased the recall in the area of rehabilitation are supported by Beckles et al. who strongly recommends that the database should be relegated too selective rather than routine searching due to a very low proportion of unique references [64].

Results from the post-hoc sensitivity analyses showed not surprisingly that the inclusion of studies with a low proportion of included studies could introduce high risk of bias of the results. Adding only four more studies increased the median recall to 100% and by 10% compared to the main results. Yet, as the IQR of the results are unchanged, this reinforce the notion, that searching additional databases is less likely to add more studies.

To our knowledge only one earlier study concluded that one database was enough in order to achieve full recall. Kelly et al. [65] concluded that MEDLINE was enough to identify all relevant studies for their specific question. However, they concluded that to fully capture the complete body of available literature on other subjects might require searching multiple databases [20]. This is strongly supported by a number of other studies [813, 15, 16, 1822, 33, 6678]. Studies evaluating this question within the musculoskeletal field make the same conclusion: searching more than one database is necessary [14, 17, 23, 32, 7981]. Based on results from earlier studies and the results from the present study, recommendation for an optimal systematic literature search to locate RCTs within the musculoskeletal area may be to use the three generic databases: MEDLINE; EMBASE and CENTRAL, and an additional two or more other databases. However, this search should ad other sources such as reference- and citation search, grey literature, conference proceedings, and contact experts within the area as results from this study suggest that 10% could be missed when only searching electronical databases to identify relevant studies.

Limitations of this study

This study has some limitations. An important limitation of this study and other studies evaluating the recall of a systematic literature search is the definition of the true number of studies that should be identified. In this study we defined this as the number of studies deemed relevant (i.e. included) in a SR, yet as a SR seek to answer a well-defined question, there are some limitations to whether the included SRs in this study fully represent the various areas of the musculoskeletal field. Another limitation of any study reproducing an original search strategy at a later date is that the contents and indexing of databases change over time, and not all of these changes can be rewound.

Another limitation to this study is the underlining assumption that the systematic literature search strategy used in each SR did capture all relevant studies in the database searched. The result from the present study does not evaluate this question, yet Sampson et al. [82] found that errors in electronic search strategies reduce the effectiveness of electronic search strategies. Further research is needed to evaluate not only the recall of studies retrieved using a search strategy, but also comparing this to the recall of studies indexed in a database by bibliographic verification: searching for known items [83], thereby addressing the key question, what is indexed in a database? and what is actually retrieved when searching this database?

The aim of this study was to evaluate the relative recall in the databases recommended by MECIR for systematic literature searches to locate RCTs within the area of musculoskeletal disorders. The use and limitations of the method to the musculoskeletal area thus clearly limits the general conclusion from this study. However, our results are in line with other studies evaluating literature search in electronic databases.

The strengths of this study lies in the systematic approach of selecting Cochrane SRs of the highest quality, and combining results of the literature search from these SRs in a way that make SRs with a high number of studies included equal to those with low number of studies included. This is also one of few studies [20, 23, 84] that have combined multiple databases using cumulative analysis, thereby accepting what researchers have urged in the past, that searching one database is not enough but investigating what a combined search would yield.

Conclusions

Searching MEDLINE, EMBASE and CENTRAL is not sufficient for identifying all effect studies within the area of musculoskeletal disorders. Literature searches in ten additional databases only increases the median recall by 2 percentage point.

It remains to be evaluated how much higher the relative recall could be achieved by supplementing the database search with reference and citation search. Further studies where the same methods are applied on different content areas needs to be performed, to see if the assumption that the way to perform a search depends on the content area is true or not. It is possible that searching databases is not sufficient to identify all relevant references, and that reviewers must rely upon additional sources in their literature search, but further research on these additional sources is needed.