Background

Meta-analyses of randomized controlled or N-of-1 trials provides us with the highest level of evidence to inform guidelines and clinical practice. Their validity is therefore important. Over the past 20 years their methodology [13] and reporting has been improved. This includes establishing the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) statement (http://www.prisma-statement.org/), which has increased their rigor and transparency. Despite this, however, meta-analyses addressing the same research question arrive at conflicting conclusions or recommendations. The reasons have been explored [47]. They include numerous decision points in the review process – such as which study to include or exclude; the risk of bias assessment; and which data to extract. Even within the constraints of a strict protocol, subjective decisions are made.

In this example, we study two different author groups’ meta-analyses of trials investigating the effectiveness of screening for depression, which arrived at opposing recommendations, (one supporting screening and another questioning it) [810], in spite of identical research questions.

We wondered about the extent of prior belief in one or other of the possible recommendations, so called confirmation bias’: if an investigator approaches any question with a strong prior belief, their approach to answering that question may be biased [11].

We aim to explore how meta-analyses with the same research question can have opposing recommendations, using a case study approach to examine the decision-points.

Methods

We chose this example because we were aware of the startling discrepancies in recommendations from different reviews addressing the question. A search was conducted for all systematic reviews and meta-analyses on screening for depression in primary care using the databases MEDLINE, EMBASE,CINAHL, PsycLIT and the Cochrane Database of Systematic Reviews, and hand-searching of the relevant reference lists.

The objectives, findings and conclusions of all accessed reviews were compared (Table1). Two meta-analyses were selected for in depth exploration of the review process. Subsequently, two authors (FGS and MLvD) applied a stepwise approach to unravel the review process followed by the authors of the selected meta-analyses. Each decision moment in the analysis process was recorded alongside an appreciation of the decisions reported by the authors of the selected meta-analyses. Discrepancies between the authors of this study and the justification of choices made were recorded. The two other authors of this paper commented on consistency and transparency of the recorded process and findings. The individual randomized controlled trials (RCTs) included in each review were identified, accessed and examined. A table was constructed recording for each RCT the sample size of the trial, whether or not it favored screening, whether it was included and whether it was pooled in each of the reviews (Table2). The various decisions the authors of the two meta-analyses had made regarding which outcomes to analyze and their data extractions from original studies were explored.

Table 1 Comparison of research objectives, findings and conclusions in five reviews
Table 2 Comparison of trials included and pooled in 5 systematic reviews of depression screening

Results

The results of our explorative analysis are presented in the flowchart (Figure1). Five systematic reviews (four with pooled data) were identified. Three meta-analyses were conducted by Gilbody and colleagues between 2001 and 2008, including one Cochrane review [810]. None of these favoured screening. Two reviews (one meta-analysis) from another author group, the US Preventative Task Force (USPTF), in 2002 [12, 13] and 2009 [14, 15] favoured screening (Table1).

Figure 1
figure 1

Flowchart of decision points and rationale for choices when comparing contrasting systematic reviews.

The five reviews included a total of 26 RCTs [1641] and not one of these was included in all reviews (Table2). For example, for the outcome of providing practitioners with feedback on screening (detection of possible depression) prior to initiation of treatment, Gilbody 2001 [10] pooled four RCTs [17, 2123] whereas for the same outcome the USPTF pooled a completely different set of seven RCTs.[20, 26, 27, 30, 31, 33, 34] All of these studies would have been available to both author groups with the exception of the study by Wells [33], which might not have been published when Gilbody et al. conducted their search.

Each of the five reviews considered three different research questions (effectiveness on detection, treatment and patient outcomes) with different combinations of RCTs included for each. Again, none of these were common between reviews. This meant that there were 15 different combinations of RCTs for the five reviews considering the three research questions. For pragmatic reasons we decided to select two reviews with opposing recommendations which addressed the same research question to determine factors leading to discrepant findings.

The two meta-analyses we selected for comparison, one favouring and the other not favouring screening were the Cochrane review by Gilbody of 2005 [8] and the USPTF 2002 meta-analysis [13]. These two meta-analyses contained the most information on both included and excluded trials, had the most overlapping studies and both included pooled data. We decided to focus on only one of the three research questions addressed in the meta-analyses. The outcome of the effect of depression screening on treatment (i.e. if the patient received treatment for depression) was selected because this is of clinical importance and also included the largest number of studies used in the reviews. We identified RCTs included and pooled in either review and then examined these to determine which most influenced the results favouring screening or not screening.

We found that the opposing recommendations of the two reviews were largely determined by the Lewis study [27] pooled in the Cochrane but not the USPTF review, and the Wells trial [33] pooled in the USPTF but excluded from the Cochrane review.

On inspection of the forest plot in the Cochrane review for the outcome of management of depression following feedback (prescription of anti-depressants) [8] (their Analysis 2.2, p 28), the Lewis study [27] has the greatest weighting (37.5%). It can be seen clearly that this study shifts the plot from favoring screening to favoring not screening. The USPTF included this study in their review but did not pool it for this outcome because they report that the figures “cannot be calculated from available data”. There were 227 patients in each of the control and screened arms. The Cochrane review has entered the Lewis study in their forest plot as 100/227 for control and 125/227 for screening. It is unclear how they have derived these numbers. The Cochrane review states that for the Lewis study they used published data only [8]. The Lewis study reports that the mean number of psychotropic drug prescriptions for the control arm was 0.44 (SD 1.58) and for the screened arm was 0.55 (SD 1.43) with a p value of 0.6 (their Table 5) [27]. However the mean number of drugs prescribed does not necessarily equate to the proportion of patients taking psychotropic drugs. Our own attempts to contact the authors of the Lewis paper to obtain their data have been unsuccessful to date.

The RCT in the USPTF review [13] which has the greatest weighting and clearly influences the finding favouring towards screening is the Wells study.[33] This study enrolled 1356 patients who were screened as depressed using the “stem” items for major depressive and dysthymic disorders from the Composite International Diagnostic Interview (CIDI) [33]. Randomization was by clinic which either provided usual care (provider not informed that their patients were in the trial) or provided a quality improvement program with either psychotropic medication or psychological intervention (providers notified that their patients had screened positive for depression). The quality of care, mental health outcomes and retention of employment of depressed patients improved in the intervention group. The Wells study is excluded from the Cochrane review because it is a “Complex quality improvement programme” (Characteristics of excluded studies, p22) [8].

Discussion

What initially presented as a straightforward task revealed itself to be increasingly complex when we discovered that in the five reviews each considering three outcomes, there were 15 different combinations of RCTs. Our analysis of the process of two meta-analyses that address the same research question but reach contradictory conclusions demonstrates how decisions in the meta-analysis process can shape the conclusion. This is an important finding as evidence-based clinical guidelines and practice recommendations rely on evidence from systematic reviews and meta-analyses.

Two questions come to mind; “Who is right?” and, “What drove the decisions?" The second question is the most essential one that requires full attention from meta-analysts. Addressing the fundamental issue of human choices in a methodologically rigorous process might even make an answer to the first and most intuitive question superfluous.

There is ample literature on the impact of publication bias, referring to an overrepresentation of trials with a ‘positive’ outcome in searches, on the conclusions of meta-analyses [4, 42]. This type of bias can be addressed by searching for unpublished data or extending the search to languages other than English [2], although it is not clear if this is worth the effort [43].

Discrepancies in outcomes of meta-analyses have been documented and are often attributed to selective inclusion of studies [5, 44, 45]. Felson describes a model for bias in meta-analytic research identifying three stages at which bias can be introduced: finding studies, selection of studies to include and extraction of data.[46]. He argues that “selection bias of studies [as opposed to selection bias of individuals within studies] is probably the central reason for discrepant results in meta-analyses.” Cook et al. determined that discordant meta-analyses could be attributed to “incomplete identification of relevant studies, differential inclusion of non-English language and nonrandomized trials, different definitions .., provision of additional information through direct correspondence with authors, and different statistical methods”[47]. Another study of eight meta-analyses found “many errors in both application of eligibility criteria and dichotomous data extraction” [48].

While selection bias and differing data extraction may contribute to discrepancy, our study suggests that the bias begins before these steps. Over three research questions in five different reviews, we found 15 different sets of RCTs were included, yet one author group consistently found against while the other found for screening. Even though the two systematic reviews have cited each other’s earlier publication this does not appear to have prevented the discrepancies. Which studies are included and which data from these studies are used involves numerous decisions. To our knowledge, the issue of choices and decision making in the process of meta-analysis has not been studied empirically before.

The methodology of meta-analysis is well developed and is continuously being refined to address identified threats of bias. The process is well documented in numerous text books, of which the Cochrane Collaboration Reviewers’ Handbook [2] may be the most widely used. The Cochrane Collaboration, the largest database of systematic reviews and meta-analyses of clinical trials in medicine, requires its authors to produce a protocol describing the intended process of the review before embarking on the review. Each step is peer reviewed and monitored by editorial groups, ensuring methodological rigor. But no matter how rigorously we describe each step in the process, human decisions are being made all the time. When documenting each decision we made in our exploration, we ourselves, although experienced reviewers, were astonished by the number of decision moments that occurred. Moreover, some of these decisions could be traced to ‘subjective’ inclinations. For example, our choice to explore the question related to effect of screening on number of patients on treatment, was based on a compromise of the desire to study a clinically relevant question and at the same time have enough material for further study. Documenting each of these decisions and the rationale for the choices could add transparency to the process.

However, there might be an even more fundamental implicit source of “bias” embedded in the review process. The consistent findings of the two author groups suggests this. We hypothesise that authors may have a belief of what the outcome of their meta-analysis will be before they start, and that this belief may guide choices that are made on the way which may impact the review's results. This is a form of confirmation bias [49, 50].

This could be an important first form of bias in the complex decision process of a meta-analysis. It refers to the fact that authors or researchers seek or interpret evidence in ways that fit with (their own) existing beliefs, expectations, or hypothesis [49]. Confirmation bias has many different aspects according to the context in which it is analysed and been shown to play a role in clinical decision making [50], but to our knowledge it has not been applied to risk of bias assessment of meta-analyses. Unravelling this concept and making its impact explicit in the meta-analysis process could contribute to a better understanding of (often implicit) forms of bias that guide the reviewers’ choices along the way.

Meta-analyses with different conclusions may result in opposing recommendations with important consequences which might be reflected in clinical guidelines, as is the situation in our case, where the US guidelines recommended screening but the UK ones recommended not screening. We recommend that guideline writers and health policy makers should check all available systematic reviews to ensure such discrepancies do not exist. Where contradicting reviews are found guideline writers should address these discrepancies and justify any stand they take, not make a subjective decision to suit their own pre-conceived beliefs. This is where prior disclosure of belief of what the outcome will be would be of assistance.

The main limitation of our study is that we chose to compare only two meta-analyses from the many options available and we have introduced subjectivity by the choices we made. However, making these choices and their potential subjectivity explicit is the main strength of the study. Our proposal of confirmation bias to explain the dissonance can only be a hypothesis. It requires further study, comparing and unravelling decision points in other meta-analyses .

Conclusion

No meta-analysis is value-free. PRISMA involves a 27-item check list (http://www.prisma-statement.org/), and expanding this would not solve the problem of confirmation bias. Nevertheless, we were surprised at the number of decision points in a meta–analysis, and propose an additional step of recognising each decision point and being explicit about these choices and their rationale would greatly increase the transparency of the meta-analysis process. But a better improvement in transparency of meta-analysis could perhaps be achieved by asking authors to declare their belief of the outcome before they embark on the review process. This step can easily be built into the review process of the Cochrane Collaboration, where the review protocol precedes publication of the full review. The implicit “subjectivity” of the seemingly “objective” meta-analysis process deserves attention in all published reviews and is an important part of well-informed evidence-based practice.

Funding

Nil