Background

Systematic reviews often apply statistical techniques to combine data, a meta-analysis, in order to arrive at a pooled treatment effect. However, the precision of this estimate depends on heterogeneity within and between the included studies. Clinical heterogeneity can be defined as differences in participant characteristics (e.g., age, baseline disease severity, ethnicity, comorbidities), types or timing of outcome measurements and intervention characteristics (e.g., dose, frequency of dose, training of interventionists) [1]. Clinical heterogeneity can cause significant statistical heterogeneity leading to inaccurate conclusions and ultimately misled decision making [1].

Healthcare professionals and policy makers are not commonly using systematic reviews as decision-making references despite their valuable applications [2]. Nevertheless, by teasing out factors that may influence the outcome effect, clinical heterogeneity assessment provides clinical decision makers with not only a more reliable estimate of the treatment effect, but also ideally the ability to tailor their interventions to improve the health outcomes of their patients.

Clinical heterogeneity is not to be confused with statistical heterogeneity or methodological heterogeneity. While clinical heterogeneity is the difference in intervention and outcome measurement, statistical heterogeneity is differences in results when measuring the same outcome [1]. This may include both opposite findings (benefit versus harm) or simply a difference in the extent of benefit or harm across studies. Ironically, it is possible that statistical heterogeneity could be a product of the presence of clinical or methodological heterogeneity. However, it is important to note that the lack of statistical heterogeneity should not preclude the investigation of either clinical or methodological heterogeneity.

There is no lack of resources offering reviewers routes to investigate statistical heterogeneity in systematic reviews. However, across consensus-based guidelines, statistical papers, and expert narrative reviews, there is little overlap in recommendations and only a small number of sources include a comprehensive set of recommendations regarding clinical heterogeneity [3]. Of nine methods manuals of the organizations or public-sector agencies that produce the largest number of systematic reviews, only five— U.S. Agency for Healthcare Research and Quality, Evidence-based Practice Centers Program (AHRQ EPC Program); Centre for Reviews and Dissemination (CRD); Cochrane Collaboration; Oregon Health & Science University Drug Effectiveness Review Project (DERP); European Network for Health Technology Assessment (EUnetHTA)—provided definitions of clinical heterogeneity, of which Cochrane provides the most detailed discussion [4]. Cochrane Handbook defines the three types of heterogeneity, explains how they differ, and offers advice regarding forest plots, a priori subgroup analysis and meta-regression to assess clinical heterogeneity [1]. Thus, the aim of this study is to identify to what extent investigators are assessing clinical heterogeneity in both Cochrane and non-Cochrane systematic reviews.

Methods

Study Selection

The most recent 100 systematic reviews published in 2012 were collected from five of the top journals in medicine as determined by impact factors as listed in the 2011 Thompson ISI citation reports of General and Internal Medicine: Journal of American Medical Association (JAMA), Archives of Internal Medicine (AIM), British Medical Journal (BMJ), The Lancet, and PLOS Medicine. To note, due to its limited number of annual systematic review publications, New England Journal of Medicine (NEJM) was not included in this study despite its high impact factor. Additionally, investigators retrieved the 100 most recently modified systematic reviews from the Cochrane Database published in 2012. A total of 317 of the most recently updated/published papers were initially considered for inclusion, however studies were excluded because they were only updated abstracts, author name changes or shortened versions of previously published reviews. Papers included were reported as being systematic reviews or meta-analyses only from any field of medicine.

Data Extraction

Investigators extracted data relating to number of studies included, total number of participants (i.e., the sample size across all included studies in each review), expertise of the review team members (by referring to their degrees/training and to relevant information on web pages at their respective institutions), presence of quantitative synthesis, exploration of clinical heterogeneity, clinically heterogeneous characteristics explored, basis for exploring clinical heterogeneity, methods used in investigating clinical heterogeneity, plotting and visual aids, author contact, inferences from clinical heterogeneity investigation, reporting assessment, and a priori vs. post-hoc analysis (Table 1) [5]. All data were extracted by one individual (LC) and cross-checked by an experienced methodologist and physician (JG).

Table 1 Criteria for investigating clinical heterogeneity in systematic reviews [5]

Statistical Analyses

Investigators conducted a chi-squared comparison of Cochrane versus non-Cochrane studies regarding number of studies included, total number of participants, presence of quantitative synthesis, exploration of clinical heterogeneity, clinically heterogeneous characteristics explored, basis for exploring clinical heterogeneity, methods used in investigating clinical heterogeneity, plotting and visual aids, author contact, inferences from clinical heterogeneity investigation, reporting assessment, and a priori vs. post-hoc analysis (Table 2). Investigators performed logistic regression with Cochrane or non-Cochrane systematic reviews being the response variable and the aforementioned components of clinical heterogeneity investigations being the predictors (Table 3) followed by a deletion process (Table 4).

Table 2 Chi-squared analysis of Cochrane status by individual item scores
Table 3 Logistic regression for being a Cochrane Review (dependent/response variable) by various predictor variables
Table 4 Logistic regression with stepwise deletion for being a Cochrane Review (dependent/response variable) by various predictor variables

Results

A total of 317 systematic reviews were considered, of which 199 were in the final analysis. The topics of Cochrane and non-Cochrane reviews differed with questions related to musculoskeletal, pulmonary and reproductive areas and non-Cochrane reviews more often relating to cardiac, endocrine and other medical conditions (see Fig. 1). Overall, the assessment and measurement of clinical heterogeneity varied greatly. A total of 81 % of Cochrane reviews and 90 % of non-Cochrane reviews explored characteristics that are considered aspects of clinical heterogeneity and also described the methods they planned to use to investigate the influence of those characteristics. The most commonly mentioned variables were age, sex, comorbidities, setting, geographic location, severity of disease and dose/dosing frequency. Unfortunately, only 1 % of non-Cochrane reviews and 8 % of Cochrane reviews explored all those clinical characteristics they initially chose. Additionally, very few studies mentioned clinician training, compliance, brand, co-interventions, dose route, ethnicity, prognostic markers and psychosocial variables as co-variates to investigate as potentially clinically heterogeneous.

Fig. 1
figure 1

Selected Cochrane and non-Cochrane studies arranged by discipline

In regards to comprehensive heterogeneity assessment, several measured areas of analysis in both Cochrane and non-Cochrane reviews were lacking. Only 49 % of non-Cochrane and 40 % of Cochrane reviews performed a sensitivity analysis to assess for outliers. Cochrane reviews (83 %) were however much better than non-Cochrane reviews (41 %) at contacting authors regarding missing data.

In forming and describing an appropriate team for sufficient planning and analysis, only 5 % and 2 % of all systematic reviews reported having researchers with methodological expertise in non-Cochrane and Cochrane reviews, respectively. Regarding data analysis, 81 % of non-Cochrane reviews included a meta-analysis compared to 62 % of Cochrane reviews. The majority of reviews made general suggestions regarding the importance of aspects relating to clinical heterogeneity but did not discuss the impact of these aspects on their analysis.

Another important aspect to assessing clinical heterogeneity is the transparency and truthful reporting of heterogeneity assessment and inferences. Arguably, limited reporting was a chief impediment to assessing the clinical heterogeneity analysis of these systematic reviews. However, only 22 % of non-Cochrane and 30 % of Cochrane reviews acknowledged reporting as a problem for investigating clinical heterogeneity.

A chi-squared test of the proportional percentage of “yes” for item scores showed that authors were more likely to describe how they planned to investigate clinical heterogeneity in Cochrane reviews, even though they were less likely to have performed a quantitative synthesis (Table 2). Additionally, chi-squared analysis revealed statistical favoritism towards non-Cochrane studies in performing a parsimonious and a priori analysis, as well as assessing statistical heterogeneity. However, Cochrane reviews also had proportionally increased contact with study authors if there was insufficient reporting. Lastly, a greater proportion of non-Cochrane reviews showed caution in making inferences from the findings of investigations of heterogeneity while Cochrane reviews were more likely to have characteristics that were chosen but not eventually investigated (Table 2).

From the logistic regression modeling, when comparing Cochrane and non-Cochrane studies, quantitative synthesis (OR = 0.38, CI = 0.20–0.73), greater number of studies (OR = 0.94, CI = 0.92–0.97) and aggregate patient data (OR = 0.28, CI = 0.15–0.53) were more likely to be included in the non-Cochrane systematic review (Table 3). Addressing aspects of clinical heterogeneity was not a significant predictor of whether the study was of Cochrane or non-Cochrane origin (OR = 2.14, CI = 0.94-4.87, p = 0.070).

Discussion

Inevitably, studies in systematic reviews will differ to varying extents. How these differences are identified and assessed influences the overall value of a systematic review. Our assessment reveals that there is room for improvement in assessing clinical heterogeneity in both Cochrane and non-Cochrane reviews. Despite Cochrane Collaboration emphasis on methodology, the proportion of reviews that assess clinical heterogeneity is less than those of non-Cochrane reviews. It is suggested that reviewers would benefit from a clear set of universal guidelines as to how to assess clinical heterogeneity.

More problematic is that clinical heterogeneity is difficult to measure because of a lack of reporting, poorly described interventions and incomplete details of participant characteristics. One of the strengths of this study is its search of recently published reviews from across various medical arenas. This may also mean a difference in journal reviewer emphasis on reporting clinical heterogeneity. However, it is important to note that this study included the 100 most recently published systematic reviews in selected journals and is largely not a comprehensive search of all reviews across all areas of medicine. Several of the reviews we investigated did not have enough information to properly determine if and how well they assessed clinical heterogeneity. While reporting is a common problem, there are no universal guidelines on assessing clinical heterogeneity. This lack of education is ultimately impeding the ability to accurately determine a review’s generalizability and could result in drastically skewed treatment effect estimates. It is important to note that a lack of complete reporting does not necessarily imply poor assessment. Thus in addition to investigating the methods of clinical heterogeneity, it is important that authors and reviewers ensure the clear and complete reporting of such assessments.

There are several potential limitations to our study. First, we included only 200 systematic reviews that were published in 2012. Therefore, our results may not generalize to other years, to other journals or to other Cochrane reviews. Second, data were extracted by one individual and cross-checked by an experienced statistician and physician. Ideally, all data would be independently extracted by two reviewers and compared to avoid bias. Next, we did not account for details of the instructions to authors in these journals. That is, the journals included in our study may have variable reporting or method guidelines for systematic reviews (e.g., PRISMA guidelines) [6]. Thus, published reviews in these journals will be influenced by such guidance, resulting in different assessments of clinical heterogeneity and ultimately of these reviews. Future research in this area could review the instructions to authors to determine their potential influence on investigations of clinical heterogeneity.

Measuring clinical heterogeneity requires some planning. We recommend investigators pre-plan what variables they want to investigate and discuss why they chose those variables. This may entail soliciting experts in the discipline to recommend what variables they feel are important to clinical decision-making. This might include inviting coinvestigators who have clinical or content expertise on the specific question or topic of the review. Additionally, investigators should decide which method to use to determine the effects of those variables (i.e., plots, contacting authors, statistical procedures, etc.). It is important to decide upon these criteria before data analysis and state in the meta-analysis that they did so. The planning team should consist ideally of at least two individuals—one with clinical expertise and one with expertise in systematic reviews or meta-analyses. As before, this team should be acknowledged, described, and critiqued in the methods section of the review.

In addition, we recommend investigators perform a post-hoc analysis. This may include looking at summary data sheets, looking at forest plots from meta-analyses and/or utilizing L’Abbe plots, dose–response curves, funnel plots, Galbraith plots and influence plots. After identifying all the variables of interest (both pre-planned and post-hoc), authors should have ideally at least 10 trials per variable to ensure power [7, 8]. To explore these variables, investigators should employ the use of subgroup analyses and/or meta-regressions. This should be followed up with sensitivity analyses to test the robustness of findings relative to decisions made in the review process. The method and reporting of investigating clinical heterogeneity should be transparent and propose future investigations to improve on implemented methods. In particular, we recommend that systematic reviews closely follow reporting guidelines for systematic reviews (PRISMA) [6] and recent guidance for investigating clinical heterogeneity [5].

Conclusions

In summary, defining the clinical characteristics that influence the intervention helps to tease out what population would most benefit from the intervention, or conversely who would be most harmed by that intervention. It is advised that all systematic reviews assess clinical variables and consider potential sources of heterogeneity via analysis, visual aids and with the help of individuals with methodological expertise in systematic reviews.