Background

Randomized controlled trials (RCTs) and cohort studies are the most common study designs used to assess treatment effects of medical interventions [1, 2]. RCTs, if well-designed and well-conducted, are considered as the gold standard and are widely accepted as the ideal methodology for causal inference [1,2,3].

However, RCTs may not be available for certain medical treatments due to ethical reasons or may suffer from inherent methodological limitations such as low external validity [4]. On the other hand, cohort studies may often have higher external validity, but also a higher risk of confounding. It is generally considered that systematic reviews should be based on RCTs because these studies are more likely to provide unbiased information than other study designs [5].

According to recent GRADE guidance, cohort studies can be highly valuable and provide complementary, sequential, or replacement evidence for RCTs in a systematic review or other evidence syntheses [6]. However, the potential impact of integrating evidence from cohort studies in meta-analyses of RCTs in the medical field has not been investigated yet.

To close this important research gap, this empirical study aims to conduct a pooling scenario of bodies of evidence (BoE) from RCTs with matched BoE from cohort studies. We investigate the extent of how the integration of BoE from cohort studies modifies the conclusion of BoE of RCTs, the direction of effect estimates derived from BoE of RCTs, and its impact on statistical heterogeneity. Moreover, we will evaluate the contributed aggregated weights of RCTs to the pooled estimates, use random effects and common effects models for pooling, calculate 95% prediction intervals (PIs), and test for subgroup differences between BoE from RCTs and cohort studies.

Methods

The sample of this empirical study was based on a large meta-epidemiological study [7], which was planned, written, and reported in adherence to current guidance for meta-epidemiological methodology research [8]. Eligibility criteria (PI/ECO: patient/population, intervention/exposure, comparator, and outcome) are reported in Table 1. Briefly, we included systematic reviews on medical interventions (or exposures) that included both RCTs and cohort studies for the same patient-relevant outcome and that performed meta-analyses for at least one BoE [7].

Table 1 Detailed description of inclusion and exclusion criteria

Identification of systematic reviews of RCTs and cohort studies

The original search for the meta-epidemiological study was conducted in MEDLINE on 04.05.2020 for the period between 01.01.2010 and 31.12.2019 in the 13 medical journals with the highest impact factor (according to the Journal Citation Report [JCR] 2018 category general and internal medicine). Initially, we planned to include the ten highest impact factor journals, but three journals did not publish any systematic review with an eligible BoE-pair. We therefore included the subsequent three journals according to the JCR 2018. The search strategy including the list of considered journals is given in Additional file 1: Appendix S1. Title and abstract screening was conducted by one reviewer (NB), and potentially relevant full texts were screened for eligibility by two reviewers independently (NB, LS). Any discrepancies were resolved by discussion.

For each included BoE from a systematic review, we included a maximum of three patient-relevant outcomes (e.g., mortality) and a maximum of three intermediate disease markers (e.g., blood lipids). If more than three outcomes were available for a given systematic review, we included the primary outcomes and thereafter we used a top-down approach (mentioned first). We evaluated the similarity of the PI/ECO criteria between BoE-pair from RCTs and cohort studies within each systematic review. For each BoE-pair, the similarity of each PI/ECO domain was rated as “more or less identical,” “similar but not identical,” or “broadly similar” (Additional file 1: Table S1). A detailed description of identification and evaluating similarity BoE-pairs can be found elsewhere [7].

Data extraction

Two reviewers (NB, LH) extracted the following data for each included BoE-pair into a piloted data extraction sheet: name of the first author, year of publication, type of intervention/exposure (e.g., antiretroviral therapy), description of the comparator, effect estimates (risk ratio [RR], hazard ratio [HR], odds ratio [OR], mean difference [MD], including 95% confidence interval [CI]), and number of studies. A detailed description of data extraction can be found elsewhere [7]. For the current analysis, we additionally extracted all effect estimates and corresponding 95% CI of the primary studies included for a relevant BoE (NB, LH).

Statistical analysis

For our pooling scenario, we re-analyzed the effect estimates of all eligible systematic reviews in a two-step approach: For each identified BoE-pair, we first pooled the effect estimates obtained from RCTs and cohort studies separately using a random effects model. Primary studies based on inappropriate study designs (i.e., case-control, cross-sectional, and quasi-RCTs) were excluded.

Second, we pooled the BoE from RCTs with the BoE from cohort studies with a random effects model for each BoE-pair. Binary outcomes (pooled as RRs, HRs, or ORs) and continuous outcomes (pooled as MDs on the same scale) were considered for analysis. Random effects models were used to account for potential between-study heterogeneity. For the sensitivity analysis, we used a common effects model to evaluate whether this hypothetical scenario is more conservative for pooling BoE from RCTs and cohort studies.

To explore the impact of including cohort studies on pooled effect estimates by combining BoE from RCTs and cohort studies (with or without subgroups), we compared the results and conclusions (95% CI including vs. excluding the null effect) between the BoE of RCTs only and that including both RCTs and cohort studies. Then, we evaluated the contributed weight of RCTs to the pooled estimates and conducted a statistical test for subgroup differences between the two types of BoE. A p-value < 0.05 was considered as statistical significant.

In an additional analysis, we used effect estimates of cohort studies as a reference and compared the results and conclusion between the BoE of cohort studies only and that including both, RCTs and cohort studies.

Heterogeneity in meta-analyses was tested with a standard χ2 test. We quantified any inconsistency by using the I2 parameter: I2=((Q−df))/Q × 100%, where Q is the χ2 statistic and df is its degrees of freedom [9]. An I2-value of greater than 50% was considered to represent considerable heterogeneity [10]. For binary outcomes, we additionally calculated τ2, which is independent of study size and describes variability between studies in relation to the risk estimates [11]. For continuous outcomes, we did not calculate τ2 due to the use of different scales between meta-analyses (blood pressure [mmHg] or body weight [kg]. Meta-analyses were conducted using Review Manager (RevMan) version 5.3 [12].

Whereas in a random effects meta-analysis, the focus is usually on the average treatment effect and its 95% CI, the calculation of a prediction interval (95% PI) also considers the potential treatment effect within an individual study setting, as this may differ from the average effect [11]. 95% PIs were calculated for the summary random effects for each meta-analysis since they further account for the degree of between-study heterogeneity and give a range for which we are 95% confident that the effect in a new study examining the same association lies within [11]. Calculations of 95% PI were conducted with Stata 15.

Results

Overall, 64 systematic reviews of RCTs and cohort studies were included [13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76]. Of the identified 129 outcome pairs, 118 from 59 systematic reviews were included in the present pooling scenario and re-analyzed (Additional file 1: Table S2-S3) [7, 13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76] (109 dichotomous and nine continuous outcomes) (Additional file 1: Figs. S1-118). Eleven outcome pairs from five systematic reviews [13, 26, 73,74,75] were excluded from the current analysis. Reasons for exclusion are provided in Additional file 1: Table S4.

Our sample of 118 BoE-pairs was based on 653 RCTs and 804 cohort studies. Detailed study characteristics including a description of the population, intervention/comparator, outcomes, range of study length, and risk of bias/study quality of primary studies included for each outcome pair have been described elsewhere [7].

Two of the outcome pairs were classified (PI/ECO similarity degree) as “more or less identical” and 82 as “similar but not identical,” whereas 34 were classified as “broadly similar.” Out of the 118 BoE from RCTs, for 39 (33.1%), the 95% CI excludes no effect, whereas for the BoE for cohort studies, 58 (49.2%) indicated a 95% CI excluding no effect. Twenty-four (20.3%) out of 118 BoE-pairs showed simultaneously for BoE from RCTs and BoE from cohort studies a 95% CI excluding no effect and a concordant direction of effect. The median I2 was 5% (τ2=0) across BoE from RCTs and 41% (τ2=0.03) across BoE from cohort studies, whereas the mean I2 was 23% (τ2=0.14) and 42% (τ2=0.18), respectively. Table 2 shows the summary effects of the BoE from RCTs, cohort studies, and the pooling scenario.

Table 2 Pooling results of bodies of evidence from RCTs with cohort studies based on random effects and common effects models, 95% prediction intervals, heterogeneity, test for subgroup difference, and population (P), intervention (I)/exposure (E), comparator (C), and outcome similarity degree

Pooling scenarios

By pooling BoE from RCTs and cohort studies with a random effects model, for 61 (51.7%) out of 118 BoE-pairs, the 95% CI excludes no effect. For the common effects model, for 77 (62.3%) out of 118 BoE-pairs, the 95% CI excludes no effect. Approximately half of the binary effect sizes were in the range of 0.75 to 1.25, and 64.2% reported an effect estimate <1. The test for subgroup difference comparing BoE from RCTs and BoE of cohort studies was statistically significant (p<0.05) for 25 BoE-pairs (21.2%). By pooling BoE from RCTs and cohort studies, the median I2 was 51% (τ2=0.05), whereas the mean I2 was 45% (τ2=0.11). The contributed weight of RCTs to the pooled estimates was 40% (median) and 42% (mean). As for the 95% PIs, 21.2% (n=25) of the pooled BoE from RCTs and cohort studies excluded no effect.

The direction of effect between BoE from RCTs and pooled effect estimates was mainly concordant in 94 of 118 BoE-pairs (79.7%). The difference between effect estimates was >0.25 for 4.2% (n=5) and >0.50 for 3.4% (n=4) for the dichotomous effect measures. The integration of BoE from cohort studies modified the conclusion from BoE of RCTs in 32 (27.1%) of the 118 BoE-pairs (i.e., 95% CI excludes no effect changed to 95% CI overlaps no effect or vice versa); in 28 (87.5%) of these 32 BoE, the direction of effect was concordant. In nine (28.1%) of these 32 BoE-pairs, the test of subgroup difference was statistically significant (p<0.05) comparing BoE from RCTs and BoE from cohort studies (in three of these nine associations, the direction of effect was opposite). In 12 (37.5 %) of these 32 BoE-pairs, the overall degree of PI/ECO similarity was judged as “broadly similar.” Populations (n=7, 21.9%), interventions (n=5, 15.6%), and comparators (n=4, 12.5%) rated as “broadly similar” accounted for PI/ECO dissimilarities overall. In 20 (62.5%) of the 32 BoE-pairs, the degree of PI/ECO similarity was judged as “similar but not identical.” Populations (n=18; 56.3%), interventions (n=11; 34.4%), comparators (n=7; 21.9%), and outcomes (n=1; 3.1%) rated as “similar but not identical” accounted for PI/ECO dissimilarities.

In the additional analysis with cohort studies as reference (Additional file 1: Table S5), the direction of effect between BoE from cohort studies and pooled estimates was concordant in 106 (89.8%) of the 118 BoE-pairs. The integration of BoE from RCTs modified the conclusion from BoE of cohort studies in 24 (20.3%) of the 118 BoE-pairs.

Discussion

Summary of findings

This meta-epidemiological study is the first empirical study in medical research that evaluates the impact scenario of pooling bodies of evidence from RCTs and cohort studies. Overall, 118 BoE-pairs based on 653 RCTs and 804 cohort studies were included. By pooling BoE from RCTs and cohort studies in about 50% of the BoE-pairs, the 95% CI excludes no effect, whereas in about one-third of the included BoE from RCTs, the 95% CI excludes no effect. For 21% of pooled estimates, the test for subgroup difference comparing BoE from RCTs and BoE of cohort studies was statistically significant. The median weights of BoE from RCTs to the pooled estimates were 40%, suggesting that on average the contribution weight was not dissimilar between both BoE. Overall, the degree of statistical heterogeneity was moderate (I2=51%, τ2=0.05) and higher across meta-analyses of cohort studies (I2=41%, τ2=0.03) compared to meta-analyses of RCTs (I2=5%, τ2=0.00). The integration of BoE from cohort studies modified the conclusion derived from BoE of RCTs in nearly 30% of the BoE-pairs. The direction of effect between BoE of RCTs and pooled estimates, however, was mainly concordant. This suggests that by adding evidence from cohort studies statistical precision increased substantially.

Comparison with other studies

We did not identify any similar empirical study using a pooling scenario of different study designs in the field of medical research. However, a recent methodological study investigated a similar pooling scenario in nutrition research [77]. This large pooling scenario study showed that the integration of BoE from cohort studies modified the conclusion from BoE of RCTs in nearly 50% of included diet-disease associations, although the direction of effect was mainly concordant between BoE of RCTs and pooled estimates. The median weight of RCTs to the pooled estimates was 34%, and the statistical heterogeneity was substantially higher across meta-analyses of cohort studies (I2=55%, τ2=0.01) compared to RCTs (I2=0%, τ2=0). This finding is in line with our study. However, in our study, the integration of BoE from cohort studies modified the conclusion from BoE of RCTs less often (27% vs. 44%) [77]. Two main reasons may explain this difference. First, it has been suggested that effect estimates between RCTs and cohort studies differ quite often in nutrition research [78]. A recent meta-epidemiological study, however, has shown that on average the effect-difference between both study designs was even smaller than expected [79]. Second, the median weight of RCTs to the pooled estimated was larger in our study (40% vs. 34%) [79].

A recent meta-research study investigated how RCTs and observational studies were combined in meta-analyses [80]. In nearly 40% of meta-analyses, both observational studies and RCTs were combined in a single meta-analysis, without considering the two designs as subgroups. When comparing the results of those meta-analyses with meta-analyses restricted to RCTs only, the conclusion was modified by the integration of observational studies for nearly 71%, whereas in our study this was the case for 27%. In line with our findings, the authors found that including observational studies frequently increased statistical heterogeneity.

Implications for the broader research field

In a survey investigating the rationale, perceptions, and preferences for the integration of RCTs and observational studies in evidence syntheses by Cuello-Garcia and colleagues [81], it was shown that conducting separate meta-analyses for both study designs was the most frequent approach used. However, nearly half of the experts interviewed reported that they have already, on at least one occasion, pooled RCTs and observational studies in a meta-analysis [81].

According to the recent GRADE guidance on optimizing the integration of RCT and observational studies in evidence syntheses, observational studies can provide valuable information as complementary, sequential, or replacement evidence for RCTs [6]. In our empirical scenario, evidence from cohort studies was always considered as complementary evidence for RCTs. The GRADE guidance suggests, when RCTs provide already high certainty of evidence, looking for observational evidence will be unnecessary because the high certainty will not be improved [6]. However, in our sample of 118 BoE-pairs, only six BoE of RCTs were rated as high certainty, 18 as moderate, 11 as low, and two as very low. Thus, evidence from cohort studies seems valuable in the field of medical research [7].

In line with our findings, the Cochrane Handbook indicated that authors should expect greater statistical heterogeneity in a systematic review of observational studies compared to a systematic review of RCTs. Reasons include diverse ways in which observational studies may be designed to investigate the effects of interventions/exposures, and partly due to the increased potential for methodological variation between primary studies and the resulting variation in their risk of bias. Therefore, the Cochrane Handbook recommends that RCTs and observational studies should not be combined in a meta-analysis (although the power to detect an effect may increase [82]). In contrast to the recommendations of Cochrane, a recent framework for the synthesis of observational studies and RCTs does not reject the pooling of both study designs in principle. It presents recommendations on when and how to combine evidence from different study designs, but also highlights challenges in this process [83]. Moreover, a recent scoping review summarized the methods to systematically review and meta-analyze observational studies and highlighted that existing guidance is highly conflicting for pooling if results are similar over different study designs [84]. Finally, in several high-impact factor journal meta-analyses, both study designs were pooled [21, 32, 36, 56].

Overall, it looks like further methodological research is needed to shed light into this gray area. On the one hand, further research should address the application of existing guidance in terms of utility, acceptability, and reproducibility and elaborate ways to deal with occurring challenges [83]. On the other hand, factors such as risk of bias/study quality that may contribute to the differences in effect estimates between BoE of RCTs and cohort studies and conflicting results in pooling scenarios should be further explored. Our previously conducted study analyzed disagreement of effect estimates with regard to differences by each PI/ECO domain [7]. In the meta-regression, we showed that differences of interventions were the main drivers towards disagreement. The average effect on the other pooled effect estimators, however, was not statistically significant [7].

We assume that methodological trial characteristics are other possible drivers towards disagreement, since observational studies are prone to risk of bias by confounding [5], and appropriate adjustment for confounding is thus crucial to integrate both RCTs and cohort studies (or other non-randomized studies) in a pooling scenario. In the sample provided in this study, the tools used to assess the quality/risk of bias of primary studies included across the BoE were heterogeneous, which makes the comparison of results challenging. Future studies should focus on the impact of quality characteristics on pooling scenarios by using similar appraisal tools to increase comparability between RCTs and cohort studies (e.g., ROBINS-I [85] and the Cochrane Risk of Bias Tool [86]). Moreover, attention in future studies should also be paid to the integration of other non-randomized study designs a part from cohort studies. However, overall, we assume generalizability of our findings since concordance may not be linked to study design per se, but rather on the quality/risk of bias of the studies included [1].

This paper did not aim to provide insights on how pooling results from different study designs impacts the certainty rating of results and whether it reduces or increases the amount of low or very low certainty of evidence ratings. In a recently published hypothetical scenario analysis, we could show that pooling BoE from RCTs and cohort studies for nutrition-related research questions would reduce the amount of very low and low certainty of evidence ratings [87]. We recommend that future research should examine also the impact scenario of pooling BoE of RCTs and cohort studies for medical research questions on the overall GRADE rating and on individual GRADE domains in order to inform future guidance development.

Strengths and limitations

This study has several strengths. First, we analyzed a large sample of BoE-pairs (n=118), which was based on 653 RCTs and 804 cohort studies. Second, we selected BoE-pairs from systematic reviews published in high-impact medical journals, which have shown to be of higher methodological quality [88]. Third, our study was based on a broad methodological repertoire, i.e., by including meta-analyses of binary outcomes, and also continuous outcomes, investigating different statistical measures of heterogeneity, conducting random and common effects models, and calculating 95% PI.

Limitations of this study are as follows. First, although we pooled a large sample of BoE-pairs, our sample may not be representative of all meta-analyses, and the totality of evidence of available associations might provide different results. Second, we did not consider and weighted risk of bias of primary studies in our pooling scenario. Third, only two BoE-pairs were judged as “more or less identical,” indicating that BoE of RCTs and cohort studies differ at least slightly in terms of PI/ECO criteria and caution is therefore required when pooling both BoE. Fourth, the potential for confounding in the individual cohort studies and subgroup analyses in the meta-analysis cannot be ruled out. Moreover, several subgroups also included only a small number of studies. Fifth, the methodological quality of the systematic reviews included in this study was not assessed. Although we assume that systematic reviews published in high-impact factor journals adhere to high methodological standards, this is nevertheless an important limitation. Due to these limitations, our findings need to be interpreted with caution.

Conclusions

This large pooling scenario study showed that the integration of BoE from cohort studies modified the conclusion from BoE of RCTs in 27% of included BoE, although the direction of effect was mainly concordant between BoE of RCTs and pooled estimates. The median weight of RCTs to the pooled estimates was 40%, and the statistical heterogeneity was substantially driven by integrating BoE of cohort studies. Our findings provide a first insight regarding the potential impact of pooling of both BoE in evidence syntheses. A decision for or against pooling different study designs should also always take into account, for example, PI/ECO similarity, risk of bias, coherence of effect estimates, and also the trustworthiness of the evidence. Overall, there is a need for more research on the influence of those issues on potential pooling.