Background

Randomized controlled trials (RCTs) are widely accepted as the best available tools to provide scientific evidence, are considered integral to informed clinical decision-making [1] and have been and remain the gold standard for assessing the efficacy of therapeutic agents. However, despite their potential to generate robust evidence, the positive results of single-center randomized controlled trials (sRCTs) may not be replicated when subjected to large multicenter randomized controlled trials (mRCTs), particularly within the context of intensive care settings [2, 3]. The discrepancies in results are often attributed to the inherent limitations of sRCTs. These limitations typically include biases due to local effects, minimal heterogeneity among the enrolled patients, inadequate blinding of personnel and data analysis, and the temporal gap between enrollment completion and publication. In addition, many sRCTs conducted in intensive care settings are often characterized by a low fragility index, indicating that the positive findings of the study depend on a small number of events [4]. Therefore, clinicians should interpret the positive evidence from sRCTs with caution, as clinical practice based on such evidence carries a high risk of bias [2].

Despite the above considerations, no study has systematically evaluated the discrepancy between positive sRCTs and subsequent mRCTs in the intensive care setting to provide a detailed perspective on the reproducibly of sRCTs. Therefore, we conducted a systematic review to identify sRCTs showing a mortality increase or decrease with a statistically significant difference and to evaluate whether following mRCTs confirmed or refuted the positive findings of these sRCTs. The primary objective of this systematic review was to report if significant mortality reduction observed in sRCTs was replicated in subsequent mRCTs. The secondary objective was to observe how clinical guidelines have dealt with these positive sRCTs in their recommendations.

Methods

We performed a systematic review according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines [5] (see PRISMA checklist in Additional file 1). This systematic review was registered in PROSPERO International Prospective Register of Systematic Reviews (CRD42023455362).

Search strategy and selection criteria

Two investigators independently searched PubMed for all RCTs of any non-surgical intervention influencing unadjusted landmark mortality in critically ill patients (> 48 h after randomization), published in three medical journals (i.e., New England Journal of Medicine, JAMA, and Lancet) from inception to December 31st, 2016. We did not consider sRCTs published after 2017 considering the time lag between the publication of sRCTs and their corresponding mRCTs.

We considered a difference in mortality as statistically significant when present at a specific point (> 48 h after randomization) with simple statistical tests and without adjustment for baseline characteristics. We selected articles published in NEJM, JAMA, or the Lancet, with a randomized controlled trial design in a single-center setting, presenting a statistically significant reduction or increase in unadjusted landmark mortality in critically ill patients. A quasi-randomized or non-randomized methodology, multicentric trials, pediatric populations, and absence of data on mortality were considered exclusion criteria. The full PubMed search strategy is available in Additional file 1.

After identification of eligible sRCTs, two investigators independently searched for mRCTs addressing the same PICO (population, intervention, control, outcome) frameworks, which were published from inception to December 31st, 2022.

The risk of bias of each included sRCT was assessed using the Cochrane risk-of-bias tool for randomized trials version 2 (RoB 2) [6].

Data extraction

Two investigators extracted the following variables in a standardized data collection form: PubMed unique identifier, journal, first author, year of publication, study population, number of patients enrolled, intervention, control, mortality data with statistical significance, and timepoint of mortality assessment. If a subsequent mRCT was identified, we evaluated whether the mortality findings of the mRCT were consistent with those of the sRCT. Furthermore, we explored whether sRCTs were incorporated into international clinical practice guidelines. We further assessed whether and when guidelines stopped citing such RCTs or issued recommendations modified by the mRCTs findings.

Statistical analysis

First, positive sRCTs with at least one subsequent mRCT were classified into three groups based on the results of mRCTs: significant mortality reduction (positive mRCTs), no significant difference in mortality (neutral mRCTs), and significant mortality increase (negative mRCTs). The proportion of sRCTs within each group was reported accordingly.

Second, we categorized included sRCTs that were cited at least once in international clinical guidelines based on the current guideline recommendations: supporting the intervention shown to have survival benefits in the sRCT, withholding recommendation due to insufficient evidence, and opposing the intervention of interest or excluding the sRCT cited in the preceding version of the guidelines.

To confirm the robustness of our findings, we performed a sensitivity analysis including only recent positive sRCTs published after 2001 to describe the mortality results of subsequent mRCTs and the guideline recommendations regarding the intervention assessed in the included sRCTs.

Furthermore, the following data were summarized: the number of randomized patients (sRCTs and mRCTs), the number of participating centers (mRCTs), the duration between publications of the sRCT and subsequent mRCT, and the duration from the initial citation of the sRCTs in the guidelines to an alteration in recommendation against its use or removal from guidelines. Missing data were not imputed throughout this study. Continuous variables were described as median and interquartile range (IQR). Categorical variables were expressed as number (percentage). We used RStudio Version 2023.06.0+421 (RStudio Team, Boston, United States).

Results

We identified 19 sRCTs published in the three high impact factor journals (7 in New England Journal of Medicine, 7 in JAMA, or 5 in Lancet), which showed a statistically significant mortality difference in critically ill patients [7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25] (Fig. 1). Major exclusions and reasons for exclusion are detailed in Additional file 1: Table S1. These trials were published from 1984 to 2016. The median number of enrolled patients was 231 (IQR 90–430). Acute kidney injury was the most representative condition of interest (4 studies [18, 20, 22, 25]), followed by cardiac arrest (3 studies [11, 17, 21]) and sepsis (3 studies [14, 16, 19]). Standard care or conventional therapy was used as control in 7 studies [8, 9, 14, 16, 19, 21, 23]. The most common timing of significant mortality differences was hospital discharge (8 studies [9,10,11, 13, 17, 18, 21, 23]). The characteristics of the included sRCTs are described in Table 1. The vast majority of the sRCTs included in this study (18 out of 19) were assessed as having a low risk of bias, while the remaining trial was judged as having some concerns (see Additional file 1: Table S2).

Fig. 1
figure 1

Flow chart of study selection. NEJM New England Journal of Medicine

Table 1 Single-center randomized trials with statistically significant survival benefits

Most of the included sRCTs (16/19, 84%) [7, 9,10,11,12,13,14,15,16,17, 19, 20, 22,23,24,25] were followed by at least one subsequent mRCT (in total 24 studies [26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49]), while no mRCT was available for the remaining three studies [8, 18, 21]. The mRCTs enrolled more patients (median, 1192 [IQR 488–3021] vs. 231 [IQR 90–430] in sRCTs). The median number of participating centers was 29 (IQR 8–37) and one-third of the studies involved multiple countries [27, 29, 32, 34, 40, 41, 45, 48]. The median interval between the publications of a sRCT and its relevant subsequent mRCT was 8 years (IQR 5–13 years). Survival or mortality was the primary outcome in the 10 sRCTs and 17 mRCTs (as listed in Additional file 1: Table S3).

The survival benefits of one intervention (epinephrine during out-of-hospital cardiac arrest [17]) were confirmed by a subsequent mRCT [46]. Fourteen studies [9,10,11,12,13, 16, 19, 22, 24, 25] were followed by neutral mRCTs (no statistically significant mortality difference between groups) [26,27,28,29,30,31,32,33, 35,36,37,38,39,40,41,42,43,44,45, 47,48,49]. One sRCT reporting survival benefit of intensive insulin therapy [23] was contradicted by a large mRCT documenting a statistically significant mortality increase in patients randomized to the intensive insulin therapy arm [34]. Figure 2 and Table 2 describes the mortality findings of these mRCTs.

Fig. 2
figure 2

Mortality findings in multicenter randomized trials following positive single-center trials. RCT randomized controlled trials, NEJM New England Journal of Medicine

Table 2 Subsequent multicenter randomized trials and their mortality findings

Figure 3 and Table 3 summarizes how clinical guidelines have considered survival benefits shown in sRCTs. Among the included 19 sRCTs, 14 were cited in clinical guidelines at least once (13 sRCTs with subsequent mRCTs and one without) [7, 9, 11,12,13, 15, 17, 19,20,21,22,23,24,25]. Among the 13 sRCTs followed by mRCTs, the guidelines initially provided recommendations or suggestions based on the positive results of seven sRCTs [7, 11, 13, 15, 19, 23, 24]. However, the current guidelines do not support applying two of these interventions anymore [19, 23]. Treatments assessed in the remaining five sRCTs [7, 11, 13, 15, 24] remained as suggestions for use in clinical guidelines. Such suggestions remained even after the publication of mRCTs, which reported neutral mortality findings.

Fig. 3
figure 3

Current guideline recommendations of positive single-center trials

Table 3 Citation of single-center randomized trials in international clinical guidelines

Of the remaining six sRCTs for which guidelines did not support the intervention investigated, five [9, 12, 17, 20, 22] were cited in the guidelines without a clear recommendation, primarily due to inadequate evidence. However, one intervention—epinephrine for cardiac arrest—that exhibited survival benefits in the sRCT [17] and subsequent mRCT [46] is currently recommended in guidelines. The remaining sRCT [25] was not endorsed by the initial relevant guideline as a result of other mRCTs that showed no significant mortality reduction.

Finally, among three sRCTs which has not had a subsequent mRCT, only one study [21] was cited in guidelines without any recommendation but is not referenced in the current guidelines.

Consequently, among 14 sRCTs originally referenced to in clinical guidelines, six (43%) are still cited to suggest for the intervention in current international guidelines [7, 11, 13, 15, 17, 24]. Conversely, six other sRCTs (43%) were either omitted or considered contraindicated in subsequent guideline versions [9, 19, 22, 23, 25]. Among these six studies, the median duration from the initial citation in the guidelines to an alteration in recommendation against its use or removal from guidelines was 9 years (IQR 6–12 years). Regarding the remaining two studies [12, 20], no recommendation was made due to insufficient evidence.

A sensitivity analysis restricted to recent sRCTs confirmed the overall results: survival benefits were infrequently replicated in subsequent mRCTs; half of the positive sRCTs were omitted or considered contraindicated in the current guidelines (detailed in Table S4 in the Additional file 1).

Discussion

Key findings

Our systematic review found 19 sRCTs with a statistically significant mortality decrease in critically ill adult patients. Most of these were followed by at least one subsequent mRCT. Survival benefits observed in sRCTs were rarely corroborated by mRCTs, with most mRCTs reporting neutral results on mortality, and one mRCT finding a significant mortality increase with intensive glucose control. Treatment recommendations based on the initial citation of sRCTs with survival benefits were included in international guidelines and typically remained unchanged for a decade before any revisions were made based on subsequent relevant mRCTs.

Relationship with previous literature

RCTs in intensive care medicine tend to deliver neutral results in terms of mortality for several reasons including heterogeneity of patient characteristics, underlying practice variation, insufficient power, and likely small treatment effects [3]. This fact poses an important challenge for clinicians because they must perform clinical practice without robust evidence supporting their decisions. As a result, positive trials, namely RCTs reporting statistically significant reductions in mortality attributable to the intervention of interest, look attractive and are often taken up by physicians to change their routine management. Unfortunately, however, such positive RCTs frequently suffer from methodological problems, which can limit the applicability of their findings. Furthermore, single-centric design itself carries many other limitations [2, 3].

One of the major challenges of sRCTs is that, to achieve an effect on mortality in the presence of a small sample size, they must achieve an implausibly large effect size. Single-center trials are typically conducted by advocates of the intervention under investigation [2]. The delivery of such interventions generally requires specialized expertise and dedication, which may not be readily transferable to other centers involved in subsequent large mRCTs. Such discrepancies may limit the feasibility of the interventions, potentially diminishing the magnitude of the treatment effects observed in mRCTs compared to sRCTs. In fact, a meta-epidemiological study evaluated the differences in treatment effects between sRCTs and multicenter RCTs and found that single-center trials showed a statistically significant larger treatment effects than multicenter trials (ratio of odds ratios, 0.73; 95% confidence interval 0.64–0.83) [68]. This finding was confirmed by a systematic review assessing treatment effects on mortality in critically ill settings [69]. By pooling 82 eligible RCTs, this systematic review found that a single-center design resulted in larger treatment effects than a multicenter design (ratio of odds ratios, 0.64; 95% confidence interval 0.47–0.87) [69]. Our selection criterion of sRCTs with significant mortality differences is a unique approach; nonetheless, the present systematic review was consistent with previous work showing that survival benefits in sRCTs were rarely replicated in mRCTs. Moreover, we identified one mRCT demonstrating a significant mortality increase by an intervention (intensive glucose control strategy) [34], which reduced mortality within the context of a previous single-center trial [23].

Nearly half of clinical guidelines that cite sRCTs, recommend the relevant intervention based on their positive results, despite some of these endorsements being subsequently refuted in light of accumulated evidence. In addition, a decade was typically required to amend such recommendations from clinical guidelines. Given the pervasive application of interventions examined in RCTs, these initial recommendations might have played a substantial role in the potential consequences on patient outcomes, healthcare resources, and economic costs. For example, early-goal directed therapy was recommended in the surviving sepsis campaign guidelines in 2004 [56]. However, later, three mRCTs found no benefits in clinically relevant outcomes [42, 45, 49]. Furthermore, economic evaluation using one of the mRCTs revealed that early-goal directed therapy was associated with increased health care costs without improving outcomes [70].

Despite these methodological challenges, positive sRCTs have made changes in clinical practice. The early goal-directed therapy for septic shock is a typical example. The initial sRCT [19] showed survival benefits of this protocolized management, which was not replicated in subsequent mRCTs [42, 45, 49]. However, given the difference in patient severity between the sRCT and subsequent mRCTs (e.g., reduced vs. normal central venous oxygen saturation [ScvO2]), clinicians now pay more attention to ScvO2 values than before the sRCT [71]. Furthermore, the lack of multicentric confirmation of survival benefits implies a restricted external validity of sRCTs rather than an indication of them producing false positive results.

As the intensive care community advances the methodology of randomized trial design and execution, there remains a notable lack of evidence demonstrating improved mortality from interventions. These disappointing results have been obtained by using frequentist statistics, where the conclusion is dichotomized to yes or no based on confidence intervals and p values. In contrast, Bayesian analysis provides a probabilistic assessment of the magnitude and direction of true treatment effects, which allows clinicians to augment the interpretation of the trial results. Interestingly, there are several intensive care trials where frequentist statistics denied significant mortality reduction, followed by a Bayesian reanalysis revealing a high probability of survival benefits [72, 73]. Therefore, the integration of Bayesian analysis in intensive care trials may offer a solution to the limitations commonly encountered with frequentist approaches.

Implications

This systematic review found mortality reduction was rarely replicated in mRCTs despite the existence of previous positive sRCTs. This implies that there are potential risks when incorporating novel interventions into routine practice based on positive sRCTs without mRCTs confirmation. Importantly, no intervention is free from complications. In addition, new interventions tested in randomized trials often consume more human resources and economic costs. Therefore, management change will inevitably result in complications, increased workload, and costs, all of which did not exist with previous usual care. Given the high likelihood of no mortality difference or even mortality increase in subsequent mRCTs, our findings imply that clinicians should wait for a large-scale trial prior to changing practice or at least be very careful in interpreting the results of positive sRCTs.

Strengths and limitations

This systematic review is the first study to comprehensively identify sRCTs reporting statistically significant reductions in mortality and their corresponding subsequent mRCTs in the field of intensive care medicine. The infrequent replicability of survival benefits in mRCTs corroborated the limited generalizability of sRCTs’ findings. Evaluating the impact of sRCTs on clinical guidelines may be a novel approach, but it yields important insights into the development and interpretation of guidelines.

We acknowledge several limitations. First, given our focus was solely on intensive care sRCTs, our findings may not translate to other medical disciplines. Nevertheless, the generic limitations of sRCTs are universal, regardless of the targeted population or intervention type. As such, sRCTs need to be perceived as hypothesis-generating and clinicians ought to assess the results of sRCTs with a balanced consideration of their strengths and weaknesses. Second, our study included only sRCTs published until 2016, thereby excluding more recent sRCTs. Despite this limitation, our primary objective was to compare the mortality findings of sRCTs with those of subsequent mRCTs, which necessitated an intervening period between them. In addition, the median duration between sRCT and mRCT publication was 8 years, providing justification for our inclusion criteria. Third, we included positive sRCTs published in the three renowned general medical journals, excluding those in intensive care specialty journals (as listed in Table S5: Additional file 1). As a result, the number of eligible studies was relatively small; nonetheless, we employed this strategy to evaluate whether subsequent mRCTs could replicate the survival benefits observed in sRCTs with rigorous methodologies. Given the high standards of the included studies and the concordance of the results with previous literature, it is plausible that our findings could be extrapolated to positive sRCTs reported in specialty journals. Finally, our search was confined to international guidelines to explore sRCTs’ citations. This approach was chosen to ensure the quality of evidence synthesis and generalized perspective.

Conclusions

Our systematic review found that the statistically significant survival improvement shown in sRCTs was rarely confirmed by multicenter randomized evidence in intensive care settings. Clinicians should be cautious in altering routine clinical practices until well-conducted multicenter randomized trials are available. Given their substantial implications for global clinical practice, international guidelines should refrain from issuing a clear recommendation based solely on the positive results of sRCTs.