FormalPara Take-home message

A telemedical quality improvement programme increased adherence to seven evidence-based German performance indicators in the acute ICU care. These results need further confirmation in a broader setting of regional, non-academic community hospitals and other healthcare systems.


In Germany, the number of patients admitted to intensive care units (ICU) increased from 2.15 million in 2015 to 2.52 million in 2019 [1]. Roughly one-third of these patients were treated in hospitals with fewer than 400 beds [1]. Having access to large academic hospitals with full-time intensivist staffing is associated with increased adherence to evidence-based intensive care and improved outcomes [2,3,4]. It is currently unclear, however, if telemedicine can improve adherence to comprehensive evidence-based practice.

Though several studies from the United States (US) have shown that telemedical treatment support might reduce ICU and hospital mortality and length of stay (LOS) [5,6,7], two studies (one single-centre and one multicentre trial) revealed that telemedicine improved adherence to best-practice measures like deep vein thrombosis prevention and infectious disease management [8, 9].

In 2010, the German Interdisciplinary Society of Intensive Care Medicine (DIVI) established quality indicators (QIs) to promote adherence to evidence-based principles [10,11,12]. The third edition, published in 2017, identified ten indicators that include key elements of intensive care medicine: (1) daily multi-professional and interdisciplinary clinical ward rounds with documentation of daily goals, (2) management of sedation, analgesia and delirium, (3) patient-adapted ventilation, (4) early weaning from invasive ventilation, (5) monitoring of infection prevention measures, (6) infection management measures, (7) early enteral nutrition, (8) documentation of structured patient and family communications, (9) early mobilisation, and (10) ICU leadership. All except QIs 5 and 10 refer to process quality elements and can be implemented at the individual patient level [10].

Studies report poor implementation of QI-based key performance parameters [13, 14]. To the best of our knowledge, no randomised controlled trials have specifically investigated the effectiveness of telemedicine on a comprehensive evidence-based medicine bundle like QI-implementation. To address these knowledge gaps, the Enhanced Recovery after Intensive Care (ERIC) study was initiated to establish and evaluate whether telemedicine support can be used as a vehicle to increase adherence to evidence-based medicine compared to standard of care, and thus improve intensive care medicine quality.


Study design, setting and participants

ERIC was an investigator-initiated, multicentre, superiority stepped-wedge cluster randomised controlled trial (SW-CRT) with three sequence groups (waves) and a continuous-recruitment short-exposure design investigating the effectiveness of a structured, telemedical quality improvement intervention at ICU on the adherence to evidence-based QIs.

This pragmatic trial was conducted at adult ICUs of academic, non-academic and community hospitals in the metropolitan area of Berlin, Germany, with approximately six million inhabitants and 150,000 ICU admissions annually [1]. Due to the unidirectional crossover in a stepped-wedge design, all trial sites commenced treating patients on control condition (standard of care). Sites were randomly selected to transition to the experimental intervention sequentially at three different protocol-defined time points scheduled for 3, 6 and 9 months after trial commencement, until all clusters have received the intervention.

The decision to use a stepped-wedge design was supported by the following aspects: the intervention was considered potentially beneficial for all sites; practical interest to implement the technical infrastructure of the intervention and related training tools, mimicking a natural (non-experimental) implementation process; statistical interest regarding a potentially more effective control for intra-cluster correlation which might enable a statistical power greater than that of a conventional cluster trial with the same number of patients; exploration of underlying temporal trends; and utility of obtaining balance in characteristics of sequence groups after randomisation, especially in the case of a small number of sequences (steps).

Eligible hospitals were located in the metropolitan area of Berlin and the surrounding federal state of Brandenburg, provided adult intensive care medicine, adhered to the legal obligations, and complied with cluster randomisation and the staggered schedule. Participating hospitals were selected according to their letter-of-intent written during the project application and based on their anticipated recruitment and technical prerequisites (e.g. wireless network) to implement the secure, privacy-compliant infrastructure for the telemedical intervention.

Patients were eligible if they were admitted to a participating ICU, were aged ≥ 18, had an expected ICU LOS ≥ 24 h, and were covered by a German statutory health insurance policy. Written informed consent was obtained by the patient or a legal representative. In case of readmission to a participating ICU, eligible patients could be enrolled multiple times. Screening and enrolment of eligible patients were continuously performed by local ICU staff. The study took place within the framework of statutory health insurance in Germany.

The study was approved by the ethics committee of Charité (EA1/006/18) and Brandenburg Medical School (Z-01-20180828). This article adheres to the Consolidated Standards of Reporting Trials (CONSORT) statement extended for stepped-wedge cluster randomised trials [15] and related guidelines [16]. Details of the study protocol with intervention details have been published elsewhere [17]. The study was prospectively registered with (NCT03671447).

Randomisation and masking

A total of 16 ICUs were organised to geographical clusters on an institutional level. One cluster (randomisation unit) consisted of one to three ICUs. There was no structural exchange of ICU staff between clusters, minimising the risk of intervention contamination. A cluster design rather than individual randomisation was chosen to facilitate a change in management at a hospital level rather than at an individual patient level, thus allowing the comparative effectiveness of the intervention to be evaluated pragmatically. Before trial commencement, all 12 clusters were randomly allocated to one of three sequence groups of four units to switch from the control to the intervention according to a staggered timetable. The randomisation list defining the order of the treatment switch was generated by the independent trial statistician using a computer-generated algorithm (nQuery Advisor V.7., block size = 3, without stratification by prespecified characteristics).

Due to a 3-month training period before the sequential switch to the intervention period, ICU staff were not masked to the allocated sequence group and were notified prior to the study start about their crossover date. Since study personnel and patients knew when they were and were not engaged in the telemedical intervention, blinding was not possible. General practitioners and investigators collecting data during post-ICU follow-up visits may have been aware of the treatment condition on the patient-level.


Patients treated on intervention condition received a tailored telemedical intervention in addition to standard of care. In brief, the complex intervention, i.e., the ERIC programme, targeting healthcare providers as well as patients, encompassed two key components. (1) Daily, QI-guided, structured ward rounds for study patients were conducted using a telemedical cart (Appendix, Fig. S1). These telemedical ward rounds were carried out by a specialist and an ICU nurse from the hub (i.e. tele-ICU) at Charité, together with the treating physician (consultant and/or physician in training) and bedside nurse of the respective unit (i.e. local-ICU). Specialists in the tele-ICU were senior, board-certified consultants in intensive care at the site of the tele-ICU (Appendix, p. 2). A secure, bedside connection for audio-visual, face-to-face communication between the tele-ICU and the local-ICU was established. The tele-ICU specialists were able to inspect the patient and monitoring devices (e.g., ventilator settings, monitors and infusion pumps), and microphone and speakers allowed for direct dialogue between both parties. Each ward round being scheduled for 20 min was guided by the predefined QIs, which were then discussed between staff at the tele-ICU and local-ICU. The tele-ICU consultant also gave non-QI-related medical advice if required. Treatment decisions were ultimately taken by the local-ICU staff. In case of disagreement, the tele-ICU consultants did not have autonomy to make treatment decisions themselves. After rounding, tele-ICU consultants assessed the criteria for QI adherence based on patient-specific parameters obtained from documentation of routine data at the local-ICU. (2) Expert teleconsultations: the tele-ICU offered a 24/7 service staffed with ICU consultants to respond to urgent medical issues. On request, the tele-ICU consultant established an audio-visual connection to the respective local-ICU. To address intervention fidelity, connection success rates were continuously monitored by the provider of the telemedical carts, and tele-ICU staff ensured that study patients received daily telemedical ward rounds.

The implementation of telemedicine at the cluster level started 3 months before the planned switch to the intervention with setup of the required hardware infrastructure at the respective ICUs. During this 3-month transition period at the end of the control period of the respective site, clinical experts received a blended-learning training programme (ten e-learning modules, followed by four simulator-based workshops on the QIs and on-the-job training) to ensure that teams at the local-ICUs were familiar with the telemedical cart and the QIs. One physician (any qualification level) and one ICU nurse (any qualification level) per local-ICU were invited to participate in the training. It was at the discretion of the local-ICU to send the same person or different people to each simulator-based workshop.

The control condition was standard of care according to local standards of the hospital site without telemedical support, and was also delivered throughout the 3-month training period. In both phases (control, intervention), local study personnel routinely documented QI-related key performance parameters on the patient's medical record.

Patients received two follow-up examinations scheduled 3 and 6 months after their index (i.e., first study-related) ICU discharge to assess post-ICU impairment including quality of life and functional outcomes [17]. These outcome data based on self-administered patient questionnaires and tests were recorded by the patient's general practitioner and/or trained study personnel at Charité, during home visits, via mail, or via telephone, depending on in-person visiting restrictions due to the coronavirus disease 2019 (COVID-19) pandemic.


The eight co-primary outcomes were binary composite measures defined based on patient-individual raw parameter measurements related to eight performance indicators with predetermined definitions (Appendix, p. 8) [10]. For each of these QIs, the adherence (fulfilled yes/no) was assessed daily (within a 24-h time frame) on a patient level starting from the date of enrolment (ICU admission; or the following day) until ICU discharge (or the previous day). Seven tele-ICU consultants affiliated to the coordinating investigator at Charité participated in a central endpoint adjudication and retrospectively rated the adherence to a single QI for patient i on day t (denoted as QI day) using the local-ICU’s documentation of routinely collected clinical data. Although the independent raters were aware of the respective treatment condition, assessing QI adherence to derive primary outcomes can be considered objective and reliably reproducible. This process was applied irrespective of whether the patient was treated on control or on intervention condition (Appendix, p. 2).

Secondary outcomes were assessed during the patient's ICU stay and at two post-ICU follow-up visits. Key secondary outcomes included all-cause mortality up to 180 days after index ICU discharge (validated with data from the municipal personal records database) and study-related ICU LOS (per ICU stay, in days). Remaining secondary endpoints including health–economic outcomes will be analysed separately and made available in subsequent papers.

Statistical analysis

A fixed sample size calculation was performed, considering eight co-primary outcomes. Applying a Bonferroni correction for multiple testing yields a one-sided type 1 error of alpha/8 = 0.625% for confirmatory testing of a single QI. Assuming a minimum clinically relevant absolute difference in QI adherence of 10% (for all QIs), a two-group χ2 test has a power of 82% to detect the difference between a proportion of 60% on control and 70% on intervention (odds ratio, OR 1.556) with a sample size of 530 independent patients on each treatment condition (nQuery Advisor V.7.0). To deal with the correlation between individuals from the same cluster, we further prespecified at the design stage a variance inflation factor of 1.35, and a patient-level intra-cluster correlation coefficient (ICC) of 0.117 (estimated from preliminary data on QI 2 for a small number of patients from site Charité only; independency of time assumed) which measures the correlation between observations within the same cluster [17]. This yields a total target sample size of 1431 patients required for the CRT design (neglecting stepped-wedge design-specific methodological issues). This pragmatic sample size calculation which was performed during the planning phase in 2015 neither accounted for a transition period between standard of care and intervention, nor for variable cluster sizes. Several changes in prespecified design features throughout the course of the trial may have rendered the initial sample size estimates invalid.

The eight co-primary effectiveness outcomes were analysed by logistic mixed-effects models with random intercepts for cluster and patient, and fixed effects for intervention (assuming level change), cluster-specific linear 'exposure time' (in months; ≥ 0 at(after) start of intervention, < 0 otherwise), and the interaction between both (assuming time-dependent slope change). To account for deviations from the staggered randomisation schedule, an 'as-implemented' analysis was conducted defining intervention periods according to the actual start of the first local telemedical-based QI visit. The cluster-specific 3-month training period was analysed as part of the control period, and data contributed to the primary endpoint analysis. All patients with at least one QI assessment were included in the full analysis set (FAS). Few patients who were enrolled under control condition shortly before the crossover date and received the intervention after the crossover were analysed as control patients. Missing values were handled under the assumption of missing at random, and no imputation methods were applied (number of QI visits negligibly small) [18,19,20].

Sensitivity analyses for each QI were performed based on five additional mixed-model specifications by changing the variance structure and/or adjusting for two different time effects to address robustness of results. Thus, a covariate for sequence group (together with group-by-treatment effect interactions) was incorporated as fixed effect to account for underlying secular trends (i.e., clusters classified according to their allocated step, defining 'early' [first; coded as − 1], 'middle' [second; chosen as reference group 0], or 'late' [third group; coded as 1] adopter sites to adjust for (categorical) 'time period of implementation'), or patient- or centre-specific random intercepts omitted (Appendix, p. 13).

For each QI, adjusted ORs for the QI adherence on intervention compared to standard of care are reported expressing relative effects of primary interest. Besides, the endpoint-specific cluster-level ICCs were calculated [21]. The results of the co-primary endpoints were considered statistically significant if the two-sided p < 0.00625 (eight relative effects for level change at the crossover time defined as primary estimates of interest; remaining time-related estimates tested hierarchically).

The secondary endpoint all-cause mortality within 180 days post-enrolment was analysed using a Cox regression model with frailty term for cluster, adjusting for baseline Simplified Acute Physiology Score (SAPS) II and Sequential Organ Failure Assessment (SOFA) Score (SAPS II and SOFA measurements at the first QI day were defined as pseudo-baseline values in the case of missing documentation at the day of ICU admission, assuming first observation carried backward). Results were reported as hazard ratios (HRs). The ICU LOS in days was compared between the two conditions by using a negative binomial mixed model with cluster-specific random intercepts. Because of the potential for type I error due to multiple comparisons, the findings of analyses of secondary endpoints should be interpreted as exploratory. 99.375% CI were reported for primary outcomes, 95% CIs otherwise.

The study database was stored on REDCap (Research Electronic Data Capture; version 10.6.16 Vanderbilt University, Nashville, Tennessee, USA) hosted at Charité. Statistical analyses were carried out using R (version 4.0.4), mixed-effects regression analyses were performed using the lme4 (version 1.1-25) package.


Participating units and patients

Of the 12 clusters randomised, one (allocated to third sequence group) withdrew consent before the global trial start, and one (allocated to the first sequence group) dropped out after the training period without recruiting any patients, from which no data were collected (Fig. 1). All 14 ICUs at 12 hospitals enrolling patients received the intervention, but not all opened simultaneously with recruitment. Further basic characteristics of the randomisation units together with crossover times and sequence groups are provided in the Appendix (Table S2, Fig. S2). There was considerable variation in the hospitals' involvement and adherence to the randomisation schedule.

Fig. 1
figure 1

Flow diagram for clusters and patients in the ERIC trial. Post-ICU mortality was determined with reference to the discharge date of the first (index) ICU stay. *One patient was enrolled on control and died during a subsequent ICU stay on the intervention condition. FU follow-up, ICU intensive care unit, IQR interquartile range, QI quality indicator

Given underrecruitment, the planned recruitment period was extended from 12 to 19 months, while postponing the prespecified third crossover date by 3 months (extension of the rollout period) to further enhance the number of patients treated on control condition, and lengthening the post-rollout period.

During the 19-month recruitment period between September 4, 2018, and March 31, 2020, 1463 patients comprising 1554 ICU stays (433 control, 1120 intervention) were enrolled into the trial. Of these, one patient discharged from ICU at the next day was excluded from the FAS population since no QI evaluation was performed, and no post-enrolment data documented. The final numbers for analysis thus included 414 participants randomised to control and 1048 to intervention according to their first ICU admission. A single ICU stay was documented for 1386 (94.8%) patients, and up to six ICU readmissions were documented for 76 patients during the trial (66 patients had 2, 7 had 3, 2 had 4, and 1 patient had 6 ICU stays over time [one single cluster, each time on the control condition]). Overall, a total number of 14,783 QI days were evaluated (Appendix, Figs. S3–4). The median number of ICU stays across clusters was 100 (IQR 73–230). The median number of QI days per patient was 5 (IQR 2–11) (Table S3).

Patient-level characteristics at the time of first ICU admission are displayed in Table 1. There were no major differences between both cohorts with respect to demographic characteristics and primary admission diagnosis. However, the severity of illness at baseline was lower for patients treated on standard of care vs on intervention [median SOFA at the first QI day: 4 (IQR 1–7) vs. 6 (3–9); median SAPS II at the first QI day: 28 (IQR 16–42.25) vs. 35 (22–48)]. On control condition, patients were more frequently admitted due to medical reasons (50.2% vs.42.8%) and admitted from emergency medical services (32% vs. 24.8%) and wards (19.8% vs. 15.5%), but fewer postoperative admissions (38% vs. 44.8%) were observed compared to during the intervention period.

Table 1 Population characteristics of the intention-to-treat (ITT) population according to date of first ICU admission

Primary outcomes

For 97.4% (1512/1553) of ICU stays, QI adherence data were assessed daily throughout the patient's ICU stay, indicating a high level of completeness regarding primary outcome data. In the confirmatory model-based principal analysis, the intervention, as implemented, significantly increased the odds for adherence on seven of eight QIs (Table 2). For QI 1 (daily multi-professional and interdisciplinary clinical visits), no evidence of a difference in adherence between both treatment conditions was found in the principal model (adjusted relative difference of OR 1.606, 99.375% CI 0.780–3.309; p = 0.073). However, we observed a positive effect of the exposure time on QI adherence (i.e., the cluster-specific time since the beginning of the intervention) for QI 1 (OR 1.394, 1.228–1.582), which declines after the start of the telemedical care by 20.9% per month (OR 0.791, 0.69–0.908). Results were not robust across all supportive analyses taking into account the time adopting the intervention (assuming categorical temporal effects) and differing variance components. However, sensitivity analysis with both sequence group and exposure time (Appendix, sensitivity analysis model SM.4) supported the findings of the principal model, i.e., the lack of an intervention effect and a significant temporal effect which was considerably diminished after switch to the intervention.

Table 2 Primary efficacy end points in the principal analysis

A beneficial intervention effect on the guideline adherence was revealed for ICU performance indicators QI 2 (OR 5.328, 3.395–8.358), QI 3 (OR 2.248, 1.198–4.217), QI 4 (OR 9.049, 2.707–30.247), QI 6 (OR 4.397, 1.482–13.037), QI 7 (OR 1.579, 1.032–2.416), QI 8 (OR 6.787, 3.976–11.589), and QI 9 (OR 3.161, 2.160–4.624).

Furthermore, we found a negative temporal trend attenuating the adherence to QI 4 (OR 0.833, 0.681–1.017), to QI 8 (OR 0.815, 0.737–0.901), and to QI 9 (OR 0.834, 0.769–0.904) in the course of time after start of the intervention. For QI 9, however, the negative effect of exposure time was partially compensated, as indicated by the positive interaction between exposure time and treatment (OR 1.106, 1.015–1.205). Quantitative results of the confirmatory analyses for each QI and cluster are displayed in Fig. 2 illustrating statistically significant level changes in adherence for all QIs except QI 1 with start of the intervention, and positive (QI 1) or negative (QI 4, 8, and 9) confounding temporal effects suggesting the intervention effect was not consistent over time. Additionally, the figure reveals an already high QI adherence during the control phase for all indicators except QI 2, and a substantial heterogeneity between clusters. The ICC for most of the QIs (except QI 4 and QI 6) was far higher than expected in the planning stage showing a high degree of similarity among patients from the same cluster which reduces overall power and resulting precision (Appendix, p. 21–24).

Fig. 2
figure 2

Mean QI adherence [percentages] over time before and after implementation of the intervention (model-based principal analysis for each patient QI). Black continuous lines display the global (marginal) effect (fixed effects without random effects) for each QI. Coloured dashed lines display the fixed effects and cluster-specific intercepts (without patient-specific intercepts). Vertical dashed grey line: actual switch from control to the intervention period. For QI 4 and QI 6, the principal model could not estimate the variance component for the cluster-specific intercepts (due to lack of variability). Therefore, estimates of the sensitivity analysis model SM.5 are displayed (see Appendix for further details)

Secondary outcomes

Table 3 provides details of mechanical ventilation during patients’ index ICU stay. Patients treated on intervention condition were more frequently mechanically ventilated than patients treated on control condition [74.3% (779 patients) vs. 60.6% (251 patients)]. The median duration of mechanical ventilation was, however, similar among patients on intervention and control (79 (IQR 20–251) hours vs 71 (IQR 16–204) hours).

Table 3 Mechanical ventilation and ICU discharge position of patient’s index (first) ICU stay in the ITT population, stratified by treatment condition (control vs. intervention)

With respect to the patients’ discharge position after their index ICU stay, more patients were referred to another ICU after being treated on intervention compared to standard of care [20.2% (212 patients) vs. 11.3% (47 patients)]. 1313 patients were discharged alive from the index ICU stay, and 258 patients died within 240 days after index ICU discharge (73 patients treated on control, 185 on intervention). 922 survivors received at least one follow-up. 814 patients received the first follow-up (median 93 (IQR 81–117) days after discharge), and 786 patients received a second follow-up visit (median 199 (IQR 181–238) days after discharge). Last post-ICU follow-up assessment took place on November 17, 2020.

Altogether, 385 deaths were reported up to day 180 post-enrolment, 107/414 patients (25.85%) treated on control, and 278/1048 patients (26.53%) on intervention condition. A Cox proportional hazards model with frailty term for cluster and adjusting for baseline SAPS II and SOFA score revealed no significant beneficial effect of the telemedical intervention on overall 180-day mortality compared with standard of care (HR 0.847, 95% CI 0.668–1.073; p = 0.170), see Appendix (Fig. S5) for estimated cumulative incidence of death. There was no statistically significant difference in the intervention vs control condition regarding median (IQR) ICU LOS (6 [4–13] days vs 5 [3–11] days; unadjusted ratio 1.079, 95% CI 0.967–1.204; p = 0.173).


The ERIC trial demonstrated the comparative effectiveness of a telemedical programme in a network of 12 clusters of ICUs in the area of Berlin and Brandenburg. The odds for adherence to seven of eight QIs was significantly increased for patients receiving the intervention vs. standard of care (although below the predefined minimum clinically important absolute difference of 10% [intervention minus control]). The daily intervention as implemented was most effective on the domains for sedation, analgesia and delirium (QI 2), early weaning from invasive ventilation (QI 4), and documentation of patient and family communication (QI 8). Only for QI 1 (daily multi-professional and interdisciplinary clinical visits), no significant difference in adherence between telemedicine-based care and standard of care was found.

Previous studies have primarily explored short-term mortality and hospitalisation time as primary outcome. In two systematic reviews with meta-analyses, tele-ICU programmes were associated with a reduction in mortality (ICU and hospital) and ICU LOS [5, 6], but only one meta-analysis revealed a statistically significant and clinically relevant lower hospital LOS [5]. Another systematic review, however, revealed a reduction in ICU mortality and LOS, but no beneficial effect regarding in-hospital mortality and LOS [22]. These contradictory findings may be explained by the observational design of these studies and differing confounders in the before-after designs [5]. There were notable differences regarding various characteristics in the studies with respect to used technology and hospitals. Most importantly, tele-ICU intensivist autonomy and ICU practice prior to tele-ICU implementation is a study-specific feature that should be considered [6]. Only one study that was included in the abovementioned reviews on tele-ICU programmes investigated the effectiveness of telemedical care on adherence to evidence-based practice [8]. Using a before-versus-after design, they found that patients undergoing telemedicine were more likely to receive best-practice therapy for the prevention of deep vein thrombosis, stress ulcers, ventilator-associated pneumonia, and cardiovascular protection. In another study, pharmacological, telemedical consultations during night hours revealed significantly more guideline-conformed daily sedation interruptions [23]. A German multicentre SW-CRT recently revealed that tele-ICU support significantly improved sepsis management guideline compliance, but sepsis-related mortality (subgroup of 276 sepsis patients) was not significantly reduced [9]. It remains to be seen if increased QI adherence translates into better patient outcomes—the current evidence appears insufficient and conflicting. For example, one study showed that increasing guideline adherence to more than 70% is necessary for quantitative outcome effects to be observed in infectious disease management [24], but data also suggest that improvements from a very high baseline adherence (> 90%) result in better patient outcomes. We also observed an attenuation of adherence to three QIs over time that is comparable to educational programmes and might indicate a demand to repeat the blended learning or the on-the-job training [25].

The strengths of this study include its innovative stepped-wedge design which allows for rigorous evaluation of a large-scale intervention implemented in different types of hospitals, ranging from large academic centres to small community hospitals. The heterogeneous study population consisted of adults with, e.g., cardiovascular, sepsis/infectious, oncological, or trauma primary admission diagnoses, supporting the real-world pragmatic trial character. Hence, our study patients reflect the multidisciplinary nature of intensive care medicine and can be generalised, as the sample is representative for the German ICU population.

We investigated the feasibility of this telemedical implementation in a defined local network of ICUs. Upon the end of recruitment (last patient first visit on March 31, 2020), the network was scaled up for the management of the COVID-19 pandemic, and international programmes adopted the telemedical approach, which is indicative for a high acceptance within the critical care community.

This pragmatic superiority trial has several limitations. First, there may have been selection bias as cluster-level participation was on a voluntary basis. We might have recruited a population of clusters already showing a high motivation before the study to improve their quality of care. Clinical sites already showed a high level of performance during the control period prior to the implementation of the intervention (Fig. 2), which was above the average baseline level of adherence documented in the literature and expected during the planning phase [13, 14, 26,27,28,29,30]. Second, 44.6% (652/1462) of patients were enrolled at two highly recruiting academic clusters affiliated to the sponsor of the trial providing the tele-ICU. Thus, trial findings may not be fully applicable to Germany's hospital landscape. Third, clusters’ trial participation and the consequent focus on the QIs may have already resulted in better QI adherence, irrespective of the treatment condition. This so-called Hawthorne effect observed in previous quality improvement studies may have been aggravated by the training of QI experts for each cluster already during the control phase [31]. These experts may have put more attention on the QIs in the study centres, even before transitioning to the intervention phase. Hence, we may have overestimated QI adherence on standard of care. Forth, the sample size calculation performed in 2015 was rather pragmatic due to difficulties obtaining reliable values for the correlation structure at the design stage. However, the resulting 99.375% confidence intervals for estimators for intervention effects regarding all 8 QIs indicate a high level of precision based on the actual design features. Therefore, we believe to have made rather conservative assumptions. Fifth, the trial was not powered with respect to secondary endpoints (e.g. survival) which limits the interpretation of these endpoints. We evaluated the immediate effect of the telemedical intervention on the QI adherence and the sustained effect on the 6-month all-cause mortality. The trial did not analyse if the effect regarding QIs translates into a survival benefit. This so-called surrogacy between QI adherence and survival was not the focus of our trial [32, 33]. Even if higher QI adherence were associated with better survival, its surrogacy in the context of the telemedical intervention is still not evident. If QI adherence only has a short-term effect on survival but no effect on the 6-month or 12-month survival rates, a general survival benefit may not be seen by a standard survival analysis. Sixth, patients enrolled on intervention showed a significantly higher severity of illness (according to baseline SAPS II and SOFA scores). Participating centres may have selected patients in the intervention phase who they considered to benefit most from the telemedical intervention. This identification and recruitment bias may have diluted a beneficial effect in survival between patients treated on intervention vs. control condition.

In conclusion, a structured, bundled telemedical intervention implemented in a diverse local network of hospitals in Germany improved the quality of care compared to standard of care. Although the primary efficacy endpoints were met, further research is needed to evaluate the generalizability outside the German healthcare sector and in a broader setting of regional, non-academic community hospitals. It is also important to explore long-term intervention sustainability. Therefore, future controlled trials in Germany should be designed to investigate the effectiveness of virtual care networks on long-term survival and early and late post-ICU functional impairments in a well-defined ICU population.