Background

Advance care directives are becoming increasingly important in modern day medical practice. The possibility of successful cardiopulmonary resuscitation (CPR) in case of cardiac arrest is the quintessential directive to discuss. Expected prognosis after attempted CPR for in-hospital cardiac arrest (IHCA) is an increasingly important part of the dialogue. Providing adequate guidance can be challenging, especially as patients tend to overestimate their likelihood of survival [1]. Even though the likelihood of survival and the chance at good neurological outcome after IHCA remains poor [2]. Ideally, clinicians would be able to identify both patients who have a good chance at qualitative survival after cardiopulmonary resuscitation, as well as patients with a low chance of survival, in whom futile resuscitation attempts could be avoided.

Compared to out-of-hospital cardiac arrest (OHCA), there is limited data on outcome after IHCA [3]. Although evidence from OHCA is often extrapolated to IHCA, the epidemiology is different and the determinants of survival and outcome differ accordingly [4]. There is a need for prognostication tools to guide clinicians in decision-making and counselling of patients regarding IHCA. Although several significant peri-arrest prognostic factors for IHCA have been identified, patients and clinicians must rely on pre-arrest factors to establish a CPR-directive [5].

Several risk models were published over the years addressing this clinical dilemma. However, there is still little evidence supporting clinical decision-making [4] and no model has up to now been implemented in clinical practice. An overview of the developed prognostic tools has recently been published, however the focus lay on establishing diagnostic accuracy [6]. The aim of this study was to summarize and appraise prediction models for any clinical outcome after attempted CPR for IHCA using pre-arrest variables, to assess the extent of validation in external populations, and to perform a meta-analysis of the performance of the prognostic models. Clinicians could thus improve the prediction of outcome after IHCA in order to better inform their patients and enhance clinical decision-making.

Methods

This systematic review was designed according to the Checklist for critical Appraisal and data extraction for systematic Reviews of prediction Modelling Studies (CHARMS) and the guidance as described by Debray et al. [7, 8] A protocol was registered in the International Prospective Register of Systematic Reviews PROSPERO (CRD42021269235). Data reporting and review are consistent with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement [9]. The review question was formulated using the PICOTS scheme (Population, Intervention, Comparator, Outcome, Timing and Setting) (Additional file 1: Appendix, Table S1).

Literature search

A systematic search in MEDLINE was performed via PubMed, Embase and the Cochrane Library for studies published from inception to 22-10-2021. An experienced librarian assisted in developing the search strategy, which included synonyms for [in-hospital cardiac arrest], combined with [prognostic model/prognosis/prediction/outcome assessment]. (Additional file 1: Appendix) The recommendations by Geersing et al. [10] regarding search filters specifically for finding prediction model studies for systematic reviews were followed, as well as those by Bramer et al. using single paragraph searches [11]. Two authors (CGvR, MS) independently screened titles/abstracts and full text articles and discrepancies were resolved by a third author (SH). References of each eligible article were hand searched for potential further inclusion.

Selection criteria

Studies specifically developing, validating and/or updating a multivariable prognostic model for any clinical outcome after attempted resuscitation for IHCA were included. A study was considered eligible following the definition of prognostic model studies as proposed by the Transparent Reporting of a multivariable prediction models for Individual Prognosis Or Diagnosis (TRIPOD) statement [12]. Eligible studies should specifically report the development, update or recalibration, or external validation of prognostic models to predict outcome after in-hospital cardiac arrest using pre-arrest factors and report model performance measures. No language restrictions were imposed.

Outcome assessment

Eligible outcomes were any possible clinical outcome after IHCA, such as the return of spontaneous circulation (ROSC), survival to discharge (or longer term survival) and neurological outcome (Cognitive Performance Category: CPC). Studies only including peri-arrest factors were excluded, as these prognostic factors are not available at the time of advance care planning. Studies exclusively describing data of patients after ROSC or studies of mixed OHCA/IHCA populations without separate reporting for IHCA-patients were also excluded.

Definitions and terminology

A prognostic model was defined as ‘a formal combination of multiple prognostic factors from which risks of a specific end point can be calculated for individual patients’ [13]. A good clinical prediction model should discriminate between patients who do and do not experience a specific event (discrimination), make accurate predictions (calibration) and perform well across different patient populations (generalisability) [14, 15]. Discrimination is often expressed by the concordance statistic (C-statistic)—the chance that a randomly selected patient who experiences an event has a higher score in the model than a random patient who does not. For binary outcomes, the C-statistic is equal to the area under the operating receiver curve (AUC). Calibration compares the predicted probability of survival with actual survival [16]. It is often visualised with a calibration plot and/ or goodness-of-fit (GOF) as quantified by the Hosmer–Lemeshow test. Other measures of model performance are sensitivity, specificity, positive and negative predictive value, accuracy, R2-statistic and Brier score.

Data extraction

A standardised form following the CHARMS checklist was developed in which two authors independently extracted data (CGvR, MS) [7]. Articles were categorised into development, updating/recalibration and validation subgroups. For all eligible articles, the following information was extracted: first author and year of publication, model name, study population, sample size, source of data (i.e. study design, date of enrolment), number of centres, countries of inclusion, predicted outcome, factors in the model, model performance and information on validation. For development/update studies, model development method, number of prognostic factors screened and final model presentation were collected. Separate individual prognostic factors of the models were tabulated.

Statistical analysis

For prediction models that had been externally validated in multiple studies, a random-effect meta-analysis was performed of the reported AUC’s to yield a pooled AUC for each prediction model [8]. 95% confidence intervals (CI) and (approximate) 95% prediction intervals (PI) were calculated to quantify uncertainty and the presence of between-study heterogeneity. Analyses were performed in R version 4.2.1 using the package metamisc.

Quality assessment

The Prediction model Risk Of Bias Assessment Tool (PROBAST) was used to apply the risk of bias assessment of the studies developing or validating prognostic models [17]. Assessment of methodological quality was done separately by two authors (CGvR, MS).

Results

A total of 2678 studies were screened (Fig. 1). Flow diagram of literature search and included studies.) and 33 studies were included in the qualitative synthesis of this systematic review: 16 model development studies [18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33], five model updating studies [34,35,36,37,38] and 12 model validation studies [39,40,41,42,43,44,45,46,47,48,49,50] (Tables 1, 2 and 4). All studies included patients that received CPR for IHCA. In five studies [20, 21, 25, 26, 31], multiple models were developed resulting in a total of 22 developed models in 16 studies. Of these, seven studies reported (internal) validation of the developed model and three of the five model updating studies reported validation in the original paper (Table 2). Most models were developed or updated using registries as source of data (9/21 studies) or data from retrospective cohorts (7/21 studies). Three studies used a prospective cohort, and in two studies, the source of data was not mentioned (Table 1).

Fig. 1
figure 1

Flow diagram of literature search and included studies

Table 1 Model development and updating studies
Table 2 Model characteristics of model development and updating studies

Model development and updating studies

The most frequently predicted outcome was survival to discharge (11 studies), followed by ROSC (8 studies), as shown in Table 1. Survival to discharge with a CPC of 1 (2 studies) [21, 22] or ≤ 2 (3 studies) [34,35,36] was also reported as predicted outcome. Two studies included 3-months survival in their outcomes [25, 32]. Sample size varied from 122 to 92.706 patients. The model updating studies either updated the Good Outcome Following Attempted Resuscitation (GO-FAR) Score (n = 3) [34,35,36] or the Pre-Arrest Morbidity (PAM) Index (n = 2) [37, 38].

Of the model development studies, 10 included pre- and intra-arrest factors and six exclusively pre-arrest factors (Table 2). All model updating studies only included pre-arrest factors, according to the models they were based on. A tabular overview is provided of the most frequently included pre-arrest factors affecting clinical outcome after attempted resuscitation: age, dependent functional status, (metastatic) malignancy, heart disease, cerebrovascular event, respiratory, renal or hepatic insufficiency, hypotension and sepsis (Table 3) (a full overview of the parameters per model is included in the Additional file 1: Appendix).

Table 3 Overview of included predictors

Half of the developed/updated models were validated in the same paper either by split-sample (internal) validation [19, 22,23,24, 35] or temporal (external) validation [20, 21, 28, 34]. In one study, a bootstrapping technique was used [36]. For the remaining 11 studies, no internal validation or recalibration had taken place [18, 25,26,27, 29,30,31,32,33, 37, 38].

Formally, part of the exclusion criteria was the absence of performance measures, but as the Modified PAM Index (MPI) [37] and Prognosis After Resuscitation (PAR) [38] are frequently validated externally, these studies were included with a disclaimer to the overview for this purpose.

Model validation studies

In the 12 model validation studies, a total of seven risk models were independently validated in external populations (Table 4): the GO-FAR score [22], the PAM Index [32], the PAR score [38], the MPI [37], two classification and regression tree models (CARTI and CARTII) [21] and the APACHE III score [47]. The most frequently externally validated models are the PAM (n = 6) [45,46,47,48,49,50], GO-FAR (n = 5) [39,40,41,42, 44], and PAR (n = 5) [45,46,47,48, 50]. The source of data was most frequently a retrospective cohort (n = 10) and twice [40, 42] registry data were used. Sample size varied from 86 to 62.131 patients. In six instances, the validation study was fully independent, meaning the authors of initial score were not implicated in the validation study [39, 41, 45, 46, 49, 50]. There does not seem to be a difference between reported validation performance of the fully independent validation studies and the other external validation studies. Calibration performance was reported in two studies [40, 42]. Area under the receiver operating characteristic curve estimates was calculated in 10 validation studies.

Table 4 Characteristics of model validation studies

Meta-analysis

It was possible to calculate a pooled performance of the GO-FAR [39,40,41,42, 44], PAM [45, 47, 48, 50] and PAR [45, 48, 50, 51] scores (Fig. 2. Forest plots of c-statistics in external validation studies.). The GO-FAR score showed the best performance with a pooled AUROC of 0.78 (95% CI 0.69–0.85), versus 0.59 (95%CI 0.50–0.68) for the PAM and 0.62 (95%CI 0.49–0.74) for the PAR.

Fig. 2
figure 2

Forest plots of c-statistics in external validation studies

PROBAST

The assessment of quality with the PROBAST tool showed most risk of bias was present in the ‘analysis’ domain (full assessment in the Additional file 1: Appendix): the number of participants was not always satisfactory, and frequently the way in which missing data were handled was not reported.

Discussion

This systematic review describes prognostic models that use pre-arrest factors to predict outcome of in-hospital cardiac arrest. A comprehensive overview of developed, updated and validated models is presented. Using the best available evidence, i.e. the best performing model, could aid patients and clinicians in making an informed decision whether to attempt or refrain from CPR. Only six models have been validated in external populations. Of these, the GO-FAR score shows the most acceptable performance.

Model development and updating

This systematic review shows that there has been a plethora of prognostic models developed in recent years to predict outcome after IHCA. A total of 27 different prognostic models were published using pre-arrest factors to predict any clinical outcome after IHCA. Approximately half use pre- and intra-arrest factors and the remaining half exclusively pre-arrest factors, which are the models would be more useful in a clinical setting. However, the time at which the factors are assessed often differs from the moment at which the model would be used; as is illustrated by the validation study of the GO-FAR score from Rubins et al. [41]. The authors found the lowest AUC of the GO-FAR score when using it with admission factors, instead of data collected close to the IHCA, which is to be expected as the score was not developed for this moment. However, this demonstrates a potential pitfall of the prognostic models if used in clinical practice. The clinical course of a patient admitted to the hospital is a dynamic process, which in an ideal situation the models would reflect: initially only including pre-arrest factors known at admission and gradually incorporating peri-arrest factors as the clinical situation evolves. A potential problem of prognostic models including both pre-arrest factors and peri-arrest factors is that the peri-arrest factors carry a lot of weight in the model but they are not known at the time of initial counselling. Their importance becomes evident in later clinical decision-making, when deciding whether to (dis)continue a resuscitation attempt. The impact of peri-arrest variables on outcome should be reflected in decision models for termination of CPR, whereas the pre-arrest variables studied in this review should allow for better patient counselling on advance care directives.

As Lauridsen et al. rightfully note, in a recent review of test accuracy studies for IHCA prognosis, there is a need for models that aid in prognostication at an early stage of hospital admission, so that patients (and/or family) can be properly informed. They concluded that no score was sufficiently reliable to support its use in clinical practice. Our study provides a comprehensive review of model development, updating and validating, rather than just the diagnostic accuracy of the tools where no distinction between model development and validation is made [6]. We have critically appraised the methodology behind each model using the PROBAST—as is appropriate in this instance—and have managed to perform a meta-analysis of the models’ performances, using methodological guidance on meta-analysis of prediction model performance [8]. We did specifically not include Early Warning Scores, as they are comprised of physiological parameters that are not available at the time on counselling and are used to estimate the risk of deterioration in hospitalised patients rather than the prognosis after IHCA. They proved to be highly inaccurate for prediction of patient survival. Excluding the studies investigating Early Warning Scores, Lauridsen et al. included 20 studies, whereas this systematic review includes 33 studies maybe due to a search strategy more specific for a systematic review of prediction models [10].

Age was included in almost all models as a prognostic factor for outcome after IHCA. Dependent functional status was also a frequently included factor, as were comorbidities (metastatic) malignancy, renal insufficiency and the presence of sepsis. This corresponds with findings of a recent systematic review evaluating the association of single pre-arrest and intra-arrest factors with survival after IHCA, where the pre-arrest factors age, active malignancy and chronic kidney disease were all independently associated with reduced survival [52]. Male sex was also found to be an independent prognostic factor, but it was only in three of the models included in this systematic review. Frailty has recently been found to be a robust prognostic factor for in-hospital mortality after IHCA, which is reflected in this study by dependent functional status or admission from a nursing facility being frequently included prognostic factors [53]. It is however debateable whether frailty and functional dependence are the same thing. This was recently demonstrated in an observational study, where it was found that moderately frail adults demonstrate heterogeneity in functional status [54].

A wide diversity of predicted outcomes is present in the included models, ranging from the occurrence of ROSC to survival to discharge with a good neurological outcome. And although CPC is not a patient-centred outcome measure, it does provide an extra dimension over survival. Given that the GO-FAR performance is still better than other models, future research should attempt to correlate this model’s variables to health-related quality of life (HRQoL). And as previously argued by Haywood et al., all future cardiac arrest research should use uniform reporting of long-term outcomes and HRQoL to allow for better comparison between studies and represent more clinically relevant outcomes [55,56,57].

Model validation

To maximize the potential and clinical usefulness of prognostic models, they must be rigorously developed and—internally and externally—validated, and their impact on clinical practice and patient outcomes must be evaluated. Model development studies should adjust for overfitting by performing internal validation and recalibration. Several techniques for internal validation (reproducibility) are used and include apparent validation (development and validation in the same population), split-sample validation (random division of data in training and test sets) and bootstrapping (random samples of the same size are drawn with replacement). Only half of the studies in this systematic review which developed scores engaged in some form of (mainly internal and split-sample) validation.

However, no score should be applied in clinical settings unless it has been externally validated. External validation (generalisability) of a model can be performed via geographical or temporal validation or a fully independent validation (with other researchers at another centre) [14]. Only six models were subsequently validated in external populations and only a minority of the models assessed calibration or mention recalibration of the presented model. This could mean an overall overestimation of the performance of the other reported prognostic models. Performance is easily overestimated when there is only apparent validation. Therefore, external validation studies are needed to ensure the generalisability of a prognostic model for medical practice [58]. Moreover, only a minority of the models assessed calibration or mention recalibration of the presented model.

Based on the prognostic models identified through this systematic review, the GO-FAR score has the best performance when validated in external populations and is at this time the most robust and tested model. The performance of the PAM, PAR and MPI in external validation studies limits its consideration for clinical use.

As for generalisability; models were predominantly developed in the USA (using GWTG Registry data) and the UK. Several external validation studies were performed in Sweden in the same relatively small retrospective cohort. This emphasises a need for external model validation and updating in different populations, as many countries are not represented in the current body of literature and important cultural differences play an important role in the installing of advance care directives [59].

Strengths and limitations

This study contains a comprehensive search and extensive analysis using current guidelines for reviewing and assessing bias of prediction model studies [7, 17]. Methodological assessment revealed that the most frequent risk of bias was introduced in the domains source of data, sample size, number of outcomes and analysis (Additional file 1: Appendix.) Limitations pertain mainly to design of the included studies. Only two models were developed with prospective collected data, as is reported to be the superior source of data for the development of prognostic models [13, 17]. Most models were developed using registry data or relatively small retrospective cohorts. Another limitation of this study includes low sensitivity of the search, due to a lack of search terms and indexing for prognostic model studies [10].

Methodological recommendations

An important caveat in interpreting these results and implementing them  in practice becomes apparent when examining the prognostic models as the time at which the factors are assessed often differs from the moment at which the model would be used in a clinical setting. A prognostic model meant to be used before starting CPR (at hospital admission, or even prior to that moment) might be more practical and better reflect the moment when the decision-making in advance care planning is taking place and when such a model could be most helpful.

Imputation techniques should be used when data are missing and the full equation of the prognostic model should be presented to allow for external validation and updating by independent research teams and this should be performed in large prospective cohorts. Calibration is an important aspect of performance and should be assessed in future studies, as poorly calibrated models can be unreliable even with good discrimination [16].

There seems to be a gap between the development of prognostic models and the researching of their possible effect on clinical decision-making and maybe even on patient outcomes. Furthermore, clinicians may be eschewing the use of scores due to lack of clear guidance on which score(s) to use, barriers to practical use, or they may find the utility of the scores limited in clinical practice. In spite of the prevalence of risk models, it is known few models have been validated, and even fewer are used regularly in clinical settings [58, 60]. Future research should focus on updating and validating existing prediction models in large external populations, rather than developing new models. After extensive external validation studies of prognostic models, implementation studies are needed to assess their influence in clinical practice [61].

Conclusions

Several prediction models for clinical outcome after attempted resuscitation for IHCA have been published, most have a moderate risk of bias and have not been validated externally. The GO-FAR-score is the only prognostic model included in multiple external validation studies with a decent performance. Future research should focus on updating existing models in large external populations and on their influence on clinical decision-making.