Background

Potentially reversible cardiac arrest is a major public health issue faced by patients and health systems worldwide [1,2,3]. Progress in post-resuscitation care has improved survival significantly [4], but neurological sequelae due to hypoxic-ischaemic brain injury remain a concern among patients with return of spontaneous circulation (ROSC) after cardiac arrest [5, 6]. Early prognostication of neurological outcome in cardiac arrest survivors however remains difficult [7]. Current guidelines recommend to delay neurological prognostication in comatose cardiac arrest patients until 72 h after ROSC [8]. In order to provide additional guidance for discussions about goals of care and the extent of therapeutic effort, several scoring models have been developed, which use different clinical and laboratory values to calculate the probability of poor neurological outcome [9, 10]. This probability might be integrated into an overall clinical judgment using professional experience, clinical and neurological assessment.

Two of the most promising and thoroughly validated scoring models for the prognostication of neurological outcome after cardiac arrest are the Out-of-Hospital Cardiac Arrest (OHCA) and the Cardiac Arrest Hospital Prognosis (CAHP) scores [9, 11,12,13]. Both of these scores have shown good prognostic accuracy in numerous previous validations [13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30]. A drawback of both models is the fact that they require no-flow time, which is often inaccurate or unknown, especially if the cardiac arrest was unwitnessed. A South-Korean research group recently developed the PROLOGUE (PROgnostication using LOGistic regression model for Unselected adult cardiac arrest patients in the Early stages) score to address this issue [14]. The PROLOGUE score includes resuscitation, clinical and laboratory parameters and omits the no-flow duration as a predictor variable. Box 1 gives an overview of the parameters included in the OHCA-, CAHP-, and PROLOGUE score. In the internal validation and one external validation, both conducted in South Korea, the PROLOGUE score showed excellent prognostic performance with areas under the receiver operating characteristic curve (AUROC) of 0.94 and 0.92 respectively [14, 30]. Furthermore, the probability of poor neurological outcome can easily be calculated at the bedside using a nomogram provided in the original publication. The PROLOGUE score therefore seems like a promising new scoring model to assist with early prognostication after cardiac arrest. However, a recent Austrian validation study including 1051 adult cardiac arrest patients failed to reproduce the excellent performance of the score in the Korean publications [31]. This highlights the need for further external validation in different settings and countries. Therefore, the aim of this study is to provide an independent validation of the PROLOGUE score in a large European cohort of unselected adult cardiac arrest survivors and to compare it to the thoroughly validated OHCA and CAHP scores.

Box 1 Description of included scores

Methods

Study setting

We analysed prospectively collected data of adult cardiac arrest patients who were included in the COMMUNICATE/PROPHETIC cohort between October 2012 and July 2022 at the University Hospital Basel, a tertiary teaching hospital in Switzerland. The details of the study have been published previously [32]. Informed consent was obtained from the patients or from their relatives, depending on the patient's decision-making capacity. The study was approved by the Ethics Committee of North-western and Central Switzerland (www.eknz.ch) and followed the principles of the Declaration of Helsinki and its amendments. Analysis and reporting for this study were conducted in accordance with the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) statement [33, 34].

Participants

The COMMUNICATE/PROPHETIC registry included unselected cardiac arrest patients ≥ 18 years of age treated in the ICU of the University Hospital Basel. Eligible were all patients with ROSC after out-of-hospital (OHCA) or in-hospital (IHCA) cardiac arrest. Excluded were patients suffering a cardiac arrest while being monitored (e. g., ICU, intermediate care unit, operating theatre, cardiac catheterisation laboratory) and patients where informed consent was denied. The treatment of the patients was conducted according to the standardised local treatment protocol and followed the respective current guidelines of the European Resuscitation Council [8, 35, 36].

Outcomes

The primary endpoint was neurological outcome at hospital discharge as measured by the Cerebral Performance Category (CPC) scale [37]. The CPC scale differentiates five levels of functional outcome: A score of 1 indicates good recovery with resumption of normal life, a score of 2 indicates moderate disability with independence concerning daily life, a score of 3 indicates severe disability with the need of daily support, a score of 4 indicates a persistent vegetative state and a score of 5 equals death or brain death [37]. A CPC score of 1 to 2 was defined as good, a score of 3 to 5 as poor neurological outcome. The secondary outcome was in-hospital mortality.

Data collection

The following data were extracted for each patient from the electronic patient records: Sex category (as assigned at birth or as reported by the patients/relatives), age, pre-existing chronic diseases (coronary artery disease, heart failure, neurologic disease, diabetes, hypertension, chronic obstructive pulmonary disease, chronic kidney disease, liver cirrhosis and malignancy), resuscitation parameters (location of arrest, presence of a witness to the collapse, if bystander cardiopulmonary resuscitation [CPR] was performed, first monitored rhythm, dose of epinephrine [adrenaline] administered during CPR, no-flow duration, low-flow duration), cause of cardiac arrest, clinical and laboratory parameters at ICU admission (Glasgow Coma Scale [GCS] including the three sub-scores, presence of pupillary light reflex, c-reactive protein, blood glucose level, blood pH, lactate, phosphate, potassium, sodium, haemoglobin, creatinine), interventions performed during the ICU stay (mechanical ventilation, coronary angiography, administration of vasoactive agents and TTM), and CPC score at hospital discharge.

Score risk categories

The OHCA and CAHP scores were categorised as described by previous publications: The OHCA score was divided into four categories (≤ 20; > 20–40; > 40–60; > 60 points) [18], the CAHP score into three categories (< 150; 150–200; > 200 points) [12]. For the PROLOGUE score no such categories have been suggested. Instead, we assessed prognostic accuracy at each decile of predicted risk in accordance with the original publication [14].

Statistical analysis

Continuous variables were described using mean and standard deviation (SD) or median and interquartile range (IQR), categorical and binary variables were described by counts and proportions. Continuous variables were checked visually for normality of distribution using Q-Q-Plots. For comparison between groups Pearson’s χ2-Test (binary and categorical variables), ANOVA (continuous, normally distributed variables) and the Wilcoxon rank-sum test (continuous, skewed variables) were applied as appropriate. The PROLOGUE-, OHCA-, and CAHP-score values were calculated as indicated in the original publications. PROLOGUE-. OHCA-, and CAHP-scores’ prognostic performance was assessed using measures of discrimination and calibration. Discriminatory performance was analysed using the area under the receiver operating curve (AUROC). An AUROC of 0.7–0.8 was defined as acceptable, an AUROC of 0.8–0.9 as good and > 0.9 as excellent. Comparison of AUROC between PROLOGUE-, OHCA-, and CAHP-scores was conducted using the approach by DeLong et al.[38] Sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) were assessed for the cut-off values (i.e., categories) specified in the ‘Score Risk Categories’ section. Calibration was assessed graphically using a calibration plot depicting the event rates predicted by the respective score vs. the event rates observed in our cohort. Subgroup analysis comparing the PROLOGUE-, OHCA-, and CAHP-scores' performances in IHCA vs. OHCA patients as well as in female vs. male sex category was conducted. A two-sided p-value of < 0.05 was considered to represent statistical significance.

Results

Baseline characteristics

From 708 eligible patients with ROSC after cardiac arrest, 21 patients were excluded due to screening failure or missing informed consent. Six-hundred-eighty-seven patients were included in the final analysis. The baseline characteristics of our cohort are shown in Table 1 along with the baseline characteristics of the original PROLOGUE development cohort. Factors significantly associated with poor neurological outcome were higher age, male gender, chronic comorbidities (coronary artery disease, chronic obstructive pulmonary disease, diabetes, cancer, neurological disease), longer no-flow and low-flow durations, unwitnessed cardiac arrest, no bystander CPR, non-shockable initial rhythm, higher dose of adrenaline administered before ROSC, non-cardiac cause of cardiac arrest, as well as lower GCS score, non-reactive pupils, higher levels of C-reactive protein, creatinine, blood glucose, phosphate and lactate, lower pH and lower haemoglobin at ICU admission.

Table 1 Baseline characteristics of the study population and comparison with the development cohort

Neurological outcome and mortality

Three-hundred and twenty-one patients (46.7%) survived to hospital discharge with good neurological outcome, 68 (9.9%) survived with poor neurological outcome and 298 (43.4%) died. A Kaplan–Meier survival estimate of the whole population is shown in Additional file 1: Figure S1.

Score performance

The prognostic performance of the PROLOGUE, OHCA and CAHP scores for the primary and secondary outcome are summarised in Table 2. The PROLOGUE score showed an AUROC of 0.83 (95% CI 0.80 to 0.86) and good overall calibration for the prediction of poor neurological outcome at hospital discharge. The AUROC of the OHCA and CAHP scores were 0.83 (95% CI 0.80 to 0.86) and 0.84 (95% CI 0.81 to 0.87) respectively. The differences between the AUROC of all three scores were not significant (p = 0.495). For the primary endpoint, a graphical comparison of the ROC of the three scores is shown in Fig. 1. The calibration of the OHCA score was poor with overestimation of poor outcome, the CAHP score showed acceptable calibration, also with a tendency to overestimate poor outcome. Calibration plots of all three scores for the primary outcome are shown in Fig. 2. The PROLOGUE score showed the highest AUROC for the prognostication of in-hospital mortality, however, the differences between the three scores were not significant (p = 0.275). The ROC curves and calibration plots for the secondary outcome are shown in Additional file 1: Figures S2 and S3 respectively. The highest decile of the PROLOGUE score’s predicted risk (≥ 0.9) predicted poor neurological outcome with a specificity of 99.1%, but poor sensitivity of 17.8%. The prognostic accuracy of the PROLOGUE score at each decile of predicted risk is shown in Table 3. A Kaplan–Meier survival estimate stratified by quartiles of risk of poor outcome as predicted by the PROLOGUE score is shown in Additional file 1: Figure S4. The prognostic accuracy of the OHCA and CAHP scores at the pre-defined cut-offs are shown in Additional file 1: Tables S1 and S2, Kaplan–Meier survival estimates stratified by OHCA and CAHP risk categories in Additional file 1: Figures S5 and S6 respectively. The PROLOGUE score performed similarly in OHCA and IHCA patients (AUROC 0.83 [95% CI 0.80 to 0.87] vs. 0.80 [95% CI 0.72 to 0.88], p = 0.437) as well as in men and women (AUROC 0.83 [95% CI 0.80 to 0.87] vs. 0.82 [95% CI 0.76 to 0.88], p = 0.777). The OHCA and CAHP scores performed similarly in men and women, but significantly worse in IHCA patients than in OHCA patients (AUROC 0.75 [95% CI 0.66 to 0.84] vs. 0.85 [95% CI 0.82 to 0.88], p = 0.045 and 0.76 [95% CI 0.67 to 0.85] vs. 0.86 [95% CI 0.83 to 0.89], p = 0.049 respectively). The results of all subgroup analyses are summarised in Additional file 1: Table 3.

Table 2 Comparison between scores
Fig. 1
figure 1

Comparison of ROC curves of the PROLOGUE, OHCA and CAHP scores for the primary endpoint. The differences between the three scores were not statistically significant (p = 0.495). AUROC Area under the receiver operating characteristic curve; CAHP Cardiac arrest hospital prognosis; OHCA Out-of-hospital cardiac arrest score; PROLOGUE Prognostication using logistic regression model for unselected adult cardiac arrest patients in the early stages

Fig. 2
figure 2

Calibration plots of the PROLOGUE (A), OHCA (B) and CAHP (C) scores for the primary endpoint. AUC Area under the receiver operating characteristic curve; CAHP Cardiac arrest hospital prognosis; CITL Calibration in the large; E:O Expected vs. observed ratio of poor outcome OHCA Out-of-hospital cardiac arrest score; PROLOGUE Prognostication using logistic regression model for unselected adult cardiac arrest patients in the early stages

Table 3 Performance of the PROLOGUE score at different cut-off points

Discussion

This study aimed to externally validate the PROLOGUE score in a large, unselected population of cardiac arrest patients and to compare it to the two thoroughly validated scoring systems OHCA and CAHP for the prognostication of neurological outcome after cardiac arrest. All scores showed good prognostic accuracy in our cohort, with the differences between the score’s performances being minor and not statistically significant. In our sample, the PROLOGUE score was well-calibrated. The OHCA and CAHP score in contrast both showed a tendency to overestimate poor outcome. This is a major limitation of the OHCA and CAHP scores, as overestimation of poor outcomes might lead to premature withdrawal of life-sustaining treatment in patients who otherwise might have survived with a good neurological outcome.

To our knowledge, this study is the third external validation of the PROLOGUE score overall and the second one performed in Europe [30, 31]. Our findings are in line with the other European validation study performed in Austria, in which the score showed an AUROC of 0.82 (95% CI 0.80 to 0.85) [31]. In both European studies, the PROLOGUE score failed to reach the outstanding performance it showed in the internal validation and one external validation, both conducted in South Korea. There are some possible explanations for the difference in prognostic accuracy between the South Korean and the European studies. First, there are important differences in the baseline characteristics and predictor values between the development cohort and our study sample including higher proportions of IHCA patients and lower proportions of shockable initial rhythm, witnessed cardiac arrests and cardiac aetiology in the South Korean studies. Second, the proportion of poor neurological outcome at hospital discharge was higher in the South Korean studies (64.3 to 69.3% in the South Korean cohorts vs. 53.3% in our cohort) which might be due to said differences between the populations studied and/or differences in the clinical management of the patients. Third, none of the patients in the South Korean studies underwent withdrawal of life-sustaining therapy (WLST), since this was not allowed in South Korea until 2018 [14]. Fourth, pre-hospital management of cardiac arrest and the organisation of the rescue chains might differ substantially between countries. All of this highlights the importance of external validation of scoring systems such as the PROLOGUE, OHCA and CAHP scores before applying them in a region or country different from where it was developed. Hence, before applying the PROLOGUE score in Europe, North America or South America further external validation studies will be necessary.

However, the PROLOGUE score has some advantages over the OHCA and CAHP scores: First, it does not include the no-flow time, which is often missing or incorrectly estimated in clinical routine because cardiac arrests are often not witnessed. Second, it was explicitly developed for use in IHCA and OHCA patients. Thus, it can be applied to unselected cardiac arrest patients, whereas the OHCA and CAHP score were developed for OHCA patients only and consequentially showed significantly worse performance when applied to IHCA patients in the subgroup analysis—a finding which is in line with previous publications [16, 39]. The PROLOGUE score thus might be a promising alternative to the OHCA and CAHP scores mainly due to its better clinical applicability by omitting the no-flow time as a parameter and broader suitability to OHCA and IHCA survivors with—in our study—similar prognostic accuracy. Further validations are needed to confirm the reliability and generalisability of these findings. As a matter of fact, prognostic scores are not omniscient and should be seen as a decision aid only. In a next step, interventional studies should evaluate if the application of such scores improves patient management [40].

Our study has some limitations. First, a certain risk of self-fulfilling prophecy has to be acknowledged, a limitation that has to be kept in mind always when conducting or interpreting prognostic factor studies [41, 42]. However, this problem is difficult to address since treating physicians cannot be blinded to predictor variables that are a necessary part of clinical decision making such as findings of clinical examination or laboratory values. In our study, the treating physicians had access to all parameters included in the PROLOGUE score individually but not to the score value or the resulting prediction of the probability of poor outcome for their patient. However, clinicians should be aware that clinical prediction models may be statistically valid on average, but for any individual patient, it remains a complex clinical decision based on different parameters. Second, our cohort is from a single centre, thus limiting generalisability to other regions or countries. Third, there were some differences in data acquisition between the original study and ours. Whereas in the Korean development cohort the first available values after hospital admission were used for the calculation of the score, in our cohort the values nearest to ICU admission were used. Thus, in our study, the parameters might tend to have been assessed a little later, which might be the reason for some differences in predictor values between our population and the South Korean development cohort. However, such differences can always occur when a scoring system is applied in a different setting or hospital, which is why we recommend validating and—if necessary—recalibrating the score before application in a certain population and setting. Fourth, in Switzerland if poor outcome is evident early in the treatment course (e.g., brain herniation due to excessive cerebral edema) life-supporting measures are frequently withdrawn and changed to a palliative regimen. This might explain the rather low number of poor neurological outcomes (n = 68, 9.9%) in our cohort. In general, one should aim for ≥ 100 events of a particular outcome for validation, which might reduce the certainty of our results. Fifth, we did not test the results for subgroups based on important socioeconomic factors such as being part of an ethnical minority group or social status. Also, we did not change the model to improve fit, but just validated the original score.

Finally, we focused on poor outcome at hospital discharge as our primary outcome which is in line with the original paper, but did not report other patient-centered outcomes.

Strengths of our study include the large population of unselected cardiac arrest patients and the treatment modalities which are in line with current European guidelines, both of which indicate a high external validity of our results. Furthermore, analysis and reporting of our study followed current methodological guidelines [33], which is essential for the comparability of our result to other studies and for the usability of our results for evidence synthesis in the form of a systematic review and/or meta-analysis of score performance in the future.

Conclusion

In our prospective cohort of unselected adult cardiac arrest patients, the PROLOGUE, OHCA and CAHP score all showed good prognostic accuracy. The PROLOGUE score performs well in predicting poor neurological outcome in IHCA and OHCA patients and, if validated in qualitative or interventional studies, might support early discussions about goals of care and the extent of therapeutic effort between physicians and next of kin on the ICU. However, the outstanding performance of the PROLOGUE score in the South Korean studies could not be reproduced, which highlights the importance of external validation studies in the evaluation of prognostic models.