Introduction

Out-of-hospital cardiac arrest (OOHCA) is a significant public health issue worldwide. The estimated annual incidence of OOHCA treated by emergency medical services (EMS) in countries in North America, Europe, Asia, and Oceania is 28–244 per 100,000 persons1. The management of cardiac arrest, including extracorporeal cardiopulmonary resuscitation, emergency cardiovascular treatment, and targeted temperature management, is progressing2. Despite advances in resuscitation care, the long-term prognosis for patients with OOHCA remains poor and is one of the leading causes of death worldwide3.

Accurate prognostication of survival and good neurological outcomes after the return of spontaneous circulation (ROSC) is critical for reducing unnecessary treatments and counseling patient families. Several clinical scores have been reported to predict the neurological outcomes of patients with OOHCA at an early stage4.

Although numerous predictive scores have been developed, few have been externally validated5,6,7,8,9. Only one study has evaluated multiple cardiac arrest prognostic scores simultaneously in a dataset with the same patient background5; the evidence on the superiority or inferiority of the scores is very limited. Recent studies have recommended the “Nonshockable rhythm, Unwitnessed arrest, Long no-flow or Long low-flow period, blood PH < 7.2, Lactate level > 7.0 mmol/L, End-stage chronic kidney disease on dialysis, Age ≥ 85 years, Still resuscitation, and Extracardiac cause” (NULL-PLEASE) clinical score as the superior score to predict outcomes in patients with OOHCA because of its range of external validation, ease of use, and high predictive value4,5. However, there is no clinician consensus regarding which score to use.

To date, no study has compared prognostic scores for patients with cardiac arrest using a large set of patients. Since each prognostic score is based on a different target patient population, the same patient set will differ from the original patient population. However, the simultaneous validation of numerous prognostic scores is of interest to many clinicians, as it is vital to know which scores are easy to use and accurate in a realistic clinical setting with a diverse patient population. The present study compared and validated multiple scores simultaneously using data from a nationwide multihospital prospective repository.

Methods

Ethics approval and consent to participate

The respective Ethics Committees of Kyoto University approved the Japanese Association for Acute Medicine OHCA (JAAM-OHCA) registry (approval number: R1045, approval date: May 5, 2019). The Ethics Committee approved the research protocol at each participating institution, including Hokkaido University Hospital (approval number: 022-0315, approval date: January 10, 2023). The study was conducted in compliance with the Declaration of Helsinki and the ethical guidelines of each institution. The registry for this study was an epidemiological study with no treatment intervention, and the requirement for informed consent was comprehensively waived by the Ethics Committee of Kyoto University, the representative institution of the JAAM-OHCA registry.

Study design

We conducted a retrospective analysis of the patients who were included in the JAAM-OHCA registry. The current validation study included the Out-of-Hospital Cardiac Arrest (OHCA); Cardiac Arrest Hospital Prognosis (CAHP); NULL-PLEASE; revised post-Cardiac Arrest Syndrome for Therapeutic hypothermia (rCAST); and MIRACLE2 scores and considered the variables used, ease of use, and reported accuracy8,9,10,11,12. The OHCA, CAHP, and NULL-PLEASE scores were intended to identify patients with a favorable prognosis9,10,11, while the MIRACLE2 and rCAST scores were intended to identify patients with a unfavorable prognosis8,12.

Patients

The JAAM-OHCA registry is a nationwide multihospital prospective repository of hospital data collected according to the Utstein template and in-hospital data, including treatments, arterial blood gas levels, and outcomes13,14. The JAAM-OHCA registry includes 103 participating facilities (79 university hospitals and/or critical care medical centers and 24 community hospitals providing emergency care in their respective regions). The registry includes all OOHCA cases requiring resuscitation by EMS and transported to participating facilities.

Data collection

The list of variables required for each score is shown in Table 1. These data were collected from the JAAM-OHCA registry database containing the Utstein template. However, because the presence of pupillary reflexes was not available, the modified MIRACLE2 (mMIRACLE2) score was calculated using six items, excluding the presence of pupillary reflexes. The mMIRACLE2 score is our definition of the score, and there is no existing literature on the mMIRACLE2 score. This is similar to the NULL-PLEASE score, which was validated with NULL-EASE without pH and lactate9. All of the other four score variables were included in the registry, so there were no missing variables in the scores other than the MIRACLE2 score. Since information on history was not included, chronic renal failure in the NULL-PLEASE score was defined as an estimated glomerular filtration rate (eGFR) of less than 30, calculated from age and serum creatinine levels (male: eGFRcreat = 194 × Cr(−1.094) × age(−0.287), female: eGFRcreat = 0.739 × 194 × Cr(−1.094) × age(−0.287)), and was used as a reliable substitute15,16. Because of the presence of missing data for each case in the registry, each score was calculated only for the cases with no missing data for the variables used.

Table 1 The variables needed to calculate each predictive score.

Patient selection and outcomes

This study included adult patients with OOHCA enrolled in the JAAM-OHCA registry between June 1st, 2014, and December 31st, 2019. The following patients were excluded from the analysis: (a) age < 18 years, (b) with unknown initial rhythms, (c) who experienced ROSC upon contact with the EMS, (d) with an unknown prognosis 1 month after cardiac arrest, and (e) death in the emergency room without hospitalization.

Neurological outcomes 30 days after OOHCA were evaluated using the cerebral performance category (CPC) scale17,18. The CPC scale is classified from 1 to 5 according to the degree of neurological dysfunction, with 1 and 2 on the CPC scale being able to live without assistance and generally considered to be favorable neurological outcome. Therefore, patients with a CPC scale of 1 or 2 were designated as having favorable neurological outcomes.

Statistical analysis

Data for categorical variables are presented as frequencies and percentages. Data for continuous variables are presented as medians with interquartile ranges. Patient characteristics and outcomes between the two groups were compared using the Mann–Whitney U test (for numerical variables) and Fisher’s exact test (for categorical variables). Since each of the prediction models used in this validation was not explicit about the mathematical relationship between score and prognostic probability, we mainly evaluated their discrimination ability. Multiplicity was not adjusted as this was an exploratory study. The overall discrimination abilities of the various scores for favorable neurological outcomes were tested using the area under the curve (AUC) of the receiver operating characteristic (ROC) curve. In addition, this study used partial AUCs (pAUC), which have been reported to be a useful tool for assessing regions of interest19. To evaluate the discrimination ability for favorable neurological outcomes with high specificity, the pAUC of the ROC curve from a specificity of 0.8–1.0 in each predictive score was calculated. Furthermore, to evaluate the discrimination ability for unfavorable neurological outcomes with high sensitivity, the pAUC of the ROC curve from a sensitivity of 0.8–1.0 in each predictive score was calculated. These ranges of sensitivity and specificity were defined as in previous studies5, as the subregions of greatest importance to clinicians in predicting prognosis in patients with cardiac arrest. DeLong test was used for AUC/partial AUC comparison though there was a power reduction20.

To further understand the characteristics and accuracy of the scores, we calculated the predicted rate from a logistic model with favorable neurological outcomes as the outcome and compared the predicted rate to the actual rate for each score. The deviation between the predicted and actual probabilities was evaluated using the Spiegelhalter z-test.

All analyses were performed using the R statistical software version 4.2.1 (R Core Team 2022). All reported p-values were two-tailed, and differences with p < 0.05 were considered statistically significant.

Results

Adult patients with OOHCA (n = 56,537) were identified from those included in the JAAM-OHCA registry (n = 57,754). Patients < 18 years of age (n = 1217), with an unknown initial rhythm (n = 5048), an unknown prognosis 1 month after cardiac arrest (n = 1354), who had already attained ROSC at the time of EMS contact (n = 2366), and who could not be admitted to the hospital and died in the emergency room (n = 35,845) were excluded. Finally, the analysis included the remaining 11,924 adult patients with OOHCA (Fig. 1).

Figure 1
figure 1

Flow chart of the study population. The patient groups surrounded by thick lines are those included in the present analysis. CPC: cerebral performance category scale, ECG: electrocardiogram, EMS: emergency medical services, ER: emergency room, OOHCA: out-of-hospital cardiac arrest, ROSC: return of spontaneous circulation.

The backgrounds of the analyzed patients are presented in Table 2. The distribution of patients in each score and the predicted and actual rates in each category are shown in Fig. 2. The distributions of patients with mMIRACLE2, CAHP, and NULL-PLEASE scores showed nearly normal distributions, while the patients with rCAST and OHCA scores were concentrated in categories with low rates. The percentages of favorable neurological outcomes within each category and predicted favorable neurological outcomes within each category were similar in the CAHP and NULL-PLEASE scores. The results of the Spiegelhalter z-test showed that none of the scores were significant, indicating no significant deviations between the predicted and actual probabilities (Supplementary Table 1).

Table 2 Baseline characteristics of out-of-hospital cardiac arrest in patients admitted after return of spontaneous circulation.
Figure 2
figure 2

Distributions of patients for each score and the predicted and actual rates of favorable neurological outcomes in each category. Right vertical axis, number of patients; left vertical axis, percentage; horizontal axis, classification of each score. The NULL-PLEASE and mMIRACLE2 scores are the measured values; while the OHCA, rCAST, and CAHP scores are shown in tenths of the range of scores obtained in this validation. NULL-PLEASE: Nonshockable rhythm, Unwitnessed arrest, Long no-flow or Long low-flow period, blood PH < 7.2, Lactate > 7.0 mmol/L, End-stage chronic kidney disease on dialysis, Age ≥ 85 years, Still resuscitation, and Extracardiac cause; OHCA: Out-of-Hospital Cardiac Arrest; rCAST: revised post-Cardiac Arrest Syndrome for Therapeutic hypothermia; CAHP: Cardiac Arrest Hospital Prognosis.

The AUC and pAUC of the ROC curves for each score are presented in Fig. 3 and Table 3. The NULL-PLEASE score had the largest AUC, which was significantly larger than those for the OHCA, mMIRACLE2, and rCAST scores. The next largest AUC was that of the CAHP score, which was significantly larger than those for the OHCA and mMIRACLE2 scores but not than those for the NULL-PLEASE and rCAST scores (p = 0.060 and 0.356, respectively).

Figure 3
figure 3

Comparison of receiver operating characteristic (ROC) curves for each score. aP value < 0.05 compared with OHCA score. bP value < 0.05 compared with mMIRACLE2 score. cP value < 0.05 compared with rCAST score. AUC: area under the curve, ROC: receiver operating characteristic, CI: confidence interval, NULL-PLEASE: Nonshockable rhythm, Unwitnessed arrest, Long no-flow or Long low-flow period, blood PH < 7.2, Lactate > 7.0 mmol/L, End-stage chronic kidney disease on dialysis, Age ≥ 85 years, Still resuscitation, and Extracardiac cause; OHCA: Out-of-Hospital Cardiac Arrest; rCAST: revised post-Cardiac Arrest Syndrome for Therapeutic hypothermia; CAHP: Cardiac Arrest Hospital Prognosis.

Table 3 Analysis of partial area under the curve to assess the ability to precisely determine neurological outcome.

In the current study, the pAUC of the ROC curve was used to analyze the discriminative performance of unfavorable neurological outcomes in high-sensitivity areas and the discriminative performance of favorable neurological outcomes in high-specificity areas. The former was evaluated based on the pAUC values for the ROC curves from sensitivities of 0.8–1.0 and the latter for pAUC values for the ROC curves from specificities of 0.8–1.0. The results showed that the CAHP, rCAST, and NULL-PLEASE scores were significantly more accurate than the other two scores in their ability to discriminate unfavorable neurological outcomes (Table 3). The NULL-PLEASE score also showed a significantly higher ability to discriminate favorable neurological outcomes compared with the other four scores.

Discussion

While several prognostic scores for patients with cardiac arrest have been published, few studies have simultaneously validated the accuracy of each score using the same patient group, especially using large datasets. This study is the first to simultaneously validate various cardiac arrest prognostic scores on a large data set5. All prognostic scores evaluated in this study had sufficiently high discriminative power. In particular, the CAHP and NULL-PLEASE scores were highly accurate in discriminating the neurological outcomes of patients with OOHCA at 30 days. In addition, the NULL-PLEASE score was highly specific and helpful in identifying patients with favorable outcomes.

Previous studies have identified various poor prognostic factors in patients with OOHCA, including older age, cardiac arrest occurring at home, initial rhythm other than ventricular tachycardia/ventricular fibrillation, long no-flow time, long low-flow time, high epinephrine dosage, no pupillary response, and high serum lactate levels21,22,23,24,25. Although several prognostic scores using these predictors have been proposed4, the prognosis of patients with OOHCA remains difficult to predict; moreover, no single assessment method for prognostic classification of patients with OOHCA has been recommended. The score that was the subject of this validation was also created using the above prognostic factors (Table 1). However, the target patients and prognostic evaluation methods for these scores differ (Supplementary Table 2). In addition, different targets aimed to identify the prognosis: the OHCA, CAHP, and NULL-PLEASE scores were intended to identify patients with a favorable prognosis9,10,11, while the MIRACLE2 and rCAST scores were intended to identify patients with an unfavorable prognosis8,12.

As shown in Table 1, each score uses different variables8,9,10,11,12, which may have limited the number of eligible patients. The patients and prognostic evaluation for each score used in this validation are shown in Supplementary Table 2. For example, the targets for the NULL-PLEASE score were all patients with OOHCA9, whereas those for the OHCA and CAHP scores were restricted to patients with bystander-witnessed OOHCA10,11. Additionally, although the score calculation was not hindered, the target patients in the original study may be limited. For example, the MIRACLE2 score is limited to patients with cardiogenic cardiac arrest and patients with a Glasgow Coma Scale score other than 1512. Additionally, the rCAST score is restricted to patients with OOHCA in whom therapeutic hypothermia was induced8. The accuracy of each score in the current validation may have been influenced by differences in the patients included in the original study, i.e., the prior probabilities.

The results showed that the NULL-PLEASE score had a high comprehensive discrimination ability for both favorable and unfavorable neurological patient outcomes (Fig. 3 and Table 3). The CAHP score was nearly as accurate as the NULL-PLEASE score but was significantly less accurate than the NULL-PLEASE score in discriminating patients with favorable neurological outcomes (Fig. 3 and Table 3). These results are consistent with the validation of the NULL-PLEASE score in a large cohort recently published by Byrne et al.26, who reported an AUC of the ROC curve for 30 day survival for the NULL-PLEASE score of 0.827 (95% confidence interval [CI] 0.814, 0.840). In our study, the AUC of the ROC curve for favorable neurological outcomes was 0.831 (95% CI 0.802, 0.860). Although the NULL-PLEASE score requires as many as 10 variables for calculation (Table 1), the calculation method is simple9. Since all of the variables can be easily collected in a clinical setting, the score can be applied to a large number of patients with OOHCA. Therefore, the NULL-PLEASE score is a useful predictive score in a variety of clinical settings. Although the CAHP score can only be used for patients with witnessed OOHCA, it is as useful as the NULL-PLEASE score in situations other than identifying patients with favorable neurological outcomes. The mMIRACLE2 score did not have sufficient discriminative accuracy in the present validation, and the AUCs in the present study (0.727 [95% CI 0.689, 0.764]) deviate from those in the original study (median AUC 0.83 [95% CI 0.818, 0.840])12. This may be largely because the pupillary reflex was not included in the variables. Although pupillary reflex is useful in predicting prognosis in patients with OOHCA25, our findings that the AUC of the mMIRACLE2 score was not high strongly support previous findings.

This validation also revealed that the distribution of eligible patients differed greatly depending on the scores (Fig. 2), and each score had strengths and weaknesses in discriminating the patient outcomes. Since different scores can be used depending on the patient's condition and the information obtained, it may be useful in clinical practice to compare different scores simultaneously and select the appropriate score for the patient with cardiac arrest. However, there are only a limited number of situations in which any single score can be used to make an absolute treatment decision.

Limitations

This validation was the result of the analysis of patients in Japan with OOHCA in the Japanese emergency system setting. Therefore, results may differ in different populations. The data used in this study were obtained from a subset of acute care hospitals participating in the registry, which may introduce selection bias for hospitals and patients transported. In addition, there are unmeasured confounders due to the retrospective study and the limited number of measures in the registry. For the mMIRACLE2 score, the accuracy may differ slightly due to fewer measurement variables compared to the original12. The selected patients included those with non-cardiogenic cardiac arrest and patients without therapeutic hypothermia, which may differ from the population included in the original study8,10,11. Therefore, the rCAST, MIRACLE2, and CAHP scores may have better performance than the validation results in the present study if the target patient population is limited.

Conclusions

The NULL-PLEASE scores were useful and accurate in discriminating neurological prognoses at 30 days in patients with OOHCA. Compared to the other scores, the NULL-PLEASE scores showed superior discriminative performance for favorable neurological outcomes.