Background

For patients who are unconscious after out-of-hospital cardiac arrest (OHCA) and treated with hypothermia, survival rates around 50% are reported [1, 2]. Pre-hospital factors such as time from collapse to start of cardiopulmonary resuscitation (CPR), time from collapse to return of spontaneous circulation (ROSC), initial rhythm, bystander CPR and lactate levels all are strongly correlated with outcome at a group level [27]. However, these correlations are based on large retrospective cohorts with no control for differences in patient treatment and may therefore be subject to bias. Furthermore, single predictors may not be reliable in individual cases, owing to difficulties in obtaining or recording precise information during pre-hospital management of cardiac arrest (CA) [8, 9]. Currently, neurological prognostication in patients remaining unconscious is not recommended earlier than 72 h after the CA [10]. Nevertheless, in the OHCA patient population, with its elevated mortality, an earlier prediction of poor outcome is desirable for the process of care and information provided to the patient’s relatives. An assessment of the risk for a poor outcome would also be of great value for comparing populations and to define patient risk when assessing effects in interventional studies. Efforts to construct a prediction score based on the set of data available at hospital admission have yielded the OHCA score [11] and the Cardiac Arrest Hospital Prognosis (CAHP) score [12]. When applied prospectively, the OHCA score showed moderate predictive accuracy [13]; however, none of these scores has been validated for clinical use.

The aim of this study was to establish a clinically useful association between parameters available from patient history and status at intensive care admission and outcome in comatose survivors of OHCA. The first step in the analysis was directed at determining the association between these factors and poor outcome at 6 months, defined as Cerebral Performance Category (CPC) 3–5 [14]. The second step was directed at constructing an easy-to-use risk score for prediction of a poor outcome.

Methods

We performed a post hoc analysis of data obtained in the Target Temperature Management (TTM) trial [15], in which researchers recruited patients from 36 intensive care units (ICUs) in Europe and Australia. The trial included adult patients (≥18 years) resuscitated from OHCA of a presumed cardiac cause who remained unconscious (Glasgow Coma Scale [GCS] score ≤8) more than 20 minutes after ROSC. The main exclusion criteria were unwitnessed asystole as the initial rhythm and refractory shock at hospital admission defined as sustained systolic blood pressure less than 80 mmHg despite administration of fluids, vasopressors, inotropes and/or treatment with an intra-aortic balloon pump or left ventricular assist device [16].

Pre-hospital data, including initial rhythm, witnessed arrest, administration of bystander CPR and time from collapse to ROSC, were systematically collected at admission according to the Utstein guidelines [17]. Time from CA to initiation of basic life support (BLS; administered by bystanders or first responders) and advanced life support (ALS) was recorded. No-flow and low-flow times were defined as the time from CA to the start of CPR (BLS or ALS) and the time from the start of CPR to ROSC, respectively. Time to ROSC was defined as the time from CA to the first recorded time point of sustained spontaneous circulation. Patients were included in the present analysis if their CPC was recorded at follow-up 6 months after CA. All sites participating in the TTM trial registered patient data in a common electronic case report form. The process was monitored at each site by external reviewers who visited the centres and verified the correctness of registered data. All the centres used the same study protocol that defined target temperature management over time and prompted multimodal investigations for neurological prognostication. The results of the main trial were subjected to sensitivity analyses for time, study centre and other possible biases, all of which turned out negative.

The TTM trial demonstrated no difference in mortality and neurological outcome between a target temperature of 33 °C and 36 °C. The result has been further elaborated in post hoc analyses and sub-studies, which have so far shown similar outcomes in the two target temperature groups [1821]. Therefore, data were pooled for the present analysis.

Statistical analysis

A description of original data is given in Tables 1, 2 and 3, where categorical variables are presented as crude numbers and percentages and continuous variables are presented as medians with 25th and 75th percentiles. Logistic regression was used to calculate age-adjusted ORs with corresponding 95% CIs and p values. Continuous variables not fulfilling the linearity assumption were transformed using either natural logarithm or square root transformation.

Table 1 Patient characteristics in relation to outcome
Table 2 Circumstantial factors
Table 3 Patient factors based on examination on arrival at hospital

Owing to the amount of missing data for several of the variables, multiple imputation was used for the multivariable analysis. Missing data were assumed to be missing at random (p < 0.01 for Little’s test of missing completely at random), and 50 imputed datasets were generated with the Markov chain Monte Carlo method and using the expectation-maximisation algorithm. Rubin’s rules were used when pooling the results from the imputed datasets.

To identify independent predictors of a poor outcome, we started with a full model including all variables in Tables 1, 2 and 3. We excluded some continuous variables from the statistical analysis because of collinearity: CA-BLS time, CA-ALS time, CA-ROSC time, lactate, blood glucose and base excess on arrival to the hospital, as well as the dichotomous variables, intravenous drug abuse, immunodeficiency, cirrhosis and AIDS, that were present in fewer than five cases. Multiple logistic regression was performed in each of the 50 imputed datasets, and the variable with the highest p value in the pooled result was excluded from the model. A new regression analysis was then performed in each imputed dataset, and of the remaining variables, the one with the highest p value in the pooled result was excluded. This procedure was repeated until all remaining variables yielded a p value below 0.01 in the pooled result. These variables were then used to develop our prognostic risk score (TTM risk score). To facilitate clinical use of the model, we used an approach similar to that adopted in the development of the Framingham Risk Score [22]. We let the increase in risk associated with a 5-year increase in age, reflected by five times the β-coefficient for age in the final model, correspond to 1 point. We then determined points associated with each of the other categories of the identified risk factors by how far in regression units each category was from the corresponding factor’s base category (i.e., when points = 0), dividing that distance by five times the β-coefficient for age and rounding to the nearest integer.

The AUC was used to evaluate discrimination and the concordance percentage, and the Hosmer-Lemeshow goodness-of-fit test was used to evaluate calibration. This was done for all 50 imputed datasets, and the median and range of these are presented. The first five imputed datasets were used to calculate predicted versus observed risk as well as sensitivity, specificity, and positive and negative predicted values. Youden’s J statistic was used to select the optimum cut-off of the score. Internal validation was also performed in the first five imputed datasets using bootstrapping (1000 resamples in each set), and the maximum of these was used as an estimate of optimism. No external validation was performed.

Two-sided tests were used, and p values below 0.01 were considered statistically significant. All analyses were performed using SAS version 9.3 for Windows software (SAS Institute, Cary, NC, USA), except for Little’s test, for which IBM SPSS Statistics version 23 software (IBM, Armonk, NY, USA) was used.

Results

A total of 939 patients were included in the original TTM trial. Neurological outcome data 6 months after CA were available for 933 (99%) patients, who consequently were included in the present analysis. Of these patients, 493 (52%) had a poor outcome of CPC 3–5. Their median age was 65 years (IQR 57–73 years), and 177 (19%) were female.

Variables associated with a poor outcome

Older age, female sex, higher body mass index and a previous history of alcoholism were associated with a poor outcome (Table 1). Longer time from collapse to arrival of providers of BLS and ALS, as well as initial rhythm other than ventricular tachycardia or ventricular fibrillation (VT/VF), was associated with a poor outcome. CA location at home, longer duration of no flow and low flow, increasing time to ROSC and pre-hospital intubation and administration of adrenaline were also associated with a poor outcome (Table 2).

Lower pH, base excess and initial body temperature were associated with a poor outcome. Absence of brainstem reflexes, such as spontaneous breathing as well as pupillary, corneal and cough reflexes, were also associated with a poor outcome. Associated with a poor outcome were higher lactate; initial shock; higher glucose, potassium and creatinine in plasma; and absence of motor response at pain stimulation (Glasgow Coma Scale motor score 1 [GCS-M1]). Abnormal partial pressure of carbon dioxide in arterial blood (PaCO2; <4.5 or >6.0 kPa) in initial blood gas samples was also associated with a poor outcome (Table 3).

Independent predictors of a poor outcome

Six variables were excluded from the multivariate analysis because of multicollinearity: CA to BLS and CA to ALS were highly correlated to no flow, and CA to ROSC to low flow and lactate, blood glucose and base excess were all correlated to pH. After stepwise backward elimination, ten variables remained significantly correlated to a poor outcome: older age, CA occurring at home, initial rhythm other than VF/VT, longer duration of no flow, longer duration of low flow, treatment with adrenaline, bilateral absence of corneal and pupillary reflexes, GCS-M1, a lower pH and a PaCO2 lower than 4.5 kPa on admission (Table 4). This final model showed good discrimination with a median (range) AUC of 0.844 (0.842–0.846) and a median AUC of 0.820 by internal validation with bootstrap-derived samples correcting for optimism. The Hosmer-Lemeshow goodness-of-fit test demonstrated overall good calibration with p > 0.40 for all 50 imputations.

Table 4 Independent predictors of Cerebral Performance Category 3–5 at 6 months

A simple risk score for a poor outcome: performance and validation

The TTM risk score was developed using the final selection of variables above. The points assigned to different variables are listed in Table 5. The minimum sum of points was −2, and the maximum was 35. The performance of the TTM score using quartiles as cut-offs is described in Table 6. The median (range) AUC was 0.842 (0.840–0.845), and corrected for optimism by internal validation it was 0.818 (0.816–0.821). The Hosmer-Lemeshow goodness-of-fit test yielded a p value >0.10 in all 50 imputations, showing good calibration. The median concordance percentage was 82.2 (range 81.0–82.5). In patients with a score above 13 points, the sensitivity for poor outcome was 69% to 70% with a corresponding specificity of 83% to 84%.

Table 5 Target temperature management risk score points (range −2 to 35)
Table 6 Discrimination performance of the three risk scores

Risk for a poor outcome assessed by OHCA and CAPH scores

On the basis of the formula presented by Adrie et al. [11], we calculated the OHCA risk score in our cohort for each of the 50 imputations. This rendered a median (range) AUC of 0.746 (0.739–0.752), and 48 of the 50 imputations (96% of the imputed datasets) had a p value <0.05 for the Hosmer-Lemeshow goodness-of-fit test, indicating poor calibration. Median (range) concordance was 74.4 (73.8–75.1). For patients with a high risk according to the original paper (score >32.5), the sensitivity was 44% to 45% and the specificity was 89% to 89% for a poor outcome.

By visually estimating the parameters in the risk score model presented by Maupain et al. [12], we also calculated the CAHP risk score in our cohort for each of the 50 imputations, yielding a median (range) AUC of 0.746 (0.743–0.747), and 20 (40%) of the 50 imputations had p values <0.05 for the Hosmer-Lemeshow goodness-of-fit test, indicating poor calibration. The median (range) concordance percentage was 74.3 (74.0–74.6). In patients with a score above 200 points, which Maupain et al. defined as high risk, the sensitivity for poor outcome was 48% to 49% with a corresponding specificity of 82% to 83%.

The performance of the OHCA and CAHP scores is shown in Table 6. A comparison of the performance using ROC curves of the three risk scores is presented in Fig. 1.

Fig. 1
figure 1

Comparison of the performance of the Target Temperature Management (TTM), out-of-hospital cardiac arrest (OHCA) and Cardiac Arrest Hospital Prognosis (CAHP) risk scores. ROC curves for a poor outcome at 6 months for the TTM (red), OHCA (blue) and CAHP (green) risk scores in our imputed sample (one curve per score for each of the 50 imputations). The median AUC for the TTM score was 0.842 (0.818 corrected for optimism); for the OHCA score, it was 0.746, and for the CAHP score, it was 0.746

Discussion

The aim of this study was to identify independent parameters from patient history and status available at intensive care admission that could be used for early prediction and risk stratification of a poor outcome in comatose survivors following OHCA. Age, time to ROSC and initial temperature were previously reported from the TTM cohort [21, 23, 24]. Our findings using multiple variable analysis are also in line with earlier studies that older age, CA occurring at home, initial rhythm other than VF/VT, longer duration of no flow, longer duration of low flow, treatment with adrenaline, absence of corneal and pupillary reflexes, GCS-M1, a lower pH, and a PaCO2 lower than 4.5 kPa on admission were independent predictors of a poor outcome [2, 6, 11, 1820, 2536]. Low-flow time from the initiation of CPR to ROSC has previously been associated with a poor outcome in various studies of OHCA [11, 29, 37], including the present cohort [21]. However, effective low flow may have been longer than registered because BLS was provided before arrival of ALS in 73% of the patients. Notwithstanding that BLS efficacy may vary considerably and is difficult to assess, registered low-flow time in the present study was somewhat longer than reported in the cohorts used to calculate OHCA and CAHP risk scores [11, 12]. The role of adrenaline in the resuscitation of OHCA victims is controversial, and in a recent meta-analysis, no benefit could be found [30]. In a Japanese study of 400,000 patients, researchers reported increased ROSC but decreased 1-month survival in patients who received adrenaline compared with those who did not [31]. Adrenaline is still part of European resuscitation guidelines [38], and it is possible that the association between adrenaline and a poor outcome that we found was related to more complicated resuscitations. Among variables potentially available at ICU admission, co-morbidity may be expected to be important. However, as in this analysis, it has previously been shown not to be associated with mortality in the present patient cohort [39].

To assess the risk for poor outcome after OHCA at an early stage, we constructed the TTM risk score on the basis of individual variables available at ICU admission. Several prediction methods have previously been investigated. The Acute Physiology and Chronic Health Evaluation II score [40], which is not disease-specific, has repeatedly been shown to be a poor predictor of outcome in OHCA [41]. The OHCA risk score presented in 2006 using variables available at hospital admission showed a similar performance. However, the OHCA risk score was based on a small cohort (n = 130). Also, the patients were relatively young compared with other OHCA cohorts, with a median age of 55 years and no patients older than 69 years [11]. The recently published CAHP risk score, based on a large number of patients, has performed best so far with an AUC of 0.93. However, some steps in its underlying calculations, such as the assumption of a linear association between outcome and time to ROSC, as well as outcome and pH, can be discussed [12]. When applied in our cohort, the OHCA and the CAPH scores lost performance as assessed by AUC, sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV). These discrepancies reflect the problem of low external validity for scores that are constructed in a single cohort. This may also apply to our score in spite of our use of a multicentre international cohort, and it needs to be validated in further trials. For the TTM risk score, a high value (>16 points, representing the fourth quartile of the patients) showed a promising PPV of 91% and a specificity of 95% to 96% and therefore indicated an acceptable margin not to predict a poor outcome in a patient with a good prognosis. However, the corresponding sensitivity was only 43%. Selecting a performance optimum cut-off of >13 points by use of Youden’s J statistics, yielding a sensitivity and specificity of 69% to 70% and 83% to 84%, respectively, does not render the score more useful, because the PPV is reduced to 82% to 83%. A low sensitivity when reaching a satisfactory specificity was also an issue for the CAHP risk score (sensitivity 46% to 56% in the development cohort). The high-risk cut-off OHCA score at 32.5 points (development cohort) described in the paper had an inadequate specificity of only 77% for poor outcome but a better sensitivity of 77% [11]. These results provoke the question whether better performance could be achieved with more, or contrarily, with fewer but highly discriminative, variables in a risk score model. Interestingly, Jabre et al. [42], using a development cohort of 1771 patients, found 0% (95% CI 0.0% to 0.5%) survival in 772 patients who fulfilled three criteria during pre-hospital resuscitation. These criteria were (1) OHCA not witnessed by emergency medical services personnel, (2) non-shockable initial cardiac rhythm and (3) no ROSC before receipt of a third 1-mg dose of epinephrine. However, comparison with our TTM trial cohort is difficult because the latter included only patients hospitalised alive after OHCA. Among 618 patients admitted alive in the Jabre et al. study, 22% were discharged alive. There is no information about the 180-day survival, which was close to 50% in the TTM cohort [15].

Strengths and limitations

A significant advantage of the present study is the well-defined cohort of patients who were carefully evaluated. In particular, on one hand, a major strength was that patients in the TTM trial were subject to strict rules on neurological prognostication and withdrawal of life-sustaining therapy [15], which may be a significant source of bias in cohort studies and other randomised clinical trials. On the other hand, our risk score is valid only under these clinical conditions. A large number of clinically relevant variables and outcome measures were registered in similar pre-hospital and emergency health care systems and ICUs following a published trial protocol [16]. This may favour evaluation of heterogeneous early predictors of outcome [9]. Treating all patients included in a randomised trial of two interventions as one cohort carries a principal risk of hiding differences between the groups. However, the two groups were well balanced with regard to background characteristics and the main outcome, and outcomes in numerous sub-studies have shown no differences due to the intervention. Nevertheless, it needs to be stressed that a major limitation of the external validity of our study is that the cohort consists of selected patients with OHCA of a presumed cardiac origin who were not found in unwitnessed asystole, who were unconscious on admission to hospital and who were not in a refractory circulatory shock state. Furthermore, the modality or the degree of respiratory support before ICU admission constitutes a treatment bias because it was not registered in this study and could have affected predictors such as PaCO2 and pH.

Conclusions

In a cohort of 933 patients with OHCA of a presumed cardiac cause extracted from the 939 patients included in the TTM trial, we found ten independent predictors of a poor outcome, defined as CPC 3–5 six months after CA. These included older age, CA occurring at home, initial rhythm other than VT/VF, longer duration of no flow, longer duration of low flow, administration of adrenaline, bilaterally absent pupillary and corneal reflexes, absent motor response to pain, a lower pH, and a PaCO2 lower than 4.5 kPa at admission.

The predictors readily available at ICU admission were used to construct an easy and simple-to-use risk score that showed good association with outcome 6 months after the arrest. The score could further represent a helpful tool for treatment allocation and stratification in randomised studies as well as for comparison of cohorts in epidemiological studies. However, it is important to stress that the proposed risk score is not yet precise enough to be used for individual prognostication of outcome after OHCA, and it needs further validation in a large cohort of patients with OHCA.