Background

Care of patients with post-cardiac arrest syndrome (PCAS) is mainly aimed at improving the neurological prognosis of the patients after the return of spontaneous circulation (ROSC) [1, 2]. To improve the outcomes of PCAS patients, many hospitals perform undertake induced therapeutic hypothermia, based on the results of previous randomized controlled trials [3,4,5,6,7]. Prediction of the prognosis of PCAS patients at the time of their arrival in the Emergency Room before the initiation of induced hypothermia will be useful for stratifying patients for precise clinical research, as well as for providing a baseline estimation of their prognosis.

Some studies have examined several factors that are quantifiable before the initiation of induced hypothermia to determine their relationships to the neurological outcomes in PCAS patients. For example, the duration of the resuscitation effort has been shown to be correlated with a good functional outcome in patients with PCAS [8, 9]. Other studies have revealed correlations between the pH [10], serum lactate [11, 12] and Glasgow Coma Scale (GCS) score [13, 14] with the neurological prognosis in PCAS patients. However, none of these factors has been found to be by itself capable of satisfactorily separating patients with a good outcome from those with a poor outcome, suggesting that establishment of a “suitable scale” based on a combination of prognostic factors might be useful [15, 16]. The aim of the present study was to develop a prognostic scoring system for predicting the neurologic prognosis in PCAS patients treated with induced hypothermia based on the results of examinations carried out prior to the initiation of induced hypothermia. A summary of this study was previously reported in a letter format [17].

Methods

Study design

We conducted a multi-center, retrospective, observational study examining adult patients who were treated with induced hypothermia after experiencing cardiac arrest. We retrospectively reviewed the clinical management charts of the eligible patients who were admitted to our hospitals during a period spanning the last 3-5 years: 54 patients treated at Nagoya University Hospital between April 2011 and March 2016, 23 patients treated at Chutouen General Medical Center between April 2013 and March 2016, 64 patients treated at Japan Red Cross Maebashi Hospital between April 2011 and March 2016, and 10 patients treated at Komaki City General Hospital between April 2012 and March 2016. Eligible patients were all who were treated with induced hypothermia after experiencing cardiac arrest (induced hypothermia was considered for cardiac arrest patients who were in a coma (GCS ≤ 8) after ROSC without remarkable hemodynamic instability or a “Do Not Attempt to Resuscitation” directive.). They were excluded if they were traumatic cardiac arrest patients, or pediatric patients (age < 18 years), or did not have lived independently prior to experiencing cardiac arrest. For the purpose of developing and validating the prognostic scoring system, we divided all 151 patients into a learning set (77 cases treated at Nagoya University Hospital or Chutouen General Medical Center) and a validation set (74 cases treated at Komaki City General Hospital or Japan Red Cross Maebashi Hospital). These sets were created with the aim of developing and validating a scoring system for a broad population of patients from city hospitals and general medical centers in the countryside, so each set contained one hospital located in a city and one located in a countryside. This study was approved by the research ethics boards of Nagoya University Hospital, Chutouen General Medical Center, Komaki City General Hospital and Japan Red Cross Maebashi Hospital.

Participating hospitals

The four participating hospitals are tertiary emergency medical centers (Japanese centers for emergency patients with serious or life-threatening conditions): Nagoya University Hospital is an academic hospital, while Chutouen General Medical Center, Japan Red Cross Maebashi Hospital, and Komaki City General Hospital are general hospitals. Nagoya University Hospital and Japan Red Cross Maebashi Hospital are both located in cities and have 1,035 and 592 beds each, including 26 and 12 ICU beds, respectively; these hospitals respectively treat about 12,000 and 20,000 emergency patients each year. Chutouen General Medical Center and Komaki City General Hospital are both located in the countryside and have 500 and 558 beds each, including 10 and 30 ICU/CCU beds; these hospitals treat about 20,000 and 30,000 emergency patients per year.

Dataset

Data was collected retrospectively from electronic chart reviews, including the clinical histories (age, sex, situation surrounding the cardiac arrest), cardiac rhythms, physical examinations performed upon the patient’s arrival in the Emergency Room (GCS, mydriasis), results of blood examinations (C-reactive protein [CRP], albumin [Alb], hemoglobin [Hb], glucose, creatinine, pH, lactate), cranial CT scan images, and clinical courses after admission. Scoring variable candidates were selected from among parameters measurable before admission to the ICU and the initiation of induced hypothermia.

The gray matter attenuation to white matter attenuation ratio (GWR) was measured using the method described in Torbey’s report [18]. CT scan images were obtained using an Aquilion64 (TOSHIBA) or SOMATOM Definition Flash (SIEMENS) within 6 h after the patient’s cardiac arrest event. As shown in Fig. 1, the intensities of circular areas of interest (about 10 mm2) were measured for both the gray matter and the white matter on three axial slices (5-mm slice thickness) at a basal ganglia level, a centrum semiovale level, and a high convexity level. Then, the GWR was calculated as shown below [19]:

$$ \mathrm{G}\mathrm{W}\mathrm{R}=\left(\left[\mathrm{Th}/\mathrm{PIC}\right]+\left[\mathrm{MC}1/\mathrm{MWM}1\right]+\left[\mathrm{MC}2/\mathrm{MWM}2\right]\right)/3 $$

where Th represents the thalamus, MC1 represents the medial cortex at the centrum semiovale, MC2 represents the medial cortex at the high convexity level, PIC represents the posterior limb of the internal capsule, MWM1 represents the medial white matter at the centrum semiovale, and MWM2 represents the medial white matter at the high convexity level. Each value was the average of the right and left hemisphere values.

Fig. 1
figure 1

Three imaging slices used to calculate the gray matter attenuation to white matter attenuation ratio. Left, high convexity level (a). Middle, centrum semiovale level (b). Right, basal ganglia level (c). In each slice, the Hounsfield units (HU) were measured within circle 1 on the gray matter and within circle 2 on the white matter. Then, each HU value was then calculated based on the average of the values for the right and left brains. The slice thickness was 5 mm, and the circle size was 10 mm2

Before developing the scoring system for the learning set, we conducted some unit imputations because the baseline data involved some missing values. The mean value was used for the missing values of continuous variables, while 0 was used for the missing values of binary variables. Of note, we confirmed that the accuracy of the prediction did not change even if a value of 1 was used for the missing values.

Protocol for therapeutic hypothermia

Induced hypothermia was performed for eligible patients according to each hospital’s protocol. Induced hypothermia was considered for cardiac arrest patients who were in a coma (GCS ≤ 8) after ROSC without remarkable hemodynamic instability or a “Do Not Attempt to Resuscitation” directive. At Nagoya University Hospital, Chutouen General Medical Center, and Komaki City General Hospital, a temperature of 34 °C was targeted by the infusion of cold fluids in combination with surface cooling, an ice pack and cold blanket, or a surface cooling device with a computerized automatic temperature control (Arctic Sun 2000 TTM; Bard Medical Louisville, CO). After the targeted temperature had been maintained for 24 h, rewarming to 36 °C was performed at a rate of 0.2 °C/4 h. Propofol, dexmedetomidine, fentanyl and rocuronium were used for sedation, analgesia, and muscle relaxation according to individual clinician preferences. At the Japan Red Cross Maebashi Hospital, the target temperature was mostly 34 °C, but the target was changed to 35 °C or 36 °C if the patient experienced hemodynamic instability. Induced hypothermia was performed using a surface-cooling device with a computerized automatic temperature control (Arctic Sun 2000 TTM; Bard Medical Louisville, CO). After the targeted temperature had been maintained for 24 h, rewarming to normothermia was performed at a rate of 1 °C/24 h, stopping at 36 °C. Propofol, dexmedetomidine, midazolam, fentanyl and rocuronium were used for sedation, analgesia, and muscle relaxation according to individual clinician preferences. At all the participating hospitals, the ventilator settings, fluid infusion, and doses of vasopressors, sedatives, and analgesics were adjusted so that the mean arterial pressure, pCO2, and urine output were ≥80 mmHg, 35-45 mmHg, and ≥0.5 mL/kg/h, respectively, to maintain cerebral perfusion.

Neurological outcome

We used the Cerebral Performance Categories (CPC) at 30 days to estimate the neurological outcomes as follows: CPC 1, full recovery; CPC 2, moderate disability; CPC 3, severe disability; CPC 4, coma or vegetative state; and CPC 5, died [20]. We calculated the CPC score at 30 days by reviewing the electronic charts or interviewing the patient’s family and the institutions where the patients were admitted at 30 days. The categories were grouped into either a good outcome (1-2) or a poor outcome (3-5) [20].

Sample size for external validation

We determined the sample size for the external validation set so as to ensure a high precision in estimating the proportion of correct classification (PCC). Specifically, we set the target value of PCC as 85%, which is the point estimate for the classifier based on the logistic regression obtained in the internal validation. We required that the width of a two-sided confidence interval with a 95% confidence level based on a normal approximation be less than 10%. In this manner, we determined that at least 51 patients were needed for the external validation.

Statistical analysis

During the development of the prognostic scoring system, we first identified a set of prognostic factors that could be used to predict the clinical outcome, that were both clinically important and routinely measurable in the Emergency Room, and that also exhibited some correlation with the outcome variable in the learning set. In developing classifiers using the resulting set of prognostic factors, we applied the standard logistic regression and decision tree algorithms to the learning set. For the internal validation of these classifiers, we conducted a 10-fold cross-validation using the learning set to estimate the indices of the predictive accuracy, including the proportion of correct classification, specificity, and sensitivity. Here, specificity measures the proportion of patients with poor outcomes who were correctly identified as such. We repeated the cross-validation analysis 50 times with different random sample splits in the learning set to obtain stable estimates of these indices. We then applied the prediction algorithm with a higher accuracy in the internal validation to all the patients in the learning set to obtain a classifier for the external validation. In the cross-validation, we also identified the most appropriate number of the variables for the scoring system by comparing the predictive accuracies between different numbers of variables. In the external validation, we estimated each 95% confidence interval (95% CI) using an exact method based on the beta distribution (a normal approximation was not used). Finally, we applied the prediction algorithm to the entire 151 patients to create a novel scoring system for use with future patients. R software was used for all the statistical analyses. We used the “glmnet” package for logistic regression (http://www.jstatsoft.org/v33/i01/) and the “rpart” package for the decision tree (http://CRAN.R-project.org/package=rpart). In “glmnet” package, the logistic regression algorithm allows a “good prognosis” to be identified if the posterior probability of a good prognosis is greater than 50%; otherwise, a “poor prognosis” is identified.

Results

A flow chart illustrating the subject enrollment and development of the score is shown in Fig. 2. A total of 151 consecutive adults were divided into learning set in two hospitals and validation set in other two hospitals. From the learning set, we extracted several factors for developing the scoring system and performed an internal cross validation in order to decide the optimum statistical algorithm and number of variables. A tentative scoring system was created from the data, and the predictive accuracy of the scoring system was examined by external validation. Finally, using all of the data, we created the post-Cardiac Arrest Syndrome for induced Therapeutic hypothermia (CAST) score to predict the neurological prognosis in PCAS patients prior to the initiation of induced hypothermia.

Fig. 2
figure 2

Flow chart illustrating the subject enrollment and development of the score

The baseline characteristics of the patients in the learning and validation sets are summarized in Table 1. The correlations between the baseline variables and the patient outcomes in the learning set are shown in Table 2. Candidate variables for inclusion in the scoring system were selected from among the variables that were examined before the patients were admitted to the ICU for the initiation of induced hypothermia.

Table 1 Baseline Characteristics of the Learning and Validation Data Sets
Table 2 Correlation Coefficients and P Values between Each Variable and Patient Outcomes in the Learning Data Set

From among the variables that were considered, we selected 8 factors (initial rhythm, witnessed status and time until ROSC, GCS-M score, pH, serum lactate, Alb, Hb and GWR) that showed significant correlations (P <0.01) with the patient outcomes. For convenience, while using the variables to create the scoring system, we converted the continuous variables into categorical variables in such a manner that higher scores implied poorer outcomes (Table 3).

Table 3 Categorical Classification of Each Variable

The cross-validated predictive accuracies (sensitivity, specificity, percentage of correct classifications) of the tentative scoring system created using the learning set were 0.85, 0.84 and 0.85 for the logistic regression algorithm, and 0.82, 0.73, and 0.78 for the decision-tree algorithm, respectively We used the logistic regression algorithm to construct the scoring system, because it yielded more accurate values for the sensitivity, specificity, and percentage of correct classification in the internal cross-validation, as compared to the decision-tree algorithm. The most appropriate number of variables was also confirmed using internal cross-validation. We compared the predictive accuracies of three scoring systems created with the logistic regression algorithm using different numbers of variables (6, 8, and 11 variables, which showed correlations with the outcomes at P <0.001, <0.01 and <0.05, respectively). The predictive accuracies of the three scoring systems were (0.86, 0.82, 0.84), (0.85, 0.84, 0.84), and (0.84, 0.83, 0.83), respectively. We decided to create the scoring system using 8 variables, because the highest specificity was obtained with this number of variables (specificity was prioritized because ethically, it may be more acceptable to misjudge a patient as being likely to have a good outcome, even if they actually have a poor outcome, than to misjudge a patient as being likely to have a poor outcome when they actually have a good outcome). An additional figure file illustrates this in greater detail [see Additional file 1].

Next, the 8-variable tentative scoring system that was developed from the learning set using the logistic regression algorithm was externally validated using a validation set consisting of 74 cases. For establishing the classification, we set 50, 30, and 15% as the cutoff values; for example, a 50% cutoff value meant that a “good prognosis” was identified if the probability of a good prognosis according to the score was greater than 50%; otherwise, a “poor prognosis” was identified. Based on these cutoff values, the sensitivity and specificity estimated using the validation set were (50%: 0.95 (95% CI, 0.82–0.99), 0.90 (0.77–0.99)), (30%: 0.87 (0.69–0.96), 0.98 (0.88–1.00)) and (15%: 0.67 (0.47–0.83), 1.00 (0.92–1.00)), respectively. Then, we plotted the receiver operator characteristic (ROC) curve and found an area under the curve (AUC) of 0.97 [see Additional file 2].

Since the results of evaluation of the predictive accuracy of the scoring system by both internal and external validations implied that the process of creating the scoring system was appropriate, we finally developed a novel scoring system to predict the neurological outcomes of PCAS patients prior to their undergoing induced hypothermia (the post-Cardiac Arrest Syndrome for induced Therapeutic hypothermia (CAST) score), by applying the logistic regression algorithm to all the 151 patients (including both the learning and validation sets; Fig. 3).

Fig. 3
figure 3

Calculation used for determining the post-Cardiac Arrest Syndrome for induced Therapeutic hypothermia (CAST) score. Using the correlation coefficients from all the data (a), the resultant scores and the probability of a good outcome were calculated (b)

Discussion

The considerable advances in neurological critical care medicine, including the use of induced hypothermia, have led to remarkable improvements in the neurological prognosis of PCAS patients. Such critical care treatments are, however, expensive and time-consuming (for example, the average total cost of admission of a PCAS patient treated by induced hypothermia at our hospital in 2015 was about $49,410 USD). Accurate prediction of the neurological outcomes before the initiation of induced hypothermia will help us to carefully consider the indications for these intensive and expensive therapies.

Although a few previous studies have attempted to establish a scoring system to estimate the neurological outcomes of PCAS patients at the time of their arrival in the Emergency Room, none of these studies took into account whether the patients included in the study had undergone/not undergone induced hypothermia [10, 21,22,23,24]. Today, many ICU doctors employ induced hypothermia in an attempt to obtain good recovery in patients in whom intensive treatment is indicated. It is noteworthy that the CAST score in this study was created using only data from PCAS patients who had undergone induced hypothermia, and not from all PCAS patients. Moreover, the factors included in the scoring systems attempted in these previous studies were mostly limited to clinical history items, such as the time until ROSC, the initial rhythm, and the witness status; no blood examination, physical examination or imaging findings were included. For the present study, we collected data from the patients’ clinical history, blood examination, physical examination and imaging findings, all of which were available prior to the initiation of induced hypothermia, and some of these parameters actually showed strong correlations with the patient outcomes. Using these findings, we created the CAST score to predict the neurological prognosis of PCAS patients even prior to the initiation of induced hypothermia.

When creating a scoring system, the convenience of the calculation should be emphasized, and until now, a simple decision-tree algorithm [22] or scoring system utilizing a simplified odds ratio [25] had been considered to be optimal. Today, however, electronic devices allow more complex formulae to be used, and utilize statistical algorithms with more logical and statistically verified backgrounds, enabling the creation of more accurate, and possibly more complex, scoring systems. In the present study, we judged that a logistic regression algorithm was optimal for establishing the CAST score system based on the results of an internal validation. To calculate the score more easily, we developed application tools for calculating the CAST score as iOS applications for the iPad [26] and iPhone [27].

Studies on predictive scores, while being extremely interesting, can also be contentious [28]. Predictive scores should be carefully considered, since they only show the probability of outcome in a general population, and not the precise probability for an individual patient. Although predictive scores can be useful for guiding decision-making and risk assessments for individual patients [28], their results are not absolute. Of course, the final therapeutic strategy should be decided based not only on the results of the scoring system, but also on different factors, such as the results of discussions with family members, the patient’s own wishes, societal considerations, etc. Most importantly, a patient’s exact neurological prognosis cannot be predicted without decisive examinations, such as an electroencephalogram or the long-term observation.

The predictive accuracy of the CAST score is limited, because it was created using retrospective data, even though its generalizability is likely to be high, since it was developed using data from multiple centers. We have to take into account that there were some missing values, for which we conducted some unit imputations. Moreover, it is possible that minor differences between hospital protocols (such as the kinds of sedatives or methods used for inducing and maintaining induced hypothermia) could have had some influence on the patient outcomes. The differences in the baseline characteristics of the patients among the participating hospitals can also not be ignored, because of the limited sample size of the study. It would be interest to conduct the further validation of the score using data from more participating hospitals. The endpoint used in this study was the outcome at 30 days. Although the outcome at 30 days has been used in a few other studies that have attempted to establish a predictive score for cardiac arrest patients [10, 29], it may be better to set longer-term end points, such as outcome at 90 days, for more accurate prediction of the future clinical course [30]. Prospective validation of the CAST score and a study to examine the usefulness of this score for predicting the long-term prognosis of PCAS patients are warranted.

Conclusions

The CAST score was developed to predict the potential neurological outcomes of PCAS patients treated by induced hypothermia. According to our results, in PCAS patients with a CAST score of ≤15%, the likelihood of a good recovery at 30 days is extremely low. Prospective validation of the CAST score is needed in the future.