Introduction

In Europe, out-of-hospital cardiac arrest is one of the most common causes of death, with an annual incidence of 275,000 cases [1]. Nevertheless, in the early period after successful resuscitation and restoration of stable circulation, mortality is high due to hypoxic ischemic encephalopathy (HIE) [2]. In recent years, advances in emergency care and new therapeutic options such as hypothermia treatment have improved the chances of survival and good neurological outcome for these patients. Most notably, mild therapeutic hypothermia has been shown to significantly improve neurologic outcome after cardiac arrest [3]. Therapeutic hypothermia, however, affects several prognostic parameters that are used to predict poor outcome within the first few days in patients successfully resuscitated after cardiac arrest: (1) elevated serum levels of neuron-specific enolase (NSE), (2) loss of cortical somatosensory evoked potentials (SEP), and (3) motor response to painful stimuli equal or less than two points on the Glasgow Coma Scale on day 3 after resuscitation. Several studies have indicated that therapeutic hypothermia affects the validity of these prognostic parameters [410]. For this reason, additional diagnostic tests or parameters have been evaluated and may be used in addition to previously established parameters to predict poor outcome with a high degree of confidence. Furthermore, some of the established diagnostic tests such as SEPs are not available in all hospitals treating patients after cardiac arrest, results of neurological examination may differ between examiners, and NSE may be elevated in patients with pre-existing but unknown tumors. Hence, establishing additional, widely available diagnostic tests could help to reduce the risk of false decisions in individual patients. Cranial computed tomography (CT) is performed in many patients who are comatose after cardiac arrest, and its potential for predicting outcome has been evaluated in some recent studies [1116].

HIE is known to be associated with cerebral edema, which reduces the attenuation of gray matter on unenhanced CT scans and results in a loss of distinction between gray and white matter [17, 18]. In severe HIE, there may be inversion of the normal attenuation relationship between gray and white matter—a phenomenon known as the reversal sign.

Based on these observations, it has been proposed that the ratio of attenuation of gray matter to attenuation of white matter—termed the gray–white matter ratio (GWR)—might be quantified and used as an objective prognostic parameter to predict neurologic outcome [14] (Fig. 1).

Fig. 1
figure 1

Examples illustrating different gray–white matter ratios (GWR) in comatose patients resuscitated after cardiac arrest. a Illustration of the normal attenuation relationship between gray and white matter (GWR = 1.2). The attenuation distinction is lost in b (GWR = 1.0) and reversed in c (GWR = 0.8)

For GWR estimation, investigators measure the attenuation of gray and white matter in Hounsfield units (HU) in up to 16 regions of interest (ROIs) placed in different brain areas. It has the advantage of being independent of the reader’s experience evaluating HIE on CT scans and to provide an objective marker with high interrater reliability. However, the current method is time consuming and hence difficult to integrate into a real clinical setting. We therefore investigated whether the GWR estimation method can be simplified without a loss of prediction reliability. To this end, we evaluated four methods of determining GWR, using different ROIs, and compared their prognostic accuracy.

Material and Methods

Study Population

After having obtained ethical approval for our retrospective analysis, we searched our database for eligible patients. Our search retrieved 111 patients with cardiac arrest who were treated by mild hypothermia and had a CT scan within 7 days of resuscitation in the period from December 2005 through October 2011. In an earlier study of the same patient population, we assessed the prognostic value of GWR determined compared with NSE and SEP for predicting poor outcome [13].

The reasons for performing CT scans were unrelated to the study. Most patients underwent CT to rule out a primary intracranial event (e.g., subarachnoid hemorrhage) or intracranial complications after coronary angiography involving anticoagulation. All CT examinations were performed on one of three CT scanners from the same manufacturer using a standardized protocol (120 kV, 5-mm section thickness; GE LightspeedPro 16, LightSpeed Ultra, LightSpeed CVT—GE Healthcare, Little Chalfont, UK). Patients whose CT datasets were not suitable for GWR determination were excluded from analysis: those with only postcontrast CT images available (n = 3) and those with hydrocephalus and shunt artifacts (n = 3), severe motion artifacts (n = 2), intracerebral hemorrhage (n = 3), larger old ischemic lesion (n = 1), and massive calcification of the basal ganglia (n = 1). Finally, 98 patients could be included in the analysis (Table 1).

Table 1 Demographic data given as absolute numbers and percentages or medians and interquartile ranges. (From Scheel et al. [13], with permission)

Clinical outcome was assessed using the Cerebral Performance Category score (CPC) [19]. The CPC was assessed by the treating physician in the intensive care unit at the time of transfer of the patient to a normal ward or discharge for rehabilitation. Good outcome was defined as a CPC of 1–2, and poor outcome, as a CPC of 3–5. Table 1 summarizes the demographic data of the study population.

GWR Determination

ROIs were independently placed by two readers (M. Scheel with 5 years and A. Gentsch with less than 1 year of training in neuroradiology). Both readers were blinded to outcome and other patient data and to ROI placement by the other reader.

ROI measurements were performed using a standardized protocol [14] and included bilateral manual placement of circular ROIs (area = 0.1 cm2) in the following areas: caudate nucleus, corpus callosum, putamen, posterior limb of internal capsule (PIC), medial cortex, the white and gray matter at the level of the semioval center, and the white matter and the medial cortex in the high frontoparietal area (Fig. 2).

Fig. 2
figure 2

Circular regions of interest were placed bilaterally in the following regions: 1 corpus callosum, 2 caudate nucleus, 3 putamen, 4 posterior limb of internal capsule, 5 and 6 cortex and white matter at the centrum semiovale level, and 7 and 8 white matter and cortex at the high convexity level)

The mean HU values within these ROIs were used for calculating 4 different GWR values: (1) GWR-AV using all 16 bilateral ROIs, (2) GWR-BG using 8 ROIs at the level of the basal ganglia, (3) GWR-CO including cortical ROIs and white matter ROIs in the centrum semiovale, and (4) GWR-SI, a simplified GWR estimation method using only 4 ROIs. Table 2 shows a summary of the different ROIs and their usage in the respective GWR estimates.

Table 2 Overview of regions of interest (ROIs) placed for calculating gray–white matter ratio: putamen (PU), caudate nucleus (CN), medial cortex at the level of the semioval center (MC1), medial cortex in high frontoparietal area (MC2), posterior limb of internal capsule (PIC), corpus callosum (CC), white matter (WM1) in semioval center (WM1), and high frontoparietal area (WM2). Crosses/gray fields indicate ROIs used for the respective gray–white matter ratio estimation method

Statistical Analysis

A descriptive analysis of the parameters was performed for all data using IBM SPSS Statistics (V 2.0). Interrater reliability was assessed by calculating intraclass correlation (ICC).

Prognostic parameters should ideally have 100 % specificity for predicting poor outcome, as in the context of HIE, these parameters are used to decide about limiting further intensive care treatment after cardiac arrest. We therefore determined and compared the sensitivities of the four methods at cutoff values providing 100 % specificity. In addition, we performed receiver-operating characteristic (ROC) curve analysis and compared the area-under-the-curve (AUC) values of the four methods [20]. These analyses were performed using MedCalc (version 11.0, MedCalc Software bvba, Belgium). Statistical significance was assumed at a p-value < 0.05.

Results

ICC for total interrater reliability (all values) was very good, with a value of 0.97. GWR-CO had the lowest ICC of 0.76. All other ICC values were very good: GWR-AV = 0.94, GWR-BG = 0.95, and GWR-SI = 0.93.

The majority of all CT scans were performed within the first 6 h. Median time from arrest to CT was 5 h, with the interquartile range from 2.0 to 24.3. Table 3 shows the mean HU values of all ROIs separated in patients with good and poor outcome.

Table 3 Absolute attenuation values of different ROIs in Hounsfield units. (From Scheel et al. [13], with permission)

All GWR methods had AUC values between 0.75 and 0.81. ROC–AUC comparison of the simplest method, GWR-SI, with the other methods revealed no statistically significant differences (Fig. 3 and Table 4). On a descriptive level, the AUC value of GWR-SI was even higher (AUC = 0.810) than that of all other methods. At 100 % specificity, GWR-SI and GWR-BG had the highest sensitivity (44.3 %) among all methods.

Fig. 3
figure 3

Receiver-operating characteristic curves for GWR-AV, GWR-BG, GWR-CO, and GWR-SI (AUC indicates area under the curve))

Table 4 Results of receiver-operating characteristic curve comparison

Discussion

The possible relationship between the GWR and clinical outcome after cardiac arrest has been investigated in several studies [1116]. We investigated whether the current GWR estimation method can be simplified without a loss of prediction reliability. Our findings suggest that a simpler method for GWR quantification from 4 instead of 16 ROIs has a similar prognostic validity and an interrater reliability. It must be noted, however, that the different GWR estimation methods have different optimal cutoff values. For GWR-AV and GWR-BG, the cutoff value at 100 % specificity was 1.16; for GWR-SI, the cutoff value was 1.11. Other investigators reported similar cutoff values of 1.15–1.21 [11, 14]. Our analysis corroborates earlier reports [14] suggesting that the GWR determined in the cortical regions (GWR-CO) is less accurate in predicting clinical outcome. There are two possible explanations for this observation: either the basal ganglia are more severely damaged by hypoxia, or measurement of HU values in the cortical gray matter is less reliable due to partial volume effects. The second assumption is supported by the fact that we found here the lowest interrater reliability (ICC = 0.76).

The change in the attenuation ratio of gray-to-white matter is due to a drop in HU values in gray matter in patients with brain edema. The greater vulnerability of gray matter to hypoxic events can be explained by its higher metabolic rate. The basal ganglia, in particular, have a high metabolic turnover and are among the first structures to be affected under hypoxic conditions [21, 22]. Also, it appears that edema is more severe in the area of the basal ganglia in general, and in the putamen in particular [23, 24]. The putamen is anatomically relatively well defined, allowing its reliable identification on CT. We therefore chose the putamen as the measurement site in gray matter for our simplified GWR estimation method. For the same reason, we chose the PIC as the measurement site in the white matter. The white matter, such as the PIC, appears to be nearly unaffected during the acute stage of HIE, and HU values are not different between patients with good and poor outcome [13].

Our study has several limitations. Although we investigated a large number of patients compared with earlier studies, the number is still relatively small for a comparison of different diagnostic methods. Because of the small number, our study population might not have included rare cases of good clinical recovery despite the development of brain edema in the early phase after the event. Therefore, the 100 % specificity for the cutoff values we identified must be interpreted with caution and needs to be confirmed in further studies. Another limitation is the greater variability resulting from the use of three different CT scanners. In contrast, our cutoff values for GWR-AV are very similar to values identified in earlier studies [11, 14]. Therefore, our results appear to apply across different CT systems.

A previous study demonstrated also a change of GWR values over time [13]. Unfortunately the majority of all CT scans in our study were performed within 6 h, and the time distribution allowed no additional time-dependent analysis. Future studies should address the question of the optimal time point to obtain the CT scan for GWR estimation.

In our study, GWR was determined retrospectively and, therefore, was not used to predict outcome or make therapeutic decisions, but treating physicians were informed about the CT findings (e.g., whether a relevant edema was identified by the radiologist), and it is conceivable that their therapeutic decisions in HIE patients might have been influenced by the presence of severe brain edema.

It is known that the prognostic reliability of diagnostic tests that were established for patients without hypothermia therapy are influenced by the hypothermia treatment. Therefore, we implemented at our clinic a waiting period of at least 7 days and combine the information of several tests before a decision to limit intensive care is made. If several indicators point to a poor outcome—as suggested by a prognostic algorithm established by interdisciplinary consensus—limiting treatment will be discussed with the patient’s family [25]. As in earlier studies, we cannot definitely rule out the problem of self-fulfilling prophecy by limiting treatment in those patients who are expected to have poor outcome. In our opinion, therefore, this restricted algorithm for treatment limitation makes a large bias by a self-fulfilling prophecy unlikely.

In summary, the results presented here show that the prognostic accuracy of a simplified algorithm for determining GWR is virtually the same as that of the standard method, which is far more complex and time consuming. The presented method has a high interrater reliability and could be be readily implemented in a real clinical setting, although further studies in larger patient populations are desirable before a definite recommendation for routine diagnostic use can be made.