FormalPara Take-home message

In adult patients who are comatose after return of spontaneous circulation (ROSC), the following indices predict good neurological outcome (no, or mild to moderate neurological disability) with > 80% specificity and > 40% sensitivity in most studies:

  • a withdrawal or localisation motor response to pain immediately or at 72–96 h after ROSC

  • normal blood values of neuron specific enolase (NSE) at 24h–72 h after ROSC

  • a short-latency somatosensory evoked potentials (SSEPs) N20 wave amplitude > 4 µV or a continuous background without discharges on electroencephalogram (EEG) within 72 h from ROSC

  • absent diffusion restriction in the cortex or deep grey matter on diffusion weighted imaging (DWI) of brain magnetic resonance imaging on days 2–7 after ROSC

Most studies had moderate or high risk of bias, mainly from selection or confounding.

Introduction

About 80% of patients who are successfully resuscitated from cardiac arrest are comatose on arrival at hospital [1] because of post-cardiac arrest brain injury (PCABI) [2], and their prognosis is uncertain. An accurate prediction of poor neurological outcome in these patients is important to avoid pursuing futile treatments in patients with irreversible PCABI.

Much information is available about predictors of poor neurological outcome after cardiac arrest, while little is known about predictors of good outcome. In 2020, on behalf of the European Resuscitation Council (ERC) and the European Society of Intensive Care Medicine (ESICM) we systematically reviewed the evidence concerning predictors of poor neurological outcome [3]. This review was used to inform the 2021 ERC-ESICM Guidelines on Post-Resuscitation Care [4, 5]. Based on these guidelines, poor neurological outcome, defined as severe neurological disability, persistent vegetative state, or death at 1 month or later after cardiac arrest is predicted to be likely when at least two unfavourable signs from clinical examination, biomarkers, electroencephalography (EEG), somatosensory evoked potentials (SSEPs), brain computed tomography (CT) or brain magnetic resonance imaging (MRI) are present. However, when none or only one of these predictors is present, the prognosis remains indeterminate, and further observation is needed.

Predicting good neurological outcome after cardiac arrest has several advantages. Firstly, it can reduce uncertainty in prognostication. Recent evidence shows that the prognosis remains indeterminate in about half of cases when using an algorithm based uniquely on prediction of poor neurological outcome [6, 7]. Secondly, detecting a chance of good neurological recovery can reassure patients’ relatives and inform the discussions they have with clinicians. Thirdly, it may help inform decisions about escalation of organ support. Finally, it may counterbalance a falsely pessimistic signal from predictors of poor neurological outcome. No single test predicts poor outcome with absolute certainty [3]. The 2021 guidelines for post-resuscitation care acknowledge these limits and suggest using caution and repeating the assessment when discordance is present, i.e., if signs indicating a poor outcome coexist with signs indicating a potential for recovery. However, there are no systematic reviews on predictors of good neurological recovery to support their incorporation in guidelines.

The aim of this study is therefore to systematically review the predictors of good neurological outcome in adults who are admitted to hospital in a coma after resuscitation from cardiac arrest. This review complements the 2020 review on predictors of poor neurological outcome, and is part of the same project, which has been endorsed by both the ERC and ESICM on June 19, 2019.

Methods

The methods of this review have been previously reported elsewhere [3] and will be summarised here.

The review protocol has been registered on PROSPERO (CRD 42019141169) on September 11, 2019. For data reporting, we adopted the recommended format for diagnostic test accuracy (DTA) [8] of the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement [9].

Population, intervention, comparator, outcome, time frame and study design (PICOST) framework

Based on the PICOST template, the review question was formulated as follows: in adult patients who are comatose following resuscitation from cardiac arrest in all settings (P), does the use of predictors based on clinical examination, electrophysiology, serum biomarkers or neuroimaging (I) recorded within 1 week after cardiac arrest (T), allow accurate prediction of good outcome (O)? We selected prognostic accuracy studies (S), i.e., those in which sensitivity and specificity of the index test was reported. The accuracy of the index test was assessed by comparing the predicted outcome with the final outcome, which represented the comparator (C).

Eligibility criteria

Patient population

We considered for inclusion all studies on adult (≥ 16 years) patients who were comatose following successful resuscitation from cardiac arrest. Patients defined as unconscious, unresponsive, and/or having a Glasgow Coma Score (GCS) ≤ 8 at the time of study enrolment were considered as comatose. Studies on patients in hypoxic coma from causes that did not lead to cardiac arrest (e.g., respiratory arrest, carbon monoxide intoxication, drowning, hanging) were excluded.

Interventions (Index tests)

This review includes prognostic accuracy studies. In these studies, sensitivity and specificity measure how well the results of an index test correctly identifies patients who will subsequently develop or not develop the target condition, respectively [10]. Four types of index tests were assessed: clinical examination, biomarkers, electrophysiology, and neuroimaging. These predictor categories are defined in the Appendix 1 of the Electronic Supplementary Material (ESM).

For assessment of prognostic accuracy, in the previous review we considered an abnormal result of the index test indicating a likely poor outcome as a positive result. In the present review, we considered the test results indicating a likely good outcome as a positive result. For predictors whose results are expressed as a continuous variable in a spectrum of values, e.g., the blood values of a biomarker, this usually corresponds to test values within the normal range; for predictors whose results are expressed as a categorical variable, e.g., the presence of specific EEG patterns, this corresponds to the result categories that are closer to normality. We did not include in this review the predictors whose results are dichotomised in only two categories, e.g., present vs. absent pupillary light reflex, because their accuracy for prediction of good outcome corresponds to the inverse of their accuracy for prediction of poor outcome (i.e., the specificity for prediction of good neurological outcome corresponds to the sensitivity for prediction of poor neurological outcome, and vice versa). Therefore, the accuracy of these indices to predict good outcome was already indirectly reported in our previous review on prediction of poor neurological outcome [3].

Outcome

In compliance with current recommendations [10, 11], good functional outcome was defined as absent or mild to moderate neurological disability, corresponding to a Cerebral Performance Category (CPC) [12] 1 or 2 or a modified Rankin Scale score (mRS) [13] from 0 to 3. Studies adopting CPC 1–3 or mRS 0–4 as a threshold for good outcome were also considered for inclusion, but the certainty of their evidence was lowered because of indirectness. Wherever possible, the study authors were contacted to enable recalculation of test accuracy with a CPC 1–2 or mRS 0–3 threshold. Current standards [11] acknowledge that improvement frequently continues to 6 months and beyond and suggest measuring neurological outcome after 90 days and later. However, the opposite is not true, i.e., patients who recover consciousness after the arrest only very rarely deteriorate neurologically later [14, 15]. For this reason, we included studies in which the predicted outcome was measured earlier than hospital discharge or 1 month after cardiac arrest, e.g., at discharge from intensive care unit (ICU).

Study design

We included only studies where sensitivity and specificity could be calculated, i.e., those where the 2 × 2 contingency table of true/false negatives and positives for prediction of good outcome was reported or could be calculated from reported data. Studies where the test result was expressed on a continuous scale or ordinal variable were included only if a threshold allowing dichotomisation and therefore calculation of a contingency table was provided; when multiple thresholds were available, these were reported.

Study type

We considered for inclusion all clinical studies published as full-text articles. No language restriction was imposed. We excluded reviews, case reports, studies including fewer than ten patients, letters, editorials, conference abstracts, and studies published in abstract form. In case of overlapping populations on the same index test, we included the study with the larger population.

Search strategy

MEDLINE via PubMed, EMBASE, Web of Science and the Cochrane Database of Systematic Reviews were searched using the strings included in ESM Table E1. The reference list of relevant papers and narrative reviews were searched for additional studies. The automatic alert system of the databases was activated to identify further studies published during the process of data extraction and analysis. No language or time restrictions were imposed. Our last search was on October 13, 2021.

Records extracted were entered into a reference management software (EndNote X6, Thomson Reuters, Inc.) to be screened. Two authors (SC, SD’A) performed an independent eligibility assessment. Disagreements were resolved by consensus.

Data extraction

Data extraction was performed by two authors (SC, SD’A) using a standardised form. When needed, the authors of the original studies were contacted to retrieve missing data. The list of the extracted variables is in the ESM Appendix 1.

Evidence appraisal

Two study authors (CH, MK), rated the methodological quality of the included studies using the Quality in Prognosis Studies (QUIPS) tool [16]. The tool (ESM Appendix 1) was customised by adding some specific items pertinent to neuroprognostication after CA, such as self-fulfilling prophecy or confounding from sedation. In studies on prognostication of poor outcome, self-fulfilling prophecy is a bias that occurs when the treating team is not blinded to the results of the outcome predictor so that withdrawal of life sustaining treatment (WLST) decisions are influenced or based on the predictor being investigated. Given the importance of the risk of self-fulfilling prophecy [10], the bias was graded as high when the index test was not assessed blindly or when the index test was part of the WLST criteria, regardless of the presence of other limitations. Grading was performed for individual predictors rather than the study as a whole because the risk of bias may be predictor-dependent and may differ between predictors within a study. In case a quality assessor was also a co-author of the study, quality assessment was assigned to another assessor. Disagreement between quality assessors were resolved by consensus. In case of persisting disagreement, this was resolved by a senior author.

Data analysis

According to the method described by Wan et al. [17] demographic data (e.g., age) were merged where they were presented divided by subgroups. The contingency two-by-two tables reporting true positives (TP), false positives (FP), false negatives (FN), and true negatives (TN) were extracted from each study. Sensitivity and specificity, along with their 95% confidence intervals (95% CIs) were calculated using the Clopper–Pearson method [18]. Coupled forest plots of sensitivity and specificity at comparable time points and outcome definitions were plotted and the presence of heterogeneity was detected by visual assessment. Publication and reporting bias were not assessed, because of the lack of specific recommendations in these types of studies [19].

Results

Study selection

The initial search identified 2014 records from online databases, while 147 records were identified through forward search. After duplicate removal and abstract screening, a total of 304 studies were considered for full-text assessment, of which 37 fulfilled inclusion criteria (Fig. 1). A list of excluded studies with reasons is reported in the ESM Table E2.

Fig. 1
figure 1

Flowchart illustrating the process of selection of the studies

Study characteristics

The characteristics of the 37 included studies (n = 7149 patients, males 74.5%) are listed in Table 1. Twenty-four studies (64.9%) were prospective, while 13 (35.1%) were retrospective.

Table 1 Study characteristics

Temperature control was used in 100% of patients in 32/37 (86.5%) studies, while five studies reported a percentage ranging from 41.7% and 89.2%. Two studies [20, 21] were included for clinical examination, 4 for biomarkers [22,23,24,25], 26 for electrophysiology [26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51] and 3 for imaging [52,53,54], while 2 studies [55, 56] were included for more than one prognostic modality. In all but three studies [23, 44, 53] good neurological outcome was reported as CPC 1–2 (mRS 0–3). The neurological outcome was assessed at 6 months in most studies (25 out of 37 studies, 67%), at 3 months in 3 studies [20, 38, 43], at 1 month in 1 study [52], at hospital discharge in 5 studies [27, 40, 41, 46, 47] and at ICU discharge in 3 studies [23, 34, 35].

Sensitivity and specificity for index tests based on clinical examination, biomarkers, electrophysiology, and imaging are reported on Tables 2, 3, 4, 5 and 6. QUIPS evidence profiles of included studies and predictors are reported in ESM Appendix 2. Almost all individual predictors were assessed as being at moderate or high risk of bias, mainly in the domain of study participation (e.g., studies conducted with selection bias) and of confounding (e.g., interference from sedation or lack of blinding). Pooling was not performed because of heterogeneity, especially in the domain of timing of assessment, for both index test and outcome. Coupled forest plots of sensitivity and specificity are reported as ESM Figures.

Table 2 Accuracy of clinical examination
Table 3 Biomarkers
Table 4 Somatosensory evoked potentials (SSEPs)
Table 5 EEG
Table 6 Imaging

Clinical examination

In one study [20] on 302 OHCA patients, a GCS motor score of 4 or 5 (withdrawal or localisation) on hospital admission was present in 23/302 patients, among whom 20 had good neurological outcome at 90 days (98% specificity). In a post hoc analysis of the TTM trial [21] the same sign assessed at 72–96 h from ROSC predicted good outcome with 84% specificity and 77% sensitivity (Table 2).

Biomarkers

Blood neuron-specific enolase (NSE) values within the upper limit of the normal range (17–18 µg/L) at 24 h [22, 25] and 48 h [22, 25] predicted good neurological outcome at 6 months with 85–88.7% and 83.6–89.4% specificity, respectively (Table 3.1). Sensitivity ranged from 26 to 57.8%. At 72 h, normal NSE values predicted good outcome with 80.4% specificity and 74.9% sensitivity in one large multicentre study [22]. In another large study (1053 patients) [23] the specificity of a normal NSE value at 72 h for good outcome at ICU discharge was 96.6% with only 32.8% sensitivity. However, in that study the definition of good outcome included severe neurological disability (CPC 3).

In one study [25], blood values of the glial protein S-100B below 0.61 µg/L on admission predicted good neurological outcome with 90% specificity. However, the sensitivity was only 31%. The same results were observed in that study with an S-100B value below the normal threshold of 0.12 µg/L at 24 h. In one large multicentre study [22] the specificities of normal values of S-100B for good outcome at 24 h, 48 h and 72 h were 74.1%, 71.9% and 63.4%, respectively (Table 3.2). Sensitivity ranged from 69.3 and 81.2%.

In one study [22], the ability of normal blood values of neurofilament light chain (NFL), glial fibrillary acidic protein (GFAP), tau protein, and ubiquitin carboxy-terminal hydrolase-L1 (UCH-L1) at 24h, 48 h, and 72 h from ROSC to predict good outcome were investigated. The specificities of NFL and GFAP were the highest, above 95% at all time points, but NFL had a higher sensitivity than GFAP (50.8–97.1% vs. 35.3–44.4%) and of the other biomarkers at comparable specificities. The specificity of tau protein ranged from 93 to 95%, while UCH-L1 had the lowest specificity, ranging from 70.3 to 85.2% (Table 3.3–3.6).

In another study [24] the NFL thresholds for 100% specificity for good outcome were investigated. That threshold was < 30 pg/mL at 24 h and 48 h, and < 27 pg/mL at 72 h. Corresponding sensitivities ranged between 68.6% and 78.6% (Table 3.3).

Electrophysiology

Short-latency somatosensory evoked potentials (SSEPs)

The predictive value of a high amplitude of the N20 wave of the median nerve SSEP for predicting good neurological outcome was reported in four studies [34, 36, 45, 56] at time points ranging from 12 to 96 h after ROSC (Table 4.1). The amplitude was calculated in microvolts (µV) as the difference between the voltage of the N20 negative wave and the voltage of the following positive P25 wave (N20/P25), but in one study [34] the baseline N20 amplitude was occasionally used if it was larger than the N20/P25 difference. The largest amplitude of the two sides was used, except in one study [36], where the smallest amplitude was used.

In one study [45], an amplitude threshold above 4 µV at 12 h, 24 h, and 72 h after ROSC predicted good outcome with specificities between 86 and 91%, with 48–51% sensitivity. In two other studies [34, 36] specificities were 96% for a threshold above 3.6 µV at 48-72 h, and 92% for a threshold above 4.2 µV at 24–96 h, with 32% and 28% sensitivity, respectively. Higher amplitude thresholds above 5 µV and up to 10 µV were investigated [34, 45, 56] yielding specificities up to 100%. However, results were less consistent. Sensitivities ranged from 9 to 37%.

In one study [35] the predictive value of high-frequency oscillation [HFO] bursts of SSEP to predict good neurological outcome from 24 to 96 h after ROSC was investigated (Table 4.2). The presence of an early or late HFO burst predicted good outcome with 60.1% and 80.4% specificity, respectively. Large-amplitude, late HFO bursts (> 70 and > 120 µV) had the highest specificities for good neurological outcome (87.1% and 95.7%, respectively). Corresponding sensitivities were 26% and 16%.

EEG

In total, 15 studies used full-montage EEGs, either routine EEG or continuous EEG monitoring, and defined patterns predictive of good outcome from original EEG-signals by conventional visual assessment. In most studies (10/15; 66%), the EEG patterns were described based on the 2012 American Clinical Neurophysiology Society’s (ACNS) standardised critical care EEG terminology [57]. The recording time-points and the combinations of patterns chosen to define EEG as favourable varied across studies. We aggregated the EEG predictors according to the similarities among favourable patterns (Table 5A–C).

Continuous normal-voltage background without discharges

Ten studies included both a continuous or nearly continuous background and absence of superimposed discharges when defining favourable (‘benign’) EEG-patterns.

Six of these studies (Tables 5A.1a, b) also included a normal-voltage background criterion (≥ 20 µV) and followed classifications proposed by Westhall et al. [58] or by Hofmeijer et al. [37]. The approach to superimposed discharges differed between these two classifications. In the four studies [26, 28, 32, 51] based on the Westhall classification (Table 5A.1a), superimposed discharges were defined as unequivocal electrographic seizures, or abundant (> 50%) periodic discharges or abundant rhythmic spike-and-wave using the ACNS terminology, while in the three studies [32, 37, 48] (Table 5A.1b) based on the Hofmeijer classification, superimposed discharges were defined as no evolving seizures or generalised periodic discharges with no reference to the ACNS terminology. When applying these criteria at an early time-window during ongoing sedation and temperature control (12-24 h after ROSC) the sensitivities and specificities were similar between the two classifications. At a later time-window (beyond 48 h after ROSC) specificities were generally higher in studies using the Westhall classification, and ranged from 80 to 100%, vs. 51–80% in studies using the Hofmeijer classification. Two studies [37, 48] using continuous EEG monitoring assessed the predictive value of a favourable EEG according to the Hofmeijer classification at several time-points (12, 24, 48, and 72 h after ROSC). In these studies, the specificity for good outcome prediction gradually decreased from 88 to 51% over time.

Two studies using the Westhall classification [32, 51] used more restrictive criteria, adding absence of a reversed antero-posterior gradient and reactivity to external stimuli (Table 5A.1a). Specificity for good outcome was particularly high in these studies. One study [32] compared the restrictive Westhall classification (Table 5A.1a) with the Hofmeijer classification (Table 5A.1b) in the same population and showed that specificities were higher (90–100% vs. 80–97%) and sensitivities were lower (39–46% vs. 57–63%) with the Westhall classification.

Continuous background (normal-voltage or low-voltage) without discharges

In four studies, including two from the same group [30, 31, 43, 45], the definition of continuous background with no superimposed discharges was not restricted to a normal-voltage background but also included low-voltage (10–20 µV) tracings (Table 5A.2). In one study [43] reactivity was a required criterion for defining EEG as favourable. Sensitivities and specificities were comparable with those of studies in the groups 5A.1.

Continuous or discontinuous normal-voltage background without discharges

Two studies [29, 47] defined a normal-voltage EEG with no discharges as favourable, but also included a discontinuous background (Table 5A.3). In one study [47], specificity for good outcome was 97% within 72 h from ROSC, which decreased to 84% if superimposed discharges were included among favourable EEG patterns. In the other study [29], a discontinuous EEG background was defined as favourable if reactivity was present. Specificity was 77% at days 0–5 from ROSC. Both studies had high (> 70%) sensitivity for good outcome.

Other favourable EEG patterns

Other favourable EEG patterns or grading scales (Table 5B) were mostly based on the dominant frequencies of the background activity (theta or alpha vs. delta) or on reactivity to stimulation. Their definitions varied, and none of these studies adopted the ACNS terminology. In one study [27], favourable EEG was defined as continuous background, but a definition was not provided. No study excluded EEGs with superimposed discharges from favourable patterns. Specificities ranged from 64 to 100%, with 25–96% sensitivity.

Continuous background defined with reduced electrode montages or aEEG.

Some studies (Table 5C) investigated the predictive value of a continuous normal-voltage background defined from quantitative trend analysis using amplitude-integrated EEG (aEEG) [59, 60], or defined from original-EEG signals using reduced electrode montages [44, 50] at a time ranging from 6 to 72 h after ROSC. Standard ACNS definitions for continuity or amplitude were not applicable. Specificity ranged from 56 to 96%, with sensitivities ranging from 53 to 100%.

In one study [33], the original EEG tracing of a bispectral index (BIS) monitor recorded between 6 and 48 h from ROSC from four frontotemporal channels was analysed. A slow diffuse theta and/or delta activity, as opposed to epileptiform, burst-suppression, or suppression (< 5 µV), predicted good neurological outcome with 79% specificity at all time points, with 55%-86% sensitivity.

EEG-derived indices

Bispectral index (BIS)

The predictive value of BIS (Table 5D.1) was evaluated in three studies [40, 42, 46]. In two studies, a BIS value greater than 21 at 1–3 h [42] after ROSC or 24 at 3–6 h after ROSC [46] predicted good neurological outcome with 94% and 86% specificity, respectively. In another study [40], the ability of BIS to predict good neurological outcome at 24 h from ROSC was assessed at different BIS thresholds. Specificity ranged from 41% at BIS 30 to 93% at BIS 60. Sensitivities ranged from 95 to 20%, respectively.

Cerebral recovery index

One study [49] investigated the predictive value of cerebral recovery index (CRI) (Table 5D.2). In that study, a CRI above 0.57 at 18 h or 0.69 at 24 h predicted good neurological outcome with 100% specificity.

Imaging

Brain CT

The ability of brain CT to predict good neurological outcome was assessed in one study [52]. Hypoxic-ischaemic changes due to cardiac arrest were quantified using the density ratio between the grey and white matter (GWR), and the quantitative regional attenuation (QRA) score at 2 h after ROSC. QRA is the sum of hypoattenuations due to ischaemic damage in 12 parenchymal areas on brain CT, and it is calculated bilaterally, with a maximum score of 24 (lower scores indicate fewer hypoattenuations). In that study, both a GWR equal to or above 1.25 or a QRA ≤ 5 predicted good neurological outcome at 1 month with 77% specificity and 25% sensitivity (Table 6.1). The study also assessed the Alberta Stroke Program Early CT Score—bilateral (ASPECTS-b) as a predictor. ASPECTS-b provides a semiquantitative assessment of hypodense ischaemic changes on brain CT in the middle cerebral artery territories bilaterally. The score is calculated by subtracting 1 point per affected area from the maximum score of 20 points (higher scores indicate fewer changes). In that study, an ASPECTS-b ≥ 15 predicted good neurological outcome with 89% specificity and 75% sensitivity.

Brain MRI

Brain MRI for the prediction of good neurological outcome was assessed in four studies [53, 54, 56, 59]. In one study [56] absence of restricted diffusion on diffusion-weighted imaging (DWI), or the presence of a single focus of restricted diffusion immediately after rewarming, predicted good outcome with 95% and 92% specificity, respectively (Table 6.2). In another study [54], the absence of restricted diffusion was assessed at two time points, 77.6 h and 3.1 h after ROSC. The study showed that a later MRI assessment predicted good outcome with higher specificity (93% vs. 60%), while sensitivity was equally high (100%). A study [59] conducted at a similar later time point (74.5 h after ROSC) confirmed a high specificity (93%) and sensitivity (92%) of absent DWI lesions for predicting good outcome. Another study [53] assessed the absence of DWI or fluid-attenuated inversion recovery (FLAIR) lesions within 8 days from ROSC at three different anatomical sites: cortex, deep grey nuclei, and cerebellum and pons. Specificity for predicting good outcome was higher for absence of lesions in the cortex (80%) and deep grey nuclei (87%) vs. the brain stem and cerebellum (20%).

Finally, in one study [59], phase-images of the T2-weighted gradient-recalled echo (GRE) sequence were assessed at a mean of 74.5 h after ROSC. GRE was measured in three venous structures of the brain, the superior sagittal sinus, the thalamostriate veins, and the cortical veins to assess changes in cerebral venous oxygen content. The absence of GRE abnormalities predicted good outcome with 100% specificity and 75% sensitivity (Table 6.3).

Discussion

This is the first systematic review to be conducted on predictors of good neurological outcome after cardiac arrest. Our results showed that in adult patients with PCABI, a GCS motor score 4 or 5 immediately or at 72–96 h after ROSC, normal NSE blood values at 24–72 h after ROSC, and absent diffusion restriction in the cortex or deep grey matter on MRI on days 2–7 after ROSC predicted good neurological outcome with more than 80% specificity. The same specificity was observed in almost all studies for a favourable EEG background (continuous, normal voltage, or both) with no abundant discharges at 12–72 h after ROSC. A SSEP N20 wave amplitude above 4 µV on at least one side predicted good outcome with > 85% specificity at 12–96 h after ROSC. However, thresholds varied across studies.

Clinical examination

In patients who are comatose after resuscitation from cardiac arrest, sedation and paralysis are widely used to facilitate ventilation and control body temperature. Sedation may interfere with clinical examination potentially reducing its accuracy for predicting good neurological outcome. However, in both studies we included, clinical examination was performed off sedation, which is usual clinical practice. In one large study [20] motor response was assessed on hospital admission before the start of targeted temperature management (TTM) at 32–34 °C and before any sedative or neuromuscular blocking drug was administered. In that study, a withdrawal or localising response to pain (motor score of the Glasgow Coma Scale [GCS-M] 4–5) was a relatively uncommon but highly specific (98%) predictor of good neurological outcome. If confirmed by further studies, this sign may be considered to screen patients destined to neurological recovery early after arrest and potentially rationalise post-resuscitation interventions.

In the second study [21] motor score was investigated in 342 TTM-treated patients who survived up to day 4 but were still unconscious after suspension of sedation at 36 h from ROSC. In these patients, specificity of a GCS-M 4–5 for good outcome was lower (84%) than in the previous study. One possible reason for this discrepancy is that the second study included patients with prolonged unconsciousness, which itself suggests a greater PCABI severity and poorer neurological recovery [61, 62]. This is the population of patients in whom prognostic uncertainty is maximal in ICU, and who are the target of currently recommended prognostic algorithms [5]. In these patients, combining multiple prognostic tests may reduce uncertainty. In the study above [21], among 39 patients (Table 2) in whom GCS-M 4–5 yielded a falsely optimistic signal, 11 (28%) had other prognostic signs suggesting a poor outcome, vs. 2/75 (3%) patients with true positive prediction of good outcome.

Biomarkers

NSE is the most studied biomarker of post-cardiac arrest brain injury. In the current ERC-ESICM guidelines for post-resuscitation care, NSE values above 60 µg/L at 48 h and 72 h from ROSC are recommended as poor outcome predictors after cardiac arrest [5]. Unfortunately, achieving a consistent NSE threshold for predicting poor outcome with 100% specificity is very difficult because of the presence of outliers who achieve neurological recovery despite very high NSE values [63]. This is often due to release of biomarkers from extracerebral sources [64]. This important bias does not occur with good outcome prediction, for which normal values of biomarkers are used. However, normal blood values of biomarkers did not predict good outcome with 100% specificity either. One of the reasons for this was death from non-neurological causes. In a study we included, 3/14 (21%) of patients who died despite NSE values below 17 µg/L did so from a secondary deterioration after having recovered consciousness [23]. In an observational multicentre study [65] of 4646 comatose resuscitated patients, this ‘death after awakening’ occurred in 4% of cases.

NSE predicted good outcome with a specificity consistently above 80%, even when measured at just 24 h after ROSC. This trend is different from that observed for poor outcome prediction, where the best accuracy is achieved at 48–72 h from ROSC [3], which coincides with the peak blood NSE values [64].

S-100B is a biomarker released from astrocytes and is less well documented than NSE as an outcome predictor after cardiac arrest. S-100B was investigated in two studies included in our review. In the larger study (650 patients at study inception) [22], the specificity of S100B was lower than that of NSE at the corresponding time points of 24, 48, and 72 h after ROSC in the same population. In the smaller study [25], the accuracy of S-100B was comparable to that of NSE. Interestingly, in that study S100-B at hospital admission had 89% specificity for good outcome prediction, with 31% sensitivity. Such an early prediction is consistent with the fact that S-100B rises more rapidly, reaching its peak blood values within 24 from ROSC, and has a shorter half-life compared with NSE [64]. This could make S-100B a potential test for good outcome prediction early after arrest. However, these results need confirmation from other studies.

Other biomarkers, such as NFL, GFAP, Tau protein, and UCH-L1 have been recently investigated as an alternative to NSE or S100-B. Of these, NFL and GFAP had the highest specificities in our review. In one study [22], NFL had the highest overall accuracy when compared with other biomarkers after cardiac arrest. In another study [24] NFL showed a sensitivity above 50% even at the low thresholds corresponding to 100% specificity for good outcome. However, since only a small fraction of NFL released from the injured brain enters the blood [66], measurement of its blood values requires research grade assays, whose limited availability at present represents a challenge for a wide application of NFL in clinical practice.

Electrophysiology

Somatosensory evoked potentials

The accuracy of a bilaterally absent SSEP N20 wave for predicting poor neurological outcome is well known [67,68,69]. However, the ability of SSEP to predict good outcome has only recently received attention, despite early reports [70] establishing a potential relationship between SSEP N20 amplitude and outcome after cardiac arrest. The quantitative analysis of SSEP represents a remarkable change from the previous dichotomous interpretation of SSEPs, based exclusively on the presence or absence of the N20 wave. Moreover, these studies showed that a very low-amplitude N20 wave excludes good outcome with high likelihood [45, 56], therefore narrowing the area of uncertainty. An indirect confirmation of the reliability of the SSEP N20 wave amplitude for outcome prediction came from a recent study showing that this amplitude was inversely proportional to the severity of PCABI detected on autopsy [71].

However, the reproducibility of SSEP N20 wave amplitude as an outcome predictor is still limited by several factors. The first is that a universally recognised normal range has not been established. This may have been partly due to the previous limited interest in SSEP amplitude. The second limitation is that the N20 amplitude is affected by recording parameters, such as the electrode position or montage, the filter bandwidth, and the stimulus intensity [72]. In our review, the methods for measuring the N20 amplitude were not entirely consistent across the four studies included, which may partly explain the variability of the SSEP thresholds. These methods need standardisation. Finally, although one of the studies we included [45] showed that combining an SSEP N20 wave ≥ 3 µV with a continuous normal-voltage EEG increased the sensitivity for good outcome prediction, the added value of the N20 amplitude when compared with other predictors of good outcome remains to be established.

One study [35] investigated the accuracy of HFO bursts of SSEPs, both as a dichotomous and as a quantitative predictor. HFO bursts are low-amplitude bursts of electrical activity with a frequency around 600 Hz superimposed on conventional (low-frequency) short-latency SSEPs. They are elicited with standard median nerve stimulation and are isolated using digital filtering [73]. Interestingly, the amplitude of the HFO bursts is positively correlated with that of the SSEP N20 wave [74]. An important limitation of HFO bursts is their inconsistent presence, even in healthy subjects, which may significantly reduce their sensitivity as a good outcome predictor. The reason for this is unknown. HFO bursts are not currently used in clinical practice and their applicability for outcome prediction after cardiac arrest deserves further investigation.

EEG

We included several different EEG patterns associated with good outcome in our review. However, most of them had recurrent favourable features in common, the most consistent of which was a continuous or nearly continuous, normal-voltage background with no superimposed discharges (Table 5A.1a-b). Most of these studies adopted the 2012 ACNS terminology [57] (see ESM Table E3 for details). The recently updated 2021 version ACNS terminology [75] was not used in any of the included studies, but the changes concerning the favourable features we assessed in this review are very minor. Presence of a low-voltage or discontinuous EEG was also considered to be compatible with a favourable EEG in other studies (Table 5A.2–A.3). Although a direct comparison between these studies and those restricting the definition to a continuous normal-voltage pattern cannot be made, the accuracy for prediction of good outcome was comparable.

Continuous, normal-voltage EEG patterns without discharges (Table 5A.1a-b) were proposed in two main studies: the Hofmeijer 2015 study [37] and the Westhall 2016 study [51]. These two studies classified EEG tracings in three grades (unfavourable—intermediate—favourable, and highly malignant—malignant—benign, respectively; see ESM Table E4). The definitions of favourable EEG in the ‘Hofmeijer model’ and the ‘Westhall model’ were similar. However, the ‘Hofmeijer model’ was designed to be most accurate early after ROSC (12–24 h) while the ‘Westhall model’ was proposed to be used beyond 48 h. For instance, early after ROSC a low-voltage, discontinuous background with absent or even reversed anterior–posterior gradient can be seen in patients with a good outcome and if applying the more restrictive ‘Westhall model’ such a pattern would not be categorized as a benign EEG.

One reason for higher specificity in the most restrictive definition of favourable EEG could be the inclusion of EEG reactivity. When reactivity was not available, specificity was lower at comparable time points. This was observed across different populations [26, 32], different subpopulations of the same study cohort [28, 51], and in the same dataset when EEG reactivity was not considered [51] (Table 5A.1a). However, these findings will need confirmation from further studies. In addition, the assessment of EEG reactivity was not standardised in the studies we included, and its interrater reliability was only fair [76].

Four studies [30, 31, 43, 45] defined favourable EEG as a continuous or nearly continuous background without discharges, but the definitions also included low-voltage (10–20 µV) background. The specificities for good outcome prediction of these patterns were still high (up to 98% within 12 h in one study [45]) using this more inclusive definition. Sensitivity was also high, over 70% in all but one of the studies [77] (Table 5A.2). These findings contrast with results of our previous review, where a low-voltage EEG was associated with poor neurological outcome in most patients [3]. This apparent discrepancy could be explained by the fact that in the three studies included in the present review low-voltage background was documented within a continuous or nearly continuous EEG background, without epileptiform discharges and, in one study [43], also included reactivity. Interestingly, in two studies using the ‘Westhall model’ [28, 51], all patients with good outcome despite a low-voltage background (therefore classified as false negatives for good outcome according to that model) had otherwise favourable features (continuous or nearly continuous reactive EEG without abundant periodic/rhythmic discharges). This suggests that a low-voltage EEG background (10–20 µV) in PCABI should not be considered as an unfavourable pattern in isolation.

In two studies [29, 47] good outcome was predicted with high specificity by a definition of favourable EEG pattern that included a discontinuous background. Similarly to the previous group, this result can be explained by the presence of other important favourable features, such as a normal-voltage background and absence of superimposed discharges. If discharges were present over an otherwise favourable pattern, specificity decreased remarkably (Table 5A.3).

From this review, concluding which combination of favourable features (background continuity, voltage, organisation, reactivity, frequency content, amount/type of discharges accepted, and time-point/method of recording) is the most optimal and thus yielding the highest accuracy for predicting good outcome would be difficult. A more continuous background, preferably appearing early after ROSC with fewer discharges seems the most important feature in most studies and these variables have recently been shown to have an independent prognostic value [78].

Another feature suggesting good outcome is the time to recovery of a favourable EEG background. We did not directly assess this sign because studies including it measured it as a dichotomous variable and were therefore excluded from the present review. The EEG measures the ‘electrophysiological functional recovery’ of the cortex and its connections with subcortical structures after cardiac arrest [79]. Immediately after ROSC the EEG background is suppressed and then gradually recovers towards a continuous normal-voltage in most patients [80]. This process is not specific for good neurological outcome, but its timing is, since in patients with good neurological outcome recovery typically occurs earlier. In one study [60], EEG background recovered to a continuous normal-voltage within 24 h in 95% of patients with good outcome vs. 11% of patients with poor outcome. This occurred in 75% of patients with good outcome in another study [78]. This confirms that timing after ROSC is an important criterion to be considered when interpreting EEG patterns in PCABI. Obviously, sedation can be an important confounder since sedative drugs may potentially suppress the amplitude and decrease continuity of EEG in a dose-dependent manner and interfere with the process of EEG recovery post-ROSC.

aEEG monitoring and BIS are quantitative trend analysis tools based on few recording EEG channels and aimed at enabling non-specialists to interpret the EEG. While aEEG results report voltage and continuity, BIS is based on a proprietary technology that returns a single number, from zero (corresponding to an isoelectric EEG) to 100 (‘full consciousness ‘). In the four studies using aEEG we included, the identification of a continuous normal-voltage background was possible and its specificity for good outcome was high up to 96 h after ROSC (Table 5C). In our review, a BIS value of 21–24 had high specificity (86–94%) and sensitivity (88–94%) at 2-5 h from ROSC in two studies [42, 46], but its accuracy was lower at 24 h [40], possibly reflecting a partial recovery of EEG background activity in patients with poor outcome, a trend confirmed by other studies based on continuous EEG monitoring [78].

Both aEEG and BIS do not directly enable a morphological assessment of the original EEG signals, so that the identification of superimposed activity is difficult unless the original EEG channels are also displayed. In one study [41], status epilepticus was excluded by reviewing the original EEG tracings displayed together with the aEEG trend curves. All studies were conducted in centres where neurophysiology expertise was available, and their findings have not been externally validated by less experienced readers.

CRI (Table 5D.2) is a summary score which represents a combination of five quantitative EEG features derived from automated quantitative EEG analysis. Each feature is combined into CRI, which ranges from 0 to 1 (the higher, the better). In the only study on CRI we included [49] CRI at 18 h and 24 h from ROSC had a wide AUC (0.94 and 0.87, respectively) and allowed prediction of good outcome with 100% specificity. Interestingly, the study showed that the CRI of patients with good outcome improved faster than did those of patients with poor outcome, confirming signals from both standard EEG and aEEG [60, 78]. CRI has the advantage of being based on an automated and quantitative EEG analysis, which makes the interpretation simpler and more objective. However, the availability of this technique is still limited, and these results need to be validated in a larger patient cohort.

Imaging

PCABI leads to cytotoxic oedema which manifests on brain CT as a decreased density of the brain parenchyma mainly affecting the grey matter symmetrically [81], with a consequent reduction of the density ratio between the grey matter and the white matter (GWR). Other signs of brain oedema from PCABI include an effacement of the cerebrospinal fluid spaces, and the pseudo-subarachnoid haemorrhage and white cerebellum signs [82]. All these signs suggest a poor neurological outcome [3].

Prognostic studies reporting the distribution of individual GWR values in post-cardiac arrest patients according to their neurological outcome [83, 84], showed that while the lowest GWR values were observed only in patients who died or had severe disability, no clear GWR threshold above which good outcome could be predicted was identified. The single study on brain CT included in the present review (Table 6.1) subdivided the GWR in tertiles and showed that the accuracy of GWR for predicting good outcome was not particularly high—just 77% specificity with 25% sensitivity for a GWR ≥ 1.25. It must be noted that in this study brain CT was performed early, on average 2 h after ROSC, when the discriminative value of GWR for PCABI is low [85]. This has been confirmed in studies using a nonquantitative assessment of brain CT [86].

The accuracy of the quantitative regional abnormality (QRA) score was not better than that of GWR. A possible reason is that some of the cerebral regions assessed by QRA include white matter, which is less affected by PCABI than the grey matter, and the cerebellum, whose density is more challenging to assess due to beam hardening resulting from the surrounding skull base. Conversely, the ASPECTS-b score, which focuses on the supratentorial grey matter, was more accurate for predicting good neurological outcome in that study, with 75% sensitivity and 89% specificity at a threshold of 15/20 points [52]. Given the paucity of evidence about prediction of good outcome using brain CT, the use of ASPECTS-b score appears interesting. However, ASPECTS-b has been designed for assessing ischaemic injury from stroke, which is usually unilateral. Conversely, brain damage from PCABI is usually bilateral, which deprives the reader of the CT scan of a contralateral reference when detecting ischaemic changes. The feasibility of this method needs confirmation from other studies.

MRI

Acute PCABI is characterised by cytotoxic oedema, cellular swelling, and restriction of water diffusion in affected brain areas which appears as a hyperintensity on diffusion-weighted imaging (DWI) with corresponding low apparent diffusion coefficient (ADC). MRI has high sensitivity for PCABI [3], therefore the absence of DWI changes is a potentially valuable predictor of neuronal integrity and good clinical outcome. Indeed, both sensitivity and specificity for absence of DWI lesions were high in our review. The lowest specificity (60%) was observed in a study [54] where DWI was assessed at a median of 3.1 h after ROSC. However, when MRI was repeated at a median of 78 h after ROSC, specificity increased to 93% (Table 6.2). The reason is that development of brain oedema after PCABI is time-dependent, and the extent of post-anoxic changes may not be evident before 3–7 days after ROSC [53]. The spatial distribution of brain injury is also of relevance when prognosticating using imaging studies, due to the selective vulnerability of specific brain areas to PCABI. One study [53] assessing supra- and infratentorial regions separately showed that the absence of DWI and FLAIR changes on the cortex or basal ganglia predicted good outcome much more accurately than when these lesions were absent on the infratentorial structures.

Besides the specific points made above, some caveats regarding the use of imaging studies for the evaluation of PCABI should be considered: the interpretation of their results is partly subjective, being dependent on the experience of those reporting the scans. Moreover, even when the interpretation is based on quantitative measurements, ensuring reproducibility within and across studies of measurements from multiple brain parenchymal regions is challenging because of variations in the measurement methods (e.g., location of the region-of-interest) and differences in CT and MRI scanners and scanning protocols. Finally, the influence of comorbidities (e.g., presence of white matter lesions, or brain atrophy) on quantitative and functional imaging techniques has not been sufficiently investigated. Standardisation and normalisation of the imaging techniques [52] may therefore be of value.

Overall, our review showed that good neurological outcome can be predicted with high (> 80%) specificity early after cardiac arrest. This specificity was lower than the 100% value (0% false positive rate [FPR]) reported for many poor outcome predictors [3]. However, we cannot exclude if self-fulfilling prophecy bias causes the specificity of these predictors to be overestimated. On the other hand, achieving a zero FPR when predicting good neurological outcome is less important than when predicting poor outcome because good outcome predictors are not used in WLST decisions. In an international survey conducted in 2019, 19% of clinicians considered errors in recommending long-term support in patients who will not ultimately recover to be unimportant [87]. Most responders in that survey thought a maximum 1% FPR would be desirable when predicting good outcome, an expectation which is not matched by any of the predictors included in our review. However, achieving such a low FPR would be difficult because of a bias typically affecting prediction of good neurological outcome after cardiac arrest—death from non-neurological causes [65].

Sensitivity was 40% or more in most studies included in our review, but it exceeded 70% with ≥ 90% specificity for some predictors, such as normal blood values of NFL or normal brain MRI after rewarming. Evidence concerning NFL had a low risk of bias, being based on a blinded assessment in two multicentre studies. However, blood samples were tested in the same laboratory and external validation is necessary. Evidence regarding MRI was based on a wider range of studies, but assessment was not blinded. Moreover, there is a risk of selection bias, since haemodynamically unstable patients with the most severe whole-body ischemia–reperfusion injury may not be suitable for MRI.

There is no consensus on the optimal sensitivity for prognostication of good outcome in post-cardiac arrest coma. On one hand, achieving 100% sensitivity is desirable to ensure that no patient with a potential of recovery is missed. On the other hand, this would inevitably come at a cost of decreasing specificity. The trade-off between sensitivity and specificity in prognostication depends on a series of factors, including costs, legal and ethical considerations. For predictors based on continuous variables, such as blood values of biomarkers or N20 SSEP amplitude, it may be possible to select a specific threshold based on whether sensitivity or specificity should be prioritised. However, further studies and standardisation will be necessary to establish consistent thresholds for these predictors. For EEG-based predictors, using more restrictive definitions may increase specificity for good outcome (Table 5A.1a).

Timing of assessment affected the accuracy of the predictors we evaluated. In studies assessing the same predictor at multiple time points, the specificity for good outcome (or the sensitivity, at comparable values of specificity) decreased progressively over time. This occurred for biomarkers [22, 24], SSEP N20 wave amplitude [45], and EEG [26, 30, 32, 45]. As discussed above, this may be partly because, even with a serial evaluation, the population assessed in these studies was not consistent, since patients who awakened or died after their initial evaluation were not subsequently reassessed, and survivors with moderate or severe PCABI in whom prognosis is more uncertain prevailed in the population assessed at later time points.

Although making conclusions about timing of assessment of these predictors is premature, favourable EEG, higher-voltage N20 SSEP wave and normal values of biomarkers appear suitable for predicting good outcome already during the first 24–48 h after ROSC, while a normal MRI could be used to detect late awakeners. As for prediction of poor neurological outcome, using multiple predictors and repeating assessment at multiple time points is the most reasonable strategy [5].

Predicting good neurological outcome has a potential to reduce uncertainty in the prognostication process, which is currently almost entirely focused on poor outcome prediction. In one study [88] on 486 comatose resuscitated patients, 330 (68%) had an indeterminate outcome after the application of the 2015 ERC-ESICM prognostication algorithm at 72 h. Of these, 250 (74%) had a favourable EEG (continuous or nearly continuous, normal voltage background without seizures or abundant discharges), which was associated with neurological recovery in 184 (74%) patients. Future prospective studies are needed to assess the potential of good outcome prediction to reduce uncertainty in patients assessed using the 2021 prognostication algorithm. Although the 2021 guidelines for post-resuscitation care [5] do not recommend any specific strategy for predicting good outcome, they mention low NSE or NFL blood values and normal MRI as signs suggesting a potential good outcome and recommend caution when these signs coexist with others predicting poor outcome. This cautionary recommendation, based on expert opinion, is confirmed by the present review. However, direct evidence on the prognosis of patients showing discordant signals from neurological predictors is lacking. Investigation in this field is warranted to validate current recommendations.

Some limitations of our study should be acknowledged. First, although this review included 37 studies, there were rarely more than four studies assessing an individual predictor. The most documented predictor was standard EEG, reported in 15 studies. Even if three main favourable EEG patterns were identified (Table 5A1–3) there were slight differences among them, whose prognostic relevance requires further investigation. We hope the results of this systematic review and the increasing adoption of the standardised ACNS EEG terminology will encourage future research on specific homogenous EEG patterns. The interrater variability in the assessment of these EEG patterns should also be prospectively investigated.

Second, temperature management after arrest and the use of sedatives or neuromuscular blocking drugs may have affected the accuracy of some predictors, especially those based on clinical examination or EEG. Sedation was not standardised in the studies we included. Although the use of short-acting sedatives may affect time to awakening in post-cardiac arrest patients [89], the specific effects of the different sedation protocols on the accuracy of prognostication remain to be investigated.

Third, publication bias and selective outcome reporting may have affected the certainty of evidence in this review. This risk in prognostic studies is substantial because, unlike randomised controlled trials, registration for prognostic studies is not mandatory [90]. However, there is no consensus on how publication or reporting bias should be assessed in these studies. Tests like funnel plot asymmetry, designed primarily for randomised control trials, are not appropriate [19].

Fourth, evidence concerning predictors of good neurological outcome after cardiac arrest may be biased by non-neurological causes of death, which may cause censoring of patients destined to a good neurological recovery or cause death after this recovery has occurred. Documenting causes of death in prognostic studies could reduce this bias [10, 68]. Some of the studies we included [26,27,28, 30, 31, 46, 47, 49,50,51] assessed the ‘best CPC’ achieved in their patient populations. Unfortunately, this was investigated in different predictors and that heterogeneity precluded a sensitivity analysis to investigate if the use of ‘best CPC’ resulted in higher accuracy vs. CPC at the scheduled time point of these studies.

Fifth, our review was aimed at neurological prognostication only and did not consider other potentially predictive variables of neurological outcome such as 'downtime', age, and pre-arrest functional status. Including these variables requires a multivariate analysis which is beyond the scope of this review.

Conclusions

Our systematic review showed that in adult comatose survivors of cardiac arrest, a GCS motor score 4 or 5 immediately or at 72–96 h after ROSC, normal NSE blood values at 24 h–72 h after ROSC, a SSEP N20 wave amplitude above 4 µV or a continuous EEG background without discharges within 72 h from ROSC, and absent diffusion restriction in the cortex or deep grey matter on MRI on days 2–7 after ROSC predicted good neurological outcome with high specificity and a sensitivity above 40% in most studies. EEG, SSEP, NSE and MRI were the most widely documented predictors, but the number of supporting studies is still lower than that of predictors of poor neurological outcome. Further prospective studies of predictors of good outcome are needed to help reduce uncertainty when applying current guidance for prognostication in comatose survivors of cardiac arrest.