FormalPara Take-home message

Prespecified thresholds of quantitatively assessed pupillary light reflex < 4% and the Neurological Pupil index ≤ 2 predict unfavorable neurological outcomes with zero-percent false-positive rate in comatose out-of-hospital cardiac arrest survivors. Combined with neuron-specific enolase, pupillometry thresholds offer a robust multimodal neuroprognostication strategy, especially relevant for centers facing challenges in obtaining neurophysiological data.

Introduction

Despite hospitalization, out-of-hospital cardiac arrest (OHCA) in Europe averages a 26% survival-to-discharge rate (41% in Denmark, rising to 81% if bystander-witnessed with initial shockable rhythm), reaching 35% for those admitted with return of spontaneous circulation (ROSC) [1]. Most die from hypoxic–ischemic brain injury, leading to prolonged coma and active withdrawal of life-sustaining treatment (WLST) [2, 3].

European countries practicing WLST report over 90% favorable long-term neurological outcomes in survivors, contrasting a 50% unfavorable outcome, with 33% in a persistent vegetative state, in countries where WLST is not implemented [4]. To avoid futile treatment and prevent inappropriate WLST, the European Resuscitation Council (ERC) and the European Society of Intensive Care Medicine (ESICM) recommend a multimodal neuroprognostication strategy in survivors remaining comatose at ≥ 72 h after ROSC [5]. This strategy involves clinical, electrophysiological, biomarker, and neuroimaging predictors, with an unfavorable outcome being likely with at least two positive indicators.

Assessment of pupillary reflexes is a crucial predictor in patients with acute brain injury [6, 7]. Due to significant inter-observer variability when manually assessing pupillary reflexes with a penlight, guidelines recommend automated pupillometers for neuroprognostication [5]. The automated pupillometer quantitatively measures pupil reaction, including the percentage of pupillary constriction (quantitatively assessed pupillary light reflex, qPLR) and several other reactivity variables. These can be integrated into an algorithm-based composite risk score, such as the Neurological Pupil index (NPi), ranging from 0 to 5 [8, 9].

Several studies have confirmed the significant prognostic value of qPLR and NPi and proposed specific thresholds predicting neurological outcomes with no false predictions [10,11,12,13,14,15,16,17,18]. However, using different devices and unaligned assessment and analyzing methods, results on absolute values differ across studies. Further, as no studies have reproduced and validated findings in a uniform methodology, guidelines have been unable to provide standardized threshold recommendations [19].

We aimed to perform an external validation with a similar methodology of the previous studies proposing pupillometry thresholds of qPLR < 4% and NPi ≤ 2, shown to predict unfavorable outcomes from admission to 72 h with zero-percent false-positive rate (FPR) in comatose OHCA survivors [10, 13]. As the recommended multimodal approach demands at least two positive predictors in unconscious patients at ≥ 72 h, we further aimed to investigate the prognostic effect of pupillometry combined neuron-specific enolase (NSE) > 60 μg/L, a biochemical marker of neurologic damage (with higher levels associated with more extensive brain injury) [20].

Early preliminary results from this substudy have previously been presented [21, 22].

Methods

Study design and patients

We conducted a protocolized [23], prospective, multi-center prognostic substudy within the Blood Pressure and Oxygenation Targets after Cardiac Arrest (BOX) trial. The BOX-trial, an investigator-initiated, multi-center, randomized trial, compared blood pressure, oxygen, and temperature targets [24,25,26]. From March 2017 to December 2021, we consecutively included 789 comatose OHCA survivors (≥ 18 years) of presumed cardiac origin from two Danish tertiary cardiac arrest centers. The BOX-trial design, data collection, management, and informed consent handling have been previously published [27]. This substudy adhered to the overall protocolized parameters of the main BOX-trial [23, 24, 27].

Subjects from the BOX-trial were eligible (criteria outlined in supplementary Table S1), however, excluded if pre-existing ophthalmic conditions affecting pupil movement were present (e.g., cataract and eye surgery) [28, 29]. Pre-hospital data were systematically collected according to Utstein-style guidelines [30] and reported in accordance with the STARD and TRIPOD statements.

Post-cardiac arrest care and study procedures

All patients received protocolized guideline-suggested post-resuscitation care [23, 24, 27]. This included blood-pressure management with vasoactive agents (primarily with norepinephrine and dopamine) and oxygenation during mechanical ventilation (primarily by FiO2 and positive end-expiratory pressure [PEEP] adjustments). Sedation was achieved with propofol and fentanyl to a Richmond agitation sedation scale of -4 or less, and temperature control targets were managed with active cooling devices. After 24 h of temperature control, core temperature was gradually increased to normothermia at a rewarming rate of < 0.5 °C per hour, and sedation was tapered.

Pupillometry assessments were conducted at admission (t0) and at 24 (t24), 48 (t48), and 72 h (t72), approximated to within ± 6 h as part of clinical practice. Clinical nursing staff used NPi®-200 pupillometers (NeurOptics®, Irvine, CA, USA) until awakening or death. Plasma NSE was measured at t48 using a COBAS analyzer system (Roche Diagnostics).

We included continuously comatose patients with Glasgow Motor Score (M) ≤ 3 at ≥ 72 h after admission in the analysis for the neuroprognostication strategy algorithm with combined predictors of 48-h plasma NSE > 60 μg/L and pupillometry [5].

Automatic pupillometry assessments were performed with the same device for each eye, and measurements were automatically imported into the electronic database through SmartGuard® devices, which were unviewed until post-trial analysis, blinded to outcome assessors. The lowest value of the two eyes defined threshold values associated with an unfavorable outcome, as in similar studies [10, 16, 18].

This study used qPLR (%) and NPi to predict outcomes. qPLR corresponds to the percentage of pupillary constriction to a calibrated light stimulus (1000 Lux) from the NPi-200. The NPi, a composite score from 0 to 5, integrates multiple pupillary variables from the device (size, latency, constriction, and dilation velocity), with lower values indicating less reactivity and greater pathology [14, 18, 31, 32]. NPi values ≥ 3 are considered normal, while < 3 indicates abnormal reactivity. An NPi value of 0 indicates a non-reactive/absent response.

Outcome measures

The primary outcome was neurological function defined by Cerebral Performance Category (CPC) score [33, 34] assessed at a follow-up visit no earlier than 90 days post-randomization. We invited patients to the follow-up that included neurological evaluation by CPC (ranging from 1 to 5, increasing with more significant disability; 3 or 4 being severe disability, coma, or vegetative state, and 5 being death), modified Rankin scale (mRS) score [35, 36] (range 0–6, with 0 indicating no symptoms, 1 no clinically significant disability, 2 slight disability, 3 moderate disability, 4 moderately severe disability, 5 severe disability, and 6 death), and Montreal Cognitive Assessment (MOCA) score (range, 0 to 30, with a score of 26 or higher being normal), performed by trained personnel blinded to the pupillometry results [35,36,37]. A CPC or mRS score of ≥ 3 defines an unfavorable neurological outcome.

For non-participating outpatients, we conducted telephone interviews or reviewed medical records, excluding MOCA scoring. Some visits were delayed due to the coronavirus disease 2019 (COVID-19) pandemic hospital restrictions [24].

Secondary outcomes included death from any cause within 365 days, median 48-h plasma NSE, and follow-up mRS and MOCA. In addition to this, we evaluated the primary outcome, with all extracerebral deaths (no sign of severe brain injury in neuroprognostication and hemodynamic, multi-organ failure, sepsis, or other causes more likely) occurring during initial admission excluded.

Prognostication and withdrawal of life-sustaining treatment

The protocolized prognostication from the BOX-trial [24, 27] is referenced in the supplementary appendix. Decision on WLST was made no earlier than 72 h after ending sedation, based on criteria including brain death due to cerebral herniation, severe myoclonus status ≤ 24 h with bilateral absence of N20-peak on median nerve somatosensory-evoked potentials (SSEP), persisting M ≤ 3 with bilateral absence of N20-peak on median nerve SSEP < 72 h, and persisting M ≤ 3 with treatment-refractory status epilepticus < 72 h. Prior to prognostication, electroencephalographs (EEGs) were performed on all patients who were persistently comatose 24 h after sedation and analgesics had been tapered. If inconclusive, subsequent EEGs were performed.

Automated pupillometry was used for research only (unutilized by treating physicians or included in clinical neuroprognostication/WLST), as it was only recently recommended in guidelines [5].

Statistical methods

Categorical variables are expressed as counts (percentage), and continuous variables as mean (± standard deviation [SD]) or median (interquartile range [IQR]). We assessed differences using Chi-square or Fisher’s exact test for categorical variables and t tests for continuous variables. We assumed a linear mixed-effects model with an unstructured covariance for repeated measures to analyze mean value differences over time and between primary outcome groups of unfavorable outcomes at follow-up. We incorporated outcome group, time-point, and their interaction term as fixed effects, with random intercepts for each patient. We employed logistic regression models to estimate the performance of quantitative pupillometry in predicting the outcomes of CPC3-5 and mRS3-6 at follow-up after 90 days, and death from any cause within 365 days. The analysis involved evaluating the predictive accuracy using Receiver-Operating Characteristic (ROC) curves and Area Under the Curve (AUC) statistics. Difference in AUCs was tested with the De Long method [38]. Models were adjusted for age, sex, time-to-ROSC, lactate level at admission, witnessed arrest, NSE, bystander cardiopulmonary resuscitation (CPR), shockable primary rhythm, and randomization site.

We calculated specificity (%), sensitivity (%), FPR (%), positive predictive value (PPV, %), negative predictive value (NPV, %), Youden Index, and area under the curve (AUC) and presented results with 95% confidence intervals (95% CI) when appropriate. The combination of pupillometry and NSE thresholds was performed sequentially with a conditional subsequent retest (qPLR < 4 or NPi ≤ 2) if the first test was negative (NSE ≤ 60 μg/L). The overall test was positive if either test was positive. As a sensitivity analysis, we performed interaction analyses and assessed the characteristics of patients with missing quantitative pupillometry data. If at random, we conducted multiple imputations by chained equations.

As per protocol [23], we calculated sample size using the Jones et al.’s method, similar to that of Oddo et al. [10, 39]. Based on a specificity of 95%, a 95% CI of 3%, and a prevalence of unfavorable outcomes of 38%, we needed at least 534 patients. We used R statistical software (version 4.2.2) for all analyses, with a two-sided significance level of 5%.

This trial is registered with ClinicalTrials.gov, NCT03141099.

Results

Baseline demographic and characteristics of patients

Of the 789 comatose OHCA survivors in the BOX-trial [24], 71 lacked pupillometry data, and 8 were excluded due to a history of cataract surgery. The final study population included 710 patients (mean age: 63 ± 14 years; 82% males). The mean time-to-ROSC was 21 ± 14 min, 91% had a primary shockable rhythm, 85% had witnessed OHCA, and 89% received bystander CPR (Table 1). Patients with qPLR < 4% or NPi ≤ 2 at t48 had fewer occurrences of shockable rhythm, longer time-to-ROSC, significantly lower pH, and higher lactate levels on admission (supplementary Table S2). Patient flow and exclusions are summarized in supplementary Fig. S1.

Table 1 Demographic and clinical characteristics of the patients

The 71 patients lacking pupillometry were older, had fewer primary shockable rhythms, and instances of bystander CPR. However, there were no differences in outcome data, and multiple imputations yielded prognostic results similar to the complete cases.

Outcomes

As the 710 (100%) patients at t0 gradually woke up or died, we obtained pupillometry at t24 (n = 682, 96%), t48 (n = 543, 76%), and t72 (n = 301, 42%). After 72 h, 39 (5%) patients had died, and 187 (26%) were still unconscious with M ≤ 3.

At follow-up (median time of 134 days), we estimated unfavorable neurological outcomes with median values (IQR) in 266 (37%) patients with 1 (1–5) for CPC, and in 280 (39%) patients with 1 (0–6) for mRS, similar to other Danish studies [13, 15]. Median MOCA was 27 (24–29), and plasma NSE level at 48 h was 18 (11–37) μg/L. With a median lifetime of 7 (4–21) days from admission, 257 (35%) patients died within 365 days (extracerebral deaths, n = 63). Patients with qPLR < 4% or NPi ≤ 2 at t48 had significantly higher incidents of primary and secondary outcomes than for thresholds above (Table 2).

Table 2 Primary and secondary outcomes according to t48 pupillometry thresholds

Pupillometry

Values for favorable outcome patients were initially significantly lower, clustering tightly at admission and t24 compared to t48 and t72. Unfavorable outcome patients showed no significant NPi increase between time points, creating a wider gap to favorable outcome patients later. Even without statistical significance, we observed similar trends for qPLR. However, median values for both predictors remained significantly lower in patients with unfavorable outcomes at all time points. We have illustrated the distribution of qPLR and NPi according to time points by outcome at follow-up in Fig. 1.

Fig. 1
figure 1

Distribution of quantitative pupillometry measurements during the first 72 h (t0–t72) after cardiac arrest according to outcome at follow-up. The rainfall plots show the measurements of qPLR (A) and NPi (B) at time points, with the density and distribution depicted in the violin plot. Measures are divided by primary outcome group, dichotomized as favorable (green) and unfavorable (purple). qPLR quantitatively assessed pupillary light reflex, NPi Neurological Pupil index

Prognostic performance, assessed by AUC under the ROC curve, peaked at t48 for qPLR (0.81 [0.77–0.85]) and NPi (0.83 [0.80–0.87]) predicting primary outcome (supplementary Table S3), with consistent findings for secondary outcomes (supplementary Table S4). Adjusted for clinical predictors, the highest AUC occurred at t24 for qPLR (0.93 [0.91–0.96, p < 0.001]) and at t48 for NPi (0.93 [0.90–0.95, p < 0.001]). Both qPLR and NPi had similar significant contributions to AUC at all time points in the multivariable model. When excluding extracerebral causes of death, the increased AUC was insignificant in either predictor compared to the total population.

An NPi ≤ 2 from t0 to t72 predicted unfavorable neurological outcomes at follow-up and death within 365 days, with 0% (0–0%) FPR. Similarly, a qPLR < 4% from t24 to t72 predicted outcomes with a 0% (0–0% to 0–1%) FPR (Table 3). The sensitivity was highest at 26% for qPLR and 12% for NPi at t24. Exploring thresholds predicting outcome with maximum sensitivity and 0% FPR, qPLR < 3 (t0), < 4 (t24-t48), and < 6 (t72) showed a corresponding sensitivity of 16–26%. Similar NPi < 2.8 (t0), < 2.9 (t24), < 3.3 (t48), and < 3.1 (t72) resulted in a sensitivity of 10–18% (supplementary Table S5a + b).

Table 3 Prognostic performance of pupillometry thresholds and NSE

With an AUC of 0.81 (0.80–0.82), 48-h plasma NSE > 60 μg/L predicted the primary outcome with 2% (0–3%) FPR and 39% (33–46%) sensitivity. When combined, pupillometry significantly improved the prognostic performance of NSE (from t0 to t48 for qPLR and from t0 to t72 for NPi) (Fig. 2). In the multimodal neuroprognostication strategy for comatose patients, with M ≤ 3, at ≥ 72 h, NSE > 60 μg/L predicted an unfavorable outcome with 0% (0–0%) FPR and sensitivity of 42% (35–51%). When t48 pupillometry was implemented as the second predictor, it significantly increased sensitivity to 55% (47–63%) for qPLR and 50% (42–58%) for NPi. When exploring the performance of other pupillometry thresholds, qPLR < 9% and NPi < 3.3 maintained predictions with 0% FPR and improved sensitivity to 70% (62–77%) and 57% (49–65%), respectively.

Fig. 2
figure 2

Prognostic performance of pupillometry and 48-h NSE. Receiver-operating characteristic curves depicting the prognostic performance by the area under the curve of 48-h plasma neuron-specific enolase level alone and combined with quantitatively assessed pupillary light reflex and Neurological Pupil index, respectively, measured at t0 (A), t24 (B), t48 (C), and t72 (D). The P value depicts the significance of the contribution of pupillometry to neuron-specific enolase in predicting unfavorable outcomes at follow-up. NSE neuron-specific enolase, qPLR quantitatively assessed pupillary light reflex, NPi Neurological Pupil index

Discussion

In this prospective multi-center study of 710 OHCA survivors, we present the largest cohort to investigate the prognostic efficacy of automated quantitative pupillometry.

We successfully validated proposed pupillometry thresholds for predicting unfavorable neurological outcomes at follow-up with 0% FPR. Assessments from admission (NPi ≤ 2) and from day 1 (qPLR < 4%) were 100% specific for outcome at all remaining time points. Notably, both parameters remained robust as independent predictors, and thresholds of qPLR < 3 and NPi < 2.8 achieved a 0% false-positive rate immediately after admission.

These findings align with the previous studies [10, 13] and the recent ORANGE study [18], an extensive, international, multicentre, prospective, observational study that demonstrated clinically and statistically significant prognostic value of quantitative pupillometry for neurological outcomes and mortality in 514 patients with traumatic brain injury, aneurysmal subarachnoid hemorrhage, and intracerebral hemorrhage. This study emphasizes the importance of NPi as a very early predictor, as proposed by Oddo et al. [10].

Sedatives/analgesics (especially propofol and fentanyl used in this study) minimally affect NPi, whereas deep sedation may decrease qPLR [40, 41]. This may explain NPi's consistent predictability across all time points, with qPLR achieving 100% specificity only after day 1.

A recent systematic review and meta-analysis by Wang et al. [12] proposed thresholds of qPLR ≤ 7% and NPi ≤ 2 derived from pooled data median values. The NPi threshold was consistent with the validated data from this study, however, both Wang et al. and another recent systematic review [11, 12], identified variability in proposed qPLR thresholds ranging from 0% at 72 h to < 7% and < 13% at 48 h [10, 14, 17]. Studies using a Neurolight® (IDMED, Marseille, France) pupillometer reported qPLR thresholds of < 7–13% but also higher median values, with previous research indicating an approximately 10% variance between the Neurolight and the NPi-200 used in this study [42]. Oddo et al. [10], using an NPi-200, reported median qPLR values similar to this study (± 0–1%) but with a even more restricted threshold of qPLR = 0% for predicting outcomes with 0% FPR at day 3. Their population included cardiac and non-cardiac causes for arrest, and patients with favorable outcomes had longer time-to-ROSC and a higher rate of non-shockable primary rhythm compared to the general Danish OHCA population [1, 43]. This could have yielded more false-positive qPLR measurements due to extended “recovery period” from transient brain stem dysfunction after OHCA [44, 45]. As qPLR can be decreased when influenced by anesthetics [40], a lower qPLR threshold predicting outcome would be expected when pupillometry was performed regardless of sedation [10].

The sensitivity corresponding to the extreme FPR found in this study (NPi 9–12% and qPLR 8–26%) was lower than previously reported [11]. However, exploratory analyses for optimal thresholds showed increased sensitivity (NPi 10–18% and qPLR 16–26%). Slightly lower FPR at 1–2%, increased sensitivity significantly (NPi at 25–46% and qPLR at 41–50%). The lowest sensitivity was observed at t72, possibly influenced by the change in population composition (most patients with true-positive favorable outcomes were awake) and recovery from transient brain stem dysfunction. The ORANGE study confirmed that an abnormal NPi (< 3) is strongly associated with long-term mortality and poor neurological outcomes in acute brain injury patients [18]. While ORANGE included non-anoxic acute brain injury of a more heterogeneous composition, our results confirm the prognostic value of NPi < 3 in hypoxic–ischemic brain injury. At 48–72 h post-OHCA, NPi < 3 predicted unfavorable outcomes with 100% specificity. The pathology of non-anoxic brain injuries warrants repeated measurements for predicting outcomes. Dynamic changes in our results indicate that this could be beneficial for OHCA survivors as well and should be further investigated.

As pupillometry is still dichotomized in “present” versus “absent,” it was not included in the recent systematic review of predictors of a good outcome by Sandroni et al. [46]. Our prognostic performance results (Table 3) indicate that qPLR/NPi also has the potential to predict favorable outcomes. By accepting specific thresholds in prognostication guidelines, future studies could confirm this prognostic ability.

The ERC/ESICM neuroprognostication strategy incorporates a multimodal approach with at least two predictors to enhance sensitivity without increasing false predictions. Adding either qPLR < 4% or NPi ≤ 2 to NSE > 60 μg/L significantly increased performance (AUC under the ROC curve) and sensitivity to 50% (NPi) and 55% (qPLR) while maintaining a 0% (0–0%) FPR. The results of NSE combined with pupillometry at values above the proposed thresholds indicate that when used in a multimodal approach, even more, liberal cut-off values could achieve similar specificity with increased sensitivity.

Combining 48-h NSE and pupillometry with validated thresholds for evaluating persistingly comatose OHCA survivors with M ≤ 3 at ≥ 72 h provides a highly reliable (specificity) and useful (sensitivity) multimodal neuroprognostication strategy, especially for centers where neurophysiology is challenging to obtain.

Study limitations

The protocolized aim of this study was to validate and investigate the qPLR and NPi parameters, and therefore, the remaining quantitative parameters (CV, MCV, DV, and LAT) have not been included in this study. Due to the clinical design of the main trial, it was not possible to standardize ambient light conditions, diverge from the use of sedation or vasopressors in standard care, or test measurements in different conditions. Thus, the results of this study reflect the clinical setting for neuroprognostication.

Even though automated quantitative pupillometry was unavailable for personnel involved in outcome assessments and decision-making, clinicians involved in patient care were not blinded to the standard manual pupillometry performed. As there was no set order for the manual and quantitative pupillometry assessments, it is possible that manual measurements could have been recorded by the same personnel also performing automated assessments. Despite this, no formal training was provided for interpreting qPLR and NPi, and WLST decisions were never based on a single predictor.

Some patients were excluded if quantitative pupillometry data were missing. However, there was no difference in baseline characteristics or outcomes between the patients with missing and the present data, and we consider that missingness was at random and thus did not infer the generalizability of results.

The individual components and calculation of the NPi algorithm, developed by NeurOptics®, have never been fully published. However, the prognostic value of the index has been thoroughly verified through numerous studies [9, 10, 13, 15, 16, 18].

Though all analyses of the primary and secondary outcomes (Table 3) met the Bonferroni corrected α (0.05/7 = 0.0071), this was not protocolized and thus performed post hoc.

Conclusion

We successfully validated the robust prognostic value of quantitative pupillometry after OHCA, with specific thresholds of qPLR < 4% and the NPi ≤ 2 predicting unfavorable neurological outcomes with zero-percent FPR. Combining the thresholds with NSE > 60 μg/L increased sensitivity without raising false positives, suggesting its potential as a substitute for absent pupil reflex in multimodal neuroprognostication.