Death by neurologic criteria occurs when a catastrophic brain injury causes the permanent loss all cerebral functions essential to life. Accurate death determination by neurologic criteria (DNC) is essential to providing closure to relatives and to ceasing somatic mechanical support in the deceased individual. Patients who are diagnosed with death by neurologic criteria often become organ donors; they are in fact the major source of transplantable organs for individuals with terminal heart, lung, liver, and kidney disease.1 The cornerstone of DNC is a reliable clinical neurologic examination showing permanent cessation of consciousness and loss of brainstem reflexes, including central apnea as shown by an apnea test.2 Perfect specificity in DNC (i.e., absence of false positives) is of paramount importance to ensure that the dead donor rule, which states that organs can only be retrieved from a dead person, is respected.3 In practice, numerous factors commonly known as “clinical confounders” may render the clinical examination unreliable, such as drug intoxication or cervical spinal cord injury. Furthermore, a complete neurologic examination is not always feasible, for instance when apnea testing is not safe because of cardiopulmonary instability. In these scenarios, clinicians often use ancillary tests to assess surrogates of brain function, namely cerebral blood flow (e.g., cerebral four-vessel angiography, computed tomography [CT] angiography), perfusion (e.g., CT perfusion scan), or neurophysiologic function (e.g., electroencephalogram [EEG]-evoked potentials).4 In certain jurisdictions, ancillary tests are also compulsory to confirm DNC, even in patients with reliable clinical examinations.5

Guidance on the use of ancillary tests for DNC, as well as clinical practice, are heterogeneous both between and within jurisdictions.6,7,8 This may reflect the absence of a comprehensive analysis of the diagnostic validity of ancillary tests. The objective of this study was thus to assess the diagnostic accuracy of commonly used ancillary tests for DNC.

Methods

This study is a systematic review and meta-analysis of diagnostic test accuracy, for which the detailed protocol was published previously.9 The review follows strict methodological standards based on the Cochrane Collaboration Diagnostic Accuracy Working Group’s recommendations. Reporting follows the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement.10

Study selection criteria

The target condition of this review was death by neurologic criteria. Since ancillary tests are applied to both clinically diagnosed DNC patients (in a confirmatory role) and to comatose patients suspected of death by neurologic criteria (in a diagnostic role), studied populations included either 1) patients with clinically diagnosed death by neurologic criteria who underwent confirmatory ancillary testing (only patients with DNC), or 2) comatose patients suspected of death by neurologic criteria who underwent reference standard and ancillary testing for DNC (patients with and without death by neurologic criteria). We included cohort and case–control studies, as well as case series, without restriction by language of publication. As this review principally concerns adult patients, we included study samples composed of at least 80% of adults (18 yr or older). We excluded studies from which we could not obtain or calculate the true and false positive and negative rates from the text, appendices or after contacting the main authors. Studies for which the objective was to determine diagnostic criteria of a specific ancillary test with no a priori definition of the diagnostic criteria for death by neurologic criteria were also excluded. Finally, we excluded studies without a valid reference standard, studies conducted on pediatric patients only, case reports (2 or fewer patients), and duplicates or subcohorts of already published cohorts.

Reference standards and index tests

We considered studies that used one of three reference gold standards for DNC: clinical diagnosis (an established cause of brain injury, irreversible coma, absence of brainstem reflexes, and central apnea), conventional four-vessel angiography (no intracranial blood flow), and radionuclide imaging (hollow skull phenomenon).11 In studies where authors included an ancillary test in the reference clinical diagnosis, we considered the combination of the clinical evaluation and this ancillary test as the clinical diagnosis reference standard, with plans to perform subgroup analyses pertaining to this factor. In studies where multiple reference standards were applied to patients, clinical diagnosis was chosen as the preferred reference to allow four-vessel angiography and/or radionuclide imaging to be included in the analysis as index ancillary tests.

We investigated the following ancillary tests: four-vessel angiography, radionuclide imaging (including 99mTc-pertechnetate angiography, 99mTc-diethylenetriamine pentaacetate [DTPA] angiography, 99mTc-hexamethylpropyleneamine oxime [HMPAO] angiography, 99mTc-HMPAO perfusion with and without single-photon emission computed tomography [SPECT], or other radionuclide testing), transcranial Doppler ultrasonography (TCD), electroencephalography (EEG; cortical or nasopharyngeal), evoked potentials (brainstem auditory, visual, or somatosensory), CT angiography (CTA; 4-point scale,12 7-point scale,13 10-point scale,14 no intracranial flow criteria, or other criteria), CT perfusion imaging (CTP), magnetic resonance imaging (MRI; time-of-flight angiography, diffusion weighted imaging and apparent diffusion coefficient, arterial spin labeling, or other criteria), magnetic resonance venography, magnetic resonance perfusion imaging, and xenon CT.

Search strategy and study screening

We searched MEDLINE, EMBASE, Cochrane databases, and CINAHL Ebsco from their inception to 4 February 2022, using a comprehensive search strategy developed with an information specialist trained in the conduct of systematic reviews (Electronic Supplementary Material [ESM] eAppendix 1). We also reviewed the reference lists of all published narrative reviews, systematic reviews, and eligible studies for additional references. Two blinded reviewers independently performed study screening at the title/abstract level and then at the full-text level using the same inclusion and exclusion criteria. At each level of the study selection process, disagreements were solved by consensus or by consultation with a third reviewer as needed.

Data collection and methodological quality assessment

Two blinded reviewers independently collected data on study characteristics (study design, location, studied population, inclusion and exclusion criteria, patient characteristics and flow, reference standard and ancillary testing definitions) and results (number of true positives, false positives, true negatives, false negatives, inconclusive results, and patients with missing data). Reviewers used the Quality Assessment of Diagnostic Accuracy Studies 2 (QUADAS-2) tool to independently assess the methodological quality of each included study.15 Disagreements in data collection and methodological quality assessment were also solved by consensus or consultation with a third reviewer as needed. When required, reviewers attempted to contact investigators of included studies to clarify extracted data.

Data analysis

Since some studies assessed multiple ancillary test types, descriptive statistics are presented at both the study and assessment levels. Dichotomous variables are reported as counts and proportions. To estimate ancillary test diagnostic accuracy, we performed two meta-analyses. We conducted the first meta-analysis among studies involving only clinically diagnosed death by neurologic criteria patients (where ancillary tests are used in a confirmatory role). For these studies, only sensitivity could be calculated, since all are either true positives or false negatives. We conducted the second meta-analysis among studies involving comatose patients clinically suspected of death by neurologic criteria (where ancillary tests are used in a diagnostic role). For these studies both sensitivity and specificity could be calculated.

Meta-analysis of clinically diagnosed death by neurologic criteria patients

We estimated partially pooled test sensitivities using a hierarchical Bayesian model in which studies were nested within ancillary test types.16 The three-level (beta-binomial) model was specified as in Kruschke and Vanpaemel, except that we only included one concentration parameter common to each ancillary test type and used a diffuse half-Cauchy prior (with scale parameter of 150) for the concentration parameters rather than a Gamma prior.17 Results are reported as partially pooled sensitivities and random-effect standard deviations (reported as posterior modes and 95% highest density intervals [HDI]). Partial pooling, achieved through the hierarchical structure of the model, took into account both between-test and between-study variance, ensuring that extreme estimates for assessments with few patients and for ancillary test types with few assessments were moderated.

Meta-analysis of clinically suspected death by neurologic criteria patients

We estimated partially pooled sensitivities and specificities using a different hierarchical Bayesian model in which studies were nested within ancillary test types, as before. The three-level (hierarchical summary receiver operating characteristics [HSROC] curve) model was specified as in Rutter and Gatsonis, except that our model had three levels instead of two, and our priors were slightly more informative (for details, see ESM eAppendix 2).18 Results are reported as summary receiver operating characteristics (SROC) curves, summary operating points, and partially pooled sensitivities and specificities with random-effect standard deviations (reported as posterior modes and 95% HDI). Detailed model descriptions and Stan codes are provided in the ESM (eAppendix 2).

To explore clinical and statistical heterogeneity, we planned a priori to fit separate SROC curves for the following subgroups, all based on study-level characteristics: 1) demographic group (adult patients only versus mixed children/adult patients), 2) inclusion versus exclusion of an ancillary test in the clinical diagnosis reference standard, 3) delay between clinical DNC and ancillary testing < 24 hr vs ≥ 24 hr, and 4) presence or absence of clinical examination confounders. Among these planned subgroup analyses, the following three were deemed feasible because of the low proportions of missing values and sufficient heterogeneity in subgroup composition: 1) patient demographic group, 2) inclusion of an ancillary test in the clinical diagnosis reference standard, and 3) presence of clinical examination confounders. For the latter subgroup analysis, we excluded three studies that used a reference standard different from that in the clinical examination (namely, four-vessel angiography), as clinical confounders do not apply to these reference standards. For the subgroup analysis, we followed the methodology as presented in the Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy (section 10.5.3.3).19

Although we planned to conduct sensitivity analyses pertaining to the risk of selection bias, risk of interpretation bias for the studied ancillary test, and risk of bias introduced by the interpretation of the reference test, we were unable to pursue these analyses because of the small proportion of studies with a low risk of bias. Nevertheless, for the meta-analysis of clinically confirmed death by neurologic criteria patients, we performed two sensitivity analyses pertaining to our Bayesian models to assess the degree to which partial pooling influenced the meta-analysis results. First, we modified the data in such a way that all assessments had the average sample size, reducing the role of partial pooling. Second, we loosened the priors of the scale parameters. Since estimates from these sensitivity analyses were largely consistent with the main analysis, we consider our results to be robust (sensitivity analysis results not reported). All analyses were performed with R version 4.1.3 (R Foundation for Statistical Computing, Vienna, Austria) and Stan version 2.21.0 via the rstan package in R.

Role of the funding source

The funder of the study had no role in study design, data collection, data analysis, data interpretation, or writing of the report. The corresponding author had full access to all the data in the study and had final responsibility for the decision to submit for publication.

Results

Overall, 137 records met the selection criteria (Fig. 1). Justifications for study exclusion at full-text screening are provided in the ESM (eAppendix 3).

Fig. 1
figure 1

Flowchart diagram

Descriptive analyses

We included 137 studies (Table 1). Ninety-six studies (70%) had been conducted solely on patients clinically diagnosed with death by neurologic criteria and the remaining 41 (30%) included patients with clinically suspected death by neurologic criteria. Studies reported data on a variety of brain injury etiologies: 111 (81%) with traumatic brain injury, 107 (78%) with intracranial hemorrhage, 83 (61%) with hypoxemic-ischemic brain injury/cerebral anoxia, 77 (56%) with ischemic stroke, and 95 (69%) with other causes. Fifty-five studies (40%) had been performed without clinical confounders, such as drug intoxication, facial or cervical trauma, hypothermia, and hypotension, whereas 73 (53%) included patients with clinical confounders and 9 (7%) did not specify. Ninety-five studies (69%) did not report the delay between the reference standard and index ancillary test(s), with 28 (20%) clearly indicating a delay < 24 hr and the remaining 14 (10%) with delays ≥ 24 hr.

Table 1 Descriptive analyses at the study level

From the included studies, 230 assessments of ancillary tests were made (Table 2). Reference standards were clinical diagnosis and four-vessel angiography in 99% and 1% of assessments, respectively. Ancillary test types most frequently assessed were TCD (25%), CTA (18%), radionuclide imaging (16%), and EEG (16%). Characteristics of each individual study are detailed in the ESM (eAppendix 4).

Table 2 Descriptive analyses at the assessment level

Meta-analysis of sensitivity among clinically diagnosed death by neurologic criteria patients

Partially pooled ancillary test sensitivities estimated from the 94 studies comprising only clinically diagnosed death by neurologic criteria patients (n = 8,891 ancillary tests applied) were overall similar for all ancillary test types (Fig. 2), and ranged from 0.82 (CTA, 7-point and 10-point scales) to 0.93 (four-vessel angiography). Tests with the highest sensitivity estimates were four-vessel angiography (0.93), 99mTc-HMPAO perfusion with SPECT (0.90), CTP (0.90), TCD (0.89), MRI using time-of-flight angiography (0.89), MRI using arterial spin labeling (0.89), visual evoked potentials (0.89), and somatosensory evoked potentials (0.89). The standard deviations of the partially pooled sensitivity estimates were larger within each ancillary test type (σ = 0.10—0.15) than the standard deviation between the partially pooled ancillary test sensitivities (σ = 0.04), suggesting heterogeneity within each ancillary test was considerably higher than it was between tests. Data were most abundant for TCD (24% of applied ancillary tests), EEG (30%), and four-vessel angiography (11%) (Table 3). Forest plots for partially pooled sensitivity estimates by ancillary test are provided in detail in the ESM (eAppendix 5; eFigs 1–19).

Fig. 2
figure 2

Ancillary test sensitivities obtained from the studies comprising clinically diagnosed death by neurologic criteria patients. [%] represents the proportion of ancillary tests performed in the test type among n = 8,891 ancillary tests applied. HMPAO = hexamethylpropyleneamine oxime; SPECT = single-photon emission computed tomography

Table 3 Number of patients pooled by study population and ancillary test type

Meta-analysis of sensitivity and specificity among patients with clinically suspected death by neurologic criteria

Partially pooled ancillary test sensitivities and specificities obtained from 40 studies including patients with clinically suspected death by neurologic criteria (n = 2,732 ancillary tests applied) are provided in Fig. 3. Results were mostly driven by TCD data, which represented 41% of applied ancillary tests (Table 3). There were no data for four-vessel angiography. Overall, ancillary test types showed variable partially pooled sensitivities (0.81–1.00) and specificities (0.87–1.00). The interval estimates (HDI) were considerably wide for the following ancillary test partially pooled specificities: evoked potentials, MRI, 99mTc-pertechnetate angiography, and 99mTc-HMPAO angiography. The following ancillary tests had both acceptable partially pooled sensitivity estimates and excellent partially pooled specificity estimates (in terms of both accuracy and precision): CTA (all scales), CTP, 99mTc-DTPA angiography, 99mTc-HMPAO perfusion (with or without SPECT), TCD, and EEG. Forest plots for partially pooled sensitivities and specificities by ancillary test, as well as respective SROC curves, are provided in the ESM (eAppendix 5; eFigs 20–53).

Fig. 3
figure 3

Ancillary test sensitivities and specificities obtained from the studies comprising clinically suspected death by neurologic criteria patients. [%] represents the proportion of ancillary tests performed in the test type among n = 2,732 ancillary tests applied. ADC = apparent diffusion coefficient; DTPA = diethylenetriamine pentaacetate; DWI = diffused weighted imaging; HMPAO = hexamethylpropyleneamine oxime; SPECT = single-photon emission computed tomography

Risk of bias assessment

Study risk of bias assessment according to the four QUADAS-2 domains is summarized in Fig. 4. Overall, the proportions of studies with a low risk of bias were 7% for patient selection and 12% for patient flow, whereas the proportion of assessments with a low risk of bias was 44% for the interpretation of the reference standard and 12% for the interpretation of the ancillary test. One study (0.7%) had low risks of bias on all QUADAS-2 elements.

Fig. 4
figure 4

Quality Assessment of Diagnostic Accuracy Studies-2 study risk of bias summary

Subgroup analyses

Although data were too sparse to allow reliable subgroup analyses, these did not show significant differences between diagnostic accuracy estimates according to patient demographic group, inclusion of an ancillary test in the clinical diagnosis reference standard, or presence of clinical examination confounders (ESM eFig. 54).

Discussion

In this systematic review and meta-analysis, we assessed a wide variety of ancillary tests currently used in practice and found that 136/137 eligible studies (99%) had an unclear or high risk of bias on at least one QUADAS-2 domain. Study characteristics support this assessment as most studies were conducted in the presence of clinical confounders that alter the reference standard’s validity (53%) or did not specify the delay between reference standards and index tests (69%). Furthermore, most studies only included patients with clinically diagnosed death by neurologic criteria (70%); results from these studies, where the ancillary test is used in a confirmatory role, are not translatable to situations where ancillary tests have a diagnostic role, since they apply to a different patient population and do not assess the trade-off between-test sensitivity and specificity. Finally, we observed significant heterogeneity in sensitivity and specificity estimates across studies. In fact, there was greater heterogeneity in ancillary test accuracy within each ancillary test type than between ancillary test types. These findings likely reflect high variability in the methodological quality of included studies. Since these concerns challenge the internal validity of studies included in this systematic review, caution is mandated in the choice and use of ancillary tests for DNC, as their diagnostic accuracy has not yet been extensively validated in high-quality, rigorous studies.

The methodological shortcomings of the studies included in our meta-analyses call into question the validity of our sensitivity and specificity estimates; however, some general findings remain of interest. First, current data suggest that 99mTc-HMPAO perfusion (both with and without SPECT), EEG, and TCD have reasonable diagnostic accuracy. A recent review of national DNC protocols found that these modalities are the most frequently recommended ancillary tests worldwide in addition to four-vessel angiography, for which we did not find data on specificity.8 Nevertheless, these tests all have specific limitations in clinical practice. For instance, TCD is not applicable in 10–20% of patients that have a poor acoustic window or in patients with significant structural damage to the cranium; EEG is not appropriate in cases of deep chemical sedation; and nuclear imaging is not universally accessible. Second, there is large uncertainty in the specificity estimates of evoked potentials, MRI, 99mTc-pertechnetate angiography, and 99mTc-HMPAO angiography, suggesting that these tests are inappropriate for DNC. Third, in the context of clinically diagnosed death by neurologic criteria, where ancillary tests are used in a confirmatory role, tests have similar sensitivities overall. Some tests have, however, been subject to less investigation, such as CTP and MRI.

The recent World Brain Death Project offers guidance on ancillary testing for DNC, some of which is supported by our findings.2 First, the project recommendations suggest that four-vessel angiography, TCD, and radionuclide imaging combining diffusible radiopharmaceuticals and SPECT are the three ancillary tests deemed most appropriate for DNC. Our analysis indeed shows that these tests have the most robust diagnostic accuracy based on currently available data, which we reiterate is subject to significant bias. Nevertheless, we did not find data on the accuracy of four-vessel angiography among comatose patients, so it was not possible to estimate this ancillary test’s specificity, despite it being historically considered the gold-standard ancillary test for DNC. The World Brain Death Project recommendations also caution against the use of CTA, CTP, and MRA, as they have not been sufficiently studied, which is consistent with our findings. Nevertheless, our results do not provide evidence to support other suggestions made in these guidelines. For instance, there are no data supporting the adjunct role of evoked potentials in patients initially evaluated with EEG. Although there is a physiologic argument to combining evoked potentials, which assess neuronal integrity of the brainstem and cortico-subcortical structures, to electroencephalographic evaluation of supratentorial activity, available data on evoked potential diagnostic accuracy yield specificity estimates with high statistical uncertainty.

Our study has several strengths. Although prior studies have assessed the diagnostic validity of TCD20,21 and CTA,22,23 our work has assessed the diagnostic accuracy of a wide arsenal of ancillary tests currently used in clinical practice using an exhaustive search strategy. We also excluded studies without an a priori definition of ancillary test diagnostic criteria to adhere to strict methodological standards.24 Our findings were robust to statistical model sensitivity analyses, which did not significantly alter estimates. Despite the paucity of data available for several ancillary test types, our analytical model was able to provide clinically useful parameter estimates and heterogeneity estimates for all included ancillary tests. Finally, subgroup analyses allowed us to model and inspect sources of heterogeneity including the presence of clinical confounders to the neurologic examination and the inclusion of an ancillary test in the reference standard. The major limitation of our work is the reliance on data provided by studies with unclear or high risk of bias, which calls into question the validity of our meta-analysis estimates. Importantly, the reference standard, which was a clinical neurologic examination in most included studies, may have been inaccurate in studies where confounders to the examination had not been clearly excluded. The degree to which this influenced our ancillary test diagnostic accuracy estimates is uncertain. Nevertheless, our subgroup analyses comparing the presence or absence of confounders to the clinical examination, and the addition of an ancillary test to the reference standard, did not disclose any significant differences in our point estimates, which suggests that this may not have been a significant source of clinical heterogeneity in our meta-analysis. Studies were also heterogeneous with respect to many other characteristics, such as patient demographics, brain injury etiologies, choice of ancillary tests applied, and ancillary test technology. Moreover, our statistical modeling did not allow us to quantify ancillary tests’ areas under the SROC curve, but use of this measure to represent diagnostic accuracy is controversial.25 Our analytic approach also assumed that ancillary tests were exchangeable, although in reality some studies used the same material to interpret different ancillary tests (for instance, some studies examined different CTA scales using the same images), which calls into question whether these tests may have some dependency unaccounted for by the models. Finally, we did not consider other factors pertaining to the use of ancillary tests, such as cost, reliability, and availability, which are beyond the scope of our study.

In light of our study’s findings, we believe high-quality research is warranted to provide accurate and valid measures of DNC ancillary tests’ diagnostic accuracy. As technological innovation advances, a growing number of diagnostic tests are being developed in neuroradiology and neurophysiology. Prior to being transposed in clinical practice for DNC, these modalities should undergo thorough accuracy evaluation in rigorous studies applied to appropriate study populations. Hopefully, ancillary tests will eventually evolve to reliably assess cerebral function even in the presence of clinical confounders, such as chemical sedation, instead of relying on other related, but not equivalent, brain physiology parameters (cerebral blood flow or perfusion). As current ancillary tests all assess surrogates for clinical brain function, it is not surprising that test sensitivities are imperfect, as these tests may often show persistent blood flow, perfusion, or neurophysiologic function that is insufficient to produce pertinent clinical cerebral function, particularly among patients suspected of death by neurologic criteria following a primary infratentorial brain injury.26,27 Nevertheless, until significant diagnostic advances in ancillary testing are made, clinical examination should remain the cornerstone of DNC and ancillary testing should retain its role in providing further assurance to the presence of death by neurologic criteria in situations where the clinical examination may be unreliable or impossible to complete.

In conclusion, clinicians employing ancillary tests in DNC should be aware that the studies assessing their diagnostic accuracy have modest methodological quality and are subject to significant risk of bias. Our findings have implications for clinical practice since patients who require ancillary testing in the process of DNC should be subjected to tests with near-perfect specificity and robustly studied diagnostic accuracy. Further high-quality studies are required to thoroughly validate ancillary tests for DNC.