Introduction

Routine electrophysiological assessment of optic nerve and macular retinal ganglion cell (RGC) function often involves cortical visual evoked potential (VEP) and pattern electroretinogram (PERG) methods, performed according to well-established standardised protocols [1,2,3]. There is increasing interest in use of the photopic negative response (PhNR) to assess global [4,5,6,7] and focal [8,9,10,11] RGC function. The PhNR can be evoked with many different flash strength and wavelength combinations including white flashes on a white background (WW PhNR) and red flashes on blue background (RB PhNR). The International Society for Clinical Electrophysiology of Vision (ISCEV) extended protocol for the PhNR recommends the use of RB stimuli as studies have reported that this yields a larger amplitude response than broadband (WW) stimuli [12,13,14], with a few exceptions [15,16,17]. Diagnostic accuracy relates to the ability of a test to discriminate between health and a target condition [18], and previous studies have compared the diagnostic accuracy of different PhNR stimuli using a case–control methodology [14,15,16, 19,20,21,22], with most focussed on the diagnostic and prognostic potential in glaucoma [23,24,25,26,27,28]. There is a lack of published studies that compare the relative diagnostic accuracy of RB and WW PhNR stimuli without prior knowledge of the diagnosis in a heterogeneous clinical population.

This study investigates the diagnostic accuracy of WW and RB PhNRs, compared with a test battery of clinical tests routinely used in the diagnosis of retinal ganglion cell disease (our target condition). The aim was to test whether WW PhNR stimuli, used routinely to record the ISCEV standard light-adapted full-field (LA 3) ERG [29], are a suitable alternative to RB stimuli for the detection of retinal ganglion cell dysfunction.

Methods

This was a prospective, paired diagnostic accuracy study conducted at Moorfields Eye Hospital, London, UK. Ethics committee approval was granted by UK National Research Ethics Committee Wales 6 (reference: 20/WA/0300). All eligible adult (18 +) patients attending the Department of Electrophysiology within the recruitment window were identified through the triage of outpatient referrals and invited to participate in the study.

Each participant underwent examinations and electrophysiology according to routine clinical management with an additional PhNR protocol added at the end of electrophysiological testing. Participants were excluded from the study if they met any of the following criteria:

  1. 1.

    Paediatric patients (< 18 years).

  2. 2.

    Declined or were unable to provide consent.

  3. 3.

    Reference tests were unavailable or unrecordable (e.g. undetectable responses due to severe photoreceptor disease).

  4. 4.

    Poor quality test results, e.g. excessive eye movement/blink artefact, muscle tension, mains artefact.

Procedures

Photopic negative responses (index tests)

PhNRs were recorded binocularly using gold foil corneal electrodes with ipsilateral outer canthus reference electrodes, with a ground electrode situated on the forehead. Pupils were pharmacologically dilated using 1% tropicamide (in many cases supplemented with 2.5% phenylephrine hydrochloride). Responses were recorded using an Espion-E3 system (Diagnosys LLC, Lowell, USA). RB PhNR stimuli consisted of red (640 nm) flashes of ≤ 4 ms duration and 1.5 phot cd·s·m−2 stimulus strength presented on a blue background (450 nm; 10 phot cd·m−2) as specified in the ISCEV extended protocol for the PhNR [12]. The final result for every acquisition was an average of ≥ 15 responses. Amplifier bandwidth was 0.125-300 Hz. Traces exceeding ± 200 μV were automatically rejected as artefactual. WW PhNRs were recorded adhering to the ISCEV ERG standard [29] and consisted of white 3.0 cd·s·m−2 flashes on a white 30 cd·m−2 background. Amplifier bandwidth was 0.31-500 Hz. All other WW parameters were the same as for the RB PhNR. One eye from each participant was chosen for analysis: either the affected eye in uniocular disease or the left eye when symptoms or pathology were bilateral. The test order was the same for all participants (WW then RB) with approximately the same amount of time between the two tests.

The top right panel in Fig. 1 highlights the main waveform components of the photopic negative response. The PhNR was measured from the baseline to the deepest trough of the negativity that followed the b-wave, either before or after the i-wave. Additionally, the PhNR-to-b-wave ratio (PhNR:b) was calculated after measuring the PhNR from the peak of the b-wave to the deepest trough that followed.

Fig. 1
figure 1

Single eye recordings from two of the study participants. White-white (top row) and red-blue (second row) PhNR recordings and PERGs (third row) and PVEPs (bottom row). Patient 1 (left column), a 68-year-old female, demonstrated no abnormality on any test (normal findings). Patient 2 (right column), a 68-year-old female, was referred with a clinical diagnosis of Glaucoma. Arrows highlight the elevation of the PhNR trough in patient 2, in keeping with generalised retinal ganglion cell dysfunction. The PERG N95:P50 ratio was reduced and the pattern VEP P100 component was delayed and of subnormal amplitude. The main PhNR waveform components are highlighted in the top left panel. Vertical black lines show a- and b-wave amplitude measurements, and PhNR amplitude measurements from baseline. The dashed black line represents the PhNR measured from the peak of the b-wave as used to calculate the PhNR-to-b-wave ratio

Reference tests

As no single gold standard test of RGC function exists to directly compare against the PhNRs, a battery of reference tests was used. This consisted of the pattern electroretinogram (PERG) [1] and pattern visual evoked potentials (PVEP) [2], often performed together as part of routine test protocols in the electrophysiology clinic. Fundus photography (Optos plc, Dunfermline, UK) and optical coherence tomography (OCT) (Spectralis Heidelberg Engineering Ltd, Heidelberg, Germany) measures of retinal nerve fibre layer thickness (RNFL) and mean ganglion cell layer volume were assessed as part of routine clinical assessment. Additionally, relevant clinical and family history was recorded from all participants during their visit, as part of routine clinical care. All reference tests were performed according to current clinical standards.

Due to the nature of the study population, clinical judgement was required to assign the participants into groups according to evidence of RGC pathology. All participants with a reduced PERG N95:P50 ratio were included in the ‘evidence of RGC pathology’ group, as were those with OCT evidence of RNFL thinning. In all other cases, at least two abnormal reference tests were required. In cases where only one of the reference tests were abnormal, the clinical notes were reviewed and those with an established diagnosis of optic neuropathy were included, e.g. glaucoma and abnormal VEP.

The investigator interpreting the index tests (SL) was masked to the result of the reference tests. Conversely, investigators (AR and MN) interpreting the reference tests were masked to the results of the index tests.

Definition of clinically significant retinal ganglion cell dysfunction

Significant RGC dysfunction was defined using local reference ranges from the control PhNR dataset. The lower limit (5th centile) of the reference ranges for amplitudes of the RB and WW PhNRs was 18.4 µV and 12.8 µV, respectively. Reference test results (including PERG and VEP) were compared with local reference ranges and were analysed by experienced electrophysiologists (MN; AGR). Participants were then categorised into either the ‘no evidence of RGC pathology’ group or the ‘evidence of RGC pathology’ group.

The primary outcome was the difference between the sensitivities and specificities of WW and RB PhNRs, derived using paired contingency tables. Secondary outcomes were the difference between the group amplitudes of the two PhNR types, PhNR positive and negative predictive values and area under the receiver operating characteristic (ROC) curves.

Statistical methods

Descriptive statistics were performed on PhNR amplitudes. The distribution of the data was evaluated using the Shapiro–Wilk test, and the RB and WW groups were planned to be compared using either a two-tailed t test or the nonparametric Mann–Whitney U test if the distribution of the data was not Gaussian. Results were considered statistically significant if p < 0.05. Sensitivities and specificities were calculated using paired contingency tables [30]. McNemar tests were used to compare the estimated sensitivities, specificities, positive predictive values (PPVs) and negative predictive values (NPVs) of the WW and RB PhNRs [31, 32]. The relationship between true (sensitivity) and false (1-sensitivity) positive rates across a range of cut-off points was investigated using ROC curves, and test performance was measured using the area under the ROC curve (AUC) [33]. Where applicable, 95% confidence intervals (CI) were applied to the above results. Analyses were conducted using R version 3.6.3 (R Foundation for Statistical Computing, Vienna, Austria) and OriginPro 2019 (OriginLab, Northampton, USA).

Sample size calculation

Sample size was calculated following the methods of McCray et al. [34]. Prevalence of RGC dysfunction within the study population was estimated to be 50%. A power of 80% at an alpha level of 0.05 was used for the calculation, giving an estimated minimum sample size of 152 participants.

Results

Baseline demographics

Recruitment was between March 2021 and February 2022 and included 243 consecutive patients who provided consent to take part in the study. The flow of participants through the study is outlined in Fig. 2. Twenty-nine participants had undetectable or residual full-field ERGs due to severe generalised retinal dysfunction and were excluded from the analysis. RB PhNR recordings from 14 participants had excessive levels of blink/eye movement artefact precluding reliable quantification and were excluded from the analysis. Results from the remaining 200 participants were analysed (completion rate 82%). The median age of participants was 54 years (range 18–95), and 129 participants (65%) were female. Table 1 summarises the characteristics of the participants who were categorised as either having evidence of RGC dysfunction or no evidence of RGC dysfunction. Clinical findings and history, VEP and PERG results were available for all participants. OCT RNFL was performed on 59% of participants, and mGCL volume results were available for 90%. Table 2 summarises the results of the reference tests in those in the ‘evidence of RGC pathology’ group. There were no significant time delays in conducting any of the investigations, and no adverse events occurred as a result of any of the tests.

Fig. 2
figure 2

Flow of all participants through the study categorised with the baseline-to-trough PhNR

Table 1 Baseline demographics and characteristics of all participants recruited to the study
Table 2 Abnormal reference test results in those in the ‘reference positive’ group

Photopic negative response amplitudes

Shapiro–Wilk tests determined that PhNR amplitudes were not drawn from a normally distributed population, and therefore the nonparametric Mann–Whitney U test was used. Amplitude findings are presented in Fig. 3. The median baseline-to-trough amplitudes of RB and WW PhNRs were 27.3 µV and 22.6 µV, respectively. The mean baseline-to-trough amplitudes of RB and WW PhNRs were 28.2 µV and 23.7 µV, respectively. The minimum and maximum, median and 5th and 95th centiles of RB PhNR amplitudes were larger than those of the WW PhNR, and there was a statistically significant difference between the amplitudes of RB and WW PhNRs in the participants without any RGC pathology (p = 0.02). There was no significant difference between the amplitudes of RGC pathology positive RB and RGC pathology positive WW PhNRs (p = 0.40). There was a highly significant difference between the amplitudes of all participants with RGC pathology grouped together versus the participants with no evidence of RGC pathology (p < 0.0001).

Fig. 3
figure 3

Half box plots showing the distribution of PhNR amplitudes for all patients grouped according to the reference test result. Individual data points from each participant are shown to the right of each box. Whiskers show 5th and 95th centiles. Boxes show the 25th centile, median, and 75th centile

Estimates of diagnostic accuracy

The diagnostic performance of WW PhNRs and RB PhNRs were compared using contingency tables. A summary of the results is displayed in Table 3. Forty-five patients had evidence of RGC dysfunction giving an overall prevalence of 23% in the study cohort. An example of the electrophysiological findings from two participants is shown in Fig. 1.

Table 3 Diagnostic accuracy measures for all participants (N = 200)

Baseline-to-trough

The sensitives of WW and RB PhNRs were 53% (95% CI 39% to 68%) and 62% (95% CI 48% to 76%), respectively. The difference between the sensitivities was -9% (95% CI − 17% to − 1%). Specificities were 80% (95% CI 74% to 86%) and 78% (95% CI 72% to 85%), for WW and RB PhNRs, respectively. The difference between the specificities was 2% (95% CI 0% to 4%). Positive predictive values of WW and RB PhNRs were 44% (95% CI 31% to 57%) and 45% (95% CI 33% to 58%), respectively. Negative predictive values were 86% (95% CI 80% to 91%) and 88% (95% CI 82% to 93%), for WW and RB PhNRs, respectively. McNemar’s test found a statistically significant difference between the sensitivities of WW and RB PhNRs (p = 0.046). There was no statistically significant difference between the specificities of WW and RB PhNRs (p = 0.08). There were no statistically significant differences between the WW and RB positive predictive values (p = 0.52) or the WW and RB negative predictive values (p = 0.08).

PhNR:b-wave ratio

PhNR:b-wave ratios were also analysed, and detailed results are given in Table 3. Compared with PhNR amplitudes, ratio values reduced the sensitivity to RGC dysfunction from 53 to 40% and from 62 to 49%, for the WW and RB PhNR, respectively. Use of the PhNR:b-wave ratio increased specificity from 80 to 89% and from 78 to 88% for the WW and RB responses, respectively.

The WW PhNR amplitudes were abnormal, and the PhNR:b-wave ratio was normal in nine of 45 cases and the ratio solely abnormal in three cases. For RB, PhNR amplitudes were abnormal, and the PhNR:b-wave ratio was normal in seven of 45 cases and the ratio solely abnormal in one case.

‘Normal’ LA 3 a- and b-waves

In this sub-analysis, participants with an LA 3 a- or b-wave outside of the control reference range (amplitude and peak time) were excluded. This was to examine whether the diagnostic accuracies (outlined above) were influenced by other, non-RGC pathologies. Fifty-one participants met these criteria and were excluded leaving a total of 149 (38 with RGC pathology). The results are summarised in Table 4. The sensitivities of WW and RB PhNRs were 50% and 61%, respectively. The difference between the sensitivities was − 11%. Specificities were 94% and 90%. The difference between the specificities was 4%. There was a statistically significant difference between the sensitivities and the specificities of WW and RB PhNRs (both p = 0.046).

Table 4 Diagnostic accuracy measures for patients with normal LA 3 a- and b-waves (N = 149)

ROC curves

Figures 4 and 5 display ROC curves for the PhNRs with sensitivity plotted against 1-specificity. The output values from the ROC curves are summarised in Table 5. The AUC value for the WW PhNR was 0.73 (95% CI 0.65 to 0.82; p < 0.001). The RB PhNR AUC was 0.74 (95% CI 0.66 to 0.82; p < 0.001). The criterion values (optimal cut-offs where sensitivity and specificity values are closest to the AUC value and have a minimal difference between them [35]) for the WW and RB PhNR amplitudes were 17.7 µV and 23.5 µV, respectively. In participants with normal LA 3 a- and b-waves, ROC AUC values increased to 0.81 for both WW and RB PhNRs.

Fig. 4
figure 4

Receiver operator characteristic curves for WW and RB PhNRs from all patients (N = 200)

Fig. 5
figure 5

Receiver operator characteristic curves for WW and RB PhNRs from patients with normal a- and b-waves (N = 149)

Table 5 Receiver operator characteristic curve summary for all participants

Discussion

This study examines the diagnostic accuracy of PhNRs in the largest patient cohort to date, by comparison with multi-modal assessments of optic nerve structure and standardised electrophysiological tests of function. Uniquely, the diagnostic accuracy of PhNRs evoked by chromatic and broadband stimuli is compared in a heterogeneous rather than a case-controlled clinical population, providing a robust estimation of sensitivity and specificity more applicable to the general patient population. The potential of using a widely available ISCEV standard full-field ERG protocol to assess retinal ganglion cell function is examined.

This study showed that PhNR amplitudes are larger when elicited by the ISCEV-recommended RB [12] rather than ISCEV standard (LA 3 ERG) WW [29] stimuli, consistent with several previous comparisons of chromatic and broadband stimuli [14, 19]. It has been suggested that a chromatic stimulus may preferentially stimulate a single subtype of cone, reducing the amount of spectral antagonism in the receptive fields of the RGCs [18]. It is noted, however, that PhNR amplitudes are only minimally influenced by the chromaticities of the flash and background when stimuli are expressed in photopic photometric terms [15, 36] and photopically matched, with one report that RB are larger than WW PhNRs only at higher flash strengths (3 phot cd·s·m−2). In the present study, the statistically significant amplitude difference between PhNRs seen in those without RGC pathology was not apparent in patients with RGC dysfunction. The possibility that dysfunction disproportionately attenuates chromatic-evoked responses and that this may be related to disease type and severity warrants further investigation.

The study shows that the RB PhNR has a higher sensitivity than the WW PhNR for the detection of RGC dysfunction (62% vs 53%) and that this difference is statistically significant. The confidence intervals of the difference between the sensitivities (95% CI − 17% to − 1%) further support this finding as the range does not encompass zero [37]. These findings suggest that the RB PhNR stimulus is better able to detect RGC pathology than the WW stimulus in a heterogeneous clinical population. Estimations of specificity did not significantly differ for the two stimuli (78% and 80% for RB and WW), suggesting that both methods can identify unaffected individuals to a similar degree. The findings suggest that the RB PhNR is more likely to detect RGC disease in affected individuals, but as the specificities of the PhNR stimuli are equivalent, a positive finding in the WW PhNR is just as likely to be a true positive as a positive result from the RB PhNR. This is an important finding in the context of clinical practice, as RB PhNR protocols are not fully standardised and are less widely available than WW PhNRs, as the latter form part of the ISCEV standard full-field LA 3 ERG, used routinely to assess retinal function.

In our cohort, the PhNR:b-wave ratio lowered the sensitivity of the test when compared with the PhNR amplitude measure. Perhaps not surprisingly, specificity increased as the ratio takes into account ERG amplitude variability and the possibility of ERG b-wave attenuation, e.g. due to retinal (non-RGC) pathology. In support of this, the specificity of both measurements was equivalent when cases with an abnormal LA 3 ERG a- and b-wave were excluded from the comparison, highlighting the importance of considering retinal function and light-adapted ERG a- and b-waves when interpreting the PhNR clinically.

There were no significant differences in the ROC AUCs of WW and RB PhNRs measured with the amplitude or ratio methods, consistent with some previous studies on patients with glaucoma [10, 38]. However, differences have also been reported; Cvenkel et al., [23] found that the PhNR amplitude provided significantly larger AUCs than the ratio for both suspect and early glaucoma, and Preiser et al. [39] found that the PhNR:b-wave ratio yielded higher AUC values than amplitude measures in pre-perimetric but not manifest glaucoma. These conflicting reports may relate to different methods and individual differences, and it may be prudent to consider both measurements (of the same waveform) for diagnostic or monitoring purposes.

In this study, the estimated area under the ROC curves for RB and WW PhNRs suggests only a modest level of diagnostic accuracy, as defined by ROC reporting guidelines [33]. A contributory factor may be the heterogeneity and diversity of the clinical patients examined, with different disorders and at different stages of disease severity. The overall diagnostic accuracies of RB and WW PhNRs estimated by AUCs were equivalent (0.74 vs 0.73 for RB and WW), a finding in contrast with some prior reports. Sustar et al. [14] reported AUC values of 0.97 and 0.74 for RB and WW stimuli, respectively, and Banerjee et al. [21] also reported higher RB AUC values of 0.90 compared with 0.76 for WW stimuli. Hara et al. [16] calculated AUCs for a range of photopically matched RB and WW stimuli and found that the 3.0 cd·s·m−2 RB stimulus provided the best diagnostic accuracy overall (AUC = 0.94); the best WW PhNR stimulus was obtained with a 2.0 cd·s·m−2 flash (AUC = 0.88). In these studies, examinations of pathology were restricted to patients with glaucoma, which may account for some of the divergence with our findings; nonetheless, the AUC evidence suggests that RB stimuli may have better overall diagnostic accuracy than WW stimuli for the detection of glaucoma.

Our heterogeneous group of patients with RGC dysfunction included seven with mitochondrial optic neuropathies (Leber hereditary optic neuropathy (LHON) and autosomal dominant optic atrophy (DOA)). These disorders primarily affect the papillomacular bundle, with relative sparing of peripheral RGC axons [40], particularly in the early stages of the disease process. As the full-field PhNR is a global measure of RGC function it is likely to be less sensitive to focal/central RGC dysfunction than the PERG or the focal PhNR. Majander et al. [41] reported that the majority of full-field PhNR responses from a cohort of patients with LHON were normal or near the lower limit of normal. Likewise, Miyata et al. [42] found only mildly abnormal/borderline reductions in the full-field PhNR in patients with DOA. Tamada et al. [43] investigated the ability of focal and full-field PhNRs to detect optic nerve atrophy and found that focal PhNRs were more sensitive to damage in eyes with a central visual field defect. The inclusion of patients with both central and diffuse RGC damage in the present study is likely another reason that the estimations of sensitivity and specificity are lower than previous reports.

There are some limitations to this study. In our preliminary experiments, a stimulus–response series was used to determine the optimal RB PhNR stimuli to compare against the broadband PhNR. An aim was to assess the diagnostic accuracy of the ISCEV LA 3.0 ERG PhNR component, which may be of lower diagnostic accuracy compared with optimised WW stimuli. Our aim to compare the LA 3 WW PhNR against the RB PhNR from the ISCEV extended protocol also meant that there was a slight difference between the amplifier bandwidths of the two protocols. The high-pass filter of the WW PhNR was 0.31 Hz, while the high pass of the RB PhNR was 0.13 Hz. As the PhNR is a relatively slow frequency response, this difference may have contributed to the smaller PhNR amplitude that we reported for the WW PhNR. Another limitation is that there is no single gold standard test of retinal ganglion cell dysfunction and therefore multiple tests combined into a test battery were needed to establish the diagnostic status of the participants. Not all participants performed all of the reference tests; this means that not all participants were compared to an identical reference standard, raising the possibility of some verification bias [44]. The group comparison showed that sensitivity to RGC dysfunction was higher for PhNR amplitude than for the PhNR:b-wave ratio, but it is acknowledged that the ratio is likely to be informative in cases of dual pathology, e.g. to judge the severity of optic nerve/RGC dysfunction in the presence of retinopathy (manifest as ERG b-wave and ‘downstream’ PhNR reduction). Strengths of the study include that it was appropriately powered and based on an ethnically diverse and relatively large number of patients, from a consecutive sample referred to an electrophysiology department rather than a case–control population. This enabled a robust estimation of sensitivity and specificity generalisable to the general population.

The PhNR yields moderate levels of diagnostic accuracy for the detection of retinal ganglion cell dysfunction in a heterogeneous clinical cohort. PhNRs evoked by red flashes on a blue background are more sensitive to dysfunction than white-on-white stimuli, but there is no significant difference between the relative specificities of the two PhNR methods. The study highlights the value and potential convenience of using the WW stimulus, already used widely for routine ERG assessment of retinal function.