Introduction

The objective assessment of visual acuity has become increasingly important over the past few years. One way to achieve this is based on visual evoked potentials (for reviews see [1, 2]), often termed “sweep VEP”, “stepwise sweep VEP” or “acuity VEP. Bach and coworkers [1] described a method that combined:

  • Brief onset checkerboard presentation, yielding relatively high amplitudes,

  • Temporally in the steady-state region, allowing Fourier transform-based analysis,

  • Laplace montage, yielding high noise rejection,

  • Application of the Meigen/Bach statistic [3], yielding noise-corrected response and significance,

  • An automated “heuristic algorithm” for regression point selection, enabling an acuity estimate, or the outcome of “no result” even when a “notch” [4] is present.

This approach was used in the Freiburg laboratory for a decade with high testability, (i.e., a “no result outcome” occurs in only 5–10% of the cases); problems in amblyopia have been described [5, 6], and the algorithm was successfully extended to very low acuities (≈ 2 LogMAR) [7]. The stimulation and recording system was also used in other laboratories and is available free of charge [8]. However, the platform hardware and software is outdated (e.g., MacOS 9), so a re-implementation was needed to enable a broader set of clinical and research users to operate the method in their clinics. Diagnosys expressed an interest in the method and implemented it following the method previously reported [1], and we herein report the outcome of that implementation.

Methods

Equipment and Stimuli

Steady-state VEPs were recorded using a Diagnosys Espion E3 System (Lowell, MA, USA). Checkerboard stimuli were presented in brief onset mode, two frames (33.3 ms) on at 100 cd/m2 and six frames (100 ms) off, corresponding to 7.5 Hz with a stimulus distance of 180 cm and a contrast of 40%. Three sets of check sizes were used, one for the highest VA range (‘‘Range A’’), one for medium VA range (“Range B”) and one for the lowest VA range (‘‘Range C’’). For Range A, the check sizes were 0.37° to 0.05°, for Range B they were 1.19°–0.17° and for Range C they were 4.0°–0.57°. Six check sizes were used in each Range, and cumulatively across all three Ranges there were twelve unique check sizes (i.e., there is overlap of checks sizes between the Ranges). In this study for the three acuity conditions of each participant, Range A and Range B were always used, and depending on the adjusted acuity of the participant with the strongest Bangerter foil, either Range B or Range C was used for the lowest acuity recording. The benefit of having three check size ranges available is that in clinical use this approach will usually keep the test time shorter for a patient. Typically, the clinic has a general understanding of the acuity range the patient is likely to fall within thereby enabling it to choose one of the three shorter protocols. In cases where the clinic does not have that knowledge, they can run the full set of twelve check sizes on a patient.

Freiburg Acuity Test (FrACT) [9] measurements were taken on a standard PC with a screen size of 58.5 cm (diagonal) using the same Bangerter foils used for the steady-state VEP recordings, also at a distance of 180 cm from the computer.

Recording

The VEP was recorded using gold cup electrodes at Oz, O1 and O2, referenced to Fz. In accordance with the ISCEV VEP standard [10], Oz was placed on the midline at 10% above the inion. O1/O2 were placed laterally to Oz at a distance of 10% the ½ head circumference on either side of the Oz electrode. Signals were amplified by a factor of 8, digitized at a rate of 1 kHz with 32-bit resolution and digitally filtered in the range of 5–50 Hz. Averaging was arranged to capture exactly eight on-/offset periods in 1066-ms epochs. fourty sweeps were taken for each step within an artifact rejection window of ± 100 µv. For each step, the Laplace transform is calculated from the signals obtained at the Oz, O1 and O2 electrode locations (VEPLaplace = 2Oz–(O1 + O2)). The software then calculates the Fourier transform of each resulting signal at each step and plots the resulting six amplitudes by log spatial frequency (dominant of the check size [11]). Finally, the software calculates a visual acuity estimate based on methods described below. A simplified recording setup is depicted in Fig. 1. We define a set of six traces (along the chosen checkerboard set) to be one “recording.”

Fig. 1
figure 1

Basic setup of the stepwise sweep VEP test methodology. The participant is positioned in front of a monitor that presents on–off checkerboard patterns. Signals are measured by the system from three active electrodes on the participant (Oz, O1, O2), referenced to an electrode on the participant’s forehead. As the signals are recorded, the system first completes a Laplace transform on the data (2Oz–O1–O2) and then a discrete Fourier transform to determine one VEP magnitude for each set of check sizes. This particular example also exhibits a strong 2nd harmonic

Participants

There were twenty-four participants in the study (14 male and 10 female), with an age range of 19 to 74 years old (mean age was 46.5), and in each case both eyes were tested. Participants were given the choice of either using their habitual eyeglasses during the test or not, and that same condition was used for all tests for that participant. Under these conditions the participants’ visual acuity ranged from approximately 0.6 to better than − 0.15 LogMAR. Each participant completed one set of tests with full vision conditions (defined as participants with their chosen correction, and no Bangerter foil), which was typically 0.30 LogMAR or better. An additional set of tests were then conducted using a Bangerter foil that was intended to reduce the participants’ acuity to approximately 0.4 LogMAR, and a final step with a foil intended to reduce acuity to approximately 1.0 LogMAR. Since every participant was recorded under several conditions, one might think that the “eyes or patients” problem might arise [12]. Given that we are using descriptive, not inferential statistics, this is not a problem.

Analysis

Response traces were de-trended and subjected to a discrete Fourier transform (DFT). Because care was taken to choose the analysis interval (1066 ms) to be an integer multiple of the stimulation period, there is no overspill in the spectrum [13] and the noise can be estimated by averaging the magnitudes recorded at the two neighboring frequencies (6.5 and 8.5 Hz). The ‘true’ response magnitude at 7.5 Hz was calculated by non-linearly subtracting the noise from the magnitude measured at 7.5 Hz, and finally a significance for the response at 7.5 Hz was also calculated [3].

Responses are recorded over six check sizes. Ideally, the stimulation of the various check sizes would be interleaved, but this was not yet implemented on the system in this study. The six check sizes were selected from a range of 0.05° to 4.0°, as appropriate for the expected VA. The responses were processed as described above, resulting in 6 values for the response magnitude plus the associated significances. From these, the heuristic algorithm, starting at small check sizes, selects as many points as possible up to peak response and avoiding a notch [4] if present. The resulting points are regressed to zero magnitude on a log(spatial frequency) scale, resulting in the value SF0. SF0 is divided by 17.6 (calibration factor, [1]), yielding a decimal acuity estimate VAdec(VEP). This is converted to LogMAR using the standard formula: VALogMAR = − log10(VAdec). When insufficient points are found or other irregularities occur, a “no result” outcome is flagged, reducing “testability.”

The relationship between behavioral acuity and the VEP-based acuity estimate is quantified in terms of the Bland-Altman limits of agreement (LoA) [14]. Frequently for such a task the correlation coefficient is computed; that is, however, an inappropriate measure because it is normalized by range [12, 15].

Results

Altogether we recorded 864 traces (24 participants · 2 eyes · 3 VA conditions · 6 check sizes). In Fig. 2, one recording (representing one set of traces across the six selected check sizes) is depicted for the full vision condition (participant id 4, right eye). Its heuristic analysis outcome is seen in Fig. 3.

Fig. 2
figure 2

Left: VEP traces (after calculating the Laplace transform) across a series of check sizes (0.05° bottom, 0.37° top). There are eight responses per sweep, with marked frequency doubling at intermediate check sizes. Right: Magnitude spectra of these traces after discrete Fourier transform (DFT). The firsrt harmonic response is at 7.5 Hz, a marked second harmonic is also obvious here (rarely as strong). No evidence of overspill is seen in the spectra. The magnitudes at the stimulus frequency (7.5 Hz) and their immediate neighbors form the basis for further analysis (Fig. 3)

Fig. 3
figure 3

Check size tuning curve based on Fig. 2 data. The abscissa depicts the dominant spatial frequency of the pertinent checkerboards. The ordinate depicts the spectral magnitude at 7.5 Hz after noise removal. The signal-to-noise ratio enables the calculation of individual significance, here indicated by stars versus circles. The dashed line represents the regression of the three points selected by the heuristic algorithm and its extrapolation to zero magnitude. results in SF0, the spatial frequency for zero magnitude. In this example, SF0 is 14.6 cpd, corresponding to a decimal acuity of 14.6/17.6 = 0.83 or 0.081 LogMAR

Across all 144 recordings, the heuristic algorithm reported “success” in 136 cases, and “no result” in 8 cases. This corresponds to a testability of 94%. In Fig. 4, the VEP-based acuity estimate of the 136 success cases is plotted versus their behavioral visual acuity. Since the unit “LogMAR” quantifies visual loss, not visual acuity, the LogMAR scale is inverted, showing good acuity at top right. The Bland-Altman limits of agreement were calculated to be ± 0.31 LogMAR and there was a tendency of the VEP-based acuity method to underestimate acuity in the lower acuity conditions, compared to the behavioral visual acuity measurements. One particular outlier disagrees by 0.5 LogMAR; inspecting its original data (and all others) showed no independent reason to exclude it.

Fig. 4
figure 4

Relation of the VEP-based visual acuity estimate (ordinate) and behavioral visual acuity (abscissa) using inverted LogMAR scales (good acuity top right). Blue stars indicate acuity in the “full vision” condition. Acuity was artificially reduced with foils to (nominally) decimal 0.4 (green squares) and decimal 0.1 (red diamonds). The Bland–Altman limits of agreement (LoA, grey lines parallel to the identity line) were calculated to be ± 0.31

Discussion

Using the Acuity VEP method from [1], we found a high testability (94%) and a reasonably close agreement of behavioral and VEP acuity estimates (95% limits of agreement of ≈ 3 lines). Behavioral test-retest LoA can be as low as 0.1 LogMAR [16, 17] but will be markedly higher in a clinical population. Thus, the LoA of ± 0.31 for the Acuity VEP seems acceptable and is very close to the one reported earlier [1]. The possibility of outliers should, of course, always be considered when analyzing patient data in clinic.

The current implementation of the Acuity VEP method steps through the six check sizes sequentially starting from the largest one, recording 40 sweeps at each check size. Each step takes approximately 40 s, resulting in a total of about 4 min of recording per eye. In principle, this allows the technician to stop adding steps with finer checks when the amplitude drops. However, this should never be done because insufficient data may be recorded thereby preventing a proper regression and the test could miss a notch region (e.g., the amplitude might rise again for smaller checks). In the present data set, two recordings had “no result” for this reason. In practice, the protocol is created in a stepwise fashion to allow the patient a brief time to rest at a few points during the test.

The VEP protocol had been set up with digital filtering in the range of 5 to 50 Hz. In hindsight, this seems unnecessarily narrow. However, this filtering does not affect the results because the heuristic algorithm is solely based on the 7.5 Hz spectral line (response) and its immediate neighbors (noise estimators). Even some mains intrusion (at 50 or 60 Hz, depending on locality) would not have a detrimental effect. In the future, the Acuity VEP protocol will use standard VEP filtering settings.

We have not addressed test-retest agreement here, since this is implicitly covered by the analysis of agreement between the behavioral visual acuity and the VEP acuity outcome. In future work, it may be of interest to specifically assess the test-retest agreement to analyze the relative variance contribution from interindividual vs. intraindividual sources.

The Freiburg Acuity VEP has previously been found to be of substantial aid in the management of patients with non-organic visual loss. The present study shows that the method has been implemented effectively in a commercial system, enabling its use in a broader set of clinical sites; the method is also being validated for pediatric applications. We would also welcome critical third party assessments without our own conflicts of interest. Finally, if the machine learning approach lives up to its promise [18], it can be applied to the Freiburg Acuity VEP method potentially extending its value to researchers and clinicians.