1 Introduction

During human locomotion, vision is disturbed by head perturbations [1]. The vestibulo-ocular reflex (VOR) helps the eyes fix on a target ahead, by rotating the moving eyes in the opposite direction [2]. This compensatory eye movement is induced by sensing angular acceleration in the semicircular canals of the vestibular system [3]. However, vestibular dysfunction in the form of blurring of vision in that a target slips from the fovea makes an eye movement pattern called nystagmus [4]. Visual acuity can decrease by 50% when an object is only 2° from the center of the fovea, which can be caused by Nystagmus [5]. In addition, 35.4% of US adults aged 40 or older had some degree of vestibular dysfunction according to the survey from 2001 to 2004 [6]. There are two gold standard methods to diagnose nystagmus and evaluate the semicircular canals, rotatory chair tests are conducted on suspected patients [7]. These tests include the sinusoidal harmonic acceleration (SHA) test and the velocity step test. The SHA test measures factors that can infer a degree of dizziness by rotating a chair with frequencies of 0.01–0.64 Hz increased by two times in terms of sinusoidal velocity, and by measuring eye movement [7]. To analyze eye movement signals with the SHA test, three factors are calculated, which are gain, symmetry, and phase. Each feature represents overall responsiveness, the degree of symmetry between left and right stimuli, and time difference between head stimulus and eye movement. The velocity step test measures vestibular response decay by suddenly changing chair velocity and keeping it constant [7]. However, the SHA test is preferred generally due to limitations with the velocity step test that includes the need for a powerful chair, lower reliability, and more stress to the patient [7]. In this study, only SHA tests were conducted.

Electronystagmography (ENG) and scleral search coil systems (SSCSs) were used to collect eye movement signals [8, 9]. ENG measures movements of the eye by attaching electrodes around the nose, while ENG has the advantages of being able to make measurements while the eyes are closed; ENG has disadvantages of artifacts being caused by eye blinking, perspiration, or light, and difficulty in measuring vertical eye movement. Additionally, SSCS requires subjects to attach an annular contact lens to their eyes; although having a high accuracy and sampling rate, it can be worn for only 40 min and has a tendency to slip [10, 11]. With advances in computer vision technology, videonystagmography (VNG) has been used generally, although ENG data are still valuable [12]. VNG is a technology that can test whether dizziness is caused by inner ear disease; it uses an infrared camera to track the pupil in the dark. Advantages of VNG are its high accuracy and non-invasive nature, making its use common, although with high cost.

Various VNG researches using VNG have been conducted [13,14,15,16,17]. In [13], researchers suggested a new method to solve the problem of estimating the eye position in VNG analysis brought about by deformable contour methods. They suggested a method based on position, amplitude, and duration that could track saccade movement with high accuracy. In [14], a method of vestibular disease analysis for VNG applications was proposed, and new features were suggested based on Fisher’s criteria for the diagnosis of nystagmus. In [15], researchers proposed a method for medical characteristic analysis with the displacement vectors of nystagmus using Gaussian mixture models (GMM). Video data were captured using infrared cameras, and two-dimensional displacement vectors were calculated by comparing two adjacent frames. Then, GMM was applied to analyze the vectors. In [16], researchers compared the features extracted from the diameter and position of an eye recorded using VNG on patients and normal subjects. In [17], researchers analyzed VNG dataset based upon the fundamental measurements of normal and patients who have vestibular disorder by using a neural network to develop the assessment of vertigo symptom.

Various commercial VOG products have been developed [18, 19]. They conduct and analyze results in SHA test, velocity step test, and optokinetic nystagmus test by measuring pupil movements with a rotatory chair and VOG to monitor vestibular loss, dizziness, and chronic imbalance. Also, they are highly accurate systems but with high cost and used only in hospitals.

In this paper, a pair of head-mounted goggles was built with an infrared camera and used with a rotatory chair to conduct SHA tests. Then, pupil coordinates were extracted from the video data. From the pupil data, gain, phase, and symmetry were calculated for measuring rapid eye movements. Finally, these factors were compared with System 2000 [18] results, which is a highly accurate system that measures pupil movements with a rotatory chair and infrared cameras and analyzes vestibular system, but it is expensive.

2 Methods

2.1 Prototype

Head-mounted goggles were designed for obtaining eye movement signals. Figure 1 shows a pair of head-mounted goggles with an infrared camera and a gyroscope sensor. The goggles can record infrared videos of the right eye of the subject and measure rotatory velocity. To conduct a calibration, the participant can see the front with the left side of the goggles. The measured field of view is 32°. Three 850-nm light-emitting diodes (LEDs) were attached to the goggles. To avoid noise caused by light reflexes, the LEDs were located at the bottom. The distances from the center were approximately 1.5, 1.9, and 2.9 cm from the left to the right LED. The prototype complies with International Standard IEC 60825-1 [20].

Fig. 1
figure 1

Head-mounted goggles with an infrared camera. a Goggles inside. b Raspberry Pi 3. c IR LED

The prototype consists of an infrared camera, a gyroscope sensor, and infrared LEDs. 3D virtual reality (VR) glasses (Maruneuru, Korea) were used as the frame for the goggles. Figure 2 shows a printed circuit board (PCB) for the goggles. The Raspberry Pi 3 model B (Raspberry Pi Foundation, UK) [21] was used with the goggles for image processing and calculating features, due to its low cost, appropriate size, and fair image processing performance. Additionally, a Pi Camera (v2) without an IR filter, equipped with a IMX219 8-megapixel sensor (Sony, Japan) [22], was used to capture infrared images of the eyes. A gyroscope MPU-6050 (InvenSense, USA) [23] was used to measure the angular velocity of the goggles.

Fig. 2
figure 2

PCB circuit of the goggles

2.2 Pupil extraction

To obtain horizontal eye movement, the pupil was extracted from video data obtained with the goggles and the System 2000; Fig. 3 shows screenshots of a healthy subject. Because pupils have a generally circular shape, the same method was used to both video data obtained from the goggles and the System 2000 with different threshold values of brightness and pupil size. The method includes noise removal and a circle-detection algorithm.

Fig. 3
figure 3

Infrared images of a pupil. a System 2000. b Developed goggles

To decrease the miss rate, the ROI was selected from raw images. The sizes of the ROIs for the developed goggles and the System 2000 were set at 200 × 210 and 400 × 200, respectively. Figure 4 shows a gray image and the result of image binarization. Because video data were obtained as RGB images, the gray channel was extracted from the infrared video data. The intensity of the pupil area is higher than that of the rest; thus, binary images were obtained from the extracted gray channel with manual threshold values for each subject. Binary images may contain noise caused by occlusion by the eyelids and eyelashes.

Fig. 4
figure 4

Binarization. a Gray image. b Binary image

To solve the noise issue, a morphology operation was implemented (Fig. 5), whose kernel was set as 3 by 3 rectangular structuring element, and iterations were manually set for each subject. Table 1 shows the pupil miss rate, estimated from images where the morphology operation was and was not applied. The pupil miss rate is given by the following:

$$ Miss\ rate\ \left(\%\right)=\frac{\sum False\ negative}{\sum Condition\ positive}, $$
(1)
Fig. 5
figure 5

Morphology operation. a Erode. b Dilation. c Erode and dilation

Table 1 Miss rate with morphology and no morphology

Average of the miss rate is approximately 18.54%.

Finally, FindContours function in OpenCV library was applied to detect a circular region. The contour candidates were selected by the following equations:

$$ \mathrm{abs}\left(1-\frac{w}{h}\right)<k, $$
(2)
$$ \mathrm{abs}\left(1-\frac{\mathrm{s}}{\pi \bullet {\left(\frac{w}{2}\right)}^2}\right)<k, $$
(3)

where w, h, and s represent width, height, and area of detected contours, respectively, and k represents a constant value which was set to 0.6 or more. The minimum radius and area of the detected circle were set to 20 and 30, respectively. A contour that has the minimum radius was chosen among the candidates. Figure 6 shows an example of an image with the method applied.

Fig. 6
figure 6

Example of detected pupil

2.3 Slow-phase eye movement

Figure 7 shows a part of the eye movement signal obtained from the goggles. The dashed rectangles and the solid rectangles indicate fast-phase and slow-phase eye movement, respectively. To extract features from eye-movement signals, the slow-phase velocity of the pupil should be calculated [24].

Fig. 7
figure 7

Pupil movement signal (System 2000)

A degree per pixel value was assumed to determine the velocity of horizontal eye movement in degrees. Figure 8 shows a calibration method for obtaining degrees per pixel. The Ɵ value was calculated using a trigonometric function. Also, degrees per pixel for the goggles and the System 2000 were calculated by dividing Ɵ by \( \overline{\mathrm{AB}} \). Finally, gain, symmetry, and phase were calculated from the signal.

Fig. 8
figure 8

Calibration. The distance between a and b is the actual diameter

2.4 System 2000

Slow-phase velocity is derived from the raw signal as shown in Fig. 9. First, the horizontal pupil movement signal was extracted from the infrared video data using a pupil extraction method. Missing frames caused by eye blinking were set as a negative number. In compensating for missing frames caused by eye blink artifacts, a linear spline algorithm was implemented. Moreover, a cubic spline algorithm was implemented to interpolate the signal to 30 Hz due to the irregular sampling rate (27–29 fps). Pupil velocity was also derived from the interpolated signal.

Fig. 9
figure 9

Pupil movement signal (System 2000). a Raw signal. b Interpolated signal. c Derived pupil velocity

The fast phase of the signal was removed as follows. Since having a larger slope than the slow-phase velocity, the fast-phase velocity was determined based on slope degree. First, the obtained signal was normalized with a mean of 0 and a standard deviation of 1. Then, the eye movement signal was derived and squared, to obtain the slope of the signal. Peaks were detected by a threshold value from the squared signal, which were manually set based on each velocity of chair. As shown in Fig. 10, the red line represents the fast phase obtained. Finally, the fast phase was removed from the derived signal.

Fig. 10
figure 10

Pupil movement signal (System 2000). a Normalized signal. b Derived signal. c Fast phase

After the fast-phase signal was eliminated, a linear spline algorithm was used. As shown in Fig. 11, low-pass filter was applied to remove noise from the velocity signals. The cut-off frequency of the low-pass filter was set as follows:

$$ {f}_{\mathrm{low}}=k\bullet \frac{n}{\Delta t}. $$
(4)

where k, n, and ∆t represent a constant, the number of cycles, and the period, respectively, corresponding to the frequency of head movement multiplied by a constant. The constant values were manually set between 1 and 2 for each subject.

Fig. 11
figure 11

Obtained slow phase and low-pass filter (System 2000)

2.5 Designed goggles

The infrared camera used in the goggle design had a lower resolution and sampling rate than those of the System 2000. Thus, a different method was used to obtain the slow-phase signal and features calculated with the camera. Slow-phase velocity is derived from the raw signal as shown in Fig. 12. A negative number indicates that the pupil was not detected due to noise, such as eye blinking and low sampling rate. Then, raw signals were interpolated using a linear spline algorithm to remove missing frames. Because the sampling rate of the signals obtained from the goggles was irregular (25–30 fps), signals were interpolated using a cubic spline algorithm.

Fig. 12
figure 12

Pupil movement signal (designed goggles). a Raw signal. b Interpolated signal. c Derived pupil velocity

Figure 13 shows an example of a peak detection result from the pupil movement velocity signal. Since the signals obtained between the designed goggles and System 2000 contain different levels of noise and have different frequencies, another method was used to find peaks. The ‘Findpeaks’ function in MATLAB was used to find each peak and the valley location. Yellow and red circles represent valley and high peaks of the signal, respectively. The velocity differences between adjacent peaks and valleys were calculated. Velocities that were less than the rotatory chair velocity were removed to eliminate fast-eye movement.

Fig. 13
figure 13

Peak detection (designed goggles)

Moving-average and low-pass filters were then applied to remove noise. Figure 14 shows filtered signals with moving-average and low-pass filters. The same equation of cutoff frequency was applied, which is based on the frequency of head movement multiplied by a constant value.

Fig. 14
figure 14

Pupil movement signal (designed goggles). a Moving average. b Low-pass filter

2.6 Gain, asymmetry, and phase

Gain is the ratio of the amplitude of the eye movement to the amplitude of head stimulus [25]. Symmetry is indicated by comparable left and right responses when the same stimulus is applied [25]. Phase is the timing relationship between the initiation of head movement and the reflexive eye response [24]. Gain, symmetry, and phase are calculated as follows:

$$ \mathrm{Gain}=\frac{\mathrm{Amplitude}\ \mathrm{of}\ \mathrm{the}\ \mathrm{maximum}\ \mathrm{slow}\ \mathrm{phase}\ \mathrm{eye}\ \mathrm{velocity}}{\mathrm{Amplitude}\ \mathrm{of}\ \mathrm{the}\ \mathrm{maximum}\ \mathrm{stimulus}\ \mathrm{velocity}}, $$
(5)
$$ \mathrm{Symmetry}=\frac{b_2-{b}_1}{b_2+{b}_1}\times 100, $$
(6)
$$ \mathrm{Phase}\ \left({\varphi}^{{}^{\circ}}\right)={360}^{{}^{\circ}}\bullet f\bullet \Delta t, $$
(7)

where b1 and b2 represent the maximum velocity of slow-phase eye movement in rotating to the left and right, respectively, f represents the frequency of the eye movement signal, and ∆t represents the time difference between the maximum velocity of slow-phase eye movement and head movement.

3 Results

3.1 Acquisition

Data were collected from eight healthy non-smoking men and two women aged from 20 to 28 years. All subjects wore the developed goggles and System 2000 goggles, while seated on a rotatory chair (System 2000) in the sitting position (Fig. 15). Each experiment was performed seven times (from 0.01 to 0.64 Hz). The chair was rotated with maximum velocity from 80° to 20° per second for 2 to 8 cycles (Table 2). The videos from the goggles were recorded at 600 × 420 resolution. The videos from the System 2000 were obtained using the built-in camera on a Galaxy S6 (Samsung, Korea) at 1920 × 1080 resolution by pointing the monitor of the computer, because no video data are ordinarily provided by the System 2000 and because of the limitation of accessing the computer. Both video data sets were recorded at ~ 30 Hz.

Fig. 15
figure 15

Experimental setup. a Developed goggles. b System 2000

Table 2 Experimental setup

3.2 System 2000

Reference results were obtained from a System 2000 software itself. To evaluate the results, we calculated the error ε for each frequency of gain, phase, and symmetry, as follows:

$$ \upvarepsilon =\frac{\mathrm{mean}{\left(\mathrm{abs}\left(x-{x}_{\mathrm{est}}\right)\right)}^2}{\mathrm{mean}{\left(\mathrm{abs}(x)\right)}^2}, $$
(8)

where x and xest represent the reference and estimated feature, respectively. Each error for gain, phase, and symmetry were measured among all subjects for each frequency. Figure 16 shows the median and interquartile range (IQR) of the errors measured from the results at each frequency. The mean errors for gain, symmetry, and phase were 0.81, 17.35, and 2.74, respectively.

Fig. 16
figure 16

Median and IQR errors measured from the gain, asymmetry, and phase at each frequency (all subjects). a Gain, b Asymmetry. c Phase

Figure 17 shows Bland–Altman and correlation plots of gain, symmetry, and phase with mean differences of 0.047, 0.16, and 0.029, respectively; the solid lines and the dashed lines in part (b) of each figure represent the regression line and a Pearson correlation coefficient of 1.

Fig. 17
figure 17

Bland-Altman and correlation plots of gain, asymmetry, and phase. a Gain. b Asymmetry. c Phase

3.3 Designed goggles

It is hard to wear two goggles simultaneously without creating noise. Thus, we calculated the features and the coverage rate that measures how many results of the features are in the range of the reference data. The experimental setup was set to the same values used with the System 2000. The coverage rates for gain, symmetry, and phase were 66, 87, and 70%, respectively.

4 Conclusions

This paper has presented a low-cost VNG using an infrared camera for the diagnosis of nystagmus by obtaining eye movement features. The device was developed using a pair of head-mounted goggles, a Raspberry Pi board, an infrared camera, and LEDs. To extract and track the pupil from video data obtained from the device, a morphology operation and a contour detection method were used. The miss rate was 18.54% when applying the morphology operation, whereas without applying it, the rate was 66.9%. The horizontal eye movement signal was obtained, and its slow-phase velocity was calculated. Gain, symmetry, and phase were calculated from the velocity of horizontal eye movement and evaluated in 10 healthy subjects. Interpolation algorithms were used to detect missing signals caused by eye blinking in the subjects. Also, video data for the System 2000 are slightly tilted because of being captured by pointing the monitor of the computer and had minor oscillation noise caused by movements of subjects. Accordingly, the average IQR errors of gain and phase were 0.81 and 2.74, while that of symmetry was 17.35.

Because some images were misdetected due to noise caused by a low frame rate of IR camera and occlusion by eyelids and eyelashes, the relative high miss rate of pupil detection and the IQR errors was shown. Moreover, thresholds to detect pupil, to calculate degree per second, and frequency filtering factors are manually selected for each subject. Additionally, the number, location, and wavelength of the LEDs were not verified in the experiments. Also, relatively long missing frames with interpolation methods may cause minute distortion to signal and high IQR errors. Moreover, the developed goggles were not evaluated with a clinical device, because it is impossible to use two sets of goggles simultaneously without generating noise. In the future, more experiments will be conducted on more subjects including patients who have vestibular diseases for clinical tests. Also, we will evaluate the goggles using other tests, such as the velocity step test to compare between the developed goggles and other clinical equipment. In addition, we will develop a VOG system that records pupil and angular velocity using an infrared camera and a gyroscope with any swivel chair at home.