# A robust method for calibration of eye tracking data recorded during nystagmus

## Abstract

Eye tracking is a useful tool when studying the oscillatory eye movements associated with nystagmus. However, this oscillatory nature of nystagmus is problematic during calibration since it introduces uncertainty about where the person is actually looking. This renders comparisons between separate recordings unreliable. Still, the influence of the calibration protocol on eye movement data from people with nystagmus has not been thoroughly investigated. In this work, we propose a calibration method using Procrustes analysis in combination with an outlier correction algorithm, which is based on a model of the calibration data and on the geometry of the experimental setup. The proposed method is compared to previously used calibration polynomials in terms of accuracy, calibration plane distortion and waveform robustness. Six recordings of calibration data, validation data and optokinetic nystagmus data from people with nystagmus and seven recordings from a control group were included in the study. Fixation errors during the recording of calibration data from the healthy participants were introduced, simulating fixation errors caused by the oscillatory movements found in nystagmus data. The outlier correction algorithm improved the accuracy for all tested calibration methods. The accuracy and calibration plane distortion performance of the Procrustes analysis calibration method were similar to the top performing mapping functions for the simulated fixation errors. The performance in terms of waveform robustness was superior for the Procrustes analysis calibration compared to the other calibration methods. The overall performance of the Procrustes calibration methods was best for the datasets containing errors during the calibration.

## Keywords

Eye tracking Nystagmus Calibration## Introduction

Eye tracking is a useful tool to record and study eye movements. However, the nystagmus eye movements disturb the calibration procedure for individual recordings, causing comparisons of waveforms between recordings unreliable. For example, the calibration protocol assumes an ability to fixate the gaze, which is limited in people with nystagmus. Using the default calibration protocol may lead to unreliable eye tracker data, which in turn may misrepresent or even invalidate data analysis. In this paper, we explore the problems associated with calibration and propose a method that secures a repeatable and reliable gaze estimation, referred to as *point-of-regard* (PoR), which is crucial for detailed computer based nystagmus diagnostics and objective evaluation of treatment effects between recordings.

### Description of nystagmus

Nystagmus could be a symptom of an underlying oculomotor disorder, which causes involuntary movements of the eye(s) and the condition may lead to decreased visual acuity (Hertle, 2010; Hussain, 2016). There are two broad types of nystagmus: *early-onset nystagmus* and *acquired nystagmus* (Hussain, 2016; McLean, Proudlock, Thomas, Degg, & Gottlob, 2007), where the former condition is developed in the months after birth and the latter is developed later in life (Dunn, 2014). The eye movement pattern, sometimes referred to as a *waveform*, can be classified into different categories and there are at least 12 different types of nystagmus waveforms according to a classification study (Hussain, 2016; Theodorou & Clement, 2016; Dell’Osso & Daroff, 1975).

Different treatments strategies, for instance drug treatment (McLean et al., 2007) and surgery (Kumar, Shetty, Vijayalakshmi, & Hertle, 2011), have been suggested to improve the visual acuity in people with nystagmus. In order to evaluate the different strategies, eye movements before and after the treatment can be studied. Treatment effects are difficult to asses in detail without an objective evaluation of the eye movements, since people with nystagmus are often considered to be hard to diagnose by clinicians (Hussain, 2016).

Nystagmus can also be found in visually healthy subjects. *Optokinetic nystagmus* (OKN) is a reflex found in humans (Naegele & Held, 1982), which causes oscillatory eye movements similar to the oscillations found in some forms of nystagmus such as pure jerk nystagmus. It can easily be elicited by keeping the head still in a moving environment (Naegele & Held, 1982).

### Calibration of a camera based eye tracker

Nystagmus eye movements can be studied in detail with the use of an *eye tracker*. The video-based eye tracker, referred to as video-oculography (VOG) (Holmqvist et al., 2011), records eye movements using eye images captured by an infrared camera. The data from the VOG system are in this work obtained by finding the pupil center (PC) and the reflection off the cornea caused by an infrared illuminator, called the *corneal reflection* (CR). The vector between the PC and CR positions is a measure called the *pupil-corneal reflection vector* (PCRV). This measure is unique for each eye orientation and can therefore be used to estimate the PoR. In order to do this estimation from the PCRV, a relationship between the PCRV data and the corresponding PoR data is needed. The process to identify this relationship is referred to as *calibration*, which is dependent on the geometry of the experiment as well as the individual eye anatomy of each participant (Holmqvist et al., 2011).

The goal of the calibration is to find a *mapping function* (MF), e.g. a polynomial, which describes the relationship between the PCRV data and the PoR data. By presenting targets at known positions during an experiment, referred to as *calibration targets*, and simultaneously recording the corresponding PCRV data, it is possible to estimate the mapping function parameters. The number of calibration targets can vary, but common choices are 2, 5, 9, 13 and 16 targets (Holmqvist et al., 2011).

### Previous work

Several calibration polynomials for video-based eye tracking have previously been studied. One study investigated more than 400,000 polynomials and evaluated their performance based on the *average error* (accuracy), *maximum error*, *standard deviation* of the estimated PoR, *number of polynomial parameters* and *head movement tolerance* (Cerrolaza, Villanueva, & Cabeza, 2008). Another study tested polynomial structures based on accuracy and the number of calibration targets (Blignaut & Wium, 2013). The two studies were using simulated data or data from participants with no visual impairments. In both Cerrolaza et al., (2008) and Blignaut and Wium (2013), *accuracy* was used to evaluate the calibration MFs. As is pointed out perfect accuracy, or goodness of fit, can be achieved by using the same model order as the number of calibration targets (Blignaut & Wium, 2013). The calibration polynomial is, however, used also for other gaze positions and should be tested also for these (Blignaut & Wium, 2013).

#### Previous work on nystagmus calibration

Different approaches for *calibration data selection* for nystagmus applications have previously been published. This is an important part of the calibration since the selected calibration data should represent that the participant looked at the displayed calibration target. If the selected calibration data do not represent the “correct” fixation, there is a risk of misrepresenting eye movement data.

A method to find saccades in eye movement data based on adaptive acceleration thresholds was presented in Behrens, Mackeben, and Schröder-preikschat (2010). The intent of the method was not calibration of nystagmus data, but it served as the basis for the development of a method designed for the nystagmus case. The nystagmus specific version identified the slowest eye movement velocities, referred to as *foveation periods* (Dunn, 2014). The method is based on an algorithm for saccade detection in uncalibrated data, which is used to divide the waveform into fast and slow eye movements. The foveations are found in the slow phase of the data. Another approach to find foveations was presented in (Dell’Osso, 2005), where manual annotation to mark the start and end times of the foveations, was used. While there has been some work on how calibration data are selected, literature on the suitability of various polynomials for nystagmus recording purposes is sparse.

Summary of nine different studies, their calibration and validation protocols, the calibration methods and the calibration method performance

Study | System | Calibration positions | Data selection method | Calibration polynomial | Validation | Reported data quality |
---|---|---|---|---|---|---|

McLean et al., (2007) (101) | SMI Eye Link, 250 Hz | 1: 3X3 grid, 0° and ± 20° Horizontal, ± 15° Vertical; 2: 3° steps from − 24° to 24°. Start point (− 24°, − 24°), Stop point (24°, 24°) | 1:Information Missing (U); 2:Fixation (U) | 1: Information Missing; 2: Fourth Order Poynomial | Information missing | Information missing |

Tai et al., (2010) (6) | EyeLink 1000, 500 Hz | 0° and ± 10° Horizontal and Vertical | Not explicitly specified (U) | Information Missing | Information Missing | Information Missing |

Abel et al., (2008) (11) | EyeLink II | Information Missing | Foveation Periods (U) | Information Missing | Information Missing | 0.5°–1.0° (Manufacturer Numbers) |

Barot et al., (2013) (16) | EyeLink II | 30° Left to 30° Right in steps of 3° | Foveation Periods (A) | Best line of fit | Information Missing | Information Missing |

Dell’Osso et al., (2011) (24) | EyeLink II, 500 Hz | Information Missing | Foveation Periods (U) | Information Missing | Information Missing | 0.5°–1.0° (Manufacturer Numbers) |

(Hertle et al., 2011) (19) | Ober 2 or EyeLink, 500 Hz or 1000 Hz | 1° targets or 3° pictures | End of fast phase (U) | Information Missing | Information Missing | Information Missing |

Taibbi et al., (2008) (28) | EyeLink II, 500 Hz | Information Missing | Foveation Periods (U) | Information Missing | Information Missing | Information Missing |

Thomas et al., (2008) (56) | EyeLink 250 Hz | 0° and ± 15° Horizontal and Vertical | Foveation Periods (U) | Information Missing | Information Missing | Information Missing |

Dunn (2014) (1) | EyeLink 1000, (include sampling frequency) | ± 5° Horizontally, ± 3° and (0°, 0°) | Automatic Foveation Algorithm (Dunn, 2014) (A) | Regression with cross term. Degree unspecified. | Self Validation | mean and standard deviation for horizontal and vertical values |

#### Calibration polynomials

Summary of the calibration polynomials found in eye tracking and nystagmus related studies

Study | Polynomial [ | Eye tracking data vector [ | Property |
---|---|---|---|

**Barot et al., (2013) | \( \mathcal {A}_{1}\) (5) | [1 | Linear mapping (Linear) |

**Dunn (2014) | \(\mathcal {B}\) (6) | [1 | Linear mapping + Rotation (non-linear) |

* Stampe (1993) | \(\mathcal {G}\) (7) | \([1 \quad x_{PC} \quad y_{PC} \quad x_{PC}^{2} \quad y_{PC}^{2} \quad x_{PC}y_{PC}]^{T}\) | Quadratic mapping + Rotation (non-linear) |

**McLean et al., (2007) | \(\mathcal {A}_{4}\) (8) | \([1 \quad x_{PC} \quad x_{PC}^{2} \quad x_{PC}^{3} \quad x_{PC}^{4} \quad y_{PC} \quad y_{PC}^{2} \quad y_{PC}^{3} \quad y_{PC}^{4}]^{T}\) | Fourth order (non-linear) |

The PoR estimation, **p**_{PoR} = [*x*_{PoR}*y*_{PoR}]^{T}, is computed using a polynomial, * P*, and eye tracker data,

**u**_{PC}, as,

**u**_{PC}= [

*x*

_{PC}

*y*

_{PC}]

^{T}. The selected structure of

*determines the structure of*

**P**

**u**_{PC}(see Table 2). The purpose of the calibration is to estimate the coefficients of the polynomial

**p**_{h}and

**p**_{v}are the horizontal and vertical polynomials respectively. The coefficients are estimated using a least squares solution according to

*d*is either the horizontal or the vertical direction,

**U**_{PC}is a matrix containing the calibration data vectors for each calibration target,

**t**_{d}is a vector with calibration targets of direction

*d*, and

*n*is the number of calibration targets. The different polynomials evaluated in this work are given in the equations below:

### Aim of this paper

The aims of this paper are to propose and evaluate a new calibration MF generating consistent PoR estimations across recording sessions and participants and compare it to other calibration mapping functions previously used in nystagmus research. The main objective is to find an MF which reliably can be used to evaluate the effects of different nystagmus treatments, even when the participant fails to accurately fixate the calibration target.

## Proposed method

In this section a new calibration method is proposed. It is developed for video-based eye trackers using a nine-point calibration and a geometrical setup similar to that of an EyeLink 1000 Plus in desktop mode. The method consists of two parts: First, an outlier correction algorithm aimed at correcting inaccuracies in the recorded calibration data. Second, a linear mapping function based on *Procrustes analysis* is proposed. The method is based on 5 s of data recorded for each calibration target, as will be presented in more detail in “Calibration method evaluation”.

### The outlier correction algorithm

For the recommended setup of the eye-tracker used in this work, the horizontal data typically have the following structure; the horizontal PoR data are dependent only on the horizontal PCRV data, and not on the vertical PCRV data. Thus, horizontal PCRV for a horizontal gaze position is approximately the same, regardless of the vertical gaze position. This characteristic is used to create an algorithm to reduce errors in the calibration dataset. The algorithm is based on nine calibration targets distributed in a 3 × 3 grid where the calibration data for each calibration target are mapped to one coordinate pair. In this case there are 9 two-dimensional coordinates; one for each two-dimensional calibration target. The outlier correction algorithm consists of two stages.

### Stage I

- 1.
Divide the data into six groups with three adjacent data points in each. Half of the groups share a horizontal calibration target value (see Fig. 2a) and the other half share the vertical calibration target value (see Fig. 2b).

- 2.
Fit a line to the three data points in each of the six groups.

- 3.
Compute the angle between each of the vertically fitted lines and each of the horizontally fitted lines (3 × 3 computations).

- 4.
If the angle deviates more than 25° from the expected 90°, the vertical line is considered to contain an outlier. The value of 25° was chosen empirically.

If one or more outliers were found during Stage I, Stage II is initiated.

### Stage II

- 1.
An outlier is detected by finding the datapoint with the largest horizontal deviation from the vertical line.

- 2.
Corrected coordinates of the outlier are computed as the average of the other data points on each of the intersecting horizontal and vertical lines, i.e., the new horizontal data point value is computed as the average of the corresponding horizontal data points of the vertical line, and the new vertical data point value is computed as the average of the corresponding the vertical data points of the horizontal line.

### Procrustes calibration

In the calibration process, a set of *n* (here *n* = 9) two-dimensional data points (calibration data) are fitted to another set of *n* two-dimensional data points (calibration targets). Both of these data sets can be viewed as two-dimensional shapes, and the objective of the calibration is to identify the best transformation from the calibration data shape to the calibration target shape. In this work, *Procrustes analysis* (Gower, 1975) is used to compare and align the two datasets. Three steps are involved in the Procrustes analysis: translation, scaling and rotation. Once they have been estimated, they can be used to compute the gaze positions from PCRV data.

- (a)
Construct the calibration data matrix \(\boldsymbol {D} =\left [\begin {array}{ll}\boldsymbol {x}_{d} & \boldsymbol {y}_{d} \end {array}\right ]^{T} \) as a 2 ×

*n*matrix where*n*is the number of calibration targets, and the calibration target matrix \(\boldsymbol {T} = \left [\begin {array}{ll}\boldsymbol {x}_{t} & \boldsymbol {y}_{t} \end {array}\right ]^{T}\) contains the corresponding calibration targets. - (b)
Center both the calibration data and calibration target datasets by subtracting their respective horizontal and vertical averages from each data set to create

**D**_{μ}and**T**_{μ}.$$ \boldsymbol{D}_{\mu} = \left[\begin{array}{ll}\boldsymbol{x}_{d} - \bar{x}_{d}\\ \boldsymbol{y}_{d} - \bar{y}_{d} \end{array}\right] = \left[\begin{array}{ll}\boldsymbol{x}_{d, c}\\ \boldsymbol{y}_{d, c} \end{array}\right], $$(9)where \(\bar {x}_{d}\) is the average of$$ \boldsymbol{T}_{\mu} = \left[\begin{array}{ll}\boldsymbol{x}_{t} - \bar{x}_{t}\\ \boldsymbol{y}_{t} - \bar{y}_{t} \end{array}\right] = \left[\begin{array}{ll}\boldsymbol{x}_{t, c}\\ \boldsymbol{y}_{t, c} \end{array}\right], $$(10)**x**_{d}, \(\bar {y}_{d}\) is the average of**y**_{d}, \(\bar {x}_{t}\) is the average of**x**_{t}and \(\bar {y}_{t}\) is the average of**y**_{t}. - (c)
Compute the norms,

*N*_{D}and*N*_{T}, usingwhere$$ N_{D} = \sqrt{\sum\limits_{i = 1}^{n}x_{d, c}^{2}(i) + \sum\limits_{i = 1}^{n}y_{d, c}^{2}(i)} $$(11)*x*_{d, c}(*i*) ∈**x**_{d, c}and*y*_{d, c}(*i*) ∈**y**_{d, c},and$$ N_{T} = \sqrt{\sum\limits_{i = 1}^{n}x_{t, c}^{2}(i) + \sum\limits_{i = 1}^{n}y_{t, c}^{2}(i)} $$(12)*x*_{t, c}(*i*) ∈**x**_{t, c}and*y*_{t, c}(*i*) ∈**y**_{t, c}. The datasets are scaled according to:$$ \boldsymbol{D}_{N} = \frac{\boldsymbol{D}_{\mu}}{N_{D}} $$(13)$$ \boldsymbol{T}_{N} = \frac{\boldsymbol{T}_{\mu}}{N_{T}} $$(14) - 1.
The rotation,

, is computed using singular value decomposition (SVD). In general, the SVD decomposes a matrix**R**into two orthonormal matrices**M**and**U**and a diagonal matrix**V**that contains the singular values**S***σ*_{l},*l*∈ [1,*k*]. In Procrustes analysis, \(\boldsymbol {M} =\boldsymbol {D}^{T}_{N}\boldsymbol {T}_{N}\).where$$ \boldsymbol{D}^{T}_{N}\boldsymbol{T}_{N} = \boldsymbol{U}\boldsymbol{S}\boldsymbol{V}^{H}, $$(15)and$$ \boldsymbol{R} = \boldsymbol{U}^{H}\boldsymbol{V}. $$(16)$$ \boldsymbol{S} = diag(\sigma_{1}, \ldots, \sigma_{k}). $$(17) - (d)
Once the translation, scaling and rotation parameters have been estimated, the PoR estimation,

**p**_{PoR}, is computed as follows:where$$ \boldsymbol{p}_{PoR} = \kappa \boldsymbol{R} \boldsymbol{p}_{PC} - \boldsymbol{L} $$(18)$$ \kappa = \frac{N_{T}} {N_{D}}\sum\limits_{i = 1}^{k}\sigma_{i}, $$(19)and$$ \boldsymbol{L} = \kappa\left[\begin{array}{l} \bar{x}_{d} \\ \bar{y}_{d} \end{array}\right]\boldsymbol{R} - \left[\begin{array}{l} \bar{x}_{t} \\ \bar{y}_{t} \end{array}\right], $$(20)$$ \boldsymbol{p}_{PC} = \left[\begin{array}{l} x_{PC} \\ y_{PC} \end{array}\right]. $$(21)

This method is denoted as \(\mathcal {P}\).

## Calibration method evaluation

In this Section the evaluation strategy of the proposed method is presented. The Section consists of three main parts; the recording of *nystagmus data* (“The nystagmus data experiment (NDE)”), the recording of *control data* (“The control data experiment (CDE)”) and the performance evaluation measures (“Comparing calibration methods”).

### Hardware and software

Binocular, raw pupil and CR data were recorded with an EyeLink 1000 Plus (desktop mode) with a sampling frequency of 1000 Hz using the host software v. 5.09 and the DevKit 1.11.571. The center of mass tracking mode was used. The eye tracker camera was placed in accordance with the recommendations of the manufacturer (SR-Research, 2010). PsychoPy (version 1.83) (Peirce, 2007) was used to present all stimuli. The stimulus was presented on an ASUS VG248QE monitor with a resolution of 1920 × 1080 pixels, with dimensions 53 *c**m* × 30 *c**m*. The participant to monitor distance was 80 cm.

A chin and forehead rest was used for all participants. The analysis software was written in Python (version 2.7).

### The nystagmus data experiment (NDE)

#### Participants

The nystagmus data experiment was performed with patients diagnosed with nystagmus. The diagnosis was performed by Björn Hammar (MD), senior consultant at the neuro-ophthalmology unit at Skåne University Hospital in Lund, Sweden. This dataset is denoted **NDE data**. A total of eight patients with nystagmus were recorded, two of which were recorded twice totalling ten separate recordings. Two of the participants were female and six were male. Out of the ten recordings, four were excluded from the data set; one due to lack of validation data, two due to loss of calibration data (too many blinks during the recording of calibration data) and one due to too small oscillations. For this participant, only the data from one out of the nine calibration targets consisted of oscillations with an amplitude larger than 1° and a frequency higher than 2 Hz. Out of the six remaining recordings, from five different participants, all were diagnosed with infantile nystagmus (*M* = 35.3 [year], *SD* = 15.9[year]).

#### Data recording

The experiment included calibration and validation data recordings. Both calibration and validation data were recorded monocularly for both eyes by covering one eye and recording the other eye. Nine calibration targets were presented to each patient in a randomised order. The calibration targets were placed in a 3 × 3 grid. The horizontal target positions were 0° and ± 18° and the vertical target positions were 0° and ± 10°. The validation targets were placed in a 2 × 2 grid where the horizontal and vertical validation target positions were (± 5°,± 5°) respectively. The calibration target was a black circle with radius of 0.6° with a red circle of radius 0.15° in the center. The targets were presented on a grey background. The presentation duration of each calibration target and validation target was decided manually. The goal duration for each target was 5 s (M = 5.02 [s], SD = 1.24 [s]). The experiment also included fixation, smooth pursuit, saccade and optokinetic nystagmus tasks which were not included in this work.

#### Calibration data selection

- (a)
Instead of computing saccade velocity thresholds for the entire calibration data set, the thresholds were computed for each calibration target.

- (b)
The saccade acceleration threshold was not implemented, due to too heavy saccade rejection.

- (c)
The adaptive filter to find foveations was not implemented. Instead, each slow phase longer than 50 ms was considered as a potential foveation. The first 50 ms directly after the onset of the slow phase were considered to be the most likely foveation candidate.

### The control data experiment (CDE)

#### Participants

A total of eight participants were included in the dataset, one female and seven male (*M* = 37.0 [year], *SD* = 7.7 [year]). This data set is denoted **CDE data** and was divided into two subsets, see “Two **CDE** subsets”. Data from one participant was excluded due to data loss (too many blinks during the recording of calibration data).

#### Data recording

**CDE data**a different method for calibration data selection was needed, see “Calibration data selection”.

*calibration plane distortion*and the

*waveform robustness*, described in “Calibration plane distortion & waveform robustness”.

#### Two **CDE** subsets

The **CDE data** datasets were divided into two subsets: one which contains only calibration targets with no offset, **CDE - NO**, and one which contains calibration targets with a random offset for each calibration target, **CDE - O**. The notations NO and O represent datasets with no introduced offsets and with introduced offsets, respectively. While the **CDE - NO** data correspond to data from participants without any visual impairment, the **CDE - O** data simulate potential fixation inaccuracies caused by the nystagmus oscillations for different angles during the calibration.

The **CDE - O** dataset was created by repeating the calibration data selection process 50 times, each time assigning a horizontal random error (including 0°) to each calibration target. Each repetition was independent of previous repetitions.

#### Calibration data selection

- (a)
First, in order to avoid influence of the time it takes to change positions after a new calibration target has appeared, the first 500 ms of the recorded data for each calibration target are removed.

- (b)
Second, the 200 ms window with the smallest variance of the following PCRV data are computed. The total variance, \(s_{tot}^{2}\), is computed according to Eq. 22, where \({s_{x}^{2}}\) and \({s_{y}^{2}}\) are the horizontal and vertical variance respectively.

$$ s_{tot}^{2} = {s_{x}^{2}} + {s_{y}^{2}} $$(22) - (c)
Finally, the horizontal and vertical calibration data position estimates are computed as the averages of the 200 ms window found in step 2.

### Comparing calibration methods

In this work, three different measures are used to compare the characteristics of the different mapping functions. These are accuracy, *α*, *calibration plane distortion*, *μ*_{d}, and *waveform robustness*, *ξ*. Accuracy is tested on a limited number of validation targets, which in this work is equal to four targets per participant. The calibration plane distortion is the distance between two PoR estimations from the same MF. Finally, the waveform robustness is computed as the difference between two PoR estimations after adjusting for the linear properties translation, rotation and scaling between the two PoR estimations.

#### Accuracy

The accuracy for validation target point *k*, *α*_{k}, is computed according to Eq. 23 where *x*_{PoR}(*k*) and *y*_{PoR}(*k*) are the mapping function estimates of the horizontal and vertical validation target positions, respectively, and *x*_{s}(*k*) and *y*_{s}(*k*) are their corresponding known validation target positions. The accuracy computation in Eq. 23 results in one single value for each validation target. A small accuracy value means good performance, while a large value means poor performance.

The accuracy is presented in the following way. For each mapping function, the average accuracy of each eye of all validation data for one dataset is computed. This means for example that the **CDE O** dataset contain: 7 participants × 4 validation targets × 50 iterations = 1400 accuracy samples.

The accuracy is calculated separately for all three datasets. In order to evaluate the performance of the outlier correction algorithm (see “The outlier correction algorithm”), the accuracy results for the **NDE** dataset without the outlier correction algorithm are also calculated.

#### Calibration plane distortion & waveform robustness

The calibration plane distortion computations were implemented in the following way. If \(\boldsymbol {P}_{PoR1} = \left [\begin {array}{ll} \boldsymbol {v}_{x} & \boldsymbol {v}_{y} \end {array}\right ]^{T}\) and \(\boldsymbol {P}_{PoR2} =\left [\begin {array}{ll} \boldsymbol {w}_{x} & \boldsymbol {w}_{y} \end {array}\right ]^{T}\) are two matrices of dimension *L* × 2 containing gaze estimations, the calibration plane distortion, *μ*, is defined as:

*D*

_{P}. It is defined as:

*=*

**S***d*

*i*

*a*

*g*(

*σ*

_{1},…,

*σ*

_{k}), is computed according to Eq. 15 and

*D*

_{P}∈ [0,1]. The

**P**_{PoR1}and

**P**_{PoR2}matrices correspond to the

*and*

**D***matrices described in “Procrustes calibration”.*

**T**If **P**_{f{k}, NO} is a gaze estimation from mapping function *f*{*k*} from the **CDE - NO** dataset and **P**_{f{k}, O} is a gaze estimation from mapping function *f*{*k*} from the **CDE - O** dataset, where \(f = \{\mathcal {A}_{1}, \mathcal {B}, \mathcal {G}, \mathcal {A}_{4}, \mathcal {P} \}\) and *k* ∈ [0,4], the calibration plane distortion, *μ*_{k}, and the waveform robustness, *ξ*_{k}, for mapping function *k* are defined in Eqs. 28 and 29 respectively.

The results for calibration plane distortion and waveform robustness are presented as empirical *cumulative distribution functions* (*CDF* s), as well as the area under each CDF curve, *A*_{CDF}. The area computations for the calibration plane distortion were bounded to 1° as this is considered a good calibration accuracy (Hansen and Ji, 2010). The area computation for the waveform robustness was bounded to 0.2 as the results from “Waveform robustness and accuracy
examples” showed that Prob(*D*_{P} > 0.2) ≈ 0.01 for the \(\mathcal {G}\) MF. The *A*_{CDF} was adjusted such that *A*_{CDF} ∈ [0,1] by dividing the computed area with the maximum CDF-value for the area computation. Using this definition of the waveform robustness, the *A*_{CDF} for the Procrustes calibration method will be 1.0 be definition.

## Results

### Accuracy

**NDE**dataset with and without OA, it can be seen that the OA improves the accuracy at least for one of the eyes for all

**mapping functions**. The most prominent improvements are seen for the

**mapping functions**with a higher degree of freedom, i.e., \(\mathcal {G}\) and \(\mathcal {A}_{4}\). As expected, the \(\mathcal {G}\) MF achieved the best accuracy for the

**CDE - NO**dataset.

The average and standard deviation of accuracy for all datasets

For the **NDE data** and **CDE - O** data (both with OA) where calibration data fixation inaccuracies are present, the accuracies for the \(\mathcal {A}_{1}\), \(\mathcal {B}\), \(\mathcal {G}\) and \(\mathcal {P}\) mapping functions are approximately the same while the \(\mathcal {A}_{4}\) yields a considerably worse accuracy. The fact that the accuracies are worse for the **NDE** database than for the **CDE - O** database indicates that the true Nystagmus calibration errors are more severe than the simulated ones. If good accuracies are defined as being smaller than or equal to 0.5°, it is difficult to achieve good accuracy with inaccuracies in the calibration data.

### Calibration plane distortion

*A*

_{CDF}results are listed in Table 4. The differences between the results for the vertical and horizontal OKN data within each MF are small. The performance of the \(\mathcal {A}_{1}\), \(\mathcal {B}\) and \(\mathcal {P}\) MFs are quite similar. The results for the other two MFs are worse. This is confirmed by Fig. 7.

**A**_{CDF} Results

Dataset calibration plane distortion ( | \(\mathcal {A}_{1}\) | \(\mathcal {B}\) | \(\mathcal {G}\) | \(\mathcal {A}_{4}\) | \(\mathcal {P}\) |
---|---|---|---|---|---|

Vertical | 0.61 | 0.57 | 0.29 | 0.23 | 0.59 |

Horizontal | 0.59 | 0.57 | 0.31 | 0.18 | 0.58 |

Waveform robustness ( | |||||

Vertical | 0.91 | 0.82 | 0.64 | 0.36 | 1.00 |

Horizontal | 0.93 | 0.85 | 0.80 | 0.43 | 1.00 |

### Waveform robustness

*A*

_{CDF}results are presented in Table 4. The results in Fig. 8 show that the Procrustes calibration method performs the best and the \(\mathcal {A}_{4}\) performs the worst for both the vertical and the horizontal OKN tasks. This is quantified in Table 4. The waveform robustness seems to be linked to the non-linearity of the MF; a higher degree of non-linearity causes worse waveform robustness performance and vice versa.

### Waveform robustness and accuracy examples

*D*

_{P}= 0.05, with a relatively large accuracy value, 2.12°. On the other hand, Fig. 10 illustrates that a Open image in new window accuracy does not guarantee a small waveform robustness value. A

*D*

_{P}value larger than 0.2 is high, since only 1 % of the waveforms generates a higher value in the

**CDE - O**dataset. All waveforms estimations were made using the \(\mathcal {G}\) MF.

## Discussion

In this paper, we investigated the suitability of commonly used calibration mapping functions for data from people with nystagmus and proposed a new approach for calibration of these participants. The new method utilises an outlier correction algorithm based on the experiment geometry and calibrates the eye tracker using Procrustes analysis. Our method was compared to different calibration MFs previously used in nystagmus research. Accuracy and Procrustes distance were used to study the properties of the various MFs. Procrustes distance was used to study waveform robustness, i.e., how well waveform PoR data can be repeated within the same participants despite fixation inaccuracies during the calibration, and calibration plane distortion, i.e., how close, in absolute terms, data with simulated fixation inaccuracies were to data without simulated fixation inaccuracies. Data from people with nystagmus (**NDE**), visually healthy participants (**CDE - NO**) and participants with simulated fixation inaccuracies (CDE - Offset) were included in the study.

The accuracy data show that there is little difference between the \(\mathcal {A}_{1}\), \(\mathcal {B}\), \(\mathcal {G}\) and \(\mathcal {P}\) MFs for the **NDE** and **CDE - O** when using the outlier algorithm. However, when studying the calibration plane distortion presented in Fig. 7 and Table 4 it becomes apparent that the \(\mathcal {G}\) polynomial performs worse compared to the \(\mathcal {A}_{1}\), \(\mathcal {B}\) and \(\mathcal {P}\) MFs. This observation is likely explained by poor performance on interpolated data (the OKN dataset) by the \(\mathcal {G}\) polynomial. The calibration plane distortion thought as an accuracy measure for interpolated data, using the **CDE - NO** as reference. Finally, the results from the waveform robustness in Fig. 8 show that the \(\mathcal {P}\) MF has the best performance Since the Procrustes calibration method is based on linear operations only, the waveform robustness is 1.0 by default. The performances of the other MFs are ordered by their non-linearity; the more non-linear, the worse performance. The overall results show that it is not beneficial to use non-linear mapping functions when working with difficult to calibrate participants. Therefore, Procrustes analysis is the best choice when repeatable calibrations are desirable.

The outlier correction algorithm improved the validation accuracies in all cases. This suggests that there is a potential value in modelling the experiment geometry. Even though our results show that the accuracy alone is not a reliable measure for evaluation of an MF it is still desirable to improve the accuracy as long as it does not affect other properties, such as the waveform. It should be noted that if the distribution of the calibration targets is different from the one presented in this paper, the algorithm needs to be adapted for the specific target constellation. One could try to find the geometric relationship between data and targets for calibration target distributions as well, but that would likely demand a more in-depth analysis of the geometry of the experimental setup. The threshold for detecting an outlier, described in Stage I of the outlier correction algorithm in “The outlier correction algorithm”, is an important parameter for the correction performance. This parameter reflects the maximum deviation that is accepted from the theoretical horizontal distribution of the calibration data. As can be seen in Fig. 4, the foveation position varies spontaneously for people with nystagmus. If the threshold value is set too low, there is a risk to affect the structure of the calibration data. On the other hand, if the threshold is set too high, there is a risk to not detect outliers in the data.

The reason why accuracy is not considered as a good indicator of calibration performance for people with nystagmus are the following: 1) It is difficult to know if the validation data were recorded when the participant looked at the corresponding validation target. The accuracy analysis does not make sense if the participant did not look at the presented target, since the entire point of the validation is to test how well the mapping function transforms PCRV data to some known position. Since gaze estimation is dependent on the calibration, it is not possible to know if poor validation results originate from the calibration or the validation. 2) Data distortion effects, as shown in Fig. 10a, may occur even if the accuracy is considered to be Open image in new window . This is a problem because one will think that the calibration went well, when in reality gaze data do not correspond to the actual eye movements generated by the participant. However, accuracy is a good measure in the sense that it is a unit (degree) that can be compared between recordings and systems.

The distance measure was included to complement the accuracy and it was used to study how the waveform is affected by the calibration. A problem with the calibration plane distortion and waveform robustness measures is that the value may be difficult to interpret. In this paper, we have computed them on the same PCRV dataset for each mapping function, which makes it possible to compare the distance values between the MFs. The results can only be used to find *that* there are differences in the waveform, not the nature of these differences. For the nystagmus case, more specific differences such as foveation duration, amplitude, frequency and the nystagmus waveform, are of interest but are not possible to find using *D*_{P}.

The **CDE - O** used in this work is likely not representative of fixation inaccuracies caused by nystagmus, which the results also indicate; the accuracy of **CDE - NO** is better compared to that of the **NDE**. The idea of making random errors of fixed magnitude does have its limitations and a continuous distribution may possibly be a more realistic representation of the fixation errors for some participants. Signal (1) in Fig. 4 shows that it is possible for the position after the fast phase to vary as much as 4° between cycles. The fixation errors introduced in the CDE - O database are therefore considered reasonable.

The creation of the **CDE - NO** and **CDE - O** databases serves a useful purpose in the sense that we have created two identical PCRV datasets, but with different estimates of the mapping functions. This allowed us to study differences between the tested calibration mapping functions. It is not possible to turn off the nystagmus oscillations for the affected patients, causing this analysis to be impossible to carry out for nystagmus data, since there is no reference waveform to compare the estimations with.

In this work, we tested the EyeLink 1000 Plus system, which is frequently cited in nystagmus research. The applicability of the proposed method for other eye trackers has not been studied.

Finally, the calibration data selection has not been central to the analysis in this paper. It is reasonable to assume that a poor calibration data selection method does have a negative impact on the PoR results, especially considering the results presented in this paper. The adjustments to Dunn’s method (Dunn, 2014) may have influenced the results in this paper. But based on the data we recorded, the method adjustments are considered reasonable. An updated version of the method has recently been developed (Dunn et al., 2018). This method may further improve the accuracy of the algorithm. It should also be noted that the calibration data selection implemented in this work is designed for nystagmus with foveation periods or at least waveforms with a distinct fast phase. As can be seen in Fig. 4, there were no pendular waveforms present in this dataset. For pendular nystagmus waveforms, it is, however, still possible to use the method since the outlier correction algorithm estimates missing data. It is, however, necessary to have at least three recorded data points, one in each row and one in each column, in order for the algorithm to work.

## Conclusion

The Procrustes analysis calibration method was shown to be the best when working with data from participants who have a decreased ability to fixate their gaze during the calibration. The principal difference between the Procrustes calibration method and the other investigated methods was the ability to generate repeatable waveform estimations regardless of the calibration recording condition. The choice of calibration mapping function may have a significant impact on the resulting eye movement estimations, which in turn may decrease the reliability of subsequent data analysis.

## Notes

### Acknowledgments

This work has been funded by the Swedish Research Council [grant number VR 2015-05442]. We gratefully acknowledge the Lund University Humanities Laboratory. We would also like to thank all the participants.

## References

- Abel, L.A., Wang, Z.I., & Dell’Osso, L.F. (2008). Wavelet analysis in infantile nystagmus syndrome: limitations and abilities.
*Investigative Ophthalmology & Visual Science*,*49*(8), 3413–3423.CrossRefGoogle Scholar - Barot, N., McLean, R.J., Gottlob, I., & Proudlock, F.A. (2013). Reading performance in infantile nystagmus.
*Ophthalmology*,*120*(6), 1232–1238.CrossRefPubMedGoogle Scholar - Behrens, F., Mackeben, M., & Schröder-preikschat, W. (2010). An improved algorithm for automatic detection of saccades in eye movement data and for calculating saccade parameters.
*Behavior Research Methods*,*42*(3), 701–708.CrossRefPubMedGoogle Scholar - Blignaut, P., & Wium, D. (2013). The effect of mapping function on the accuracy of a video-based eye tracker. In
*Proceedings of the 2013 conference on eye tracking South Africa*(pp. 39–46): ACM.Google Scholar - Cerrolaza, J.J., Villanueva, A., & Cabeza, R. (2008). Taxonomic study of polynomial regressions applied to the calibration of video-oculographic systems. In
*Proceedings of the 2008 symposium on eye tracking research & applications*(pp. 259–266): ACM.Google Scholar - Dell’Osso, L.F. (2005). Recording and calibrating the eye movements of nystagmus subjects. omlab report 011105, 1–4.Google Scholar
- Dell’Osso, L.F., & Daroff, R.B. (1975). Congenital nystagmus waveforms and foveation strategy.
*Documenta Ophthalmologica*,*39*(1), 155–182.CrossRefPubMedGoogle Scholar - Dell’Osso, L.F., Hertle, R.W., Leigh, R.J., Jacobs, J.B., King, S., & Yaniglos, S. (2011). Effects of topical brinzolamide on infantile nystagmus syndrome waveforms: Eyedrops for nystagmus.
*Journal of Neuro-Ophthalmology*,*31*(3), 228–233.PubMedGoogle Scholar - Dunn, M. (2014). Quantifying perception and oculomotor instability in infantile nystagmus. PhD thesis. Cardiff University.Google Scholar
- Dunn, M.J., Harris, C.M., Ennis, F.A., Margrain, T.H., Woodhouse, J.M., McIlreavy, L., & Erichsen, J.T. (2018). An automated segmentation approach to calibrating infantile nystagmus waveforms. In press.Google Scholar
- Gower, J.C. (1975). Generalized procrustes analysis.
*Psychometrika*,*40*(1), 33–51.CrossRefGoogle Scholar - Hansen, D.W., & Ji, Q. (2010). In the eye of the beholder: A survey of models for eyes and gaze.
*IEEE Transactions on Pattern Analysis and Machine Intelligence*,*32*(3), 478–500.CrossRefPubMedGoogle Scholar - Hertle, R.W. (2010). Nystagmus in infancy and childhood: characteristics and evidence for treatment.
*American Orthoptic Journal*,*60*(1), 48–58.CrossRefPubMedGoogle Scholar - Hertle, R.W., Yang, D., Adams, K., & Caterino, R. (2011). Surgery for the treatment of vertical head posturing associated with infantile nystagmus syndrome: Results in 24 patients.
*Clinical & Experimental Ophthalmology*,*39*(1), 37–46.Google Scholar - Holmqvist, K., Nyström, M., Andersson, R., Dewhurst, R., Jarodzka, H., & Van de Weijer, J. (2011)
*Eye tracking: A comprehensive guide to methods and measures*. Oxford: OUP Oxford.Google Scholar - Hussain, N. (2016). Diagnosis, assessment and management of nystagmus in childhood.
*Paediatrics and Child Health*,*26*(1), 31–36.CrossRefGoogle Scholar - Kumar, A., Shetty, S., Vijayalakshmi, P., & Hertle, R.W. (2011). Improvement in visual acuity following surgery for correction of head posture in infantile nystagmus syndrome.
*Journal of Pediatric Ophthalmology and Strabismus*,*48*(6), 341–346.CrossRefPubMedGoogle Scholar - McLean, R., Proudlock, F., Thomas, S., Degg, C., & Gottlob, I. (2007). Congenital nystagmus: Randomized, controlled, double-masked trial of memantine/gabapentin.
*Annals of Neurology*,*61*(2), 130–138.CrossRefPubMedGoogle Scholar - Naegele, J.R., & Held, R. (1982). The postnatal development of monocular optokinetic nystagmus in infants.
*Vision Research*,*22*(3), 341–346.CrossRefPubMedGoogle Scholar - Peirce, J.W. (2007). Psychopy psychophysics software in python.
*Journal of Neuroscience Methods*,*162*(1-2), 8–13.CrossRefPubMedPubMedCentralGoogle Scholar - Sheena, D., & Borah, B. (1981). Compensation for second-order effects to improve eye position measurements. Eye movements: Cognition and visual perception, pp. 257?-268.Google Scholar
- SR-Research (2010). EyeLink 1000 User Manual.Google Scholar
- Stampe, D.M. (1993). Reliable calibration methods for video-based pupil-tracking systems.
*Heuristic Behavior Research Methods, Instruments, & Computers*,*25*(2), 137–142.CrossRefGoogle Scholar - Tai, Z., Hertle, R.W., Bilonick, R.A., & Yang, D. (2010). A new algorithm for automated nystagmus acuity function analysis. British Journal of Ophthalmology, pp. bjo–2010.Google Scholar
- Taibbi, G., Wang, Z.I., & Dell’Osso, L.F. (2008). Infantile nystagmus syndrome: broadening the high-foveation-quality field with contact lenses.
*Clinical Ophthalmology*,*2*(3), 585–589.PubMedGoogle Scholar - Theodorou, M., & Clement, R. (2016). Classification of infantile nystagmus waveforms.
*Vision Research*,*123*, 20–25.CrossRefPubMedGoogle Scholar - Thomas, S., Proudlock, F.A., Sarvananthan, N., Roberts, E.O., Awan, M., McLean, R., ..., et al. (2008). Phenotypical characteristics of idiopathic infantile nystagmus with and without mutations in frmd7.
*Brain: A Journal of Neurology*,*131*(5), 1259–1267.CrossRefGoogle Scholar

## Copyright information

**OpenAccess**This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.