In the following we present an in depth evaluation of the Tobii EyeX device, of the developed Matlab Toolkit, and of the data that can be obtained from simple eye movement experiments.
In order to reliably measure the accuracy and precision of the device, we designed the following experimental setup. Observers were positioned at ≈700mm from the computer monitor with the head stabilized by a chin and forehead rest. The EyeX Controller was mounted at the bottom of the screen. Fifteen subjects participated in the experiment, all had normal or corrected to normal vision (i.e. they were wearing their prescription contact lenses). Eleven observers were right eye dominant, four were left eye dominant. The subjects underwent the monocular and binocular calibration and test procedures described in “Calibration procedure”. Each procedure was repeated four times per subject in random order. The experiments were run on a PC with an Intel Core i7-4700 CPU @2.40GHz, and 12GB of RAM, connected to a 17 inch LCD with 1920 × 1080 resolution at 60Hz, running on the Windows 7 Professional OS.
Accuracy and precision vs eccentricity
The performance of eye tracking devices may vary as a function of gaze angle away from straight ahead, central fixation. To evaluate the accuracy and precision of the EyeX device as a function of eccentricity away from central fixation, all raw data collected during the monocular and binocular test procedures were pooled together. The data were then separated with respect to the eccentricity of the visual target, computed as its angular distance from the center of the screen. This resulted in eight values of eccentricity, ranging from 0° to ≈12.2°. We observed that the angular error did not follow a Gaussian distribution, but was better described by a Poisson error distribution. Thus, rather than employing mean and standard deviation, we describe the performance metrics in terms of median and inter-quartile range. We therefore report accuracy as the distance between the median gaze estimate and the true target location. Precision is computed as the standard deviation of the estimates of angular gaze position when the eyes are steady and fixating a target.
Figure 2 summarizes the results obtained regarding accuracy (A) and precision (B) of the Tobii EyeX as a function of visual angle. The device performs best at the center of the display. Accuracy worsens slightly at increasing eccentricities, whereas precision is approximately constant (cf. the linear regression lines). Accordingly, near the center of the monitor accuracy and precision can be considered to be < 0.4°, and < 0.2° respectively. At more than 5 degrees away from the center of the monitor, accuracy and precision worsen to < 0.6°, and < 0.25° respectively.
System latency and sampling frequency
To evaluate the device latency for gaze contingent applications, we employed a method similar to that described in Saunders and Woods (2014). We developed a simple gaze contingent display, consisting of two fixed targets and a cursor under the control of the user’s gaze. This simple gaze contingent display was implemented directly in C/C++ as well as with the Matlab Toolkit we developed (the Matlab Toolkit will be further evaluated in the following sections of this paper). We compared the C/C++ gaze contingent implementation against the Matlab gaze contingent implementation to assess whether the UDP server for data communication between Matlab and the Tobii EyeX Engine introduced any additional latency.
Observers were required to execute saccades back and forth between two targets presented on screen. Along with the saccade targets, the on-screen gaze position was displayed as a cursor in real time. While observers were performing the saccade task, we employed a high speed camera to record (at 240 fps) the PC screen and simultaneously the observer’s eye through a mirror. Two observers performed 20 saccades each while the camera simultaneously recorded both their eyes as well as the gaze contingent cursor on the screen.
After having acquired these video sequences, a video editing program (VSDC Free Video Editor) was used to perform a frame by frame inspection of the video sequences. The experimenter identified, for each saccade executed by the subjects, the movie frame in which the eye movement was initiated and the movie frame in which the gaze-controlled cursor began to move across the screen. The experimenter could then unambiguously count the number of frames between the actual eye movement onset and the corresponding response of the on-screen cursor. The total latency with which the system responded to the eye-movement was measured by multiplying the number of elapsed frames by the duration of each camera frame (4.2 ms). The estimated latency thus resulted from the sum of the display latency (hardware) and the gaze computation latency (software). The latency estimated from the data collected on both subjects with the C++ implementation was 48 ± 3 ms (mean ± standard deviation). The latency observed with the Matlab Toolkit was 47 ± 4 ms. These data confirm the reliability of the proposed procedure to estimate latency, since the uncertainty on the latency estimates is primarily due to the temporal resolution of the camera. Although different total latencies may be possible with different display or PC configurations (Saunders and Woods 2014), these data show that the UDP communication link between the Tobii server and Matlab does not appear to influence the system latency.
Because saccadic suppression (Volkmann 1962; Volkmann et al. 1968) or poor sensitivity to high speed retinal images (Dorr and Bex 2013) render a person visually insensitive for about 50 ms from the beginning and end of a saccade, the observed system latency is likely to go unnoticed by human users employing the system for gaze contingent applications.
The sampling rate and sampling variability were estimated from the data collected during the experiments performed to evaluate the calibration procedures, which provided a large quantity of samples. The observed sampling time of the system was 18.05 ± 2.49 ms (median ± inter quartile range), resulting in a median sample frequency of ≈55 Hz, which is slightly lower than the nominal frequency of 60 Hz.
The measurements we have just presented regarding latency and sampling frequency are necessarily system dependent. Thus, as a final consideration, we note that the use of a high performance PC and low-latency monitor are likely to improve the overall performance of the eye tracking system.
Matlab toolkit evaluation
A detailed description of the implemented Matlab Toolkit is presented in the Appendix A. Here we focus on evaluating the calibration procedures we propose and implement in the Toolkit with regards to the accuracy of the gaze measurements.
Comparison between proposed calibration procedure and native Tobii calibration procedure
In order to evaluate the influence of the proposed calibration procedures on the accuracy of the gaze measurements, we further analyzed the data collected as described in “Device evaluation”. We computed the angular error from the data obtained in the test procedure following the TNC, and on the same data corrected with the proposed 5PC, 9PC and 13PC. In Fig. 3 the accuracy for each calibration procedure is plotted as a function of angular distance from the screen center.
The 5PC (red) performs as well as or better than the TNC (blue). The 9PC and 13PC routines consistently outperform the TNC at every target eccentricity.
Moreover, these data were analyzed separately for the dominant and the non-dominant eye. Figure 4 shows the scatter plots of the angular error achieved by the TNC (x-coordinates) compared to the residual error (y-coordinates) after the 5PC (red), 9PC (green) and 13PC (pink). In order to evaluate the trend on the error, the data were fitted with a linear regression line. The horizontal inset represents the histogram of the error computed on the original data calibrated with the TNC, while the vertical inset is the same error computed on the data corrected by the 5PC, 9PC and 13PC. The histograms were computed via kernel density estimation (Botev et al. 2010). The median values are represented on the horizontal inset with a vertical bar, and on the vertical inset with horizontal ones. The figure provides an in-depth characterization of the effect of the different calibration procedures on the accuracy of the gaze measurements.
The histograms show that the error produced by the TNC (blue), as anticipated in “Device evaluation”, has a distribution which is skewed to the left, with a long right tail. These error distributions are well approximated by a Poisson distribution. As expected, the non-dominant eye (see Fig. 4, right column) is characterized by a larger mean gaze error and a wider error distribution with respect to the dominant eye (left column).
All three proposed calibration procedures reduce the mean error, especially at the right tail of the error distribution. This suggests that the proposed calibration procedures have the strongest effect on large errors. The linear regression highlights how the 5PC calibration, which relies on calibration points positioned away from the center of the monitor, reduces large errors at the borders of the monitor, but exacerbates small errors near the center of the display. Conversely, the 9PC (green) and 13PC (pink) procedures, which rely on a finer tiling of the workspace, are able to reduce both small and large errors. Accordingly, the histograms of the error distributions produced by the 9PC and 13PC are characterized by a narrower peak with respect to both the TNC and the 5PC, and by smaller median error values. These results are further confirmed by the regression lines passing through the data calibrated via the 9PC and 13PC. These regression lines fall below the diagonal throughout the error range, demonstrating that the errors are globally reduced.
These data have been further summarized in a table reporting the values of the angular error (mean and standard deviation) computed over the whole dataset (fifteen subjects, four repetitions, twelve test points, see Table 1). The statistical significance of the possible improvements has been assessed using a one-tailed paired-sample t-test, performed between the error produced by the TCP and the error produced by the proposed procedures. Consistent with what we have reported so far, the 5PC only occasionally significantly improved measurement accuracy. Conversely, the 9PC and 13PC always resulted in a statistically significant improvement of the gaze measurement accuracy. As expected, the 13PC, which relies on a larger number of calibration points, outperforms all the other procedures. Accordingly, the proposed procedure has been demonstrated to be equivalently effective in both the monocular and the binocular approaches.
As a final remark it is worth noting that the gaze measurement for the non-dominant eye suffers from larger measurement error with respect to the dominant one (p < 10−3). In agreement with a very recent study (Svede et al. 2015), this results strengthens the notion that careful choice of the appropriate calibration procedure is a mandatory step to increase the accuracy of binocular eye tracking data.
Comparison between single binocular calibration and two independent monocular calibrations for each eye
A further analysis was performed in order to highlight potential differences between monocular and binocular calibration procedures. Depending on the goal of an experiment or application, a binocular calibration might be better suited than a monocular one. For instance, when tracking the point of regard on a 2D screen, as in human computer interaction and gaming (Smith and Graham 2006; Dorr et al. 2007; Sundstedt 2012) or visual attention studies (Hoffman and Subramaniam 1995; Rayner 2009), a binocular calibration might be more appropriate than a monocular calibration. Conversely, if an experimental setup requires precise measurements of the position of each eye, which would be necessary when measuring vergence eye movements or the point of regard in three dimensional space, two separate monocular calibrations, one for each eye, are potentially preferable (Cornell et al. 2003; Gibaldi et al. 2015; Svede et al. 2015; Gibaldi et al. 2016).
In view of the above considerations, we evaluated the effect of performing two independent monocular calibrations and then performing a binocular test, as well as the effect of performing a single binocular calibration and then testing monocularly. The results have been summarized for the three calibration procedures in Table 1. The results show that mixing the couplings between monocular and binocular calibration and testing affects the accuracy of the gaze measurements. A careful inspection of Table 1 shows that data accuracy is best when the test is performed the same way as the calibration (i.e. a monocular test is used with a monocular calibration or a binocular test is used with a binocular calibration). In fact, in most of the measurements in which the monocular/binocular coupling between calibration and test was not preserved, accuracy was worse with respect to the corresponding “correct” coupling (p < 10−2 for 9PC and 13PC). Moreover, the mixed coupling results in a significant increase (p < 10−4) of the error variability in all the measurements.
The case of two monocular calibrations and subsequent binocular testing is particularly interesting: the loss in accuracy in this case is attributable to effects of eye dominance (as discussed above), so even though the accuracy might seem lower, the measurements might be closer to what the experimenter is truly interested in studying (e.g. fixation disparity (Svede et al. 2015)). Defining the appropriate calibration procedure is thus of paramount importance when designing an eye movement study. Within our Toolkit we thus provide the necessary tools to implement the appropriate procedure.
Repeatability of the calibration procedure
The repeatability of the calibration was evaluated from the data collected on the fifteen subjects by computing Pearson’s correlation index between the calibration functions obtained repeating the 13PC procedure four times. Each function was sampled over the screen area covered by the calibration procedure (see Fig. 1b), and the correlation index was computed between each possible coupling of the functions obtained from the four repetitions (i.e. 6 correlation estimates per subject). Table 2 reports mean and standard deviation of the correlation computed across the six estimates and fifteen subjects separately for the monocular/binocular calibration procedures and for the dominant/non-dominant eye. Whereas the calibration functions from different subjects were uncorrelated, the calibration functions from the same subject were consistently correlated independently of tested eye or monocular/binocular procedure (all ρ > 0.5), confirming the repeatability of the calibration procedures.
Eye movement data quality
We have so far shown that the EyeX controller can be successfully employed via the MATLAB framework, and that the device, accessed through the Toolkit we provide, can be calibrated and employed for simple gaze-contingent applications, given the reasonably short system latency. Next, we verify whether it is possible to successfully measure the most common types of eye movements that are typically studied in basic and clinical research settings.
To bring our high resolution fovea onto targets of interest preselected with our low resolution peripheral vision, our oculomotor system continuously makes fast, ballistic eye movements called saccades. Saccades are perhaps the most investigated type of eye movement, thus we devised a simple experiment to verify whether we could successfully measure simple saccadic eye movements.
Experimental setup The experiment was run on a standard PC equipped with Windows 7 Professional, with an Intel Core i7-4700MQ CPU @2.40GHz, and 12GB of RAM, with a 28 inch LCD with 1920 × 1080 resolution running at 60 Hz. Observers were positioned ≈500mm from the monitor, which subtended 70 × 40 degrees of visual angle. Observers were positioned in a chin and forehead rest to stabilize head movements. A 13 point calibration procedure was performed for each observer. The EyeX eye tracker was positioned below the monitor in front of the observers.
Stimulus presentation Observers were instructed to fixate a central red fixation dot presented on a uniformly black screen, and when ready, were required to initiate a trial by pressing a key on the keyboard in front of them. The fixation target would then turn white, and, after a 500 ms delay, the target would jump 10 degrees left. Observers were simply required to visually track the target as accurately as possible. The target would remain at the eccentric position for 750 ms, and then turn red once again and return to the center of the monitor. Each subject performed 50 eye movement trials.
Results Figure 5 shows the results of our measurements of saccade dynamics in three observers. The first subject was an experienced observer (author GM), while second and third subject were naive observers. Figure 5a-c present average horizontal eye position as a function of time from target step for the saccades measured in all three subjects. As can be seen from the shaded regions representing the variability in the measurements, the data collected on the first two subjects (Fig. 5a, b) were highly reliable and accurate, whereas the data collected on the third subject (Fig. 5c) were more variable and particularly less accurate for the subject’s right eye (red trace) than for the subject’s left eye (blue trace). The saccades in all three subjects were initiated between 200-250 ms after the onset of the eccentric target, which is consistent with typical saccade latencies observed in the literature (Saslow 1967; Cohen and Ross 1977). The duration of the saccades was ≈50ms, which is also highly consistent with the literature on similarly sized saccades (Baloh et al. 1975; Bahill et al. 1981; Behrens et al. 2010).
Saccade velocity and saccade acceleration profiles are eye movement characteristics often investigated in the literature. We measured velocity (Fig. 5d-f) and acceleration (Fig. 5g-i) by taking the first and second derivative of the data in Fig. 5a-c using a two point differentiator. Qualitatively, reasonable velocity and acceleration profiles are observable in all subjects. Peak velocity was ≈400d
g/s, whereas peak acceleration and deceleration were ≈18000d
2, all values highly consistent with previous measurements of these parameters in normally sighted subjects (Bahill et al. 1981).
Smooth pursuit eye movements
Another commonly investigated class of eye movements are smooth pursuit eye movements, which allow us to closely track moving objects. We thus set out to verify whether we could reliably measure smooth pursuit eye movements with the Tobii ExeX in another simple experiment.
Experimental setup As in the previous experiment, we employed a standard PC, equipped with Windows 7 Professional, with an Intel Core i7-4700MQ CPU @2.40GHz, and 12GB of RAM, with a 28 inch LCD with 1920 × 1080 resolution running at 60 Hz. Observers were positioned ≈500mm from the monitor, which subtended 70x40 degrees of visual angle. Observers were positioned in a chin and forehead rest to stabilize head movements, and a 13 point calibration procedure was performed for each observer.
Stimulus presentation Observers were instructed to fixate a central red fixation dot presented on a uniformly black screen, and when ready, were required to initiate a trial by pressing a key on the keyboard in front of them. The fixation target would then turn white, and, after a 500 ms delay, the target would begin to move at a constant speed of 10 d
g/s to the right . After one second, the direction of the target would reverse and the target would return to the center of the monitor. Observers were simply required to visually track the target as accurately as possible. Once the target had returned to the starting position, it would turn red and a new trial could be commenced. Each subject performed 50 eye movement trials.
Results Figure 6 shows the results of our measurements of smooth pursuit eye movements in the same three observers as the previous experiment. As in the saccade experiment, the data collected on the first two subjects (Fig. 6a, b) were highly reliable and accurate, whereas the data collected on the third subject (Fig. 6c) were more variable. The typical characteristics (Robinson 1965; Spering and Montagnini 2011) of smooth pursuit eye movements can nonetheless be clearly observed in the data from all three subjects. In the initial open-loop stage of the tracking eye movement, after a latency ranging from 100-300 ms, the eyes accelerate and perform catch up saccades to capture the target. Then, in the closed-loop phase of the tracking eye movement, the eyes of the observers match the position of the moving target quite closely by maintaining the same speed as the target. When the target abruptly changes direction of motion, once again the eyes of the observers catch up and then match the smoothly moving target.
Vergence eye movements
When looking at an object binocularly, our two eyes must rotate in opposite directions to be correctly pointed towards the object. These disconjugate rotatory movements are called vergence eye movements. Vergence eye movements correctly position the retinal areas with highest spatial resolution of both eyes (the foveae) onto the object of interest, and thus facilitate binocular fusion, resulting in a richer perceptual experience of the selected object. Vergence eye movements are another commonly investigated class of eye movements. Thus we designed an experiment to evaluate the usability of the Tobii EyeX in oculomotor research involving eye vergence.
Experimental Setup Observers were positioned in a chin and forehead rest to stabilize head movements, at a distance of ≈1000mm from the screen, i.e. at a vergence distance of ≈3°. Whereas the eye movement measurements described above could be performed using a conventional 2D monitor, the test of vergence eye movements required three-dimensional stimulus presentation. Accordingly, the experiment was conducted with a passive stereo LCD (LG 42LW450A) running at 100 Hz. Observers were required to wear stereoscopic polarized glasses, and a 13P calibration procedure was run monocularly on each subject.
The size of the employed screen (42″) was larger than the screen size (24″) suggested by the manufacturer of the EyeX. However, eye tracking was still possible simply by placing the eye tracker on a stand at 600mm from the observers. To obtain reliable gaze data the device had to be positioned parallel to the screen, as if it were mounted at the bottom of the display.
The experiment was run from a standard PC with an Intel Core i5-2410M CPU @2.30GHz, and 8GB of RAM, equipped with Windows 8.1 OS.
Stimulus Presentation The visual stimulus employed to drive binocular fusion was a flat virtual plane positioned in the center of the screen. The stimulus subtended 10° of field of view to ensure full coverage of the area of the field of view that elicits vergence movements (Allison et al. 2004). The plane was textured with 1/f pink noise, which has the same frequency content of natural images (Kretzmer 1952; Bex and Makous 2002; Jansen et al. 2009). A white fixation cross was presented in the center of the stimulus.
The stimulus protocol was conceived to test both divergence and convergence eye movements. The plane was initially presented with 1° of positive disparity, thus requiring observers to fixate at a vergence distance of 4°. Once a subject was properly fixating (which took ≈2s), the stimulus disparity was set to zero, i.e. the plane would be rendered at the actual depth of the screen, thus inducing a divergence movement. This procedure was repeated 50 times, and alternated with a −1° disparity step, which required a convergence movement.
Results Figure 7 shows the results of our measurements of vergence eye movements in three observers with normal stereo vision. The first subject was an experienced observer (author AG), while the second and third subjects were inexperienced naive observers. Qualitatively we can observe from Fig. 7a-c how the device provides a reliable characterization of the vergence trajectories. The eye movement response delay from stimulus onset was between 100−200 ms, whereas the time required to complete the movement was around 400−500 ms, which is all in good agreement with the literature (e.g. (Hung et al. 1994; Collewijn et al. 1995; Alvarez et al. 2002)). As per the data collected on saccadic eye movements, we measured velocity (Fig. 7 D-F) and acceleration (Fig. 7 G-I) by taking the first and second derivative of the data in Figures 7 A-C using a two point differentiator. Peak velocity was recorded at 3−5 deg/s, while time to peak velocity was between 400−550 ms. The measurements regarding acceleration were noisy, but qualitatively the expected patterns were observed.
Fixation distributions in natural scenes
The distributions of fixations in natural viewing provide an interesting tool to study both (top-down) goal-directed (Schötz et al. 2011) and stimulus driven (bottom-up) mechanisms of attention allocation (Henderson 2003). Scan paths, the screen locations our eyes foveate while visually exploring a scene, are indeed often investigated both in neuroscience as well as marketing research.
The following simple experiment has the goal to verify whether the Tobii EyeX is able to provide metrics of eye movement patterns, as well as the distribution of fixations in an image exploration task.
Experimental Setup Observers were positioned with their head stabilized by a chin and forehead rest at a distance of ≈700mm from the screen. A stimulus image was displayed for 30 seconds, during which time subjects were instructed to freely explore the scene.
The experiment was performed on a standard PC running Windows 7 Professional, with an Intel Core i7-4700MQ CPU @2.40GHz, and 12GB of RAM, with a 17 inch LCD with 1920 × 1080 resolution running at 60 Hz.
Stimulus Presentation The stimuli used for the experiment were 2D rendered images of a 3D virtual workspace representing a kitchen and an office table (Chessa et al. 2009). The workspace was designed to investigate visual behavior in the peripersonal space, and consists of a table (1m × 1m) with ∼ 20 objects positioned at random positions on top of the table (see Fig. 8). The 3D models of the rendered objects were created with a high precision Vivid 910 3D Range Laser Scanner produced by Konica Minolta. The range scanner provides highly accurate 3D meshes (spatial resolution < 1mm) and realistic, high resolution textures (Canessa et al. 2011; Sabatini et al. 2011), that yield a naturalistic perception of the virtual objects.
Results Figure 8 shows the results of our measurements of fixation distribution in three observers. The first subject was an experienced observer (author AG), while second and third subjects were naive observers. Fixation maps of visual scene exploration have been computed as bidimensional histograms. These histograms are represented as contour lines for the left (blue) and right (red) eye, separately. The figure demonstrates how the device provides a sensible characterization of the distribution of fixations during the visual exploration task. Furthermore the Tobii EyeX is able to provide other metrics of eye movement patterns, such as the mean fixation duration (1148 ± 780 ms, mean ± standard deviation), and the amplitude (8.9 ± 5.9 deg, left eye, and 8.26 ± 5.85 deg, right eye) and velocity (336.04 ± 234.82 deg/sec, left eye, and 315.96 ± 200.71 deg/sec, right eye) of saccades executed between fixations. The EyeX might thus be employable to study how multiple aspects of visual perception and action interact to determine gaze behavior (Schötz et al. 2011).
The Matlab code used for the proposed experiments is provided in the Appendix A.