# Mapping and correcting the influence of gaze position on pupil size measurements

- First Online:

DOI: 10.3758/s13428-015-0588-x

- Cite this article as:
- Hayes, T.R. & Petrov, A.A. Behav Res (2016) 48: 510. doi:10.3758/s13428-015-0588-x

## Abstract

Pupil size is correlated with a wide variety of important cognitive variables and is increasingly being used by cognitive scientists. Pupil data can be recorded inexpensively and non-invasively by many commonly used video-based eye-tracking cameras. Despite the relative ease of data collection and increasing prevalence of pupil data in the cognitive literature, researchers often underestimate the methodological challenges associated with controlling for confounds that can result in misinterpretation of their data. One serious confound that is often not properly controlled is *pupil foreshortening error (PFE)*—the foreshortening of the pupil image as the eye rotates away from the camera. Here we systematically map PFE using an artificial eye model and then apply a geometric model correction. Three artificial eyes with different fixed pupil sizes were used to systematically measure changes in pupil size as a function of gaze position with a desktop EyeLink 1000 tracker. A grid-based map of pupil measurements was recorded with each artificial eye across three experimental layouts of the eye-tracking camera and display. Large, systematic deviations in pupil size were observed across all nine maps. The measured PFE was corrected by a geometric model that expressed the foreshortening of the pupil area as a function of the cosine of the angle between the eye-to-camera axis and the eye-to-stimulus axis. The model reduced the root mean squared error of pupil measurements by 82.5 % when the model parameters were pre-set to the physical layout dimensions, and by 97.5 % when they were optimized to fit the empirical error surface.

### Keywords

Pupillometry Pupil foreshortening error Eye tracking Artificial eyeThe human pupillary response correlates with a wide range of important cognitive variables including mental workload (e.g., Hess & Polt, 1964; Kahneman, 1973, Klingner, Tversky, & Hanrahan, 2011), emotional valence (Partala and Surakka 2003), attention (Beatty 1982), working memory (Beatty and Kahneman 1966), arousal (Murphy, Robertson, Balsters, & O’Connell, 2011), decision-making (Einhäuser, Koch, & Carter, 2010), surprise (Preuschoff, Marius, & Einhäuser, 2011), and uncertainty (Nassar et al. 2012). While the underlying mechanisms that drive cognitive pupillary effects are still an active topic of inquiry, recent data suggests that the pupillary response may reflect noradrenergic activity in the brain (Murphy, O’Connell, O’Sullivan, Robertson, & Balsters, 2014; Aston-Jones & Cohen, 2005; Koss, 1986; Samuels & Szabadi, 2008). As a result of these recent developments, pupil size is increasingly being used by cognitive scientists as an important measure of cognitive processing.

Pupil data can be collected inexpensively and non-invasively by most modern image-based eye-tracking systems, but researchers often underestimate the methodological challenges in producing well-controlled studies that allow for cognitive interpretations of the pupillary response (Holmqvist et al. 2011). Most researchers are aware of the most serious confound—the pupillary light reflex. The pupil diameter depends first and foremost on the luminance of the stimulus and the ambient illuminance of the experimental room. Changes in either of these variables can produce strong light reflexes that modulate pupil size by as much as 50 % (Miller and Newman 2005; Loewenfeld 1993). Cognitive researchers control for this confound by using isoluminant stimuli and maintaining constant room illumination. A lesser-known yet equally problematic confound is *pupil foreshortening error* (*PFE*). Changes in gaze position produce foreshortening of the pupillary image because the eye-tracking camera is fixed but the eye rotates. Specifically, as the eye rotates away from the camera, the pupil image becomes more elliptical and shrinks in apparent area by as much as 10 %. This is a large margin compared to the magnitude of the cognitive effects, which rarely exceed 5 % change in pupil size. Despite its potential seriousness, the PFE is not corrected in many commonly used remote eye-tracking systems (e.g., Tobii, Tobii Technology AB, 2010, and EyeLink, SR Research, 2010). All too often, the PFE is not controlled by the experimental design either (Brisson et al. 2013). The combination of these factors has led to an increasing number of published studies with inadequate control for PFE.

There are currently two different methods aimed at circumventing PFE using experimental design. The most conservative method—and the one recommended by most commercial eye-tracking providers—is to have participants maintain constant fixation throughout each trial (e.g., SR Research, 2010, p.28). The constant fixation method is implemented by defining a small fixation boundary area and verifying that the eye position remains within this area throughout the pretrial fixation baseline and stimulus presentation periods. This minimizes PFE by keeping the optical axis of the eye at a fixed angle from the camera. Standard baseline normalization is typically used in conjunction with the constant fixation method, where the task-evoked pupillary response is measured as the absolute difference in millimeters or percent change between the stimulus presentation and pretrial baseline periods (Beatty & Lucero-Wagoner, 2000). However, the constant fixation method has two serious drawbacks. First, it limits the types of tasks that can be studied to those with a constant fixation location. Unfortunately, this rules out studies of important topics such as visual search or reading. Second, it is hard for participants to consistently maintain fixation for long periods of time. In addition, the pupillary response lags behind the stimulus by about 400 ms, peaking around 1-2 s post-stimulus, which necessitates longer baseline and stimulus durations where participants must maintain fixation (Partala and Surakka 2003; Murphy et al. 2011; Hayes and Petrov 2015). This results in a high number of invalid trials that increases as the stimulus duration increases. Another PFE-mitigation method that is sometimes recommended by eye-tracking providers (SR Research 2010, p. 98) is stimulus-position counterbalancing. In these designs, there are several positions where the stimulus can appear, but the number of presentations is counterbalanced across trials. This method still does not allow the study of tasks that require free viewing of the stimulus. Neither does it allow for comparisons of physical pupil size among different gaze locations because PFE varies across the visual field (Gagl, Hawelka, & Hutzler, 2011). In sum, though effective at mitigating the PFE, both methods have serious limitations.

A completely different approach to the problem is to measure PFE and then correct the pupillometric data prior to analysis (Gagl et al. 2011; Brisson et al. 2013). Gagl et al. (2011) recently pioneered this approach with the aid of an artificial eye model with a fixed pupil size. The artificial eye was moved across a single horizontal scan line, emulating a sentence reading task. The PFE could thus be measured explicitly and a correction formula was developed. The horizontal extent of the sentence stimuli spanned from −17° to +9° from the screen center. This resulted in systematic pupillometric error from +5 % to −13 %, respectively, relative to the true pupil area of the artificial eye. These data allowed the development of a mathematical model of the PFE. This model was used to correct the empirical data from a sentence-reading task and a “Z-string”- reading task in which participants scanned words composed entirely of the letter Z. When the measurement error was corrected, previously reported discrepancies between the pupillary response during word recognition (Kuchinke, Võ, Hofmann, & Jacobs, 2007; Võ et al., 2008) and sentence-reading tasks (Just & Carpenter, 1993; Raisig, Welke, Hagendorf, & Meer, 2010) were revealed to be artifacts of PFE (Gagl et al. 2011). This pioneering study illustrates the importance of accounting for PFE and how it can lead to incorrect cognitive interpretations if not properly controlled or corrected. One major shortcoming of this study is that it only mapped a single horizontal scan line. The resulting correction formula is applicable to this special case only.

Brisson and colleagues 2013 used an object pursuit task to characterize the PFE of three popular eye-tracking systems (Tobii X120, Tobii T120, and EyeLink 1000) across both horizontal and vertical changes in gaze position. Forty-four human participants tracked a circle moving counterclockwise across the display in an elliptical pattern that covered up to 22° of horizontal visual angle and up to 14° of vertical visual angle (depending on the system). The results showed substantial measurement errors in pupil diameter as a function of gaze position in all three systems. A linear regression model using X and Y gaze coordinates as predictors was able to explain between 9 % and 20 % of the pupil variance depending on the system and the maximum visual angle. While these data provided important new insights into the extent of PFE with human participants across multiple eye-tracking systems, the regression-based correction procedure only accounted for a small amount of the total PFE. Furthermore, even low-effort tasks such as object tracking require attentional mechanisms and cognitive effort that induce pupillary variability (Alnaes et al. 2014; Beatty 1982; 1982b; Kahneman 1973) that complicates the estimation of the foreshortening error.

## Experiment 1

The *EyeLink 1000 User Manual* (SR Research 2010, p. 98) states that, “The pupil size data is not calibrated, and the units of pupil measurement will vary with subject setup. Pupil size is an integer number, in arbitrary units. … Pupil size measurements are affected by up to 10 % by pupil position, due to the optical distortion of the cornea of the eye, and camera-related factors”.

This statement is very vague. It does not even specify the type of scale for these “arbitrary units”. A measurement procedure establishes a correspondence between a set of numbers and a set of objects with respect to some attribute of interest—pupil diameter in our case. It matters greatly what types of relationships are preserved by this mapping (Torgerson 1958). An *ordinal scale* only preserves comparative relations (<, =, >). An *interval scale* preserves distances but not ratios. Thus, it is meaningful to calculate means, differences, and standard deviations among interval-scale measurements, but they do not warrant inferences of the form, “the pupil diameter in condition A is 10 % greater than that in condition B”. The latter kind of statement asserts a multiplicative relationship (*A* = 1.1×*B*) that is warranted only for a *ratio scale*—that is, for an interval scale with a *true origin*. Ratio scales map the number zero to the (possibly hypothetical) object that altogether lacks the attribute in question.

Ideally, we want to measure pupil diameter on a ratio scale with standard units such as millimeters. The next best method is to measure it on a ratio scale whose units are proportional to millimeters, although the coefficient of proportionality is not specified. This is one possible interpretation of the phrase “arbitrary units”. The coefficient of proportionality may vary with experimental setup, as the *Manual* warns, but as long as all measurements within a given setup form a ratio scale, they still support multiplicative operations. This is important because, as we show below, the geometric foreshortening law is multiplicative: The foreshortened diameter equals the true diameter *times* the cosine of a certain angle.

The *Manual* provides very little information about how the pupillometric data are acquired and processed by the proprietary software on the host computer. This raises the disconcerting possibility that the “arbitrary units” may only form an interval scale. This could occur, for instance, if the software added an unspecified constant before writing the numbers to the data file. This would render invalid all multiplicative operations with these measurements. In particular, it would undermine the common practice in the literature to report the pupillometric data in terms of percent change from baseline.

Finally, the “arbitrary units” may form merely an ordinal scale. This could happen, for instance, due to “the optical distortion of the cornea … and camera-related factors”. In this case, the information content of the pupillometric data would be very low and they should be analyzed using ordinal statistical methods (Agresti 1984).

The purpose of this preliminary experiment is to collect calibration data to characterize the type of scale used for EyeLink 1000 pupillometry. The results indicate that the “arbitrary units” form a ratio scale with a layout-dependent coefficient of proportionality to millimeters.

### Method

We used an EyeLink 1000 desktop eye tracker (SR Research 2010) at a sampling rate of 1000 Hz. The tracking mode was set to “pupil only,” the pupil threshold parameter to 110, and the eye-to-track parameter to “Left”. Pupil area was measured using centroid mode throughout the study. Centroid mode computes pupil area using a center-of-mass algorithm that identifies the number of black pixels in the thresholded pupil and its center on the camera image. The *User Manual* recommends the center-of-mass algorithm over the ellipse fitting algorithm because “it has very low noise” (SR Research 2010, p. 71).

### Results and discussion

*k*

_{n}= 10.07 for near,

*k*

_{m}= 9.65 for medium, and

*k*

_{f}= 8.43 for the far layout in Fig. 3. Each of these linear regressions accounted for essentially all the variance (

*R*

^{2}>.999) of the 11 measurements within a given layout. Importantly, adding a free intercept parameter to the regression equation did not significantly improve the fits, and the intercept estimates were negligible (<0.5 AU). This indicates that 0 arbitrary units always correspond to 0 mm regardless of layout, thereby satisfying the true-origin requirement for ratio scales.

Furthermore, the slope coefficients *k* were themselves approximately inversely proportional to the eye-to-camera distances, which were *L*_{n} = 584, *L*_{m} = 610, and *L*_{f} = 698 mm for the three layouts.^{1} To verify this inverse proportionality, we calculated products of the form *p*_{i} = *α**k*_{i}*L*_{i}, where *i* ∈ {*n*, *m*, *f*} and *α* ≈1.70×10^{−4} is a re-scaling parameter determined in the next paragraphs. If each slope coefficient is inversely proportional to the eye-to-camera distance regardless of layout, then *p*_{i} should equal unity for any *i*. In our data, we obtain *p*_{n} = 0.9999, *p*_{m} = 1.0002, and *p*_{f} = 0.9998.

This suggests that the “arbitrary units” are not really units of length but of visual angle. Apparently, the EyeLink 1000 in centroid mode reports the *angular* area subtended by the pupil. A straightforward algorithm for estimating this angular area—and probably the algorithm implemented by the EyeLink software—is to count the “pupil” pixels in the image of the eye. This interpretation is consistent with the statement in the *User Manual* that the pupil area is recorded in “scaled image pixels” (p. 17). Taking the square root transforms the angular area into a linear visual angle *ϕ*, which is approximately ^{2} proportional to the ratio of the true pupil diameter *d* and the viewing distance: *ϕ*∝*d*/*L*.

In this equation, the angle *ϕ* subtended by the pupil is reported by the EyeLink 1000 in “arbitrary units,” *L* denotes the eye-to-camera distance in millimeters, and the re-scaling parameter *α* is in radians per arbitrary unit. We estimate *α* from the relationship *d*_{j}/(*ϕ*_{ij}*L*_{i}) = *c**o**n**s**t* across the 11 pupils *j* and three layouts *i*. This ratio has a mean of *α* = 1.70 × 10^{−4} rad/AU ≈ 35.1 arcsec/AU in our data set (SD = 7.20 × 10^{−7} rad/AU ≈ 0.149 arcsec/AU). With just one free parameter, (1) accounts for 99.99 % of the variance of the 33 calibration measurements.

In conclusion, experiment 1 established that EyeLink’s “arbitrary units” form a ratio scale with coefficient *α**L* depending on the eye-to-camera distance *L* for each fixed layout. Also, the pupil-size data seem to track the visual angle subtended by the pupil. Consequently, the units can be converted across layouts according to Eq. 1. These results were obtained with very simple artificial pupils printed on a flat surface. The next experiment investigates the pupil foreshortening error caused by the rotation of spherical artificial eyes relative to the camera.

## Experiment 2

When an eye is photographed from an oblique angle, the image of the pupil becomes elliptical and the apparent pupil area decreases (e.g., Jay, 1962; Spring & Stiles, 1948). The resulting pupil foreshortening error (PFE) depends on the viewing angle. In a typical eye-tracking setup, the camera is stationary relative to the screen but the eye rotates in its socket, thereby varying the angle between the optical axis of the camera and the plane of the pupil. We constructed spherical artificial eyes that could rotate in artificial sockets and could be pointed to “fixate” arbitrary points on the screen. The purpose of experiment 2 was to map the relationship between EyeLink’s pupillometric data and the screen coordinates of a grid of fixation points. The results indicated that the measurements were contaminated by substantial PFE, but the magnitude of the error depended systematically on the cosine of the angle between the eye-to-camera axis and the eye-to-stimulus axis. Because the PFE was systematic, it could be corrected very well by a simple geometric model.

### Method

### Apparatus

All EyeLink 1000 settings were the same as those in experiment 1, with the exception of the pupil threshold parameter. The latter was lowered from 110 to 60 as the spherical artificial eyes with pupillary wells required a lower threshold than the flat pupils printed on paper. These settings provided a reliable thresholded pupil signal across all measured conditions.

### Artificial eye model

Three artificial eyes were manufactured from three solid wooden spheres, each 31 mm in diameter. The production of each eye began by drilling a 0.79-mm pilot hole all the way through the center of the wooden ball using a Skil 3320 10” benchtop drill press and DeWalt titanium pilot drill bits. Then a well was drilled 20 mm deep into each eye to form the artificial pupil (see Fig. 2b, c). The three eyes had well diameters of 3.17, 4.76, and 7.14 mm, respectively. The inside of each well was painted black and the rest of the eye was painted white using tempera washable paint. Finally, a pressure-switch laser sight (<5 mW) was collimated within a 100-mm section of PVC pipe (25 mm in diameter) that was attached to a second 30-mm section of PVC pipe (15 mm in diameter) that was firmly cemented to each artificial eye using epoxy cement (Fig. 2c). The laser beam originated approximately 75 mm behind the eyeball and traveled to the screen through the .79-mm hole. The pressure switch allowed us to (de)activate the laser without disturbing the eye orientation in the socket.

The artificial eye was kept at a constant position within the headrest using a simple ocular socket mechanism (see Fig. 1). This mechanism was composed of four pieces of wood (sides 131 × 13 × 28 mm; top and bottom 13 x 63 x 38 mm), two steel mending plates (140 × 0.9 × 13 mm), two threaded steel rods (152 mm long), four wing nuts, and two rubber grommets. Two holes were drilled through the wooden sides for the threaded rods—one 14 mm from the top and one 26 mm from the bottom. Two circular recessed holes (25 mm in diameter) were drilled 4 mm deep into the inside of each wooden side piece (centered horizontally and vertically) to hold the rubber grommets firmly in place. The ocular socket was then assembled as shown in Fig. 2d. The threaded rods were placed through each side piece with a mending plate on each outside edge held in place by the four wing nuts. The wing nuts could then be tightened down to firmly hold the current artificial eye in place and loosened to remove and replace the eye when needed. The elasticity of the rubber grommets allowed us to manipulate the elevation and azimuth of the artificial eye within the socket, while at the same time holding firmly when the desired orientation was reached.

The standard EyeLink 1000 nine-point calibration and validation procedures were used to quantify the accuracy of the artificial eye and socket apparatus. The nine-point validation data indicated high gaze-position accuracy across the display (average mean error M =0.18, SD =0.04; maximum error M =0.26, SD =0.04 degrees of visual angle). The validation data verified that the laser pointers were collimated accurately and the ocular socket kept the artificial eyes in a stable orientation throughout the procedure.

### Gaze position map and procedure

Pupil size measurements were collected with each artificial eye on a rectangular grid of fixation targets for each experimental layout. For the near and far layouts, we used 16×12=192 targets spaced 64 pixels apart (see Fig. 2e top); for the medium layout, we used 8×6=48 targets spaced 128 pixels apart (at screen resolution 1024×768).

On each trial, we rotated the eye until the pointer beam illuminated the appropriate target on the screen. A box shielded the eye tracking camera as a precaution against accidental exposure to the beam. As in experiment 1, pupil-area measurements were recorded for 5 s at 1000 Hz, converted to linear units by taking square roots, and averaged across the 5000 samples. This produced one pupil-diameter datum per trial.

### Geometric model

The results from experiment 1 suggested that the EyeLink 1000 (operating in centroid mode) reports the *angular* area subtended by the pupil. Based on simple geometric principles, (1) accounted extremely well for the pupillometric data when the distance between the camera and the eye was varied. Our working hypothesis is that the same principles can account for the pupillometric data when the *angle* between the camera axis and the eye axis is varied. In this section, we develop a geometric model that formalizes this hypothesis and provides explicit formulas for correcting the pupil foreshortening error.

The key term in the model is the *oblique angle**θ* between the eye-to-camera axis and the eye-to-stimulus axis (Fig. 3d). To simplify the trigonometric calculations, the model assumes the camera is pointed directly at the eye and thus the eye-to-camera axis coincides with the optical axis of the camera. Let us consider first the *baseline configuration* in which the eye too is pointed directly at the camera. Then the eye axis coincides with the camera axis and *θ* = 0. Let *A*_{0} denote the angular area subtended by the pupil as viewed from the camera in the baseline configuration.

*A*(

*x*,

*y*) diminishes when the eye rotates away from the camera to fixate a target with coordinates

*x*and

*y*on the screen. The derivation proceeds in two steps: First, we argue that the foreshortening is multiplicative and depends on the oblique angle

*θ*according to Eq. 2. Second, we express cos

*θ*as a function of

*x*and

*y*using simple vector calculus.

We consider a circular pupil for concreteness, although the mathematical derivation generalizes to arbitrary planar shapes. When a circular disk is viewed from an oblique angle, it is projected into an ellipse and its apparent area decreases (Gagl et al. 2011; Jay 1962; Mathur et al. 2013; Spring and Stiles 1948). Importantly, the projected pupil becomes narrower in the direction of view (the *tangential section*) but remains unchanged in the perpendicular direction (the *sagittal section*, (Equation 3.2) in Atchison & Smith, 2000). This is why the multiplier in Eq. 2 above is cos*θ* rather than cos^{2}*θ*. To simplify the analysis, we have assumed that the eyeball diameter is negligible relative to the distance to the camera and, consequently, the rotational center of the eyeball lies approximately in the pupil plane. This seems a reasonable assumption for desktop trackers such as the one used in this article, but probably becomes very crude for head-mounted trackers. Taking the eyeball geometry into account does not change any of the principles here, but complicates the trigonometry. Another assumption that is implicit in Eq. 2 is that the camera has negligible optical aberrations near the center of its field of view.

*θ*as a function of the target coordinates

*x*and

*y*. It is convenient to work in a coordinate system with an origin O at the pupil center and axes as described in the caption of Fig. 3d. The camera lens is at point C with coordinates

*C*

_{x},

*C*

_{y}, and

*C*

_{z}, which are parameters of the physical layout (Fig. 3). The other parameters of the layout are the coordinates

*S*

_{x},

*S*

_{y}, and

*S*

_{z}of the upper left-hand corner of the screen. Then a fixation target T with screen coordinates

*x*and

*y*mm has coordinates

*T*

_{x}=

*x*−

*S*

_{x},

*T*

_{y}=

*y*−

*S*

_{y}, and

*T*

_{z}=

*S*

_{z}in the eye-centered system. In this notation, the oblique angle

*θ*is the angle COT in Fig. 3d and its cosine can be calculated via the dot product of the vectors OC and OT:

*A*is converted to arc length

*ϕ*using the relationship \(\phi = 2 \sqrt {A / \pi }\). Taking square roots of both sides in Eq. 2 and rearranging leads to Eq. 4, in which

*ϕ*

_{0}denotes the angle subtended by the pupil diameter in the baseline configuration and

*ϕ*(

*x*,

*y*) denotes the apparent angle when the artificial eye points at a target with screen coordinates

*x*and

*y*.

*d*can be determined by pointing the artificial eye directly at the camera (or asking the human participant to look at the camera), obtaining EyeLink measurements of the subtended angle

*ϕ*

_{0}in “arbitrary units,” and converting them to millimeters according to Eq. 1. The pupil foreshortening error

*P*

*F*

*E*(

*x*,

*y*) =

*ϕ*(

*x*,

*y*)/

*ϕ*

_{0}at fixation point (

*x*,

*y*) is \(\sqrt {\cos \theta (x,y)}\). For an artificial eye with a constant pupil, the geometric model predicts that the ratios in Eq. 4 will be invariant for all points (

*x*,

*y*). This prediction is tested below.

Two versions of the geometric model were compared: a parameter-free model and an optimized model. The parameter-free model simply calculated cos*θ* according to Eq. 3 using the physical measurements taken directly from each experimental layout (Fig. 3). The relevant measurements are the camera lens coordinates (*C*_{x}, *C*_{y}, *C*_{z}) and the coordinates (*S*_{x}, *S*_{y}, *S*_{z}) of the upper left-hand corner of the screen for the respective layout. The parameter-free model is an idealization that does not take into account the error of the layout measurements or the optics of the camera lens itself, which can alter the effective geometric layout. In order to account for these additional sources of error, an optimized model was also considered. The optimized model fit five parameters *C*_{x}, *C*_{y}, *S*_{x}, *S*_{y}, and *S*_{z}; the *C*_{z} parameter was fixed to the respective physical layout measurement. The optimized model was fit to the pupillometric data separately for each layout using an unconstrained Nelder–Mead optimization routine that minimized theroot mean squared error. The initial values for the iterative parameter search were set to the physical layout measurements.

### Results and discussion

PFE was invariant across the three different artificial eye diameters within a given experimental layout. Principal-component analysis (PCA, Everitt & Dunn, 2001) was performed to estimate the intra-map similarity across the three pupil diameter maps within the near, medium, and far layouts. The first principal component accounted for nearly all the variance across the three pupil diameter maps (near 99.8 %; medium 98.9 %; far 98.9 %). The invariance across changes in pupil diameter is a critical finding as it establishes that the PFE surface is not affected by the changes in pupil diameter that occur when measuring a dynamic, biological eye. Due to the invariance across pupil diameter the maps were collapsed by taking the geometric mean across the three different pupil diameter maps and normalizing relative to the geometric mean within a given experimental layout, resulting in one aggregate map for each experimental layout. The remaining results focus on modeling and correcting PFE in the aggregate layout maps.

Descriptive statistics of the deviation in pupil diameter from the geometric mean for the nine empirical maps (3 experimental layouts x 3 pupil sizes). *RMSE* root mean squared error

RMSE | Min | Max | |
---|---|---|---|

| |||

Pupil diameter 3.17 | 0.034 | 0.916 | 1.058 |

Pupil diameter 4.76 | 0.032 | 0.918 | 1.053 |

Pupil diameter 7.14 | 0.038 | 0.905 | 1.060 |

| |||

Pupil diameter 3.17 | 0.025 | 0.940 | 1.044 |

Pupil diameter 4.76 | 0.026 | 0.938 | 1.043 |

Pupil diameter 7.14 | 0.024 | 0.944 | 1.037 |

| |||

Pupil diameter 3.17 | 0.022 | 0.945 | 1.039 |

Pupil diameter 4.76 | 0.019 | 0.951 | 1.030 |

Pupil diameter 7.14 | 0.019 | 0.951 | 1.030 |

*ϕ*

_{0}. As a result of the normalization and aggregation described above,

*ϕ*

_{0}= 1=

*c*

*o*

*n*

*s*

*t*for each map and all residual variation is due to foreshortening error. The parameter-free model reduced the root mean squared error of the aggregated measurements by an average of 82.5 %. The correction is applied according to Eq. 4 by dividing the measured pupil diameter at each grid target location by the square root of the cosine of its corresponding angle. Figure 6 shows the empirical aggregate layout maps before correction, the geometric model corrective multipliers, and the corrected maps. There was a strong correspondence between the three aggregate maps and their respective multipliers (near R

^{2}=.99, medium R

^{2}=.98, and far R

^{2}=.98). The parameter-free geometric correction greatly reduced the RMSE to 18.7, 15.8, and 18.2 % of its original value for the near, medium, and far layout. Table 2 shows the descriptive statistics before and after applying the parameter-free correction.

Descriptive statistics of the deviation in pupil diameter from the geometric mean for the aggregate layout maps, the corrective multipliers generated by the parameter-free (PF) model and the model with optimized (opt.) parameters, and the corrected data (parameter-free and optimized). *RMSE* root mean squared error

RMSE | Min | Max | |
---|---|---|---|

| |||

Aggregate data | 0.034 | 0.913 | 1.056 |

Model multipliers (PF) | 0.029 | 0.925 | 1.046 |

Corrected data (PF) | 0.006 | 0.987 | 1.014 |

Model multipliers (opt.) | 0.034 | 0.912 | 1.056 |

Corrected data (opt.) | <0.001 | 0.998 | 1.002 |

| |||

Aggregate data | 0.025 | 0.941 | 1.040 |

Model multipliers (PF) | 0.025 | 0.944 | 1.038 |

Corrected data (PF) | 0.004 | 0.993 | 1.009 |

Model multipliers (opt.) | 0.025 | 0.941 | 1.039 |

Corrected data (opt.) | <0.001 | 0.998 | 1.002 |

| |||

Aggregate data | 0.020 | 0.949 | 1.033 |

Model multipliers (PF) | 0.019 | 0.952 | 1.030 |

Corrected data (PF) | 0.004 | 0.992 | 1.008 |

Model multipliers (opt.) | 0.020 | 0.947 | 1.033 |

Corrected data (opt.) | <0.001 | 0.998 | 1.002 |

While the parameter-free geometric model reduced the RMSE in pupil diameter substantially, it left a small amount of residual error that was systematic and thus could be corrected further. Specifically, close inspection of the third column in Fig. 6 suggests that the correction generated by the parameter-free model was not strong enough to fully correct the “tilt” of the uncorrected maps in the first column. Thus, even after the parameter-free correction, the pupil diameter remained slightly underestimated for fixation targets near the left edge of the screen and slightly overestimated for targets near the right edge.

*stationary*pupil across trials. This indicates that the PFE can be eliminated entirely for practical purposes, at least for the artificial eyes considered here, provided enough calibration measurements are available to constrain the parameters of the geometric model.

Comparison of geometric models (parameter-free physical measurements vs. optimized parameters) for near, medium, and far layouts. All values are distances in mm. Cx, Cy, Cz are the coordinates of the camera; Sx, Sy, Sz are the coordinates of the upper left-hand corner of the screen (cf. Fig. 3). An *asterisk* indicates the parameter was fixed to the physical layout measurement

Model Parameters | Near | Medium | Far |
---|---|---|---|

| |||

Physical | 92 | 92 | 92 |

Optimized | 130 | 165 | 183 |

| |||

Physical | -310 | -310 | -310 |

Optimized | -215 | -239 | -230 |

| |||

Physical | 495 | 525 | 625 |

Optimized | 495 | 525 | 625 |

| |||

Physical | -163 | -163 | -163 |

Optimized | -142 | -87 | -76 |

| |||

Physical | 58 | 58 | 58 |

Optimized | 206 | 140 | 156 |

| |||

Physical | 740 | 835 | 935 |

Optimized | 736 | 851 | 937 |

It is instructive to compare the optimized parameter values in Table 3 to their physical counterparts. The distance *S*_{z} from the eye to the screen was constrained well by the data but there were surprisingly large deviations in the other four parameters. The optimized values of *C*_{x} and *S*_{x} would be veridical if the artificial eye were located ≈ 60 mm closer to the left edge of the screen (on average across the three layouts). The optimized values of *C*_{y} and *S*_{y} would be veridical if the eye were located ≈ 90 mm lower. The overall effect was to produce greater variation in the oblique angle *θ* in Eq. 3 across each map, which in turn produced stronger correction multipliers. The within-map RMSE of the optimized model multipliers were larger than their parameter-free counterparts and better matched the RMSE of the calibration data (Table 2). We speculate that the optimized parameters captured the magnification effect of the optics inside the eye-tracking camera. Equations 3 and 4 essentially model a pinhole camera, whereas the real device includes lenses on the optical path. This is a topic for further research. The theoretical significance of the optimized model fit is that the PFE surfaces were smooth and could be parameterized with a few numbers that could be estimated from calibration data.

*C*

_{x}and

*S*

_{x}model parameters (Fig. 3). A recent survey of 3976 adults (Dodgson 2004) reports a median interpupillary distance of 63 mm. We thus assume

*C*

_{x, right}=

*C*

_{x, left}− 63 and

*S*

_{x, right}=

*S*

_{x, left}− 63. Table 4 summarizes the predicted PFE for the left and right monocular cases and their binocular average. Binocular pupil recording would have decreased the pupil foreshortening RMSE only by about 1 % relative to the monocular recording from the left eye. It should be noted that if we had used the right eye position, binocular averaging would have

*increased*the RMSE of the PFE by 1 %. This is because there is less PFE for whichever eye is closer to the camera, all else being equal. In sum, the geometric model predicts that binocular data acquisition would have a negligible impact on the PFE.

Predicted deviation in pupil diameter from the geometric mean for each experimental layout for monocular (either left or right eye only) and binocular recording. The corrective multipliers were generated by the parameter-free (PF) model assuming an interpupillary distance of 63 mm. *RMSE* root mean squared error

RMSE | Min | Max | |
---|---|---|---|

| |||

Left eye (PF) | 0.0294 | 0.9247 | 1.0463 |

Right eye (PF) | 0.0289 | 0.9305 | 1.0453 |

Binocular (PF) | 0.0291 | 0.9276 | 1.0457 |

| |||

Left eye (PF) | 0.0251 | 0.9436 | 1.0381 |

Right eye (PF) | 0.0246 | 0.9484 | 1.0368 |

Binocular (PF) | 0.0248 | 0.9460 | 1.0374 |

| |||

Left eye (PF) | 0.0190 | 0.9519 | 1.0303 |

Right eye (PF) | 0.0186 | 0.9554 | 1.0294 |

Binocular (PF) | 0.0188 | 0.9536 | 1.0298 |

## General discussion

This article reports two experiments that investigate and calibrate the pupillometric measurements of a desktop EyeLink 1000 eye tracker (SR Research 2010) using artificial eyes with known pupil diameter. Experiment 1 established that EyeLink’s “arbitrary units” form a ratio scale with a true zero. This result justifies the common practice in the field to express *relative* pupil diameters as dimensionless *ratios*—e.g., percent change from baseline. When absolute measurements are needed, they can be calculated for a given arrangement of the eye tracking camera, headrest, and monitor: The “arbitrary units” are proportional to millimeters but the coefficient of proportionality depends on the eye-to-camera distance (Eq. 1). Furthermore, the results of experiment 1 strongly suggest that EyeLink’s pupil-size data are expressed not in units of linear length but of the angle subtended by the pupil as viewed from the camera. These units can be converted across layouts according to Eq. 1.

Experiment 2 mapped the pupil foreshortening error (PFE) when the artificial eye rotated in its socket to fixate a grid of target points on the screen. Data were collected across three experimental layouts with spherical artificial eyes with three pupil diameters. The nine resulting maps showed large PFE that increased as a monotonic function of the *oblique angle* between the eye-to-camera axis and the eye-to-target axis. The results supported three major conclusions as follows: First, the *relative* PFE maps were highly replicable across different pupil sizes, as evident in the high (*R*^{2}>.98) correlation among maps collected with different artificial eyes in a given experimental layout. This suggests that the relative PFE varies only as a function of the orientation of the eye with respect to the camera but does not depend on the pupil diameter. In other words, the eye-tracking hardware and software operate linearly with respect to pupil diameter. Second, a parameter-free geometric model reduced the error to <18 % of its uncorrected value. This suggests that the PFE is largely due to geometric foreshortening of the type captured by Eqs. 3 and 4. Note that the foreshortening follows a multiplicative law (Eq. 4). The third major conclusion of experiment 2 was that model-based optimization could reduce the error to levels comparable to the measurement precision for a stationary pupil. In other words, the foreshortening error induced by the rotation of the eye could be eliminated almost entirely. Such accurate correction is possible because the PFE surface is smooth and can be described with five parameters that can be estimated from calibration data, at least for artificial eyes.

On the basis of these empirical results, we formulate three theoretical principles. They seem obvious in hindsight but were neglected in earlier PFE research (e.g., Brisson et al., 2013; Gagl et al., 2011). First, PFE analysis must be based on the three-way relationship between the eye, the fixation point on the screen, and the eye-tracking camera. Leaving the camera out of the equation leads to an ill-posed problem that does not permit a general solution. This is revealed when one compares data from different experimental layouts as was done here. Second, the foreshortening law is multiplicative in nature (Eq. 2). Therefore, division rather than subtraction is the appropriate corrective operation (Eq. 4), and the geometric rather than arithmetic mean is the appropriate measure of central tendency. Additive operations are justified only approximately for normalized data on the basis of the approximation log(1+*x*) ≈ *x* for small *x*. The third principle is the cosine relationship embedded in the geometric model (Eqs. 3 and 4) that grounds the PFE in the physics of the data acquisition process. With parameters that could have been fixed before any calibration data were collected, the geometric model accounted for over 82 % of the variance in these data. In agreement with the first two principles, the model specifies a three-way multiplicative relationship between the coordinates of the eye, fixation target, and camera. The same principles are widely used in computer vision (Forsyth and Ponce 2011).

### Comparison with previous research

Our PFE measurements are broadly consistent with published results that measured a single horizontal scan line using an artificial eye (Gagl et al. 2011), and a circular object pursuit task performed by human participants (Brisson et al. 2013). During the object pursuit task of Brisson et al. (2013), the pupil diameter systematically decreased as the object receded vertically and horizontally from the camera. The geometric model is consistent with this pattern of results. Gagl et al. (2011) reported a piecewise linear and quadratic function in their artificial eye measurements. Whereas our PFE maps exhibited a smooth quadratic trend, (Gagl et al. 2011) reported that their artificial eye measurements showed a sharp linear increase near the left edge of the screen that then decreased quadratically as the gaze position moved rightward across the display. The quadratic pattern is consistent with the geometric model and predicts that the camera lens was on the left side of their EyeLink 1000 since the right eye position was measured. We applied our Eqs. 3 and 4 to a reconstruction of the layout of Gagl et al. (2011) based on their Method description and the EyeLink specifications (SR Research 2010, p. 7). The geometric model reproduced the qualitative nonmonotonic shape of their data, although our reconstruction located the peak close to the center of the display rather than at the left edge. The abrupt drop in recorded pupil size near the left edge is anomalous and does not fit the geometric-model predictions.

One plausible explanation for the different PFE function obtained by Gagl et al. (2011) is that they used a tower mount EyeLink 1000, whereas we used a desktop EyeLink 1000. Unlike the latter, the tower mount system does not record a direct image of the eye. Instead, the camera is mounted above the observer and records a reflected image of the eye from an angled infrared hot mirror (SR Research 2010, p. 7). ^{3} The infrared mirror reflects infrared light but is transparent to visible light, allowing participants to perform tasks on screen that require them to reach in front of them without occluding the camera’s image of the eye. The data of Gagl et al. (2011) suggest that the addition of this angled infrared mirror alters the PFE surface in ways that are not well captured by our simple geometric model. The presence of an adjustable mirror violates the model’s assumption that the eye-to-camera axis coincides with the optical axis of the camera. The unknown optical properties of the mirror introduce additional complications. The geometric model is best suited to eye-tracking systems in which the camera records a direct image of the eye. Researchers that use eye-tracking systems with intermediary mirrors are advised to map the PFE surface of their mirrored system explicitly using an artificial eye. The model probably can be modified to account for the mirror by “unfolding” the optical path, but this should be tested on explicit calibration data.

### Recommendations

One key objective of the current article is to provide the research community with a technique to correct for PFE in a manner that does not sacrifice experimental flexibility. The present methodology allows for unprecedentedly accurate PFE correction while preserving the freedom to study tasks such as reading or visual search that involve free viewing of the display. Both the parameter-free and the optimized geometric model reduced PFE enough that cognitive effects >2 % should be detectable. One key advantage of the parameter-free geometric correction is that it does not require the researcher to collect any artificial-eye measurements. One weakness is that it does not account for the effects of the camera optics that alter the effective geometric layout. The optimized model virtually eliminates PFE by accounting for all sources of error, but requires calibration data to constrain the model parameters. For researchers that wish to use the optimized geometric model and use the EyeLink 1000, Table 3 lists the best-fitting parameters for the three layouts in our study. For researchers that use other eye trackers and/or layouts, the detailed description of our artificial eye and socket apparatus can help them calibrate their respective systems. Desktop or monitor mounted trackers (e.g., Tobii T60 and T120, Tobii Technology, 2011; Tobii X60 and X120, Tobii Technology, 2008; SMI RED series, SensoMotoric Instruments, 2009) that record direct images of the eye are more likely to correspond well to the geometric model. Eye-tracking systems that do not directly image the eye but rely on reflected infrared images of the eye (e.g., tower mount EyeLink 1000 and SMI iView X Hi-Speed, SensoMotoric Instruments, 2009) are more likely to deviate from the geometric model due to the additional optics of the intermediary infrared mirror (e.g., Gagl et al. (2011)). The geometric relationships formalized in Eqs. 2–4 apply to all camera-based trackers, but the proprietary post-processing software of specific manufacturers may render these equations unsuitable for PFE correction purposes. ^{4} Finally, some of the simplifying assumptions used in the derivation of these equations will have to be re-examined for head-mounted trackers that put the camera much closer to the eye and thus the diameter of the eyeball is no longer negligible. These assumptions were made explicit in the Model section above.

The magnitude of the pupil foreshortening error depends greatly on the geometric configuration of the screen and camera relative to the eye (Fig. 3). The following general recommendations can be used to inform the choice of layout. Putting the display far from the participant tends to reduce the range of oblique angles *θ* and hence the PFE. On the other hand, it increases the error in estimating the fixation location. This creates a trade-off between pupillometric accuracy and eye-tracking accuracy because the latter deteriorates when the screen is far from the participant. An elegant solution is possible when the research question requires that the stimuli appear at multiple locations but do not constrain the spatial relationship among these locations. Our recommendation for such situations is to spread the stimuli along an arc that maintains a constant oblique angle *θ* with the camera-to-eye axis. The coordinates of these locations can be calculated from Eq. 3 under the constraint cos*θ*(*x*, *y*)=*k* = *c**o**n**s**t* for a suitably chosen *k*. The geometric model predicts identical PFE for all points along an arc of this type. Analogous circular contours of equal PFE are prominent in the calibration data (Fig. 5). Another general recommendation is to place the camera as close to the eye-to-stimulus line of sight as the camera dimensions allow. Such placement makes cos*θ* as close to 1 as possible. Recording monocularly from the eye closest to the camera is predicted to provide modest (≈2 %) reductions in PFE relative to the more distant eye.

To facilitate effective PFE correction, the following best practices are recommended for the data collection phase: First, we strongly discourage the participation of observers whose eyelids partially occlude their pupils as well as participants that wear glasses or contact lenses, as these can each affect the PFE in unpredictable ways. For desktop or screen-mounted eye-tracking systems, a chin-and-forehead rest should be used to keep the eye in a constant position. When possible, the rest height should be kept constant across participants. In our lab, we adjust the height of the chair to accommodate participants of different stature, leaving the chin-and-forehead rest fixed at all times. Thus, a common geometric model can be applied to the entire data set during post-processing. If the chin-and-forehead rest height must be adjusted on an individual basis, height measurements should be logged for each participant and an individualized geometric model should be applied. Finally, regardless of whether the proprietary image-based eye-tracking software uses pixel counting or ellipse fitting algorithms to estimate pupil size, the pupil threshold parameter should be set at the beginning of the experimental session and not altered during the course of the session. Changing the threshold parameter in mid-session will result in large changes in recorded pupil size.

Last but not least, we offer guidelines for evaluating published pupillometric data for possible PFE artifacts. The main danger is the possibility for *systematic* pupil foreshortening errors. Nonsystematic errors merely increase the measurement noise and cannot alter the substantive conclusions of an experiment, even though they are still undesirable because they can cause type II errors. Systematic PFE, however, is very dangerous because it can cause type I errors and cannot be mitigated through averaging. For example, a systematic PFE confound would arise if all emotional stimuli were presented at one location, whereas all neutral stimuli at another. Such obvious cases of bad experimental design would be screened out during the peer-review process, but there are subtler cases that are very easy to miss. Consider the following example based on an experiment from our lab. We presented all stimuli at a single location and analyzed only trials in which the participants maintained fixation on the stimulus, as instructed. Following the widespread practice in the literature, we believed that we had thereby eliminated any possibility for PFE. We were wrong, due to a subtle interaction between certain features of our experimental procedure and certain regularities in the participants’ behavior. It is instructive to consider this case in some detail. Our procedure included a free viewing period from the stimulus offset until the end of the trial and through the inter-trial interval. Such free viewing periods are very common in eye-tracking research because it is very uncomfortable to maintain constant fixation for an extended period of time. Also, mindful of the time lag between the stimulus and the task-evoked pupillary response (Partala and Surakka 2003; Murphy et al. 2011; Hayes and Petrov 2015), we analyzed the pupillary data from the entire trial, including the free period. This is also standard practice, but it allowed systematic PFE to creep in. The problem was that the participants’ pattern of free viewing depended systematically on the preceding stimulus. After an easy stimulus, their eyes roamed the screen more or less at random. After a difficult stimulus, however, the participants often checked the bonus score that was displayed at the top of the screen at all times. Apparently, they were less confident in their responses on difficult trials and tended to look at the accuracy-contingent score to reassure themselves. This behavior produced a preponderance of fixations near the top of the screen on difficult trials, which in turn introduced a systematic PFE masquerading as a cognitive-load effect in (apparent) pupil diameter a few hundreds of milliseconds after the stimulus offset. This example illustrates that *systematic* PFE confounds can arise through subtle interactions among seemingly unrelated features of the experimental setup. Free viewing periods are particularly vulnerable to PFE artifacts because the participants can introduce unexpected regularities through their patterns of free looking. Therefore, presenting the stimuli at a constant location (or at counterbalanced locations) is guaranteed to protect against PFE only if all subsequent pupillometric analyses are based entirely on data collected while fixation was maintained within the controlled location(s).

Note that we do not claim that all published pupillometric research is infested with PFE artifacts and is thereby useless. The constant-fixation and counterbalancing methods are widely used and do mitigate the PFE, although the example above shows that they are not as foolproof, as it is commonly assumed. The stimuli in many experiments are presented near the center of the screen where the PFE is smaller compared to the far periphery. Even imperfect measurements can provide a wealth of evidence that is relevant and replicable. Still, at best the PFE erodes the statistical power and complicates the interpretation, and at worst it can introduce spurious regularities and/or change the magnitude of the real effects. The main purpose of this article is not to throw a dark shadow on the pupillometry literature, but to provide a method for correcting the PFE and encourage other researchers to adopt it in their future work.

### Limitations

One potential limitation of our study is that we do not explicitly model the refracting effect of the cornea—the transparent curved layers in front of the iris (Atchison and Smith 2000). The image recorded by video-based eye trackers measures the *entrance pupil*—the aperture stop of the iris as viewed through the refractive surfaces of the cornea. The refractive power of the cornea is considerable; it is approximately two times stronger than that of the lens of a relaxed eye (Atchison and Smith 2000). The absolute magnification provided by the cornea does not affect the relative PFE. Rather, our concern is about corneal *anisotropy*. If different segments of the cornea have different refractive indices, the pupil diameter can appear to change when the eye rotates, turning different segments towards the camera. There are empirical data that bear on this issue (e.g., Spring & Stiles, 1948; Jay, 1962; Jennings & Charman, 1978). These studies photograph human eyes from a range of oblique angles, fit ellipses to the pupil images, and estimate the foreshortening coefficient as a function of the angle. A recent article (Mathur et al. 2013) re-analyzed the data from six published studies plus sophisticated new measurements. The overall conclusion was that, “Off-axis pupil shape is well described by a cosine function that is both decentered by a few degrees [towards the temporal side] and flatter by about 12 % than the cosine of the viewing angle”. (Mathur et al. 2013, p. 7).

*θ*in our Eq. 4, yielding:

*θ*(

*x*,

*y*) is determined from Eq. 3, in degrees. Note that the data of Mathur et al. (2013), as well as the earlier data re-analyzed therein (Spring and Stiles1948, etc. Jay 1962; Jennings and Charman 1978), were all collected along the horizontal equatorial line. One area for future research would be to systematically map the apparent pupil foreshortening across the entire visual field in participants with dilated eyes, and compare the human eye data to our artificial eye data to quantify the effect of corneal refraction.

### Conclusions

Pupil foreshortening error is a potentially large confound that should be taken into account prior to interpreting pupillary data from image-based eye trackers. We introduced an artificial eye and socket model and systematically mapped the PFE across the horizontal and vertical visual field in three separate experimental layouts. In light of previous cognitive pupillometric research, our data indicate that the PFE can be larger than many cognitive pupillometric effects. The PFE is not currently corrected by popular commercial image-based eye-tracking systems. We formulated a simple parameter-free geometrical model that reduced the deviation of the PFE by 82.5 % and an optimized model that reduced it by 97.5 %. Thus, very accurate PFE correction is possible and the corrected pupillometric data have the precision necessary to measure typical cognitive effects without limiting the types of tasks that can be studied by researchers.

In Fig. 3, these are the hypotenuses of the triangle with vertical side *C*_{y} = 310 mm giving the height of the eye relative to the camera and with horizontal side *C*_{z} = 495, 525, and 625 mm, respectively.

We use the paraxial approximation tan*ϕ* ≈ sin*ϕ* ≈*ϕ* for small angles measured in radians (Atchison and Smith 2000).

The typical camera-to-eye distance in the tower mount system is 380 mm, which is comparable to our near layout (425 mm).

We urge the eye-tracking manufacturers to provide clearer descriptions of the broad type of algorithms used to estimate pupil size (e.g., pixel counting vs. ellipse fitting) and the scale type (e.g., interval vs. ratio) of the resulting measurements. We understand the motivation to keep the detailed algorithms proprietary, but the manual should provide enough information to allow the user community to interpret the data.

## Acknowledgments

This research was supported by the National Institutes of Health National Eye Institute Grant R21 EY022745.