Researchers studying infant behavior have traditionally used methods such as habituation/dishabituation to make inferences about infant cognition (Fantz, 1964). These experiments assess infants’ gross orienting behavior in controlled experimental paradigms by presenting the same stimulus repeatedly across a number of trials and measuring the durations of infants’ looks toward that stimulus on a time scale of seconds. These methods are useful as a way of assessing whether infants perceive that a stimulus has changed (for example, whether they can discern the difference between seven colored circles and eight); measuring the rate of change of looks over time is also useful as an index of infants’ speed of learning (Colombo & Mitchell, 2009). However, the restrictions of the paradigm means that its utility as an assessment of infants’ attention deployment in spontaneous, unconstrained settings may be limited (see Aslin, 2007). The basic experimental paradigm, in which a stimulus is presented repeatedly across a number of separate but immediately contiguous trials, is one that is seldom if ever encountered in the real world.
Researchers wishing to study infants’ spontaneous attention deployment in real-world settings have traditionally done this by video taping infants playing in naturalistic contexts and by hand-coding their direction of gaze post hoc (e.g., Choudhury & Gorman, 2000; Kannass & Oakes, 2008; Swettenham et al., 1998). These techniques have also yielded a number of vital insights into infant cognition—in particular, toward infants’ spontaneous orienting and learning behavior in social settings (e.g., Carpenter, Nagell, & Tomasello, 1998; Mundy & Newell, 2007). However, they also have limitations. Hand-coding infants’ direction of gaze from a video has a relatively low spatial resolution, and temporal resolution is also low: Although resolutions as high as 50 Hz can be obtained using video coding (Elsabbagh et al., 2009), this coding is extremely time consuming. It can take up to 5 h for one researcher to code 10 min of video, which limits the amount of data that can be processed using these methods. A more typical temporal resolution for video coding is 1 Hz (Kannass & Oakes, 2008; Ruff & Capozzoli, 2003; Wass, 2011; Wass, Porayska-Pomsta, & Johnson, 2011), although resolutions as low as 0.2 Hz (i.e., one sample every 5 s) are also sometimes reported. Because it is performed by humans, video coding is also more error prone.
Given the limitations of traditional methods, the advent of eyetrackers has brought a number of changes to the study of infant cognition (Aslin, 2012; Gredebäck, Johnson, & van Hofsten, 2010; Morgante, Zolfaghari, & Johnson, 2012; Oakes, 2011). As a noninvasive technique, eye tracking offers the potential to study infants’ spontaneous attention deployment in unconstrained, naturalistic settings. Relative to video coding, the advantage offered by eyetrackers is that the spatial resolution is much higher (typically ~1˚ of visual angle), as is the temporal resolution (typically, 50–500 Hz). Furthermore, the data processing can be performed automatically, meaning that there is effectively no limit on the volume of data that can be processed. This increased temporal and spatial resolution offered by eyetracker data opens up the possibility of analyzing in detail the subsecond correlates of attentional allocation—namely, how attention is apportioned through individual fixations and saccades.
When attending to a visual array, such as a natural visual scene or a sparse screen-based display, we spontaneously manifest a sequence of eye movements in order to ensure that light from objects of interest is projected onto the most sensitive part of the retina, the fovea (Holmqvist et al., 2011; Land & Tatler, 2009). When our eyes are stable (during a fixation), visual processing and encoding in working memory occurs. Fixations are separated by rapid, ballistic eye movements (saccades), during which visual sensitivity is suppressed to avoid the perception of blur as the image rapidly sweeps across the retina (Matin, 1974).
Within the adult literature, research has suggested that bottom-up visual features of scenes such as edges and motion (Itti & Koch, 2001), luminance (Loftus, 1985), or blur (Mannan, Ruddock, & Wooding, 1995) can influence fixation duration, as well as top-down factors such as viewing task and personal preference (Henderson, Weeks, & Hollingworth, 1999; Yarbus, 1967; see also Nuthmann, Smith, Engbert, & Henderson, 2010; Tatler & Vincent, 2008). Research has also suggested the existence of an internal stochastic timer mechanism that triggers saccades irrespective of immediate processing (Engbert, Longtin, & Kliegl, 2002; Henderson & Smith, 2009).
Relatively less research has examined fixation durations during spontaneous orienting in infants. When infants of 1–2 months examine static visual stimuli, they tend to view each stimulus in a series of long fixations that are located close together (Bronson, 1990). By 3–4 months, however, they show a more controlled, strategic method for scanning static stimuli, with a greater proportion of shorter (<500 ms) fixations (Bronson, 1994; see also Hunnius Geuze, & van Geert, 2006). Bronson (1994) also reported that fixation durations in 6-week-old infants are relatively more influenced by whether the fixation falls on a stimulus contour. The change in orienting style is thought to be mediated by a reduction in the early difficulties that infants encounter with disengaging their attention—known as “sticky fixation” or “obligatory attention” (Hood & Atkinson, 1993; see also Atkinson, 2000; Hunnius, 2007; Johnson, 1993, 2010; Johnson, Posner, & Rothbart, 1991). By 4 months, however, problems with disengaging from static stimuli have largely dissipated, although the problem of “sticky fixation” may be more long-lasting with dynamic stimuli (Bronson, 1994; Hood & Atkinson, 1993). Bronson (1990) examined changes in infants’ scanning to geometric patterns across the 2- to 14-week period. He found that as infants grew older, they became increasingly disposed to scan between different stimulus features while viewing static stimuli. When the stimulus was moving, however, the infants’ scanning characteristics reverted to those typically found at younger ages, suggesting that “sticky fixation” behaviors may persist longer into development for dynamic than for static stimuli.
A substantial body of research with adults has pointed to the validity and reliability of fixation durations as an index of online cognitive processing (see, e.g., Nuthmann et al., 2010; Rayner, 1998). Within the infant literature, however, a number of important research questions remain unaddressed. The degree to which fixation durations are influenced by endogenous versus exogenous factors in infancy, the degree to which differences in fixation duration relate to individual differences on other cognitive measures, and the degree to which fixation durations can be a marker for early disrupted development are all questions that remain to be explored.
Analyzing fixation duration – preexisting fixation-parsing algorithms
Most of the work described above with infants has used hand-coding techniques to analyze fixation durations. Bronson (1990, 1994) recorded infants’ gaze positions using an early corneal reflection-based device, replayed infants’ recorded position of gaze (POG) onto a rear projection screen post hoc, and identified fixations by hand. De Barbaro, Chiba, and Deak (2011) took close-up video footage of an infant’s eye and defined fixations as instances in which the eye remained static for at least 230 ms (seven frames at 30 fps). Although it can be done in a variety of ways, any type of hand-coding is extremely labor intensive, and both temporal and spatial resolution are lower than that offered by eyetrackers, suggesting the desirability of finding an automated solution.
Methods for recording eye movements have been around for over a hundred years (Wade & Tatler, 2005), but only with the relatively recent advent of high-speed infrared (IR) cameras and fast computers has eye tracking become noninvasive enough to be used with infants. Most remote, video-based eyetrackers operate by illuminating the user with IR light and using computer vision techniques to identify either a dark pupil (created by off-camera-axis illumination) or a bright pupil ("red eye effect"; caused by on-camera-axis illumination) (for a more detailed summary, see Holmqvist et al., 2011). The IR illuminators also create bright glints off the user's cornea. By triangulating the movement of the pupil center relative to these glints when the user is looking at five to nine calibration points on a screen, the eye-tracking software is able to build a 2-D model of the user's eye movements relative to the screen. Once the eye model has been built, the tracker can identify the location of a user's gaze on the screen in real time. Eyetrackers vary in their sampling speed and spatial accuracy, in whether they are binocular or monocular, in whether they require the user's head to be stabilized or allow head movement within a tracking volume, and in the complexity of the eye model (2-D or 3-D), but the general principles of IR pupil and corneal reflection tracking are similar across systems (Holmqvist et al., 2011).
The raw gaze data returned by an eyetracker include periods during which the eyes are relatively stable and visual encoding occurs (fixations), periods when the velocity of the gaze is high (saccadic eye movements), periods during which moving objects are tracked relative to the viewer (smooth pursuits), periods during which the viewer moves in depth (vergence), and periods when gaze is lost due to blinks. There are three standard measures used to separate fixations from other eye movement events: dispersal, velocity, and acceleration (for more detailed reviews, see Duchowski, 2007; Nyström & Holmqvist. 2010; Wade, Tatler, & Heller, 2003). Dispersal is defined as the distance (expressed as pixels or degrees of visual angle) in the POG reported by the eyetracker between samples; velocity and acceleration are differentials derived from the POG.
It is traditional to draw a distinction between dispersal- and velocity-based algorithms (e.g., Blignaut, 2009; Holmqvist et al., 2011; Nyström & Holmqvist, 2010; Shic, Chawarska, & Scassellati, 2008, 2009; van der Lans, Wedel, & Pieters, 2011). Dispersal-based algorithms search for periods in which the reported POG remains below a displacement threshold that is often user-defined. Thus, for example, they identify a period of raw gaze data as belonging to a fixation by starting with a window size of the minimum fixation duration (e.g., 80 ms) and expanding it until the average displacement of the eyes during the window is greater than the displacement threshold (e.g., 0.5°). Periods not identified as fixations using this method are assumed to be either saccades or periods of lost data. Velocity-based algorithms, in contrast, search for saccades (i.e., instances in which the rate of change of POG surpasses a threshold) and labels periods between saccades as fixations (Blignaut, 2009; Nyström & Holmqvist, 2010). Velocity-based algorithms can either use one fixed velocity criterion (e.g., 30°/s; SR Research; Eyelink User Manual, 2005) or have a variable velocity threshold set according to the level of noise in the data (e.g., Behrens et al., 2010). The “ramping up” of the saccade velocity can also be identified via an acceleration threshold (e.g., 8,000°/s2; SR Research). Versions of all three algorithms are in widespread use in several commercial software packages provided by companies such as Applied Science Laboratories, SensoMotoric Instruments, Tobii Technology, and SR Research.
One issue that has received attention in the literature is that the fixations returned by traditional velocity- and dispersal-based algorithms appear highly sensitive to user-defined parameter settings, such as the level at which the dispersal or velocity threshold is set. Karsh and Breitenbach (1983) showed that varying the parameters of a fixation detection algorithm led to qualitative differences in the scan patterns that emerged (see also Widdel, 1984). Shic et al. (2008) showed that changing the user-definable parameters of a distance-dispersal algorithm can lead to different patterns of between-group or between-individual differences in mean fixation duration. Thus, for example, they found that when the dispersal setting of their fixation duration algorithm was set at below 3° (equivalent to a radius of 1.5°), mean fixation duration when viewing faces was greater than that for viewing color blocks, but that when the dispersal setting was set to above 3°, mean fixation duration for blocks was returned as being greater than that for faces. Shic et al. (2009) similarly reported that changing the dispersal levels could reverse the pattern of typically developing versus autism spectrum disorder group differences on fixation duration. It should, though, be remembered that 3° (corresponding to 125 pixels in a 1,024 × 768 pixel monitor at 60-cm viewing distance) is 20% higher than the maximum recommendation for dispersal fixation algorithms in adults (Blignaut, 2009).
Analyzing fixation duration – the special challenges posed by low-quality infant data
Gathering accurate eye movement recording from infants is significantly more difficult than with adults for a variety of reasons. Adults are complicit during eye-tracking recording. They can be persuaded to keep their head on a chinrest and to minimize blinks or head movements and can be expected to behave in line with the demands of the task. By comparison, infants and children below the age of about 4 are likely to be less compliant than adult viewers and are more likely to move during eye tracking. To compensate for this, head movement eyetrackers intended for use with infants allow for movement within a “head-box.” For example, the Tobii 1750 has a head-box 30 × 16 × 20 cm in diameter centered 60 cm from the screen (Tobii Eye Tracker User Manual, 2006). Analysis of the accuracy of the gaze data reveals that movement of the head toward the edges of this head-box, as well as changes in luminance caused by room lighting or the angle of the user's head in relation to light sources, can all significantly decrease accuracy (Tobii test specification, 2011). Additionally, the recording may also include periods during which gaze data are absent completely. This is due either to the head moving out of the head-box or to either the corneal reflection or the pupil image becoming unidentifiable for some other reason.
Figure 1 shows the results when a standard dispersal-based algorithm designed for processing adult data is applied to infant data. This algorithm is the fixation detection algorithm supplied with Clearview 2.7 (Tobii Eye Tracker User Manual, 2006) at the default settings (dispersal threshold of 30 pixels [corresponding to 0.9°] and a minimum temporal duration of 100 ms). As with all stimuli presented here, the viewing material was presented on a monitor at a 60-cm viewing distance subtending 24° × 29°.
The figure shows frequency distributions of fixation durations returned for 3 sample participants. All 3 participants were typically developing 6-month-old infants viewing a 200 s corpus of dynamic viewing material. For 1 participant (participant 2), a positively skewed normal distribution with a mode of 279 ms, a mean of 516 ms, and a median of 379 ms was returned. The shape of this distribution is broadly similar to the distributions of fixation durations described in the adult literature (e.g., Henderson, Chanceaux, & Smith, 2009; Nuthmann et al., 2010; Nyström & Holmqvist, 2010; Tatler & Vincent, 2008). For the other participants, however, a radically different distribution was returned. Participant 1, for example, shows an inverse exponential distribution with a mode of 100 ms, a mean of 217 ms, and a median of 160 ms; more than three times as many fixations are being identified in the 100- to 200 ms range than in the 200- to 300-ms range. Participant 3 is intermediate, with a mean of 354 ms, a mode of 100 ms, and a median of 279 ms. For 2 of the 3 participants, the mode is 100 ms, which is the shortest possible fixation duration (due to the minimum temporal duration criterion identified above, all fixations shorter than this are excluded). Furthermore, there are extremely large differences in the mean fixation durations being reported across the 3 participants (from 217 to 516 ms).
Although it is possible that such radically differing response distributions arise because of differences in infants’ spontaneous orienting behavior, we wished to assess the possibility that it may be artifactual in origin. In particular, we considered the possibility that infant eyetracker data may differ in some way from adult data, so that a fixation detection algorithm designed for adult data may be suboptimal for infant data. We were able to find no discussion of this issue in the literature.
In order to assess how data quality might differ between adult and infant eyetracker data, we first plotted samples of raw data. Figure 2 shows examples of data quality as 3 participants (typically developing 11-month-old infants) viewed an identical 8 s dynamic clip of multiple actors talking concurrently against a busy background. These data were recorded using a Tobii 1750, with stimuli presented in MATLAB using the Talk2Tobii toolbox.
Visual inspection of these data suggest considerable interindividual variations (also known as idiosyncrasies) in data quality, and also that there may be more than one separable dimension of data quality. The data from participant 1 appear to be of high quality relative to adult data (see e.g. Holmqvist et al., 2011). It is unbroken (i.e. continuous); sections where the eye is stationary (fixations) are clearly distinguishable from sections where the eye is transiting (saccades). Participants 2 and 3, however, show lower quality data. Participant 2 shows greater variance in reported POG between one sample and the next. We assume that this individual's eye is stable during fixation, because from the infant oculomotor literature, we were able to find no reports of such high-frequency (50 Hz) “jitter” in infant eye movement behavior (Atkinson, 2000; Bronson, 1990, 1994; Johnson, 2010; see also Holmqvist et al., 2011), and our own video analysis of infant eye movements during viewing confirmed this conclusion. We concluded, therefore, that this high-frequency variance arises from lower than normal precision in the reporting of the POG from the eyetracker —that is, a larger than normal random error arising from one iteration to the next between the participant’s actual POG and the POG as reported by the eyetracker.
Participant 3 shows a different problem. For this individual, the precision appears to be as high as that for participant 1. However, contact with the eyetracker appears to be “flickery” —that is, absent for periods of time of variable length. Because head movement is unconstrained, all infant eyetrackers may have increased problems of unreliability—that is, instances in which the infant is looking to the screen, but either the pupillary reflection or glint is unavailable or judged unreliable, leading to no POG being recorded. Visual inspection of the raw data obtained from infants suggested a high degree of variability in these periods of data loss. Contact is lost for variable periods of time ranging from a single iteration (20 ms) through to longer periods. In other examples than that shown here, contact may also occasionally be lost for one eye but not for the other.
In order to quantify these different aspects of data quality, we analyzed a corpus of 300 s of dynamic viewing material presented to 17 six-month-old infants, 16 twelve-month-old infants, and 16 adult viewers. Stimuli were presented on a Tobii 1750 eyetracker using ClearView 2.7. The viewing material presented was a collection of low-load dynamic clips of objects moving against a blank background (described in more detail in Dekker, Smith, Mital, & Karmiloff-Smith, 2012).
Flickery or unreliable contact with the eyetracker
We quantified this aspect of data quality in two ways. First, we reported on the total proportion of unavailable data across the whole trial. This was calculated as the proportion of data obtained as a function of the total amount of viewing material presented. Second, we calculated a different measure of flickery contact, which is the mean duration (in milliseconds) of each raw data segment. These two measures allow us to differentiate between (1) cases in which the participant showed unbroken looking data during the first half of a trial, followed by completely absent data for the second half of the trial, and (2) instances in which the infant was looking continuously throughout the trial but contact with the eyetracker was inconsistent throughout (as shown in sample 2; see Fig. 2).
Precision: Variance in reported position of gaze
Quantifying this aspect of data quality is challenging because simply calculating the between-sample variance (i.e. the average interiteration variability) leaves open the possibility that interindividual differences occur simply because one individual saccades more around the screen than does another (see the related discussion in Holmqvist et al., 2011, chap. 11). In order to quantify unreliability in reported POG, therefore, we performed the following calculation. First, we performed an initial coarse dispersal-based parsing to eliminate all saccades. (Given that most of the issues we have identified concern the false positive identification of saccades and that, therefore, most of the data segments identified by the dispersal-based filtering are still real fixations, albeit incomplete ones, we felt that this method was free of any systematic bias.) For each of the data segments remaining, we then calculated the average variance (i.e., the average Euclidean distance of each individual sample within each fixation from the central point of that fixation). This is expressed in degrees of visual angle. High variance indicates low precision—that is, inaccurate or inconsistent reporting in POG.
Univariate ANOVAs were conducted on these results that suggested that all three parameters vary significantly as a function of age. Proportion of unavailable data is higher in 6-month-olds (M = .33 [SD = .17]) and 12-month-olds (.31 [.16]) than in adults (.06 [.06]) and varies as a function of age, F(1, 46) = 19.52, p < .001. Mean duration of raw data fragments is lower in 6-month-olds (M = 2.3 s [SD = 1.8]) than in 12-month-olds (4.3 [3.5]) and adults (9.9 [9.7]) and varies significantly as a function of age, F(1, 46) = 6.68, p = .01. Variance in reported POG follows the opposite pattern and is lower in 6-month-olds (M = 0.18° [SD = 0.05°]) and 12-month-olds (0.18° [0.02°]) than in adults (0.25° [0.05°]) and varies as a function of age, F(1, 46) = 12.3, p < .001.
Bivariate correlations were also calculated to examine whether these different parameters of data quality intercorrelate with each other. Although proportion of unavailable data and flicker (i.e., mean duration of raw data fragments) correlated in each of the three separate samples we looked at [6 months, r(1, 16) = −.62, p(two-tailed) = .01; 12 months, r(1, 15) = −.59, p(two-tailed) = .02; adults, r(1, 15) = −.53, p(two-tailed) = .03], we found no consistent pattern of correlations between flicker and precision (i.e. variance in reported POG) [6 months, r(1, 17) = .11, p(two-tailed) = .68; 12 months, r(1, 16) = −.09, p(two-tailed) = .75; adults, r(1, 16) = −.39, p(two-tailed) = .15]. This suggests that flicker and precision are independent dimensions of data quality.
Evaluating how data quality relates to fixation duration in an infant data sample
Using the same sample, we then evaluated whether relationships could be identified between data quality and the fixation durations returned by standard dispersal-based algorithms. Fixation parsing was performed by the fixation detection algorithms supplied with Clearview 2.7 (Tobii Eye Tracker User Manual, 2006) at the default settings as described above. However, to the best of our knowledge, the points we include in this discussion should apply equally to all of the preexisting fixation detection algorithms discussed in the introductory section.
Figure 3a shows the relationship between fixation duration as measured using standard dispersal algorithms and mean duration of raw data segments. Across the three different age groups we examined consistently positive correlations were found, suggesting that longer data segments (i.e. less flickery data) were associated with longer fixation durations as assessed using standard dispersal algorithms. Nonparametric bivariate correlations analyses suggested that the observed relationships were significant for the 6-month group, (r(1, 17) = .66, p = .004), marginally non-significant for the 12-month group (r(1, 16) = .47, p = .07) and not significant for the adult group r(1, 15) = .19, p =.50). Figure 3b shows the relationship between fixation duration as measured using standard dispersal algorithms and variance in reported POG. Here, bivariate correlations suggested that the relationship was significant for the 6-month-old group, r(1, 16) = −.673, p(two-tailed) = .003, but not for the 12-month-old group, r(1, 14) = −.141, p(two-tailed) = .626, or the adult group, r(1, 15) = −.469, p(two-tailed) = .078. These findings represent significant methodological confounds that substantially limit the interpretability of the results of standard dispersal-based fixation detection paradigms.
Comparing with hand-coded data
In order further to understand the relationship between data quality and performance of the standard dispersal-based algorithm, we compared the performance of the standard dispersal-based algorithm with the results of hand-identified fixations. Hand-coding of gaze data is sometimes performed in order to identify a “gold standard” for fixation detection—that is, a “true” parsing of gaze data into fixations and saccades with which the results of the automated processing can then be compared (Holmqvist et al., 2011; Munn, Stefano, & Pelz, 2008; Tatler, Gilchrist, & Land, 2005).
In order to do this we trained a novice coder (who was not one of the authors on the article and was naive as to expected outcomes) to identify fixations by hand, on the basis of a visual output of the raw gaze data returned by the eyetracker. The coder viewed the data in 8 s segments containing plots of the x-, y-coordinates and the velocity, with time on the x-axis. The coder was asked to identify fixations as segments in which the POG stayed static (i.e., deviated by <0.5º) for longer than 100 ms. The coder was instructed to ignore fixations in which contact was lost either during the fixation or during the saccades before and after. The coder was also instructed to record only fixations in which the saccades that marked the start and end of the fixation were both genuine (i.e. in which the saccade (period of high velocity) was clearly distinguishable from the fixations (periods of low velocity) before and after). These were distinguished from “false saccades” (in which the period of high velocity movement was not clearly distinguishable from the periods of low velocity movement before and after). Sections of data that showed these “false saccades” were excluded from the analysis. The start times and end times of fixations that were considered valid were recorded by the coder to the nearest 20 ms.
This hand-coding was conducted on a sample of data from 5 typically developing 11-month-old infants watching 150 s of mixed static and dynamic viewing material consisting of pictures and video clips of faces and objects.
Figures 4a and b show reliability—that is, the proportion of agreement between the results of the automatic coding and hand-coding. Figure 4a shows a strong relationship between degree of flicker in an individual’s data and the proportion of agreement between automatic and hand-coding. For the individual with less flickery (i.e., high-quality) data (mean raw data fragment duration, 3.8 s), we found interrater agreement of .83 (corresponding to Cohen’s κ of 0.66), whereas lower quality (i.e., more flickery) data shows an interrater agreement of .60 (Cohen’s κ = 0.20). The observed relationship between raw data fragment duration and the reliability of the standard dispersal algorithm was very strong, r(1, 4) = .98, p = .003. Figure 4b shows that for high-precision data (i.e., low variance in the reported POG) agreement is high between hand- and automatic coding, but for low-precision data, the reliability of the algorithm is poorer. This relationship is weaker than that between flickeriness and reliability, r(1, 4) = −.70, p = .19.
The relationships documented above leave open, however, the question of whether for lower quality data, standard dispersal-based algorithms tend to under- or overestimate fixation durations, relative to hand-coding. Figure 4c shows a comparison between mean duration of raw data segments and fixation duration as parsed using standard dispersal-based algorithms. Although the results must be interpreted with caution due to the small sample size, the results of this figure are consistent with the relationship suggested by Fig. 3. For individuals with low-quality flickery data (i.e., short duration of raw data segments), the standard dispersal-based algorithm consistently underestimates fixation duration, relative to the hand-coding, whereas for individuals with higher-quality data (i.e., long raw data segments), the algorithm consistently overestimates fixation duration. There is thus a significant correlation between flickeriness and fixation duration for the results of standard dispersal-based fixation parsing, r(1, 4) = .90, p = .04 (more flickery data associated with shorter fixation duration) but no correlation between flickeriness and fixation duration for the hand-coding. Figure 4d shows a comparison between precision and fixation duration. A similar interaction appears to be present: For high-precision (i.e., low-variance) data, the dispersal-based parsing algorithm tends to overestimate fixation duration relative to the hand-coding, whereas for low-precision (i.e., high-variance) data, the dispersal-based parsing algorithms tend to underestimate fixation durations. Again, we identified a significant relationship between precision and fixation duration as parsed using the standard dispersal-based algorithm, r(1, 4) = −.90, p = .04 (lower precision data associated with shorter fixation duration), but no correlation between precision and fixation duration for the hand-coding.
A simulation to assess how data quality might affect performance on a standard dispersal algorithm
The analyses above suggest that when parsing is performed using standard dispersal-based algorithms, individuals for whom the raw data was more flickery tend to be returned as showing shorter fixation durations. They also suggest that individuals for whom the raw data showed lower precision in the reported POG are returned by the standard dispersal algorithm as showing shorter fixations.
What, though, are the mechanisms driving these observed relationships? In order better to understand this issue, we conducted a simulation in which a single sample of high-quality data (5 min of viewing data taken from a typically developing 11-month-old infant viewing a mixed dynamic/static viewing battery) was subjected to two simulations.
First, a flicker simulation was conducted to replicate the nondeterministic dropout observed in our data (see Figs. 5 and 6). This was implemented in MATLAB by reprocessing the data iteration by iteration; if data for the previous iteration had been present, the algorithm removed data with a 5% likelihood, and if data for the previous iteration had been absent, the algorithm removed data with a 25% likelihood. This process was performed independently for the two eyes. Second, a precision stimulation was conducted to replicate the problems of unreliable reporting in POG. Again, this was implemented to replicate the nondeterministic nature of the data corruption we found in our data, in which we often encountered brief “bursts” of noise. In our simulation, a burst of noise was triggered with a 2.5% likelihood, and once a burst was initiated, it was continued with a 20% likelihood. During a noise burst, Gaussian noise (±0.1 of screen proportion, corresponding to 2.4°) was added to the data. The effect of these simulations was then tested on a re-creation of a standard dispersal algorithm that we programmed (because the preexisting manufacturer-supplied fixation detection algorithms do not allow for the processing and reprocessing of the same data set). This algorithm was programmed exactly to follow the analysis protocol implemented in Clearview 2.7 (Tobii Eye Tracker User Manual, 2006).
Why does flickery or unreliable contact with the eyetracker pose a potential challenge to the accurate identification of fixations? Because most of the preexisting fixation detection algorithms treat an instance in which contact with the eyetracker was lost during a fixation as signaling the end of that fixation. Flickery data may, therefore, lead to the storing of multiple incomplete fixations—that is, fixations that are stored as multiple separate fixations, whereas in fact they are part of one long fixation. This is indeed the effect that we appear to observe in our flicker simulation (shown in Fig. 5b). The frequency distribution of fixations detected shows an increase in the proportion of short (<200 ms) fixations being identified (Fig. 5d). The effect on mean fixation duration of the flicker simulation is also substantial (Fig. 5f): Mean fixation duration is 471 ms for the clean data, 240 ms for the flicker simulation, and 367 ms for the low precision simulation. Figure 6a shows an example of data obtained from a typically developing 11-month-old infant in which a similar effect appears to have occurred.
Why is low precision a potential challenge in obtaining accurate estimations of fixation durations? Most of the commonly available fixation duration parsing algorithms operate either via a displacement threshold, according to which a fixation is treated as ending following a change in POG above a certain, often user-defined, displacement threshold (see Blignaut, 2009; Holmqvist et al., 2011), or via a velocity threshold, according to which a fixation is treated as ending following an increase in velocity above a certain velocity threshold. (If data are obtained at a constant rate, these two criteria are the same). Figure 5c shows the effect of the low-precision simulation on performance of the standard dispersal algorithm. The velocity plot shows that the noise bursts lead to an increase in the velocity (i.e., the rate of change of reported POG from one sample to the next) that exceeds the saccade detection threshold, leading to a saccade being incorrectly identified. Thus, multiple incomplete fixations are stored, instead of one long fixation. Figure 5e shows that the number of fixations being identified is higher in the low-precision simulation. Figure 5d shows that an increased number of very short fixations are also being stored in this simulation, which is associated with shorter mean fixation duration (Fig. 5f). This effect is less strong than in the flicker simulation, although it may have increased if the amplitude of the noise added to the data was increased. Figure 6b shows a sample of eyetracker data in which a similar effect has occurred.
These analyses indicate that when parsing is performed using standard dispersal-based algorithms, individuals for whom the raw data were more flickery (i.e., there was a greater number of instances in which the eyetracker was unable to detect where they were looking) tend to be returned as showing shorter fixation durations than when the data is analyzed using standard dispersal algorithms (see Figs. 3 and 4). They also show that individuals for whom the raw data showed lower precision (i.e., higher variance in the reported POG, presumably because of a larger than normal random error between the participant’s actual POG and the POG as reported by the eyetracker) are returned by the standard dispersal algorithm as showing shorter fixations (see Figs. 3 and 4). The fact that flicker and precision do not correlate with each other suggests that they are two independent dimensions of data quality that can separately influence the accuracy of fixation parsing.
The simulation we conducted has given insight into the mechanisms that may be driving these relationships (see Figs. 5 and 6). Flickery contact appears to influence fixation durations returned by standard dispersal algorithms because almost all fixation detection algorithms are set up to treat an instance in which contact is lost during a fixation as signaling the end of that fixation. Therefore, flickery contact is associated with multiple incomplete fixations being stored. Low-precision data appear to influence fixation duration as identified by standard dispersal algorithms because inconsistent reporting of the POG leads to bursts of high velocity, which can lead to the saccade detection threshold being fired erroneously. Both of these conclusions are supported by inspection of the raw and semiprocessed data sets (Fig. 6).
Both of these findings represent potentially serious and independent confounds that substantially limit the interpretability of fixation duration as assessed using standard dispersal-based algorithms. These findings have been based on a replication of the standard dispersal-based algorithm implemented in Clearview 2.7, but from our analysis of similar algorithms by other groups, we can find no reason why these same artifacts should not also affect the performance of other algorithms when processing low-quality data. These findings agree with those of other authors who have questioned the reliability of standard dispersal-based fixation detection algorithms (Blignaut, 2009; Camilli, Nacchia, Terenzi, & Di Nocera, 2008; Karsh & Breitenbach, 1983; Shic et al., 2008, 2009).