Flipping the stimulus: Effects on scanpath coherence?

Děchtěrenko, Filip; Lukavský, Jiří; Holmqvist, Kenneth

doi:10.3758/s13428-016-0708-2

Flipping the stimulus: Effects on scanpath coherence?

Published: 02 March 2016

Volume 49, pages 382–393, (2017)
Cite this article

Download PDF

Behavior Research Methods Aims and scope Submit manuscript

Flipping the stimulus: Effects on scanpath coherence?

Download PDF

Filip Děchtěrenko^1,2,
Jiří Lukavský² &
Kenneth Holmqvist³

1223 Accesses
4 Citations
2 Altmetric
Explore all metrics

Abstract

In experiments investigating dynamic tasks, it is often useful to examine eye movement scan patterns. We can present trials repeatedly and compute within-subjects/conditions similarity in order to distinguish between signal and noise in gaze data. To avoid obvious repetitions of trials, filler trials must be added to the experimental protocol, resulting in long experiments. Alternatively, trials can be modified to reduce the chances that the participant will notice the repetition, while avoiding significant changes in the scan patterns. In tasks in which the stimuli can be geometrically transformed without any loss of meaning, flipping the stimuli around either of the axes represents a candidate modification. In this study, we examined whether flipping of stimulus object trajectories around the x- and y-axes resulted in comparable scan patterns in a multiple object tracking task. We developed two new strategies for the statistical comparison of similarity between two groups of scan patterns, and then tested those strategies on artificial data. Our results suggest that although the scan patterns in flipped trials differ significantly from those in the original trials, this difference is small (as little as a 13 % increase of overall distance). Therefore, researchers could use geometric transformations to test more complex hypotheses regarding scan pattern coherence while retaining the same duration for experiments.

A comparison of scanpath comparison methods

Article 25 December 2014

Detecting single-target changes in multiple object tracking: The case of peripheral vision

Article 04 March 2016

Gaze position lagging behind scene content in multiple object tracking: Evidence from forward and backward presentations

Article 26 July 2016

In natural tasks, humans direct their eyes toward objects of interest to make the best use of the high acuity vision in the fovea. Many factors influence eye movements. However, it is possible to get consistent scan patterns and, therefore, to separate signal from noise if the stimuli are presented repeatedly. However, repetition introduces the risk that the participants will recognize the repeated scenes and alter their gaze behavior. In many tasks, repeated presentation of the same scene is undesirable, because some subjects will recognize the scene and examine previously unexplored areas, encode additional details, and/or find targets more efficiently on the basis of their previous experience. Dorr, Martinetz, Gegenfurtner, and Barth (2010) compared the variability of eye movements during free viewing of dynamic natural scenes and found that whereas between-subjects coherence was maximal at the first presentation of the movie, it decreased during later presentations throughout the day. Dorr et al. suggested that this decrease in the between-subjects coherence was caused by a rising influence of individual viewing strategies. Gaze patterns are even affected when the stimulus is not explicitly recognized. In studies on contextual cueing, Chun and Jiang (1998) showed that people implicitly learn the positions of targets over time, which then results in shorter response times.

One way to decrease the chance that participants will recognize the repetition is to increase the number of filler trials. The disadvantage of this approach is an increase of the experiment’s duration, which can have a negative impact on the performance of the participants. Another possibility is to modify the stimuli in a way that will keep scan patterns similar to the unmodified version. Geometrical transformations of the stimuli are one way to achieve this goal. The use of geometrical transformations is limited with structured scenes such as movie clips, in which the orientation is meaningful; however, scenes without an inherent structure offer a wider range of possibilities.

In this study, we utilized Multiple Object Tracking (MOT; Pylyshyn & Storm, 1988). MOT is an experimental paradigm in which subjects must track several moving target objects among other moving distractor objects. It has been found that when MOT trials are presented repeatedly, participants fail to notice the repetition (Lukavský, 2013; Ogawa, Watanabe, & Yagi, 2009). This task is convenient for two reasons: First, because the stimuli in MOT are simple displays with moving objects/dots, they can be easily transformed without any change in meaning. Second, MOT is a dynamic task, and people need to sustain their attention over the course of the whole trial. Thus, their eye movements are likely to be connected to the scene properties or object positions rather than to other factors (e.g., guidance or search strategies). Additionally, examining human eye movement strategies in a situation of distributed attention can be useful for understanding behavior in many other everyday tasks.

In a dynamic task like MOT, eye movement data can be easily represented as scan patterns (see Fig. 1 below). Examining scan patterns instead of saccades or fixations on areas of interest is useful, since it can account for similarities in smooth pursuit or cases in which people fixate empty space between objects. Similarly, scan pattern coherence has been used to examine human eye movements while viewing movies (Dorr et al., 2010).

Here, we asked whether human observers would transform spatio-temporal scan patterns in a way that corresponds to geometric transformations that were applied to MOT trials by the experimenter. If the two transformations are similar, then the data from both the transformed and the original MOT trials could be pooled. One of the possible transformations would be to flip the stimuli around axes. This operation should be plausible if the eye movements were symmetrical with respect to the left and right hemifields (or the upper and lower hemifields, depending on the axis of a transformation).

Behaviorally, the existence of a left–right asymmetry is unclear. Unlike the upper–lower asymmetry (Levine & McAnany, 2005), a left–right asymmetry is rarely reported in healthy subjects (Greene, Brown, & Dauphin, 2014; Petrov & Meleshkevich, 2011; cf. Corballis, Funnell, & Gazzaniga, 2002). In the case of MOT, if people track a group of targets as a single object (Yantis, 1992), we should expect no left–right difference, since the optimal viewing position is found at the center of the perceived object, with no lateral bias (Foulsham & Kingstone, 2010). Conversely, others have reported a preference for early fixations to the left part of the scene (Dickinson & Intraub, 2009; Foulsham, Gray, Nasiopoulos, & Kingstone; 2013; Nuthmann & Matthias, 2014; Ossandón, Onat, & König, 2014). This bias is often discussed with respect to “pseudoneglect”: a leftward bias in the line-bisection task in healthy humans (Bowers & Heilman, 1980; Jewell & McCourt; 2000).

When viewing natural scenes, people make more horizontal than vertical saccades and show no left–right asymmetry in these saccades (Foulsham, Kingstone, & Underwood, 2008). Performance in an antisaccade task has shown that people are better prepared to make rightward saccades, which are performed faster and with fewer errors (Evdokimidis et al., 2002; Tatler & Hutton, 2007). This asymmetry may be a result of a learned behavior, since it is consistent with the findings of Abed (1991) comparing the directions of saccades when looking at simple dot patterns in Western, Middle Eastern, and East Asian participants.

All of the abovementioned studies used static stimuli. The left–right asymmetries in fixation patterns happen within a few fixations at the beginning of a trial and later disappear (Nuthmann & Matthias, 2014; Ossandón et al., 2014). In a dynamic task like MOT, it is an open question whether a left–right asymmetry would be observed. To our knowledge, there are currently no studies regarding the symmetry of scan patterns using dynamic stimuli.

To summarize the aims of our study: We wanted to test whether we could flip stimuli around the axes as a way of masking the repetition of trials. We chose the MOT paradigm because it has suitable properties for this goal. During preparation of the experiment, we found that few methods have been developed to test the statistical differences between groups of scan patterns. Any two scan patterns are almost certainly different from each other; the important question is how large that difference is, given the noise. Thus, for testing differences between two groups of scan patterns, the within-group coherence has to be computed first. The usual approach is to compute the average distance between all possible pairs of scan patterns. There are several pairwise comparison methods, such as the Levenshtein distance (Levenshtein, 1966), ScanMatch (Cristino, Mathôt, Theeuwes, & Gilchrist, 2010), and Multimatch (Dewhurst et al., 2012; Jarodzka, Holmqvist, & Nyström, 2010), that work with scan patterns represented using an event-based approach (Le Meur & Baccino, 2013). This representation is complicated in MOT, however, because of the prevalence of smooth pursuit, which is difficult to detect in eyetracking data. The alternative approach is to use the raw data, as in Dorr et al. (2010), and compute the distance using Normalized Scanpath Saliency (Peters, Iyer, Koch, & Itti, 2010), which works with a spatio-temporal fixation map (or three-dimensional saliency map).

Usually, a saliency map is computed for scan patterns collapsed over time and represents areas in the scene that participants fixated during the trial. There are two sets of methods for using saliency maps to compare scan patterns. First, both scan patterns can be expressed in the form of saliency maps and then compared to one another. Second, we can compute how well a scan pattern can be explained by a given saliency map. Both approaches are similar, and most of the methods can be used for both representations. When both scan patterns are represented as saliency maps, there are several methods for computing similarity, such as ROC analysis (Tatler, Baddeley, & Gilchrist, 2005), Kullback–Leibler distance (Rajashekar, Cormack, & Bovik, 2004; Tatler et al., 2005), and correlation-based metrics (Jost, Ouerhani, von Wartburg, Müri, & Hügli, 2005; Toet, 2011). When only one scan pattern is represented as a saliency map, researchers can use Percentile metric (Peters & Itti, 2008) or Normalized Scanpath Saliency. ROC analysis and the Kullback–Leibler distance can be used in this case. For dynamic tasks, using the spatio-temporal fixation map is an interesting extension, because it keeps the time dimension in the analysis. Correlation-based metrics and Percentile metric can still be used for the spatio-temporal fixation maps. Metrics that work with the saliency map can compute within-group coherence by using the leave-one-out method, in which all of the scan patterns except one are summed into one saliency map, and then the distance is computed using one of the available metrics. Because a saliency map can be created from a single scan pattern, these methods can be also used for pairwise comparison.

With within-group coherence computed for each group, it is an open question how to test correctly whether the between-group variability is larger than the within-group variability—in other words, whether the scan patterns from one group are significantly different from the scan patterns in other group. Currently, only one method, that of Feusner and Lukoff (2008), addresses this situation, and it is only applicable using the pairwise comparison approach. Their method computes the distance for all pairs within each group and between groups using one metric selected a priori. It then tests the significance of the differences between the overall within-group distance and the overall between-group distance using permutation tests. The choice for the distance metric is dependent on the researcher’s task. Tang, Topczewski, Topczewski, and Pienta (2012) extended this method for scan patterns of unequal lengths.

In this study, we tested whether the scan patterns from MOT trials are similar to the scan patterns from the same trials flipped around the y-axis (Exp. 2) and around the x-axis (Exp. 3). Prior to this, we developed two additional strategies for comparing groups of scan patterns and tested their discrimination capabilities on simulated data (Simulation Experiment 1).

Experiment 1 (Simulation)

To determine which method for statistical group comparison could most accurately determine differences between groups, we conducted a simulation experiment with artificial scan patterns. The scan patterns were divided into two groups to achieve a ground truth classification. We manipulated the variability within groups relative to the intergroup distance and used three methods to determine their classification sensitivity. To summarize, the purpose of this experiment was to determine the relationship between methods for measuring statistical differences between groups of scan patterns.

Method

In this experiment, we used artificial trajectories similar to the scan patterns from behavioral data. Those artificial trajectories were then divided into two groups, and similarity between the groups was tested using three comparison strategies. In following subsections, we first describe the metric for evaluating the similarity of individual scan patterns (or groups of scan patterns). Then we describe three comparison strategies using this metric to evaluate the statistical significance of the differences. Finally, we describe the process of creating artificial scan patterns and using those patterns to evaluate the three analysis strategies.

In the following experiments, we will be working with scan patterns (vectors x, y, t). For both the artificial and behavioral scan patterns, the data were first binned into a 3-D spatio-temporal matrix with a bin size of 0.25° × 0.25° × 20 ms, in which the number in each bin represented how many gaze samples fell into that bin. The data were analyzed using the statistical program R (R Development Core Team, 2014).

Metrics for comparisons

In this study, we distinguished between comparison metrics (to determine the distance between scan patterns) and comparison strategies (to test the significance of a difference, described later). To evaluate the similarity of scan patterns or a group of scan patterns, we employed the correlation distance (CD) metric.

The CD (Jost et al., 2005; Le Meur, Le Callet, Barba, & Thoreau, 2006; Rajashekar, van der Linde, Bovik, & Cormack, 2008) can be used to evaluate the similarity of two saliency maps or to compare the similarity of a set of fixation points with a saliency map. The saliency map represents areas in the scene where participant fixated during the task. Those fixation points are usually smoothed by an isotropic bidimensional Gaussian function (Le Meur & Baccino, 2013). Due to smoothing, two scan patterns slightly shifted in one of the spatial coordinates are treated as similar. Because this approach does not take the temporal order of fixations into account, we extended the saliency maps into the 3-D variant, denoted as spatio-temporal fixation maps. Convolving scan patterns with a spatio-temporal Gaussian filter preserves identity for the time scale, as well. Therefore, identical scan patterns shifted in space or in time would be treated as very similar up to some degree, defined by the properties of the Gaussian filter. In our case, the Gaussian filter had the parameters σ _x = 1.2°, σ _y = 1.2°, σ _t = 26.25 ms, but as was shown by Lukavský (2013), similar results were obtained for filters with different parameters. We then computed the correlation between the scan patterns, preserving the temporal order of the fixations.

Since spatio-temporal fixation maps can consist of several scan patterns, we compared the distance between two groups of scan patterns. In that case, the parts of the spatio-temporal fixation map in which several scan patterns were similar would have higher values. The computation of the similarity of two scan patterns is shown in Fig. 1. More specifically, the CD metric is computed as follows: Each of two groups of scan patterns (if we calculate the distance between two individual scan patterns, each group can contain only one scan pattern) is convolved with a spatio-temporal Gaussian. Thus, we obtain two spatio-temporal fixation maps. The similarity between two scan patterns is preserved even if they are slightly shifted in the time coordinate. Using the Pearson correlation coefficient, the correlation between those two maps is computed, and the CD metric is 1 – r. Due to the nature of fixation maps, the correlation coefficient can occasionally be less than 0. Therefore, for negative correlation coefficients, we set the CD metric as 1. As a consequence, the values of the CD metric can range from 0 (absolute correspondence) to 1 (completely different trajectories). Lukavský (2013) showed that the most similar scan patterns are two patterns from the same subject and same trial; followed by scan patterns from different subjects and the same trial, and from the same subject on different trials; and the least similar were scan patterns from different subjects on different trials. On the scale of the CD metric, two random scan patterns would have CD values around .95, and two scan patterns from the same subject and repeated presentation of the same trial would have a CD value around .47. The main advantages of the CD metric are the limited range [0, +1] and intuitive evaluation of the results in comparison to other metrics, such as Normalized Scanpath Salience (Dorr et al., 2010) and Kullback–Leibler divergence (Rajashekar et al., 2004).

Comparison strategies

The CD was used to measure the distance between scan patterns or between groups of scan patterns. However, given the observed variability of the scan patterns, the statistical significance of a measured difference was uncertain. In the context of repeated presentation, we must discriminate whether the variability introduced by repeated presentation differs from variability from the experimental manipulation (e.g., flipping around the y-axis). For the following text, we will work with two groups of scan patterns, G₁ and G₂, each consisting of six scan patterns. We proposed three strategies for comparing groups of trajectories to achieve this goal.

The first strategy (subset comparison) compared the within-group variabilities for random subsets of the merged G₁ and G₂ groups. If the scan patterns from each group were similar, we should get the same within-group distance for subsets of scan patterns, irrespective of whether they were all selected from one group or were instead selected from both groups. Therefore, using this strategy, we randomly sampled a subset of scan patterns and measured their overall distance. Then we compared whether these distances differed when the scan patterns were selected from a single group or from both groups. The number of scan patterns forming each subset was preselected to allow for multiple possible samples within each group. Thus, we chose four out of six trials. Next, we defined a method to measure the mutual similarity within a group by computing the CD of each scan pattern with others. Finally, we compared these mutual similarities for all quadruples sampled solely from one category with mixed quadruples having half of the elements from each category. Specifically, we compared \( 2*\left(\begin{array}{c}6\\ {}4\end{array}\right)=30 \) quadruples for each of the G₁ and G₂ groups (G values), and \( \left(\begin{array}{c}6\\ {}2\end{array}\right)*\left(\begin{array}{c}6\\ {}2\end{array}\right)=225 \) mixed quadruples (M values). The difference between the G and M values was tested using a two-sample t-test.

The second strategy (pairwise comparison) computed the differences between the G₁ and G₂ groups using the permutation test introduced by Feusner and Lukoff (2008). This method computed differences between two groups of trajectories using pairwise comparisons and compared the distances of pairs from the same and from different categories. Note that we could have used an arbitrary distance metric for the comparison of two trajectories. For each possible assignment into one of two groups of equal sizes, the overall distance is defined as d* = d _betw – d _ingrp, where d _betw and d _ingrp denote the grand mean distances of pairs selected from different groups or from the same group, respectively. The value d* expressed the overall distance between two groups. Finally, we compared the d* values for the experimental condition assignment (G₁ and G₂ groups) with the distribution of d* for all other possible groupings. Because we had six scan patterns in both the G₁ and G₂ groups, we had a distribution of \( \left(\begin{array}{c}12\\ {}6\end{array}\right)=924 \) d* values. The gaze patterns from each groups were significantly different (i.e., the G₁ and G₂ groups were nonrandomly grouped), if the corresponding d* exceeded the 95 % quantile of the distribution.

The third strategy (groupwise comparison) combined both previous strategies. First, we computed summed spatio-temporal fixation maps for groups G₁ and G₂ and expressed the similarity of those two maps using the CD metric. We then applied a permutation test and computed the similarity for all assignments of all trajectories in the two groups, and again compared the similarity of groups G₁ and G₂ with the 95 % quantile of this distribution.

All three strategies are schematically captured in Fig. 2.

Artificial trajectories

We created artificial scan patterns to match parts of the gaze trajectories without the saccades from an MOT experiment (here reported as Exp. 2). To obtain such parts, we first identified the saccades from the recorded eye movement data using an algorithm from Nyström and Holmqvist (2010). We then selected only intersaccadic segments, which consisted of both fixations and smooth pursuit eye movements with a varying number of samples. For those samples, we computed the average distances between subsequent points in the scan patterns and the total scan pattern lengths. Next, we created artificial trajectories similar to smooth pursuit eye movements (similar with respect to those two respective measures) and used artificial data for evaluating the methods for testing significance. Although we wanted the trajectories to be similar to the observed trajectories, we believe that the exact algorithm and parameters are not crucial for our argument, given that the main purpose was to introduce trajectories with the amount of variability observed in the task. Accordingly, using trajectories of low variability might inflate the similarity measure over time. Artificial trajectories were created as random walks, in which each subsequent position was sampled from a bivariate normal distribution centered at the last position with covariance matrix α·I, in which α varied from .0005 to .005 (step size .0005) and I denoted the identity matrix. The initial position of an artificial scan pattern was varied as described in the Design section. We also varied the number of samples between each two generated values and interpolated the positions between them. The interpolation factor varied from 1 (no subsequent samples) to 5 (four interpolated samples between two generated samples), with step size .5. From all of the possible artificial trajectories, we selected those with high similarity to the segments of behavioral data previously identified. More specifically, we attempted to match the subsequent distances between the samples and the total scan pattern lengths. Values for α and the interpolation factor from the most similar artificial trajectories were used for generating the trajectories for the experiment. To obtain more robust results, we also used parameters for α and the interpolation factor, which resulted in more variable scan patterns, measured in the subsequent distances and scan pattern lengths.

A comparison of the artificial trajectories with the human gaze trajectories from Experiment 2 is presented in Table 1. To compute two descriptive measures of the behavioral scan patterns, we selected from the behavioral data only parts with 500 to 600 subsequent samples (corresponding to 2.0–2.4 s recorded with an EyeLink II at 250 Hz). We found 457 smooth pursuit parts of the desired length. Correspondingly, we modeled the artificial trajectories consisting of 500 gaze samples.

Table 1 Mean and standard deviations (SD in brackets) of two characteristics of generated data

Full size table

For the present experiment, we generated two types of trajectories with the parameters α = .001, interp = 1.5 (scan patterns with low variability) and α = .003, interp = 1 (scan patterns with high variability).

Design

For evaluating the discrimination properties of our strategies, we created the following setting. There were two groups of trajectories (G₁ and G₂), each of which consisted of six trajectories. The trajectories started on a separate circle with a 1° diameter, and the distance of the centers of both groups was kept constant (10°). The first sample in each trajectory from the G₁ group started on odd multiples of π/6, and the trajectories for the G₂ group started on even multiples of π/6, as depicted in Fig. 3. Then we created additional identical copies of the trajectories from both the G₁ and G₂ groups and moved all of the spatial coordinates of the trajectories in the direction of the arrows from 0° to 5° by a step size of 0.5° (therefore, the radii varied from 1° to 6°). Therefore, with the increasing radii in both the G₁ and G₂ groups, otherwise identical trajectories were becoming more distant from the trajectories in the starting group, and the distinction between groups G₁ and G₂ would become less evident. This setting was generated randomly 50 times. Similarity between the groups was evaluated using the three strategies. We report the distance for which strategies reached chance level for rejecting the null hypothesis as the ratio of the group diameter over the distance of the centers of the groups.

Results

As expected, the classification accuracy decreased with the increasing diameter of each group for all strategies. In general, the subset strategy exhibited the best precision, followed by the groupwise strategy, and finally the pairwise strategy. When circles on which the scan patterns were generated were separated by more than 3°, all three strategies discriminated two groups in 100 % of the cases. Similarly, for distances lower than 1°, all of the strategies failed to discriminate two groups in all cases. If we define the group variability as the maximum initial distance, the distances reported above mean that all strategies discriminated the groups at the distance of 143 % or more of the group variability, and failed at distances lower than 111 %.

For all three strategies, the data were fit with a cumulative Gaussian to obtain threshold values. For the less variable trajectories (α = .001), the subset strategy reached chance level when the ratio of the group radius over the distance between the centers reached .41; the groupwise strategy reached chance level when the ratio reached .40; and the pairwise strategy reached chance level at the .39 threshold (Fig. 4). For the more variable trajectories, the values were similar: .43 for subset, .41 for groupwise, and .38 for pairwise.

Discussion

All three strategies showed similar capabilities to discriminate between the scan patterns from two groups. The subset strategy nonetheless had better precision than the other two, and was selected for the following experiments. Similar results were found for more variable scan patterns, but the fitted function was less steep. Here we only checked the situation in which two groups differed in their initial positions. In general, scan patterns could differ in other properties, such as the general shape of a scan pattern, but for the basic assessment of the comparison strategies, we assumed that manipulating the initial positions would be sufficient.

Experiment 2

In Experiment 2, we presented participants with MOT in original and flipped trials (mirrored around the y-axis), and we evaluated the effect of symmetry on the eye movements. Each trial was repeatedly presented several times in both the original and flipped versions, and we used the CD metric to evaluate the scan pattern distances and the subset strategy to evaluate the statistical significance, on the basis of the results of Experiment 1. This experiment would determine whether flipping trials around the y-axis can provide comparable scan patterns, and therefore we could use this technique for masking trial repetition.