MouseView.js: Reliable and valid attention tracking in web-based experiments using a cursor-directed aperture

Anwyl-Irvine, Alexander L.; Armstrong, Thomas; Dalmaijer, Edwin S.

doi:10.3758/s13428-021-01703-5

MouseView.js: Reliable and valid attention tracking in web-based experiments using a cursor-directed aperture

Open access
Published: 29 September 2021

Volume 54, pages 1663–1687, (2022)
Cite this article

Download PDF

You have full access to this open access article

Behavior Research Methods Aims and scope Submit manuscript

MouseView.js: Reliable and valid attention tracking in web-based experiments using a cursor-directed aperture

Download PDF

Alexander L. Anwyl-Irvine¹,
Thomas Armstrong² &
Edwin S. Dalmaijer ORCID: orcid.org/0000-0003-3241-0760^1,3

7083 Accesses
12 Citations
16 Altmetric
Explore all metrics

Abstract

Psychological research is increasingly moving online, where web-based studies allow for data collection at scale. Behavioural researchers are well supported by existing tools for participant recruitment, and for building and running experiments with decent timing. However, not all techniques are portable to the Internet: While eye tracking works in tightly controlled lab conditions, webcam-based eye tracking suffers from high attrition and poorer quality due to basic limitations like webcam availability, poor image quality, and reflections on glasses and the cornea. Here we present MouseView.js, an alternative to eye tracking that can be employed in web-based research. Inspired by the visual system, MouseView.js blurs the display to mimic peripheral vision, but allows participants to move a sharp aperture that is roughly the size of the fovea. Like eye gaze, the aperture can be directed to fixate on stimuli of interest. We validated MouseView.js in an online replication (N = 165) of an established free viewing task (N = 83 existing eye-tracking datasets), and in an in-lab direct comparison with eye tracking in the same participants (N = 50). Mouseview.js proved as reliable as gaze, and produced the same pattern of dwell time results. In addition, dwell time differences from MouseView.js and from eye tracking correlated highly, and related to self-report measures in similar ways. The tool is open-source, implemented in JavaScript, and usable as a standalone library, or within Gorilla, jsPsych, and PsychoJS. In sum, MouseView.js is a freely available instrument for attention-tracking that is both reliable and valid, and that can replace eye tracking in certain web-based psychological experiments.

The validation of online webcam-based eye-tracking: The replication of the cascade effect, the novelty preference, and the visual world paradigm

Article Open access 30 August 2023

Eye Tracking Methodology

Investigating the suitability of online eye tracking for psychological research: Evidence from comparisons with in-person data using emotion–attention interaction tasks

Article 20 June 2023

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Encouraged by the ever-increasing accessibility and quality of tools for web-based experimentation, psychological researchers have gradually moved behavioural and self-report studies online (see Anwyl-Irvine, Dalmaijer, et al., 2020a, for a brief history). One technique that has yet to make this transition is eye tracking, a popular method that yields rich data on gaze fixation patterns, saccade dynamics, and pupil size (Holmqvist et al., 2015). While webcam-based eye tracking was a relatively niche topic in cognitive and behavioural sciences until recently, pandemic-related lockdowns and related lab closures have caused a sudden surge in interest.

Modern approaches to webcam eye tracking typically employ a combination of facial landmark detection (Kazemi & Sullivan, 2014; Saragih et al., 2011), sometimes aided by pupil detection (Papoutsaki et al., 2015), and (regularised) regression models to map landmarks to gaze positions (Papoutsaki et al., 2015; Xu et al., 2015). Others have taken a slightly more complex approach, in which extracted features (e.g. each eye, the face, facial location within the image) were cropped from the webcam stream, and passed through a neural network (Krafka et al., 2016; Meng & Zhao, 2017). Many algorithms for sub-components of eye and gaze detection cascades have been proposed, see Gómez-Poveda and Gaudioso (2016) for a summary and evaluation.

One currently popular and accessible package, built as an extension of TurkerGaze (Xu et al., 2015), is WebGazer (Papoutsaki et al., 2015). When independently tested under ideal conditions (e.g. high-resolution webcam, clean camera, no reflections on the eye that obscure the pupil, participant not too far or too close, none to low participant movement), WebGazer produces reasonable accuracy and precision (about 17% of screen size, which translates into variable size in degrees of visual angle due to participants’ non-standardised home environments) (Semmelmann & Weigelt, 2018). It is good enough for studies that need only rough gaze estimation, and thus cause for optimism among many.

Unfortunately, when employed “in the wild”, webcam-based eye-tracking suffers from high attrition (participants who start the study, but fail the eye-tracking calibration): 62% in Semmelmann and Weigelt (2018) and 61% in Yang and Krajbich (2020). This is not due to operator error: both studies were conducted by able programmers who adapted the WebGazer source code for their own purposes. Instead, participants were likely unable to pass the calibration procedure due to poor image quality, suboptimal lighting conditions, and reflections on the cornea or their glasses. In addition, the technique obviously excludes participants who do not own a webcam. In sum, samples in web-based eye-tracking are biased by definition (because over 60% of participants who try to take part fail the calibration), and suffer from heavy attrition due to fundamental limitations on eye signal in webcam streams.

Here, we describe a method that is designed to simulate gaze tracking without the need for a webcam. We achieve this by mimicking the visual system’s peripheral blur and foveal clarity; specifically by allowing participants to move a high-fidelity aperture on an otherwise obscured field with their mouse. The result is akin to providing participants with a narrow-beam torch, and placing them in a dark room in which the experimenter has arranged the furniture in a particular way.

We are by no means the first to suggest such a technique. In fact, the idea of limiting viewing with a gaze-contingent aperture is almost half a century old (McConkie & Rayner, 1975). It has been used profitably in reading research (Rayner, 2014), and to simulate the viewing behaviour of macular degeneration patients (Lingnau et al., 2008, 2010) who develop “preferred retinal locations” that act somewhat similarly to the fovea (Bethlehem et al., 2014). Decoupling eyes and viewing aperture is at least two decades old, when apertures were locked to computer mouse cursors rather than gaze (Blackwell et al., 2000; Jansen et al., 2003), or gaze was limited to randomly placed (Gosselin & Schyns, 2001) or participant-operated(Deng et al., 2013) “bubbles” (perforations in a blurring mask through which participants could view the underlying stimulus material).

More recent work has taken both the bubble (Deng et al., 2013; Kim et al., 2017, 2015) and the moving aperture approaches (Gomez et al., 2017; Jiang et al., 2015) to the Internet, usually with a view to investigate specific types of stimuli (e.g. data visualisations) or to accrue large datasets for the development of visual saliency models. Aside from producing excellent names (“Fauxvea” by Gomez et al., 2017), these efforts have been particularly successful in demonstrating that mouse-guided visual exploration overlaps relatively closely with free viewing patterns.

While highly encouraging, the cited work has not produced readily available software for psychological experiments (although similar code snippets exist, e.g. this PsychoPy demonstration: gitlab.pavlovia.org/demos/dynamic_selective_inspect). In addition, it remains unclear whether non-saliency aspects of gaze behaviour are equally well approximated by the mouse-locked aperture paradigm. We address both of these issues by presenting an open-source JavaScript library, MouseView.js, and by testing this in free viewing experiments that probe more than just visual saliency. MouseView.js can be used as a standalone library, and is integrated into popular experiment-building platforms Gorilla (www.gorilla.sc, Anwyl-Irvine, Massonnié, et al., 2020b), jsPsych (www.jspsych.org, de Leeuw, 2015), and PsychoPy/PsychoJS (www.psychopy.org, Peirce et al., 2019).

We present two validation studies, in which we compare MouseView.js and eye tracking in preferential looking experiments. The first validation study replicated an existing eye-tracking study (Armstrong et al., 2020) in a web-based experiment, and we compared data between the original and our online sample. In a second validation study, we recruited a new sample to take part in two lab-based experiments to directly compare gaze and mouse within the same participants.

MouseView.js

At the core of MouseView.js is a highly configurable JavaScript library. This obscures a webpage with an overlay, permits the user to view the page through an aperture, and records coordinates of the mouse cursor or screen-touches. The library is built to be as flexible as possible. It can create an overlay over an online experimental task (as we demonstrate in our validation study), or it can be used on a dynamic website for user-experience research. The overlay does not prevent user interaction, so users can click or press buttons, and type on the keyboard as usual.

In this section, we give an overview of the available configuration options and methods. For a more complete and up-to-date overview, including some tutorials on implementation, we recommend reading our documentation at www.mouseview.org/docs. For the purposes of this description, we will use the term user to describe the participant or page viewer, and the term researcher to describe the person setting up MouseView.js.

Mechanism

The library is designed using the relatively recent ES6 Module architecture. This means it can be included in an existing website or app with minimal scripting. Once included in a webpage, the library creates a globally accessible object with variables and functions that any piece of code should be able to access. This object is called mouseview, and it contains all the methods needed to produce an overlay and to track mouse movements. This is architecture is analogous to other libraries, like WebGazer.js (Papoutsaki et al., 2015).

Configuration

As alluded to above, there are several options that researchers can choose to set. These pertain to three main areas: aperture, overlay, and recording. These are customised by specifying variables in the mouseview.params object. These variables are summarised in Table 1, and Fig. 1 illustrates some of the possible configurations.

Table 1 Configuration options, description and default values for MouseView.js

Full size table

Aperture

Researchers can specify the size of the viewing aperture in pixels, or as a percentage of screen width. Specifying size as a percentage ensures that scaling is consistent across devices of different screen sizes, whilst specifying size as pixels offers a greater level of control. The relevant name of the setting is mouseview.params.apertureSize. It can be a string in the format ‘X%’ for a percentage scale or as an integer to specify pixel. The default setting is 5%. For most researchers, we recommend the use of the percentage option, as this will ensure a reasonable level of consistency across participants. Some restriction of browser window sizes in experimental context is sensible when using the percentage setting; as small window sizes will lead to a very small aperture. This is possible in most experiment builders.

The edge of the aperture can also be blurred. The purpose of this is to roughly simulate the edges of the foveated area (Reisfeld et al., 1995), and to avoid the potentially distracting peripheral motion effects of a solid edge (Lingnau et al., 2008). We implement a Gaussian blur here, with the researcher specifying the standard deviation of the Gaussian function in pixels. The relevant setting is mouseview.params.apertureGauss. Currently, this is specified as an integer only, with 0 representing a solid edge (i.e. no blur).

Overlay

The overlay configuration allows researchers to customise the attributes of the obscuring layer, which the aperture cuts through. The overlay is a HTML canvas that we insert into the webpage over all the content, and it is drawn based on the settings. Firstly, the overlay can be transparent or any colour supported by the cascading style sheet language (label: e.g. ‘green’, hexcode, RGB or HSL code). If a colour is specified, the transparency can be set in the form of an alpha value between 0 (fully opaque) and 1 (fully transparent).

The library’s most complex feature is the ability to dynamically blur the contents of anything that is already on a webpage. This can be done by passing a non-zero value to mouseview.params.overlayGaussian. The value entered here represents the standard deviation of the Gaussian kernel applied to the underlying page content. This is not straightforward, as there is currently no way to apply a blurring filter to an entire webpage, while also allowing a cut-out. MouseView.js achieves this by taking a screenshot of the webpage, applying Gaussian blur to the screenshot, and then drawing this blurred image onto the overlay. The screenshot step is computationally intensive, and uses open-source library html2canvas (https://github.com/niklasvh/html2canvas). This renders the entire webpage off-screen, and turns it into an image.

This process takes time, so we provide the option for the researcher to pass in a callback function, which will be executed once the overlay has been generated. The researcher can use this to hide the contents of the experiment whilst they are being blurred, to avoid an unobscured preview. This function can be passed into mouseview.params.overlayGaussianFunc, which is then executed on the blur rendering (using a JavaScript object called a Promise). We recommend using the arrow syntax to define these functions (i.e. “() => {}”), as this ensures the given function will have access to external variables in the browser environment. Up-to-date examples on how to implement this can be found on the documentation website (www.mouseview.org).

Lastly, we also provide the ability to control when this blur overlay is updated. It should not be done every frame, as it is a computationally heavy process. The default behaviour is thus to only update the screenshot on window resizing and scrolling events (i.e. when the page changes with a user interaction). Researchers looking to blur a video or animation are likely better of opting for an opacity filter. However, if they do opt for Gaussian blur, we provide a refresh interval setting: Specifying a non-zero value for mouseview.params.overlayGaussianInterval, will tell MouseView.js to regenerate the blur at a set interval. Some fast computers might manage a sub-second refresh rate, but we recommend using an interval no shorter than 1–2 s, to avoid crashing the web page.

Recording

MouseView.js’ recording functionality is discussed in greater detail below. Here, we highlight one optional setting that configures the sampling rate of the mouse tracking. The mouseview.timing.sampleRate parameter sets a target sampling rate for MouseView.js to record the mouse positions at, i.e. how many (x,y) mouse coordinates will be recorded per second (defined as the time between each sample in milliseconds). The sampling rate is constricted by the animation refresh rate, which defines where the aperture is. Slower computers or higher computational loads will reduce the consistency of the sampling rate, which is why we describe it is a ”target sampling rate”. The default value is 16.66 ms, which corresponds to a single screen refresh on a 60-Hz monitor.

Functions

MouseView.js provides several functions (also called “methods”, as they are object-bound functions) to control elements on screen, and the recording of data. Like the configuration variables above, the methods are accessed through the global mouseview object. Table 2 gives a summary of all of these methods.

Table 2 Overview of methods available via the mouseview object

Full size table

The functions Mouseview.init() and mouseview.removeAll(), render the overlay and aperture or remove them, respectively. The functions mouseview.startTracking() & mouseview.stopTracking() control the recording of mouse movement (or screen touches) in pixel coordinates.AThe function mouseview.getData() returns a list of coordinates and associated timestamps, which can be piped into the format that the experimenter prefers for saving data. These are the core functions required for operating the library.

We also provide additional utility functions, including those needed for data persistence across webpages. This is particularly helpful for those wishing to conduct multi-page user experience research. By storing data with the browsers localStorage object, data can be passed between separate webpages. MouseView.js automatically detects and adds on the path of the page between these sessions. Persistence is achieved with the mouseview.storeData() and mouseview.getData() functions. Researchers can also log custom events (sometimes referred to as “triggers”), with timestamps relative to recorded samples. This functionality can be used to log dynamic events, and investigate mouse data relative to these events. Further utility functions are illustrated in Table 2.

Between-participants validation study

Plenty of studies have established that the cursor-locked apertures produce exploration behaviour that resembles gaze behaviour during free viewing (Blackwell et al., 2000; Gomez et al., 2017; Jiang et al., 2015). The purpose of the current validation study was to extend previous validation efforts into a preferential looking paradigm with pairs of affective stimuli. This is fundamentally different from single-stimulus free viewing, as participants generally divide their attention between the stimuli as a function of their affective qualities. In addition, stimuli are repeated over trials, thereby rendering initial differences in visual saliency increasingly less important to gaze behaviour.

Experiment

To validate the use of a mouse-locked aperture in overt attention research, we employed an affective preferential looking task. In this type of task, two images are shown for a relatively long time (e.g. 10 s), and participants are free to view where they would like. The affective component is in the image content: one evokes a specific emotion (independently verified by self-report), whereas the other is neutral. Image pairs are repeated over several trials, while locations (left versus right) are randomised. The measure of interest is how long participants’ gaze dwells on each image.

While the pairs are roughly matched for low-level features using a classic visual saliency model (Itti et al., 1998), the long exposure times and repeated exposures ensure that cognitive and affective processes will have an effect on participants’ dwell times for each stimulus. In the lab, this method has been used to show sustained avoidance of disgust stimuli (Dalmaijer et al., 2021), and to demonstrate how these are affected by pharmacologically altered gastric state (Nord et al., 2021). In addition, the method has been used to map overt attentional bias during trials, showing sustained biases for pleasant, threat, and suicide-related images, as well as an initial short-lived bias towards disgust stimuli that is followed by sustained avoidance (Armstrong et al., 2020).

Here, we replicate two of the summarised findings, specifically for disgusting and pleasant images, which in Experiment 2 of (Armstrong et al., 2020) were the strongest elicitors of oculomotor avoidance and approach, respectively. The original gaze data (N = 83) is directly compared to mouse data collected with MouseView.js (N = 165).

Procedure

Trials started with a central fixation cross, where the mouse had to be placed to advance the trial. Stimuli were then presented for 10 s, followed by a 1-s blank screen, after which the next trial started. A total of five stimulus pairs per condition were each presented four times, resulting in 20 trials with disgust and 20 trials with pleasant images. In the original eye-tracking study, two further image categories were included (suicide and threat). For simplicity, we did not include those in the MouseView.js replication, and thus do not report on them here.

Stimuli were shown in pairs, with one image appearing on the left of the screen and another on the right. Stimulus pairs were kept constant, but their left/right locations were randomised. The stimuli were scaled to the “viewport” (available display within the browser) so that their width was 25% of the viewport width; and their midpoints were positioned at 33% and 66% of the viewport width, and 50% of screen height. For example, on a 1920 x 1080 display, a typical viewport would be 1920 by 937, the stimuli would be sized 480 by 360 pixels, and their midpoints would fall on (641, 469) and (1278, 469). We did not estimate or attempt to control participants’ distance from their displays, and thus real stimulus size in degrees of visual angle was unknown.

Stimulus presentation lasted for 10 s, during which the screen was blurred (Gaussian blur with the default settings reported in Table 1). Participants could move a clear aperture with (Gaussian edges to avoid a hard boundary with the blurred area) and a size of 5% of the viewport width, which are the default settings for apertureGauss and apertureSize as reported in Table 1.

Stimuli

There were five disgust stimuli, each depicting bodily effluvia. They depicted a man throwing up, a toilet with excrement in and around it, a toilet with throw-up in it, a man throwing up directly into a mint-green toilet, and a close photo of loose stool (type 6 on the Bristol stool chart).

Pleasant images depicted three young children, a couple enjoying a bicycle trip, an elderly couple waving to the camera on a sunny quay, several laughing children, and four children in a catalogue-type photo.

The neutral images to which the above affective image were matched portrayed a close-up of buttons (for sewing onto clothing), assorted electrical and general-purpose tools hanging on a wall, a picnic bench, an electric clothes iron, a glass mug on a table, a metal dustpan lying on a floor, a hair dryer, four clothes pins in different colours, the bottom of a candle holder that is lying down, and a wire stripper.

Participants

Participants were recruited via Prolific Academic, and tested via the Gorilla experimentation platform. This platform has been shown to have sufficient display timing for our purposes of relatively long stimulus displays (Anwyl-Irvine, Dalmaijer, et al., 2020a; Bridges et al., 2020).

Data were collected in September 2020. Out of 165 participants, 45% self-identified as a woman, 55% as a man, and 0% as non-binary. Their age distribution spanned from 18 to 76 years, with an average of 31.5 (median = 29) and a standard deviation of 10.9. Participants reported being born in Brazil (1.2%), Canada (50.6%), China (4.8%), Germany (0.6%), Hong Kong (2.4%), India (1.8%), Iran (0.6%), Mexico (0.6%), Nigeria (3.0%), Pakistan (1.2%), Peru (0.6%), Philippines (1.2%), Saint Luca (0.6%), South Korea (1.2%), Sweden (0.6%), Taiwan (1.2%), Ukraine (0.6%), United Kingdom (1.8%), United States (24.7%), and Venezuela (0.6%); and all lived in either Canada (75%) or the United States (25%).

The MouseView.js data was then compared with data from 83 participants who took part in an eye-tracking study that was published elsewhere (Armstrong et al., 2020). This study was skewed towards university undergraduate students at Queen’s University (Kingston, ON, Canada). Their average was 19.71 years (SD = 2.06 years); their gender identity 83.3% women, 14.3% men, and 1.2% as nonbinary; and their racial/ethnic identity 46.4% White, 36.9% Asian, 6% Indigenous, 4.8% Latino/a, 3.6% Black, and 1.2% Middle Eastern or North African.

It should be noted that details were subtly different between the original study and the current mouse-aperture replication. Due to the nature of web-based data collection, display size (and thus stimulus display) was different for each MouseView.js participant. In addition, the original eye-tracking experiment had two additional stimulus conditions (threat and suicide-related), which were omitted for simplicity in the MouseView.js experiment. Finally, while affective-neutral stimulus pairing remained constant in the MouseView.js experiment, it was not in the eye-tracking experiment. For all of these reasons, direct statistical comparisons would not be particularly informative. Heatmap and scanpath comparisons are presented as they are, and interpreted qualitatively.

Reliability

We examined whether MouseView.js produced reliable results by computing the difference in dwell time between the affective and neutral stimulus in each trial. We then computed Cronbach’s coefficient α, and the split-half reliability as the average Spearman–Brown coefficient (ρ) over 100 different halfway-splits of all trials within each condition (disgust or pleasant).

The results are summarised in Table 3, and show that differences in dwell time between the affective and the neutral image in each trial are of good reliability for disgust stimuli for both gaze (ρ = 0.81, α = 0.78) and mouse (ρ = 0.89, α = 0.86). Dwell time difference was of lower reliability for pleasant images for both gaze (ρ = 0.54, α = 0.53) and mouse (ρ = 0.71, α = 0.69). These results indicate that reliability was similar between eye-tracking and the mouse-locked aperture experiments, and in fact slightly better for mouse-aperture dwell.

Table 3 The reliability of dwell-time differences between neutral and affective stimuli in 20 trials, estimated by the Spearman–Brownsplit-half reliability ρ (and the standard error of its mean across splits) over 100 random splits, and Cronbach’s coefficient alpha

Full size table

Behavioural consistency

Related to reliability is how similarly participants respond to all images. Five different images were employed in each condition, and we computed dwell-time differences between the affective and neutral stimulus for each participant (over whole trials, then averaged across four presentations). We then correlated the dwell-time differences for each stimulus with those for all other stimuli, resulting in Fig. 2. Values close to 1 would illustrate that participants avoided (or approached) stimuli in the same way, whereas values close to 0 indicate that participants were less consistent in their gaze or mouse behaviour towards the stimuli.

Dwell differences were more consistent for disgusting compared to pleasant images, but similar between gaze and mouse aperture: Four out of twenty correlations were statistically significantly different between gaze and mouse. For these significant cells (combinations of stimulus and repetition number), mouse data were more consistent between participants compared to eye-tracking data.

We computed similar metrics for each presentation number, averaged across stimuli within each condition. Three out of twelve correlations differed between gaze and mouse (Fig. 3).

It should be noted that one would not necessarily expect the above correlations to be high, nor that they directly reflect measurement reliability. This is because real differences exist between the first and later presentations of stimulus pairs in the current design: the first presentation of a disgust-neutral stimulus pair provokes less disgust avoidance than later presentations of the same pair (Armstrong et al., 2020; Dalmaijer et al., 2021; Nord et al., 2021).

In sum, these results show that behaviour is more stable between stimuli and presentations for disgusting compared to pleasant images. They also indicate that dwell behaviour is consistent between gaze and MouseView.js experiments, although it is more consistent in mouse recordings for a minority of stimuli, and different in consistency only between presentation numbers 1 and 4 (more consistent for gaze) and 2 and 3 (more consistent for mouse).

Validity

The objective of the following analyses was to determine whether mouse-locked apertures genuinely track participants’ overt attention. We established this by directly comparing MouseView.js data with the common attention-tracking method of gaze tracking. Specifically, we compared scan paths and dwell-time differences between stimuli.

In addition, we correlated dwell time differences between affective and neutral stimuli with self-reported disgust and pleasantness ratings for those stimuli. It has previously been shown that the oculomotor avoidance of disgust stimuli correlates with self-reported disgust (Dalmaijer et al., 2021). We thus expected mouse dwell to show the same.

Pre-registered hypotheses

We pre-registered the following hypotheses, which correspond to the main eye-tracking findings in Armstrong et al. (2020): 1) Participants will view disgusting images less overall compared to accompanying neutral images. 2) Participants will view pleasant images more overall compared to accompanying neutral images. 3) Participants' disgust ratings of the disgusting images will correlate negatively with their overall viewing time of the disgusting images. 4) Participants' pleasantness ratings of the pleasant images will correlate positively with their overall viewing time of the pleasant images. 5) Both disgusting and pleasant images will initially capture "attention" (gaze directed at the image) relative to the neutral image. Then disgusting images will be viewed less as the trial progresses (eventually less than the accompanying neutral image), whereas pleasant images will continue to be viewed more than the neutral image. 6) Dwell on the disgusting image will be greatest on the first exposure to an image, and then will decrease once the disgusting image becomes familiar. 7) Disgust ratings of disgust images will be associated with a greater slope of decreasing viewing across trials. The full pre-registration can be found at https://osf.io/mta2d.

Comparing mouse and gaze dwell time

Dwell time was computed as the total duration of gaze or mouse samples for which the coordinate was within an image. The difference in dwell time for affective and neutral stimuli was computed between each of the stimulus pairs, and then averaged over all stimuli within each condition (disgust or pleasant). We also computed one-samplet tests between the dwell time for the affective (disgust or pleasant) and neutral stimulus, after averaging dwell times over all stimuli within each condition (disgust or pleasant). The results are plotted in the bottom rows of Fig. 4 for gaze, and Fig. 5 for MouseView.js.

These results indicate that participants showed sustained bias towards pleasant stimuli, and away from disgust stimuli. The exception to this is an initial bias towards disgust stimuli, which is particularly apparent in the eye-tracking experiment, but also in the first trial of the MouseView.js data.

We directly compared gaze and mouse dwell-time biases (Fig. 6, top row). We employed linear regression with an intercept, and with experiment sample membership (eye tracking or MouseView.js) as the sole predictor. This is analogous to an independent-samplest test, but has the benefit of allowing for a prior-free Bayes factor computation; using the Bayesian Information Criterion (BIC) from the regression, and from a null model with only an intercept (Wagenmakers, 2007). The log(BF₁₀) is plotted in Fig. 6 (bottom row) and could be considered evidence for a difference between gaze and mouse from about 1.1 (corresponds with BF₁₀ = 3), or evidence for the lack of a difference from – 1.1 (corresponds with BF₁₀ = 1/3).

These results indicate that the initial approach of affective stimuli was stronger in gaze data, and that disgust avoidance in later stimulus repetitions is stronger in mouse data. However, there was no difference between eye tracking and MouseView.js for the majority of time in trials. This suggests that, after the first 1–1.5 s, MouseView.js is a good approximation of eye tracking in preferential looking tasks.

Relation to self-report

Averaged across all stimuli and stimulus repetitions, the average difference between dwell time for disgust and neutral items correlated with the average disgust rating for stimuli (Fig. 7, top row). This was true for gaze (R = – 0.47, p < 0.001) and for MouseView.js dwell (R = – 0.19, p = 0.017), although this correlation was significantly lower for mouse compared to gaze (Z = – 2.39, p = 0.017).

The average difference between dwell time for pleasant compared to neutral images was correlated with average pleasantness ratings (Fig. 7, bottom row) for gaze (R = 0.24, p = 0.028), but not for MouseView.js dwell times (R = 0.07, p = 0.377); although there was no significant difference between the two correlations (Z = 1.31, p = 0.192).

We also analysed whether self-reported stimulus disgust and pleasantness ratings impacted dwell times by employing linear mixed models. Here, we predicted gaze and mouse dwell time using a model with main factors condition (levels: disgust and pleasant), stimulus rating, and presentation number, and their interactions; and with participant number as random effect. These models showed highly similar outcomes for gaze (Table 4) and MouseView.js data (Table 5). Results indicated that participants showed more approach to pleasant stimuli compared to disgusting ones, less approach to stimuli with higher ratings (likely driven by disgust stimulus ratings), and increasingly less approach with presentation number. In addition, ratings had opposite effects between conditions (likely driven by higher avoidance for disgust, and no change or higher approach with higher pleasantness ratings), and there was also an interaction effect of condition and presentation number (likely driven by an increased tendency of avoidance of disgust stimuli at increasing presentations).

Table 4 Outcomes of a linear mixed model of gaze dwell time difference (affective - neutral) with participant number as random effect, and as fixed effects condition (disgust or pleasant), self-reported stimulus rating, presentation number, and their interactions

Full size table

Table 5 Outcomes of a linear mixed model of MouseView.js dwell time difference (affective - neutral) with participant number as random effect, and as fixed effects condition (disgust or pleasant), self-reported stimulus rating, presentation number, and their interactions

Full size table

In sum, these data show that eye tracking and MouseView.js resulted in qualitatively similar patterns of correlation between disgust avoidance and self-report, and quantitatively similar patterns for pleasantness approach and self-report. However, as a general pattern, the concordance between self-report and dwell was somewhat smaller for MouseView.js compared to eye tracking. MouseView.js participants used less of the available rating range than eye-tracking participants, potentially because we used a narrower scale in the MouseView.js design that lacked intermediate scale labels (e.g. “slightly”, “moderate”, etc). This could have impacted the reported correlations.

MouseView.js and gaze scanpath similarity

Previous reports have already established a good overlap between mouse-aperture viewing, gaze, and visual saliency (Gomez et al., 2017; Jiang et al., 2015). Here, we aimed to replicate these findings by comparing traditional heatmaps, but also by comparing scanpaths between eye-tracking and MouseView.js experiments.

In eye-tracking research, heatmaps usually quantify the location and duration of gaze fixations, because these moments of relative stability of the eye are when most active vision occurs. In MouseView.js, because the fovea-like aperture is mouse-guided, the concept of fixations as moments of stability between ballistic saccades does not hold up. To be able to directly compare gaze and mouse data, we resampled each to 30 Hz, resulting in 300 samples per trial. In addition, we scaled all viewports (which vary in size between participants) so that display coordinates were a minimum of 0 and a maximum of 1. Data was also flipped so that the affective stimulus appeared to the left of the neutral stimulus.

Heatmaps were then constructed as two-dimensional histograms (Fig. 8 for disgust, and Fig. 9 for pleasant stimuli; see the Supplementary Material for larger versions). They illustrate that while some differences exist (likely due to subtle differences in experiment design), gaze and mouse patterns are qualitatively similar. Specifically, the spatial distribution across stimuli of the hottest areas (longest and/or most frequent dwell) aligned between gaze and mouse. This is apparent in two aspects: the global affective-neutral distribution, and the more local patterns within each image. Subtle differences exist too, most notably in the prominent central hotspot in mouse data, and the smearing of hot areas around specific points (particularly apparent in pleasant stimuli 1 and 5). Both were likely the result of mouse movements being slower than eye movements, first to move away from the location of the fixation cross that preceded stimulus presentation (also apparent in Fig. 5), and then in movement between points of interest.

Where heatmaps are two-dimensional representations of gaze patterns, scanpaths also take into account the order of fixations, and sometimes their dwell duration. Hence, scanpaths are three-dimensional representations of gaze patterns. Traditionally, gaze fixations are extracted, after which their spatiotemporal patterns can be analyses (Cristino et al., 2010). However, as outlined before, mouse movements do not lend themselves well to fixation detection. Instead, to be able to directly compare gaze and mouse scanpaths, we constructed a single vector for each trial with 300 horizontal and 300 vertical coordinates from the resampled data (see above). Across all conditions, stimuli, and repetitions, this resulted in a combined matrix of 9852 rows (9920 trials; 9852 after excluding missing data) by 600 coordinates (300 horizontal and 300 vertical coordinates). This was then reduced into two dimensions (9852 x 2) using multi-dimensional scaling (MDS, Fig. 10)(Kruskal, 1964a, 1964b) or uniform manifold approximation and projection (UMAP, Fig. 11)(McInnes et al., 2018), so that scanpaths of all participants and trials could be fit into a single two-dimensional plot. This approach is similar to that taken by others to compare line path drawings to each other (Ang et al., 2018).

It was apparent, for both gaze and mouse data, that scanpaths for disgust and pleasant stimuli generally inhabited opposite ends of the reduced space, and that this is particularly true for later repetitions. This was expected, given the opposite avoidance and approach responses for disgust versus pleasant stimuli, and suggests that our scanpath reduction method was able to recognise dissociable patterns in the data.

Crucially, gaze and mouse data showed spatial differentiation in the MDS projection, and to a lesser extent in the UMAP projection. This means that gaze and mouse scanpaths shared features, but were qualitatively dissociable.