The validation of online webcam-based eye-tracking: The replication of the cascade effect, the novelty preference, and the visual world paradigm

Van der Cruyssen, Ine; Ben-Shakhar, Gershon; Pertzov, Yoni; Guy, Nitzan; Cabooter, Quinn; Gunschera, Lukas J.; Verschuere, Bruno

doi:10.3758/s13428-023-02221-2

The validation of online webcam-based eye-tracking: The replication of the cascade effect, the novelty preference, and the visual world paradigm

Open access
Published: 30 August 2023

(2023)
Cite this article

Download PDF

You have full access to this open access article

Behavior Research Methods Aims and scope Submit manuscript

The validation of online webcam-based eye-tracking: The replication of the cascade effect, the novelty preference, and the visual world paradigm

Download PDF

Ine Van der Cruyssen^1,2,
Gershon Ben-Shakhar²,
Yoni Pertzov²,
Nitzan Guy²,
Quinn Cabooter^1,3,
Lukas J. Gunschera¹ &
…
Bruno Verschuere¹

1988 Accesses
3 Altmetric
Explore all metrics

Abstract

The many benefits of online research and the recent emergence of open-source eye-tracking libraries have sparked an interest in transferring time-consuming and expensive eye-tracking studies from the lab to the web. In the current study, we validate online webcam-based eye-tracking by conceptually replicating three robust eye-tracking studies (the cascade effect, n = 134, the novelty preference, n = 45, and the visual world paradigm, n = 32) online using the participant’s webcam as eye-tracker with the WebGazer.js library. We successfully replicated all three effects, although the effect sizes of all three studies shrank by 20–27%. The visual world paradigm was conducted both online and in the lab, using the same participants and a standard laboratory eye-tracker. The results showed that replication per se could not fully account for the effect size shrinkage, but that the shrinkage was also due to the use of online webcam-based eye-tracking, which is noisier. In conclusion, we argue that eye-tracking studies with relatively large effects that do not require extremely high precision (e.g., studies with four or fewer large regions of interest) can be done online using the participant’s webcam. We also make recommendations for how the quality of online webcam-based eye-tracking could be improved.

MouseView.js: Reliable and valid attention tracking in web-based experiments using a cursor-directed aperture

Article Open access 29 September 2021

Eye Tracking Methodology

Investigating the suitability of online eye tracking for psychological research: Evidence from comparisons with in-person data using emotion–attention interaction tasks

Article 20 June 2023

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Humans are the only species on earth with visible sclera (i.e., the white of the eye; Kobayashi & Kohshima, 1997). Even great apes, which are extremely close to humans in evolution, do not have visible sclera. It has been claimed that the white surrounding human’s darker-colored iris evolved to make it easier for humans to follow the gaze direction of their conspecifics (Kobayashi & Kohshima, 2001). Following others’ gaze is a valuable feature because gaze direction is an indicator of human visual attention (Just & Carpenter, 2018), and events that capture attention for one person could also be relevant for its conspecifics.

What a person is looking at is not only interesting for nonverbal communication in social interactions, but is also useful for exploring more general questions concerning human attention. Since the late twentieth century, video-based eye-trackers have been used to track the eyes in real time (Singh & Singh, 2012) by measuring the position of an infrared light reflection on the cornea (i.e., the transparent layer forming the front of the eye), relative to the pupil (Carter & Luke, 2020). This method allows researchers to track gaze behavior and identify what guides visual attention. In the last 20 years, eye-tracking research gained much popularity and became a common measurement tool in many areas of science (Carter & Luke, 2020).

However, even though eye-tracking research has led to very interesting insights in recent years, this research method has some important limitations. The need for a lab, an expensive eye-tracker device, an experienced researcher who is familiar with the method, and a required calibration procedure make eye-tracking research a rather elaborate, expensive, and time-consuming method. Furthermore, these limitations do not allow for field research in natural environments. These restrictions recently sparked an interest in the use of common webcams to infer the eye-gaze locations of participants (e.g., Bott et al., 2017; Semmelmann & Weigelt, 2018). Moreover, the recent social and economic pressures of the COVID-19 pandemic reinforced this existing interest in webcam-based eye-tracking, as it would allow studies to move online.

The use of a webcam as eye-tracker would make the research quicker, easier, and cheaper, as no lab, experimenter, or dedicated hardware is needed. Moving from lab to web could also allow for reaching a larger and more diverse participant pool more quickly, or reaching a hard-to-reach sample (e.g., patients with dementia; or a US-based researcher wishing to compare US and Chinese participants). The collection of data would no longer be limited by time or location, as individuals could participate whenever they wanted from the comfort of their homes. Importantly, in other fields, research has already shown that the benefits of online research do not necessarily come at a price. Data quality has been shown to be similar to that of lab research (Kees et al., 2017; Walter et al., 2019), and several effects from other fields have already been replicated in online settings (e.g., Dodou & de Winter, 2014; Gosling et al., 2004; Klein et al., 2014; Semmelmann & Weigelt, 2018).

The movement toward online webcam-based eye-tracking research has been facilitated by recent advances in eye-tracking scripts. Open-source eye-tracking libraries such as WebGazer and TurkerGaze enable researchers to use participants’ webcams to infer their gaze position in real time (Papoutsaki et al., 2016; Xu et al., 2015). To do this, the eye-tracking modules build a mapping between the characteristics of the eye (e.g., pupil position) and gaze positions on the screen. The libraries can be easily integrated into any script or experiment with only a few lines of code.

Some studies have already successfully implemented eye-tracking libraries in their online experiments (Semmelmann & Weigelt, 2018; Slim & Hartsuiker, 2021; Yang & Krajbich, 2021). Semmelmann and Weigelt (2018), for example, demonstrated some basic gaze properties (i.e., a fixation task, a pursuit task, and a free viewing task) with online webcam-based eye-tracking. Slim and Hartsuiker (2021), and Yang and Krajbich (2021) both successfully replicated a behavioral eye-tracking experiment (a visual world experiment and a food choice task respectively), although in both studies most participants did not pass the initial calibration/validation phase, and were excluded from the study (73% and 61% exclusions). Moreover, the latter two studies did not directly compare online webcam-based eye-tracking to lab-based eye-tracking. In conclusion, it remains to be established to what extent online webcam-based eye-tracking could be a valid replacement for lab-based eye-tracking and what the cost in terms of capturing cognitive effects on gaze behavior would be.

To validate online webcam-based eye-tracking, we conceptually replicated three classic, robust eye-tracking studies online using the participant’s webcam as an eye-tracker. Furthermore, in the third study, we directly compared the data for participants undergoing both an online webcam-based and a classic lab-based eye-tracking study. Based on Semmelmann and Weigelt (2018), Slim and Hartsuiker (2021), and Yang and Krajbich (2021), we expect to be able to replicate these effects in a web-based setting. Even though some loss of accuracy can be expected, online eye-tracking studies create unprecedented opportunities, as it makes research easier, quicker, and cheaper. This would create great possibilities for studies that require large or hard-to-reach samples or have limited funding. It would also enable the progress of research during pandemic lockdowns.

Study 1: Cascade effect

The first effect we aimed to replicate was the cascade effect, originally shown by Shimojo et al. (2003). The cascade effect refers to the phenomenon that when people choose which of two presented faces they find most attractive, their gaze is initially distributed evenly between the faces, but then they gradually prioritize the face that they eventually choose. Here, we define the cascade effect as the likelihood of looking at the face that people eventually select during the 100 ms before reporting the decision.

Method

This study was preregistered (https://osf.io/ykd25). All materials, data, and analytic scripts have been made publicly available and can be accessed at https://osf.io/p3xac/.

Participants

An a priori power analysis revealed that for a one-sided one-sample binomial test with an alpha of 0.05, the minimum required sample size was 119 participants to reach 90% power to detect a small effect (g = 0.13; which was the observed size of the cascade effect in our pilot study of N = 20 when conducting a one-sided one-sample binomial test). Anticipating exclusions, 152 participants were recruited via the online crowdsourcing platform Prolific (https://www.prolific.co). Afterward, we decided that a one-sided one-sample t-test would be a more appropriate test. To achieve 90% power to detect a medium effect of d = 0.61 (i.e., the effect size of the pilot when conducting a one-sided one-sample t-test), we only needed to test 25 participants, which we greatly exceeded in our study. Eligibility was restricted to English-speaking participants with a computer connected to a functioning webcam who did not wear glasses at the time of the experiment and did not participate in the pilot study.

Based on our preregistered exclusion criteria, we excluded five participants for showing no variation in estimated eye gaze across all trials, four for showing no variation in the selected responses across all trials, and seven for having more than 50% of the measurement points falling outside any of the AOIs. Our final sample contained 136 participants (89% of the original sample; 69% male, 31% female, and 0% other) with a mean age of 25 years (SD = 7 years, range 18–48 years). They originated from 28 different countries, of which Portugal had the largest share (24%).

Procedure

All participants gave informed consent before taking part in the study. The task was computerized and completed online. In the first part of the experiment, participants provided some demographic information and we double-checked whether they had a working webcam and whether it was placed correctly on their computer. Next, participants saw an instruction screen detailing optimal conditions for webcam-based eye-tracking (see Semmelmann & Weigelt, 2018). Once participants indicated that they had successfully set up according to the instructions, they proceeded to an eye-tracking calibration phase (i.e., participants were instructed that they would see a series of white squares and that they had to look at them and click on them), followed by the main task.

Each trial of the main task started with a fixation cross (2000 ms), followed by a pair of faces (see Fig. 1). They were instructed to select the one they deemed most attractive by pressing the corresponding key on their keyboard (“F” for the left face, “J” for the right face). They could take as long as they needed to make a decision. There were 18 trials in total. The faces were selected from the London Face Research database (DeBruine & Jones, 2017), which contains images of 102 adult faces with accompanying attractiveness ratings from 2513 individuals. Face pairs were combined based on minimal differences in average attractiveness ratings to replicate the face attractiveness difficult condition of Shimojo et al. (2003). The selected face pairs were matched for gender and ethnicity and were limited to a maximum age difference of 4 years. The order and composition of the face pairs were fixed, meaning that the presentation and the location of each face were consistent across all participants. Faces were presented on a light gray background and vertically centered. The images were 173 × 173 px in size and spaced 295 px apart.

After completing 18 trials, participants received a short debriefing and were thanked for their participation. The demographics and webcam check were programmed in and hosted on Qualtrics (https://www.qualtrics.com), and the eye-tracking part was programmed in PsychoJS and hosted on Pavlovia (https://pavlovia.org/). For the eye-tracking part, we made use of the WebGazer open-source eye-tracking library (Papoutsaki et al., 2016). The entire study was conducted in English.

Results

Preregistered analyses

As previous studies that used online webcam-based eye-tracking lost many participants because they did not pass the initial validation phase, we used an alternative approach in which we manipulated the data after they had been collected, rather than excluding participants who had a too large offset. We extracted gaze position during the tailing 80% of each central fixation period at the beginning of each trial and estimated the measurement bias for that given trial. To account for the offset, the estimated bias was then added to the midline and area of interest (AOI) bounds of that trial (see Fig. 2). For instance, if we found an offset of 50 px to the right of the fixation cross, we would shift the midline and bounds of the AOIs 50 px to the right. As this experiment was limited to a left–right distinction, we applied this correction only for x-values. On average, the midline was corrected for 119 px to either direction (median = 97 px, min = 0 px, max = 480 px). Our confirmatory analyses are based on the midline corrected data. For a comparison between the raw data and the midline corrected data, see our non-preregistered analyses.

Based on our preregistered exclusion criteria, no trials were excluded due to a deviation of the corrected midline of more than 25% of the total screen width from the true midline; no trials were excluded because the standard deviation of the corrected midline was larger than 25% of the screen width; 12% of the measurement points were excluded because they fell outside any of the specified AOIs, and 16% of the trials because the response time was below 0.5 s or above 30 s.

Contrary to what we described in the preregistration, we decided that a t-test was ultimately a more appropriate test for this study than a binomial test. We did also run all analyses as described in the preregistration, from which the same conclusions were drawn. These analyses can be found at https://osf.io/kwnxc. The results revealed that the likelihood of looking at the face that was eventually selected as more attractive by the participant during the 100 ms leading up to the decision was 62% (as compared to the 50% chance level). A one-sided one-sample t-test revealed that this rate was significantly larger than chance, t(135) = 7.58, p < .001, d = 0.65, 95% CI [0.62; ∞]. Additionally, a Bayesian one-sample t-test with the default Cauchy scale of 0.707 showed that the data were 3.10 × 10⁹ times more likely under the alternative model in which participants looked more at the eventually chosen face than under the null model of no difference in viewing proportion. This cascade effect is also shown in Fig. 3, which demonstrates a steady increase in viewing proportion over time, and resembles the overall trend reported by Shimojo et al. (2003), although smaller in magnitude (the original study reported an 83% likelihood of looking at the selected face in the 100 ms leading up to the decision).

Non-preregistered analyses

Midline correction

In the raw data, 7% of the measurement points fell outside of any of the AOIs. In the midline corrected data, 5% of the measurement points fell outside of any of the AOIs. This indicates that we captured slightly more measurement points when adjusting for the offset. When we reran the analyses without the midline correction, we found a 61% likelihood of looking at the face that was eventually selected (compared to 62% with midline correction). This proportion was also significantly higher than chance level, t(134) = 7.84, p < .001, d = 0.67, 95% CI [0.65; ∞]. A Bayesian one-sided one-sample t-test showed that the data were 1.19 × 10¹⁰ times more likely under the alternative model than under the null model.

Study 2: Novelty preference

The second effect we replicated was the novelty preference effect. This effect refers to the finding that people are more likely to attend to new stimuli than to stimuli they have already seen. This effect is typically demonstrated with the visual paired-comparison task and has been shown by Crutcher et al. (2009), among others.