eyeScrollR: A software method for reproducible mapping of eye-tracking data from scrollable web pages

Larigaldie, Nathanael; Dreneva, Anna; Orquin, Jacob L.

doi:10.3758/s13428-024-02343-1

eyeScrollR: A software method for reproducible mapping of eye-tracking data from scrollable web pages

Original Manuscript
Open access
Published: 12 February 2024

Volume 56, pages 3380–3395, (2024)
Cite this article

Download PDF

You have full access to this open access article

Behavior Research Methods Aims and scope Submit manuscript

eyeScrollR: A software method for reproducible mapping of eye-tracking data from scrollable web pages

Download PDF

1003 Accesses
17 Altmetric
Explore all metrics

Abstract

The Internet has become an important part of our lives and an increasing number of researchers use eye-tracking technology to examine attention and behavior in online environments. Researchers, however, face a significant challenge in mapping eye-tracking data from scrollable web pages. We describe the R package eyeScrollR for mapping eye-tracking data from scrollable content such as web pages. The package re-maps eye-tracking gaze coordinates to full-page coordinates with a deterministic algorithm based on mouse scroll data. The package includes options for handling common situations, such as sticky menus or ads that remain visible when the user scrolls. We test the package’s validity in different hardware and software settings and on different web pages and show that it is highly accurate when tested against manual coding. Compared to current methods, eyeScrollR provides a more reproducible and reliable approach for mapping eye-tracking data from scrollable web pages. With its open code and free availability, we recommend eyeScrollR as an essential tool for eye-tracking researchers, particularly those who adhere to open-science principles. The eyeScrollR package offers a valuable contribution to the field of eye-tracking research, facilitating accurate and standardized analysis of eye-tracking data in web scrolling contexts.

MouseView.js: Reliable and valid attention tracking in web-based experiments using a cursor-directed aperture

Article Open access 29 September 2021

PyTrack: An end-to-end analysis toolkit for eye tracking

Article Open access 04 June 2020

Accelerating eye movement research via accurate and affordable smartphone eye tracking

Article Open access 11 September 2020

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

In 2021, the average rate of Internet access exceeded 90% in OECD countries (OECD, 2021), and especially young people spend an increasing amount of time online (Anderson et al., 2018). The Internet has thus become an important part of our lives and an increasing number of researchers use eye-tracking technology to examine attention and behavior in online environments, for instance, in user experience (Lewandowski & Kammerer, 2021), online retailing (Tupikovskaja-Omovie & Tyler, 2021; Ladeira et al., 2019), advertising (Kaspar et al., 2019), and online education (Alemdag & Cagiltay, 2018).

Many of these studies, however, do not actually use web pages in their research but rely on static images of web pages or images of specific web page elements (Huddleston et al., 2015; Kanaan & Moacdieh, 2021; Luo, 2021; Schröter et al., 2021). While there may be several advantages to using images of web pages rather than actual web pages, the approach reduces ecological validity since participants cannot interact with or scroll up and down an image of a web page. The misalignment between research design and reality is most likely owing to the complexity of recording and processing eye-tracking data in online environments. Current methods face a significant challenge when it comes to mapping eye-tracking data to scrollable web pages. Typically, eye tracking on web pages will result in a video file of the participant browsing and scrolling through web pages with superimposed eye movements. However, at the analysis stage, the gaze position in the video recording is usually mapped to areas of interest (AOIs) on the web page. There are two main methods for processing eye movements in online settings.

The first method entails manually mapping fixations in the video recording to AOIs on a reference image of the web page (Holmqvist et al., 2011). The method is similar to how mobile eye-tracking data is usually coded with the key exception that in mobile eye tracking the reference image is a photograph of the environment that participants are navigating (e.g., Gidlöf et al., 2013).

The manual mapping method entails recording the screen while participants perform the study and afterwards manually mapping each gaze and/or fixation coordinate on each video frame to its corresponding location on a reference image of the web page. When there are a large number of participants, web pages, or AOIs in a study, the manual method becomes extremely time-consuming. Another disadvantage of manual coding is that it may be difficult to replicate due to subjective judgments about the precise location of fixations in relation to the reference image.

The second method entails using commercial eye-tracking software such as Gazepoint, Tobii, WebLink, or iMotions, for automatic or assisted gaze-mapping to coordinates on a reference image. This method, however, can also have a number of significant drawbacks. First, these software programs can be prohibitively expensive, limiting researchers’ access to the method. Second, since this software sometimes relies on heuristic computer vision algorithms, it does not always map gaze coordinates with satisfying accuracy, and in some situations, it can be unpredictable and unreliable, making it difficult to anticipate when or why the software will or will not work. Third, commercial software typically uses its own closed source code, which limits transparency and reproducibility since other researchers need access to exactly the same software version to reproduce the study. Finally, the majority of this software has user-friendly but imprecise interfaces (e.g., drag-and-drop boxes to define AOIs). As a result, even the most diligent researcher using the same software and the same version may be unable to exactly reproduce the study.

In addition, similar to manual coding, most commercial gaze-mapping software is based on a video recording of the screen. However, monitors typically refresh at least twice as often per second as video recordings which significantly reduces the maximum possible accuracy of the mapping process. Furthermore, online screen capturing is computationally expensive, and fluctuations in available computational power may cause occasional frames to lag or be skipped entirely, generally without the user’s knowledge. As a result, video recordings are untrustworthy representations of what was presented on the monitor during a study.

We aimed to develop a third method for processing eye-tracking data from real-world online environments. The purpose of this article is to introduce the novel method and the eyeScrollR software, an open-source R package (R Core Team, 2022) that provides researchers with a free, reproducible, and reliable method for mapping eye-tracking data to web pages. The eyeScrollR package re-maps eye-tracking gaze coordinates on the screen (e.g., screen dimensions 1920*1080 pixels) to coordinates on a full-page reference image (e.g., web page dimensions 1920*5000 pixels), by correcting the gaze y-coordinate up or down as a function of participant-generated actions like mouse scrolling. Similar to manual coding and commercial software, the gaze-mapped eyeScrollR data can be analyzed with AOIs. Unlike the two other methods, gaze-mapping with the eyeScrollR package makes it completely transparent, where changes have been made to the data allowing researchers to go beyond AOI-based analyses to, for instance, examine scan paths, saccadic amplitudes, micro-saccades, etc. The eyeScrollR package relies on a deterministic rather than heuristic algorithm, and it allows researchers to control any stage of data processing, forecast when and where the approach may or may not be employed with dependability, and improve it at will.

Package description and usage

eyeScrollR is an R package designed for researchers who wish to apply eye-tracking methods to real-world online environments. The package implements a gaze-mapping method that transforms eye-tracking gaze coordinates with screen dimensions to web page dimensions. The package can be used with any type of scrollable stimuli but is primarily intended for eye tracking on web pages. Its core function (eye_scroll_correct) loops through a chronologically ordered dataset containing as a minimum timestamps, participant-generated scroll events (such as key presses or mouse scroll events), and either gaze coordinates, fixation coordinates, or both. In other words, eyeScrollR reads through the dataset line by line, looking for moments when the user turned the scroll wheel and then applied an offset to all subsequent y-coordinates when applicable. For example, if eyeScrollR detects a down scroll worth 100 pixels, all subsequent gaze and fixation y-coordinates are increased by 100 pixels; however, this does not occur if the web page content has already been scrolled down to the very bottom. The method effectively allows the researcher to convert gaze or fixation data measured in terms of screen dimensions to web page dimensions. For interpreting the gaze-mapped data, it is useful to collect reference images in the form of full-page images of the web page which can be gathered with either dedicated software or browser plug-ins. The gaze-mapped data can then be superimposed on the full-page image as a heatmap.

Web pages often contain fixed areas that remain visible when the user scrolls and these areas may even change dynamically as a result of the scroll (e.g., a top navigation menu or a search bar that shrinks or disappears after the user has scrolled down a certain number of pixels). Such areas often remain in the same position on the screen and are therefore affected differently by scrolling. Consequently, gaze coordinates on these areas are not mapped in the same way as gaze coordinates on the regular web page content. The eyeScrollR package contains a function for handling fixed areas but it requires input from the researcher concerning the exact location of the fixed areas. The entire procedure can be summarized in a seven-step workflow, as shown in Fig. 1. The following sections describe these steps in detail and provide a complete reproducible example including commented code to be used in R.

Step 1. Prepare the study setting

Reliable and accurate gaze-mapping with eyeScrollR depends on the usefulness of the recorded scroll data. It is therefore important that participants use a mouse with a scroll wheel that has tactile notches, where each notch of scroll corresponds to a fixed amount of scrolling on the screen. This means that participants cannot use navigation with analog scroll, such as an Apple mouse or laptop track pads. To ensure useful and unambiguous scroll data, it is also necessary to change certain browser settings before the data collection begins. Omitting correct browser settings may result in a few pixels of inaccuracy in the fixation coordinates, and the mapping may be temporarily ahead by up to 150 ms during scrolling blur periods. Omitting correct browser settings may occasionally corrupt the entire file, if, for instance, a participant has a very fast and erratic scrolling pattern. As a result, it is recommended to disable smooth scrolling in the browser and remove the side scrollbar. Deactivating smooth scrolling in Google’s Chrome browser can be done by typing ’chrome://flags/’ in the address bar and disabling the ’Smooth Scrolling’ option. A similar setting can be achieved in all Chromium-based browsers (e.g., Edge, Opera, etc.), and the Firefox settings menu includes an option to deactivate it. It is also recommended to hide or deactivate the side scrollbar. In Chrome/Chromium-based browsers, activating the ’Overlay Scrollbars’ option in the same menu is sufficient, while free browser extensions can be downloaded for Firefox browsers.

Finally, it is important to plan in advance if, how, and when full-page images of web pages will be gathered. Web pages with highly dynamic content may make it difficult to collect web page images before or after the data collection (e.g., social media web pages). Less dynamic content may allow researchers to collect the image right after eye-tracking data collection (e.g., online shopping, where the likelihood of products appearing or disappearing in a matter of seconds is low), while images of completely static web pages can be collected even before the eye-tracking data collection has begun.

Step 2. Calibrate scrolling

eyeScrollR must be calibrated to the screen and the browser used during the eye-tracking study. Essentially, the calibration consists of gathering three pieces of information: a) the screen resolution, b) the exact coordinates of the top leftmost and bottom rightmost pixels of the viewing area (the visible part of the web page), and c) the number of pixels being scrolled whenever the user scrolls up or down one notch on the scroll wheel. This information is passed to the eye_scroll_correct function in step 6 and it informs the software about the number of pixels to correct for each mouse scroll event, and about where the scrollable content is placed on the screen. Calibration can be done automatically or manually. Automatic calibration requires making a screenshot of the calibration page (https://larigaldie-n.github.io/eyeScrollR/calibration.html) using the browser for which settings were prepared in Step 1 and then loading the screenshot image into R. The calibration function scroll_calibration_auto uses the image to compute all relevant calibration information. The calibration web page includes colored squares in the top left and bottom right corners of the viewing area, which the automatic calibration function detects in order to get the coordinates of the top leftmost and the bottom rightmost pixels of the viewing area. Screen resolution is inferred from the size of the image. Manual calibration can be done by gathering the three types of information by hand and passing them directly to the manual calibration function scroll_calibration_manual. It is recommended to use automatic calibration, as it is easier and more reliable compared to manual calibration. The latter should be reserved for when eyeScrollR is used in contexts other than web pages.

Step 3. Eye-tracking data collection

Any combination of eye tracker and software will work with the eyeScrollR package, as long as it produces a data set that contains the variables described in Step 6. The only critical point is that the participants use the browser prepared for the study in Step 1. The eye-tracking data collection can also take place after Step 4.

Step 4. Get full-page image

This step consists of getting the full-page image of the web page to which gaze data will be mapped. Its purpose is to: a) give the user a visual representation of the complete web page, b) get the total length of the web page, c) produce measurements of fixed areas if there are any (see Step 5), and d) optionally create a heatmap (see Step 7). There are several plugins for standard browsers and pieces of software that can produce full-page images of scrollable web pages (e.g., “GoFullPage” for Chromium browsers). Any tool is acceptable, as long as the resulting full-page image has the same pixel width as the viewing area and contains all visible content. Note that complex and dynamic web pages may require the researcher to perform some manual editing of the full-page image to ensure fidelity with the web page as it appears during browsing.

Step 5. Specify locations of fixed areas on the full-page image (optional)

Some web pages contain fixed areas, which means that the content does not scroll up or down with the rest of the content. A typical example of fixed areas is a menu at the top of a web page, a sidebar, or advertising that stays in place so that it is always visible to the user. A key feature of the eyeScrollR package is the possibility to specify such fixed areas. Specifying these areas will inform the package not to apply any scroll-based correction to gazes and fixations that are recorded inside the fixed areas. Instead, gazes and fixation inside fixed areas are redirected to the specified locations on the full-page image acquired in Step 4. The specification of fixed areas is done by manually mapping rectangular areas on the screen to specific areas on the full-page image, as shown in Fig. 2.

Sometimes the appearance of fixed areas changes during user interaction. For instance, top menus sometimes retract or disappear after a certain number of pixels have been scrolled, when a button has been clicked, and so on. To accommodate such situations, the eyeScrollR package handles fixed areas with bundles and rules. In eyeScrollR terminology, a bundle is a set of one or more fixed areas corresponding to a given web page configuration. A rule is attached to each bundle to specify conditions when the bundle of fixed areas should or should not be used. As an example, consider a web page with a top menu (Fixed area 1) and a sidebar (Fixed area 2), similar to the one in Fig. 2, which do not move or change when the user is scrolling down. The top menu and the sidebar are two fixed areas that together comprise a bundle, each with a manually specified location on the full-page image, and an associated rule specifying that this bundle (and therefore all of its fixed areas) is always active.

Another possible situation is that the top menu and the sidebar disappear when the user has scrolled down more than 1000 pixels, but reappear if the user scrolls up again. In this situation, the bundle should be associated with a rule stating that the bundle is only active when the user has scrolled fewer than 1000 pixels, and inactive otherwise.

As a final example, consider a situation where, instead of completely disappearing, the fixed areas shrink when the user has scrolled down more than 1000 pixels. This implies that the web page has two different configurations, both with their own slightly different coordinates of fixed areas. Each one of these configurations leads to a different bundle: one bundle that specifies the first configuration (with its own fixed areas and their specified locations on the full-page image) and is associated with a rule making it active when the user has scrolled fewer than 1000 pixels and inactive otherwise, and another bundle that specifies the second configuration and is associated with a rule making it inactive when the user has scrolled fewer than 1000 pixels and active otherwise.

In practice, rules are custom R functions that return the logical value TRUE when the coupled bundle must be active, and the value FALSE when the bundle must be inactive. The necessary code for the creation of bundles and rules can be demanding for researchers with limited programming experience. To reduce the need for programming experience, the eyeScrollR package contains a Shiny gadget interface that outputs the necessary code. However, only a handful of the most common rules have been pre-specified in the gadget, and it may be necessary for some users to write their own code. More details about how the bundles of fixed areas and their related rules work can be found in Appendix A.

Step 6. Load data set and correct

This is the critical step in the method, in which all of the previous steps’ data are used to map screen coordinates from the eye tracker to full-page image coordinates. The mapping is performed by calling the eye_scroll_correct function. It requires a correctly formed and chronologically ordered data set, including at least the following columns with these exact names:

Timestamp: Timestamp for each row of data.
Data: Column including events, such as mouse scroll, key presses, or mouse clicks. The column must contain a string value when the user scrolls which consist of key:value pairs separated by semicolons as follows: X:{mouse X-coordinate}; Y:{mouse Y-coordinate}; MouseEvent:WM_MOUSEWHEEL; ScrollDelta:{positive or negative integer value indicating respectively a scroll up or down}. The mouse X- and Y-coordinates are necessary since usually, scroll events do not affect web page scrolling when the mouse cursor is placed outside the viewing area, for instance, if the user tries to scroll while the mouse is placed over the browser tabs. Generating a data column that contains scrolling input will be different for each data collection method. Some eye-tracking software will directly output this column using the right structure. In other cases, it may be necessary to concatenate or rewrite the relevant pieces of information from different columns or sources to fit this structure of key:value pairs separated by semi-colons, and/or join this column with the rest of the eye-tracking data by timestamps.^{Footnote 1}
Gaze.X and Gaze.Y: X- and Y-coordinates of gaze data (mandatory if there are no fixation data).
Fixation.X and Fixation.Y: X- and Y-coordinates of fixation data (mandatory if there are no gaze data). Each row must have timestamps that match the gaze points composing the fixations. Some eye-tracking software can directly output data in this form, while others will require the user to join data sets by timestamps or fixation ID.^{Footnote 2}

A short extract from a correctly formed data set including a mouse scroll event is shown in Table 1. The eye_scroll_correct function returns a data set that includes all original columns plus the following new ones:

Scroll: How many pixels have been scrolled up or down at this timestamp.
Timestamp.Shifted: The timestamps, shifted by the value of the time_shift argument.
Corrected.Gaze.X and Corrected.Gaze.Y: The gaze coordinates mapped to the full-page image (if gaze data were included in the data set).
Corrected.Fixation.X and Corrected.Fixation.Y: The fixation coordinates mapped to the full-page image (if fixation data was included in the data set).

Table 1 Extract from a correctly formed data set including both gazes and fixations

Full size table

When called, the eye_scroll_correct function starts by shifting the entire Timestamp column of the data set by the optional time_shift argument passed to the function. This is to ensure that the researcher can synchronize a data set to a desired timestamp. If the optional starting_scroll argument is passed to the function, it will also consider that this number of pixels had already been scrolled down before it starts iterating over the data set.

Every line of data before the timestamp_start and after the timestamp_stop arguments is removed from the gaze-mapped data set. All remaining X- and Y-coordinates are translated to match a coordinate system with an origin at the top left corner of the viewing area. If the outside_image_is_na argument is set to TRUE, every fixation or gaze that falls outside the viewing area will be set to NA in the resulting gaze-mapped data set.

The function then iterates over the data set line by line. During each iteration, the function first checks if a mouse wheel event has been recorded, and, if so, adds or subtracts the scroll_pixels value (obtained in the calibration step) to the total number of pixels scrolled from the top of the web page.

If the scroll_lag argument is passed to the function, changes in the y-coordinate are delayed (lagged) by the specified amount. The argument is optional but recommended to increase accuracy, as changes on the monitor never happen immediately when an input message is received. Since monitors have a finite refresh rate and computational resources are limited, a change cannot be visible immediately. The average input lag is hardware and software-dependent and can be difficult to measure accurately, but it can be approximated. If the temporal distance between an input and the next video frame is assumed to be uniformly distributed between 0ms and the monitor frame duration, then on average, the next frame will be displayed half of that time after any given input. Any frame after the very next one will then be shifted by entire frames duration. Hence the equation:

$$\begin{aligned} scroll\_lag = (n\_frame-1)\cdot \frac{1000}{refresh\_rate} + \frac{1}{2}\cdot \frac{1000}{refresh\_rate} \end{aligned}$$

With $n\_frame$ being a positive integer indicating the rank of the next monitor frame on which the change is assumed to happen on average, and $refresh\_rate$ the monitor refresh rate. As an example, assuming that inputs tend to be visible on the next 60-Hz monitor frame ($n\_frame=1$), scroll_lag should be set to 8.333ms, and this is the minimum recommended for 60 Hz monitors. Any number below 50 ms should in general be reasonable ($n\_frame=\{1, 2, 3\}$ for a 60 Hz monitor), but as a general rule, $n\_frame=2$ is recommended. The user can use the get_scroll_lag function to get a recommended argument as a function of $refresh\_rate$ and $n\_frame$.

The eye_scroll_correct function then checks individually for each rule in the rules argument if, within the current conditions, they are true or false. The function then checks for each rule that returns TRUE if the current data point is within one of the fixed areas in its associated bundle. If so, the data point is translated to its corresponding area in the full-page image. If it is not, then the data point Y coordinate is corrected by adding the current number of pixels scrolled from the top.

Once all lines in the data set have been read and if the output_file argument is not an empty string, the function writes the resulting data set in this file name in csv format.

Step 7. Output heatmap (optional)

eyeScrollR offers a heatmap generating function that produces a heatmap from the full-page image created in Step 4, and a single data set that has been mapped by the core function in Step 6. Inspecting the heatmap is useful for ensuring that the method has produced a valid mapping of the eye-tracking data to the full-page image.

A reproducible example

In this section, we describe a reproducible example that demonstrates how to use the eyeScrollR package in each of the seven steps of the workflow for a single participant, including commented code snippets. The web page used in this example can be found at https://larigaldie-n.github.io/eyeScrollR/test_page.html. Please keep in mind that simply copying and pasting our code may not produce the expected result, as several values may need to be adapted to a specific work environment. The example is complex enough to demonstrate all functionalities of the package, as the web page contains several fixed areas, some of which change in size when the user has scrolled down a certain amount. Please note that there is another example and a different version of this reproducible example, as well as complete documentation, accessible at https://larigaldie-n.github.io/eyeScrollR/.

Step 0. Install and load the package

To install and load eyeScrollR, first install and source the devtools package, then install the eyeScrollR package from the GitHub repository, and finally load the package: