Daily tasks often require attending to the visual world while simultaneously maintaining verbal information in memory. We might mentally rehearse a phone number while searching our surroundings for a pen or maintain a grocery list while scanning shelves at the store. These tasks require us to both move our eyes and engage working memory.

Working memory (WM) is thought to consist of independent, limited-capacity, visual and verbal subsystems (Baddeley, 1986; Baddeley & Hitch, 1974; Luck & Vogel, 1997; Smith, Jonides, & Koeppe, 1996). For this reason, verbal WM tasks are frequently used as control conditions for dual-task paradigms investigating visual WM and its related processes (e.g., Cronin & Irwin, 2018; Hollingworth, Richard, & Luck, 2008). Further, articulatory suppression is frequently used in conjunction with visual WM tasks to prevent participants from offloading visual load into verbal memory (e.g., Schmidt, Vogel, Woodman, & Luck, 2002). Using verbal WM tasks as a control relies on the assumption that verbal WM load will minimally interfere with visual tasks.

Contrary to this assumption, there is some evidence that verbal WM can interfere with attention control mechanisms like distractor suppression (de Fockert, 2013; Lavie, Hirst, de Fockert, & Viding, 2004) and attentional guidance (Soto & Humphreys, 2007, 2008). Given this cross-modal interference, an interesting and open question is whether a verbal WM load can change eye movements during scene-viewing. Saccadic eye movements are thought to be closely related to visual WM (e.g., Cronin & Irwin, 2018; Hollingworth et al., 2008), and the contents of visual WM can influence scene-viewing behavior (Bahle, Beck, & Hollingworth, 2018). However, no such relationship has been demonstrated between verbal WM load and scene-viewing behavior. If such a relationship exists, it would add to the evidence cautioning the use of verbal WM tasks and articulatory suppression as control conditions for visual WM tasks.

The present study

The present study examined the effects of WM load on eye movements during scene-viewing and the effects of eye movements during scene-viewing on WM performance. Participants maintained a verbal or visual WM load while free-viewing scenes. The same participants also viewed scenes without a memory load (see Fig. 1). A separate set of participants performed the visual and verbal WM tasks without free-viewing the scenes. Given the literature outlined above, we predicted the visual WM and scene-viewing tasks would interfere with each other under dual-task conditions, while the verbal WM task and scene-viewing task would not. In contrast with previous work, we instead found interference from both the visual and verbal WM tasks on eye movements during scene-viewing and no evidence that eye movements during scene-viewing interfered with either WM task.

Fig. 1
figure 1

Task procedures for the four conditions

Method

Participants

Fifty-eight experimentally naïve participants from the University of California, Davis, participated for course credit. Four participants were replaced due to poor eye-tracking (≥25% data loss) and six were replaced due to below-chance memory task performance. Forty-eight participants’ (31 female; mean age = 20.6 years, 10.1% mean data loss) data are included in the subsequent analyses.

A separate group of 59 participants recruited from the same pool participated in a memory-only condition. Eleven subjects were dropped due to below-chance memory task performance. Forty-eight (38 female, mean age = 20.9 years) participants’ data are included in the analyses.

Apparatus and stimuli

Dual-task and scene-viewing conditions. Participants were seated 85cm from a 24.5-in. LCD display. Head movements were limited by chin and forehead rests. Scenes were displayed at a resolution of 1,024 × 768 px and subtended 26.5° × 20°. Eye movements were monitored with a tower-mounted EyeLink 1000+ eye-tracker sampling the right eye at 1000 Hz (SR Research, Ottawa, Ontario, Canada). Experimental stimuli were presented using the SR Research Experiment Builder software (Version 2.1.512). Each participant underwent a 9-point calibration procedure before each block and continued on to the experimental trials once the eye-tracker’s average and maximum error were minimized to >0.5° and 0.99°, respectively.

Ninety-six real-world scenes were presented during the scene-viewing task. Scenes were originally collected from online image searches. Scenes were luminance-matched by converting each to LAB color space (0 = darkest, 100 = brightest), scaling the luminance channel of all scenes from 0 to 1, then adjusting them to the set’s average luminance (M = 0.46 L). Each scene was presented once over the course of the experiment, with 24 scenes appearing in a random order in each condition. Scenes appeared equally in each condition across participants.

Working memory control

Participants were seated 85cm from a 24.5-in. LCD display or a 21-in. CRT display. Head movements were limited by chin and forehead rests.

Procedure

Dual-task and scene-viewing conditions

There were four experimental tasks: visual WM, verbal WM, and visual and verbal control conditions (see Fig. 1). Participants completed 24 trials in each task. The four tasks were presented in separate blocks in a counterbalanced order across participants.

During the visual WM task, five colored squares were presented for 250 ms. Colors were chosen from 9 possible colors ([RGB: 255, 0, 0], [255, 255, 0], 255, 124, 27], [0, 255, 128], [0, 128, 0], [0, 165, 253], [0, 0, 255], [255, 0, 255], [128, 0, 128]). Squares subtended 0.4° × 0.4° and were presented on an invisible circle of radius 0.4° centered on the screen. After the memory array disappeared, the fixation cross remained for 500 ms until scene onset. The scene remained for 3,500 ms. Participants were instructed to free-view the scenes. After scene offset, five colored squares appeared and remained visible until response. On 50% of trials, one of the five squares’ color was changed to a color not used in the original array. Participants responded whether or not a color changed.

During the verbal WM task, seven consonants were sequentially presented at fixation. Each consonant subtended 0.29° and was presented for 300 ms. Participants were instructed to verbalize and rehearse the letters throughout the trial. After the last letter offset, a checkerboard mask was presented for 250 ms, then a fixation cross for 250 ms. Then, a scene appeared for 3,500 ms. After scene offset, a single letter was presented at central fixation. Participants responded whether or not that letter was one of the seven they held in memory.

The visual and verbal control tasks proceeded in the same way as the visual and verbal WM tasks, but participants were not asked to respond to the memory items.

Working memory control

A second group of participants completed the memory tasks as outlined above, without interleaved scenes. The delay interval consisted of a gray screen with a fixation cross at center. The task order was counterbalanced across participants. A previous study found these visual and verbal WM single tasks were matched in difficulty (Cronin & Irwin, 2018).

Data processing

Fixations and saccades were segmented with EyeLink’s standard algorithm using velocity and acceleration thresholds (30°/s and 9,500°/s2; SR Research, Ottawa, Ontario, Canada ). Eye-tracking data files were converted to ASCII format using the EDFConverter tool (SR Research, Ottawa, Ontario, Canada) and imported into MATLAB (The MathWorks, Inc., Natick, MA). Off-screen fixations, saccade amplitudes >27°, and the first fixation on the scene, always located at the center of the display due to the prescene fixation-marker, were eliminated from analysis. Statistical analyses were performed in R (R Core Team, 2019).

Results

Eye-movement measures

To determine whether visual and verbal WM loads interfere with participants’ scene-viewing behavior, we employed linear mixed-effect (LME) models, with memory load (load vs. no load) and load type (visual vs. verbal) as fixed effects and subject and scene as random effects. All p values reported below are adjusted to correct for family-wise error using the Holm–Bonferroni method. The reported effect sizes for LMEs were calculated according to Westfall, Kenny, and Judd (2014; Brysbaert & Stevens, 2018). LMEs were completed using the R package lme4 (Bates, Maechler, Bolker, & Walker, 2015). Means and standard deviations for each measure are reported in Table 1.

Table 1. Mean (SD) eye-movement measures across task conditions

First, we tested whether a WM load influenced fixation-level behavior. Participants with a memory load made fewer fixations than participants without a memory load, χ2(1) = 485.25, p < .001, d = 0.37. The main effect of load type on the number of fixations was also significant, χ2(1) = 31.71, p < .001, d = 0.33, as was the interaction between load and load type, χ2(1) = 53.41, p < .001, d = 0.37. Participants made fewer eye movements in the visual load condition than in the verbal load condition, but moved their eyes a similar number of times in the two no-load conditions.

There were also effects of load on fixation durations and saccade amplitudes, with participants making longer fixations, χ2(1) = 43.33, p < .001, d = 0.07, and shorter saccades, χ2(1) = 31.71, p < .001, d = 0.33, with a memory load compared with no load. The main effect of load type was not significant for either, fixation duration: χ2(1) = 0.82, p = 1.00, d = 0.13, BFJZS = 0.75 ± 1.84%; saccade amplitude: χ2(1) = 3.33, p = 1.00, d = 0.26, BFJZS = 0.17 ± 1.97%, but the interaction between load and load type was significant for both fixation duration, χ2(1) = 14.49, p = .002, d = 0.21, and saccade amplitude, χ2(1) = 69.97, p < .001, d = 0.42. Participants’ fixations were longer and their saccades were shorter with a visual load than with a verbal load.

Next, we assessed how a memory load influenced scene-viewing over the course of the entire 3.5-s viewing period. We examined participants’ scan path length, the percentage of the scene fixated (Castelhano, Mack, & Henderson, 2009), and the dispersion of fixations from the center of the screen (Anliker, 1976). The main effects of load and load type was significant for all three measures of fixation spread, as was the interaction between load and load type (see Table 2). Participants’ scan paths were shorter, they fixated a smaller percentage of the scene, and their fixations were less dispersed from center with a memory load compared with no memory load. As evidenced by the interactions, this effect was more pronounced in the visual load condition than in the verbal load condition (see Fig. 2).

Table 2. Summary of linear mixed-effects models for measures of fixation spread
Fig. 2
figure 2

Distribution of all participants’ fixations through all scenes in the four scene-viewing conditions. Points represent individual fixations

Working memory measures

Participants in the dual-task experiment performed well on both the visual (accuracy M = 71.18%, SD = 10.09) and verbal (M = 76.65%, SD = 8.92) memory tasks. The difference in performance was significant, t(94) = −2.813, p = .006, d = 0.57, suggesting the verbal task was easier than the visual task.

To assess whether participants’ WM performance was influenced by the concurrent scene-viewing task, a separate group of 48 participants completed just the WM tasks. Their performance on the visual (M = 72.66%, SD = 8.72) and verbal (M = 76.04%, SD = 9.86) tasks were statistically similar to dual-task performance, visual: t(94) = 0.767, p = .445, d = 0.16, BFJZS = 3.59; verbal: t(94) = 0.316, p = .752, d = 0.06, BFJZS = 4.46). Whereas the memory tasks influenced eye movements in the scene-viewing task, the scene-viewing task did not significantly affect memory task performance.

General discussion

In this study, participants viewed scenes while maintaining a WM load or with no WM load. The WM load was visual (remember five colors) or verbal (remember seven letters). We saw profound differences in how overt attention moved through scenes when participants maintained a memory load regardless of load modality. Visual and verbal WM loads influenced when participants moved their eyes (fixation durations) and where they moved them (saccade amplitudes, scan paths, percentage fixated, dispersion; see Fig. 2). We found no differences between dual-task and single-task performance on the WM tasks.

Effects of working memory load on eye movements

Previous work has shown visual WM and eye movement control are closely linked (e.g., Bahle et al., 2018; Cronin & Irwin, 2018; Irwin, 1992a, 1992b; Hollingworth & Luck, 2009; Hollingworth et al., 2008; Tas, Luck, & Hollingworth, 2016). Given this literature, it is unsurprising that a visual WM load influenced eye movements during scene-viewing in the present study. However, it is surprising that a verbal memory load also interfered with eye movements during scene-viewing. Our results caution the use of verbal tasks as a control for studies of attention and visual WM: the verbal task may affect the baseline patterns of eye movement behavior despite relying on a different subsystem of WM.

There are many possible reasons why the verbal WM task interfered with overt attention in our study. Recently, the long-held assumption of independent verbal and visual WM subsystems has been challenged. Studies have found interference between concurrent visual and verbal WM tasks (Allen, Baddeley, & Hitch, 2006; Bae & Luck, 2018; Hardman, Vergauwe, & Ricker, 2017; Makovski, Shim, & Jiang, 2006; Morey & Beiler, 2013; Morey & Cowan, 2004; Ricker, Cowan, & Morey, 2010; Saults & Cowan, 2007; Vergauwe, Barrouillet, & Camos, 2010) and evidence that verbal WM contents can influence visual attention (Soto & Humphreys, 2007, 2008). In the perceptual load literature, a verbal WM load has been shown to interfere with distractor suppression, another function of attention (de Fockert, 2013; Lavie et al., 2004). In line with these literatures, our WM tasks may have taxed both their modality-specific WM subsystem and a more general resource related to overt attention control, such as the central executive (Baddeley & Hitch, 1974).

Because our verbal stimuli were presented visually, it is possible our verbal WM task taxed visual WM to some extent, leading to the interference between the verbal WM task and overt attention. There is evidence that visual information is filtered through visual WM even while it remains on screen (Tsubomi, Fukuda, Watanabe, & Vogel, 2013). We took many precautions to discourage persisting visual representations of our letter stimuli: letters were presented sequentially in the same spatial location and the last letter was followed by a mask. Participants were also instructed to verbalize and rehearse the letter stimuli throughout the trial. While it remains possible that participants used visual WM to support their verbal WM task performance, the verbal WM task we employed in this study is similar to others commonly used in the literature (e.g., Cronin & Irwin, 2018; Lavie & de Fockert, 2005). Even articulatory suppression tasks frequently use visually presented stimuli (e.g., Luck & Vogel, 1997; Peterson, Decker, & Naveh-Benjamin, 2019; Schmidt et al., 2002). Our results suggest that these verbal WM tasks may interfere with performance on visual tasks.

Finally, participants may have been less engaged in the free-viewing task when they had to simultaneously maintain a WM load. When attention is focused on internal information (i.e., a WM load), attention to external information suffers (Buetti & Lleras, 2016; Kiyonaga & Egner, 2013). Further, when engagement is high in a given task, changes to or onsets of task-unrelated stimuli go unnoticed (e.g., Fougnie & Marois, 2007; Mack & Rock, 1998; Most et al., 2001; Neisser, 1979; Neisser & Becklen, 1975; Simons & Chabris, 1999). Here, participants may have been relatively insensitive to the scene stimuli while maintaining a WM load. The similar WM performance under dual- and single-task conditions fits with this interpretation, as does the difficulty level of our two WM tasks: The verbal task was slightly easier than the visual task, and impeded eye movements less. However, task difficulty does not necessarily equate to task-engagement (Buetti & Lleras, 2016).

Regardless of the mechanism behind the verbal WM task’s unexpected interference with overt attention, our results stand: A commonly used verbal WM task interfered with overt attention. Future researchers should take this possibility into account when designing their studies. If a verbal WM task or articulatory suppression will be used, it should be used across all conditions to ensure the effect of the verbal WM load can be accounted for.

Effects of eye movements on working memory

Aside from the effect of verbal WM load on overt attention, it is also interesting and surprising that visual WM task performance was equivalent under dual-task and single-task conditions. Recent evidence suggests that the target of an eye movement is automatically placed in visual WM, overwriting items already in WM (Cronin & Irwin, 2018; Hollingworth et al., 2008; Tas et al., 2016). Menneer et al. (2019) similarly found visual WM impairment when participants were simultaneously engaged in a search task. The present results contrast with these findings. Participants maintaining a visual WM load during scene-viewing made around seven eye movements during the 3.5-s delay period. However, their WM task performance was comparable to participants who made no eye movements in the single-task condition. Therefore, we find no evidence that participants’ eye movements under dual-task conditions impaired their WM performance.

Conclusions

Our results suggest visual and verbal WM loads impede normal scene-viewing behavior, influencing both the timing and spread of eye movements through a scene. This finding cautions the use of verbal WM tasks and articulatory suppression as an effective control condition for visual WM tasks. Researchers should account for this effect of verbal WM loads when designing experiments. Our results also suggest eye movements may not always tax visual WM, a stark contrast to the standing assumptions about the relationship between visual WM and overt attention. Future work should explore the conditions under which eye movements engage visual WM and the conditions under which they do not.