Working memory (WM) is a limited-capacity system that keeps information temporarily accessible for ongoing cognition. One proposed mechanism to keep information active in WM is refreshing. In contrast to articulatory rehearsal, which is assumed to use the speech system to maintain verbal information, refreshing is assumed to be a domain-general maintenance mechanism that operates by bringing WM representations into the focus of central attention (e.g., Barrouillet & Camos, 2012; Cowan, 1995; Higgins & Johnson, 2009; Vergauwe & Cowan, 2014). Information in the focus of attention is assumed to be in a privileged state of heightened accessibility and, thus, the act of refreshing, or “thinking of”, is assumed to result in WM representations becoming highly accessible again. This, in turn, is proposed to protect the information from decay-based forgetting. Despite the increasing number of studies on refreshing, it is currently still unclear how refreshing operates to support the maintenance of a set of elements in WM.

One prominent hypothesis is that refreshing operates serially, with the focus of attention cycling from one item to the next, reactivating WM representations one after the other (e.g., Barrouillet & Camos, 2012; Cowan, 2011; McCabe, 2008; Nee & Jonides, 2013; Vergauwe, Camos, & Barrouillet, 2014). We tested this hypothesis in a recent study, but found no evidence for serial refreshing in verbal WM (Vergauwe et al., 2016). In particular, we used the probe–span task in which short series of red letters were presented for subsequent recall, while black probe letters were presented between these memory items, with each probe to be judged present in or absent from the list presented so far, as quickly as possible. The idea was to use response times to the probes to infer the status of the representations in WM and, in particular, to use the last-presented benefit to infer whether or not serial refreshing had occurred. The last-presented benefit is frequently observed in item-recognition studies, with response times (RTs) to the last-presented memory item being faster than to any other item of the list (e.g., Burrows & Okada, 1971; McElree & Dosher, 1989; Nee & Jonides, 2008), and is typically interpreted as reflecting the heightened accessibility of the last-presented memory item in the focus of attention (see Oberauer & Hein, 2012, for a recent review). However, we assumed that speeded responses do not need to be invariably tied to the last-presented item, and used the last-presented benefit to assess whether or not refreshing had occurred in the pre-probe delay. Specifically, we manipulated the delay between each studied item and the subsequent probe, and reasoned that, if the delay before the probe is short, refreshing would not yet have occurred and the last-presented memory item would still be in the focus of attention when the probe is presented. As a result, speeded responses to probes matching the last-presented memory item should be observed. If the delay is long, however, refreshing would have occurred and, thus, the last-presented memory item would have been replaced in the focus of attention by another memory item. As a result, we should no longer observe a last-presented benefit. Against our expectations, the duration of the pre-probe delay did not affect serial position curves: participants were the fastest to respond to the last-presented memory item at all probe delays, indicating that the last-presented item was still in the focus of attention when the probe was presented and, thus, that serial refreshing had not occurred. Importantly, these results were obtained in four experiments that created optimal conditions to detect the spontaneous operation of refreshing, using pre-probe delays that were similar to the time available for refreshing in studies providing evidence for refreshing (from 100 to 800 ms), using phonologically similar material to minimize the role of articulatory rehearsal, and using short presentation times to minimize refreshing during item presentation. The current study aims at understanding this unexpected pattern of results. In particular, we propose two alternative interpretations of the results of Vergauwe et al. (2016) and propose two experiments testing these alternative views.

According to the first alternative view, no evidence was found for serial refreshing by Vergauwe et al. (2016) because the memory lists were too short for participants to spontaneously use refreshing. Indeed, participants had to remember series of four letters in all experiments, and it has been proposed that people mainly use refreshing for longer lists (e.g., Doherty & Logie, 2016; Unsworth & Engle, 2007; Vergauwe et al., 2014). To examine this possibility, Experiment 1 used a probe–span task in which series of six memory items had to be remembered, each memory item being followed by a probe after 100, 400, or 800 ms. To anticipate, we replicated the observations of Vergauwe et al. (2016), with a clear last-presented benefit at all delays, indicating that no serial refreshing occurred. This led to us proposing a second alternative view.

According to the second alternative view, the last-presented benefit does not reflect the privileged status of the final item in the focus of attention. Instead, it results from perceptual overlap between the last-presented red letter and the black probe letter. In Vergauwe et al. (2016), and in the current Experiment 1, perceptual overlap exists at short and long delays, and it is possible that the last-presented benefit in our probe–span task never disappeared because of this perceptual overlap. To test this, the current Experiment 2 replicated Vergauwe et al. (2016)’s final experiment, with two modifications to disallow straightforward perceptual matching (see McElree & Dosher, 1989; Nee & Jonides, 2008, for similar manipulations): (1) red memory letters were presented in upper case, but probes were presented in lower case, and (2) each red memory letter was followed by a brief visual mask. The data for all experiments can be accessed through the Open Science Framework: https://osf.io/gdfhq/

Experiment 1

Method

Participants

Thirty-three undergraduate students at the University of Missouri-Columbia participated and received partial course credits or were paid US$15. They were native speakers of English and had normal or corrected-to-normal vision.

Materials and procedure

The probe–span task was administered using E-prime software (Psychology Software Tools). Participants were asked to watch carefully and memorize series of six red letters presented sequentially on screen (see Fig. 1a). All consonants, excluding Y, were used approximately equally often. No consonant was repeated within a series. Red letters were presented at the center of the screen in 48-point Courier New font, in upper case. Stimuli were presented on a standard CRT monitor, and participants sat at a comfortable distance from the screen.

Fig. 1
figure 1

Illustration of a trial within the probe–span task used in Experiment 1 (a) and in Experiment 2 (b). Series of red letters (six in Experiment 1; four in Experiment 2) were presented in upper case (and masked, in Experiment 2) for subsequent recall and black probe letters were presented (in upper case in Experiment 1; in lower case in Experiment 2) between the letters to be remembered, with each probe to be judged present in or absent from the list presented so far. At the end of the series, participants recall all red letters in order of appearance. The delay before the probe was manipulated (100, 400 or 800 ms)

Each series began by a fixation cross, centrally displayed on screen for 750 ms, followed by the first red letter. Red letters were presented for 1000 ms. At the end of each series, an empty rectangle appeared on screen, prompting the participant to recall these six red letters of the series in order of appearance by typing them on the keyboard. Participants were encouraged to fill in unknown letters with a guess. All entered letters appeared in the box in upper case, from left to right. Participants pressed Enter to end the recall response and initiated the next series by pressing a button on the button box after recall.

After each to-be-remembered red letter, one black letter (probe) was presented in upper case in the center of the screen in 24-point Courier New font. Participants were instructed to decide whether this black letter corresponded to one of the red letters they were to maintain on the current trial or not. This judgment was made by pressing the rightmost button of the button box when the black letter corresponded to one of the red letters in memory and pressing the leftmost button when the black letter did not correspond to one of the red letters in memory.

The pre-probe delay variable was manipulated within-subjects. Regardless of the delay condition, the interval between two red letters was kept constant at 2000 ms. Depending on the experimental condition defined by pre-probe delay, the delay between the offset of the red letter and the onset of the black letter was different (100, 400, or 800 ms). Pre-probe delay varied within a trial and, thus, probe onset was unpredictable. Black letters were always presented for 1000 ms. The remaining delay between the offset of the black letter and the onset of the next red letter differed as a function of the pre-probe delay (900, 600 or 200 ms, respectively).

The experiment consisted of 108 trials and, for each trial and each participant, black letters were sampled randomly from a pool of potential probes in such a way that the likelihood of receiving a positive probe was 50% at each probe position. Thus, each trial could have 0–6 positive probes. For each probe position, the pool of possible probes consisted of all the letters presented in the series so far plus a random new letter for that series. Across the entire experiment, and for each of the six probes, the black letter corresponded in half of the trials to one of the to-be-remembered red letters, and each red letter presented up to that point in the trial had equal chances of being used as a target-present probe. Importantly, in each of these six pools, every different probe type was associated equally often with each of the possible levels of the pre-probe delay.

Before the experimental trials, participants received instructions that included a visualization of a trial together with the experimenter. This was followed by five practice trials. Throughout the experiment, participants were asked to respond as fast as possible to the probes, without making errors, while maintaining the six red letters. They were not informed on the varying delays. Responses in the processing task were collected by button presses on a Serial Response box. Recall performance was scored by counting the number of letters that were correctly recalled with respect to serial order within each series (max = 6). Next, an average across all series was calculated per participant.

Results and discussion

Performance-based exclusions

We applied the same performance-based exclusions as in Vergauwe et al. (2016), to keep things consistent across studies. First, we discarded the data of participants whose recall score was, on average, less than 1.5 letters out of 6 (2 participants)Footnote 1. Next, to ascertain that participants paid sufficient attention to the probe task, we planned to exclude the data of participants who performed poorly. As in Vergauwe et al. (2016), poor performance was operationalized as a rate of correct responses below 55%. All participant reached this criterion. Finally, we verified participants’ precise compliance with the instructions in the probe task. Because it is important that participants consider all of the red letters when judging the probe, we calculated the rate of correct responses to “not-last” probes (i.e., target-present probes that show any-but-the-last-presented red letter of a series) and excluded the data of participants who scored below 55% on these not-last probes (3 participants). These exclusions resulted in a final sample of 28. These participants correctly recalled several memory items at the end of the series (M = 4.33, SD = 1.11) and had high accuracy on the probes (M = .85, SD = .12).

Last-presented benefit

Serial position curves for the RTs collected at probe positions 2, 3, 4, 5 and 6 (i.e., the probe letters following memory items 2, 3, 4, 5 and 6, respectively) are shown in Fig. 2. As previously observed by Vergauwe et al. (2016), RTs were affected by the serial position of the matching memory item and became faster over time. Importantly, however, and still in line with Vergauwe et al. (2016)’s observations, it can immediately be seen that there was no drastic change in the serial position curves over time. Instead, at all probe positions and for all pre-probe delay durations, the last-presented item was the fastest responded toFootnote 2.

Fig. 2
figure 2

Mean correct probe response RT in ms observed in Experiment 1, as a function of the serial position of the matching memory item (expressed as the lag between presentation and test; on the x axis) and probe position (Probe 2, Probe 3, and Probe 4, in the upper panels, and Probe 5 and Probe 6 in the lower panels). The delay following the probe appears as the graph parameter 100, 400 or 800 ms. Error bars show standard errors of the mean

We tested the serial refreshing hypothesis by examining the evidence for or against a last-presented benefit at each pre-probe delay duration. Therefore, for each pre-probe delay duration (i.e., 100, 400, or 800 ms), we compared RTs to target-present probes matching the last-presented memory item to RTs to target-present probes that did not match the last-presented memory item (i.e., that matched another red letter of the current series). For each pre-probe delay duration, and for each probe position, a separate one-sided Bayesian t test was run, testing whether the RTs to probes matching the last-presented memory item were faster than RTs to other target-present probes (i.e., a last-presented benefit in RT; see Vergauwe & Langerock, 2017). For 1 out of the 15 tests (5 probe positions × 3 pre-probe delay durations), there was one participant with missing data in one of the cells because only correct RTs were analysed. For this t test (400 ms- delay at probe position 6), the participant with missing data was omitted and we ran the analysis on the remaining participants. Table 1 presents the results of these analyses. If spontaneous refreshing occurs during the delay before the probe, the last-presented item is assumed to be replaced in the focus of attention by another memory item and, as a result, a last-presented benefit should no longer be observed. In sharp contrast to this prediction, the evidence for a last-presented benefit was extremely strong at all pre-probe delay durations, with Bayes factors ranging between 117.8 and 5.82 × 106 for faster RTs to the last-presented memory item, relative to other target-present probes. This indicates that the last-presented item was still in the focus of attention at all delays and, thus, that no serial refreshing has occurred, even though six items needed to be maintained. Alternatively, one could argue that the last-presented benefit results from perceptual overlap between the last-presented red letter and the black probe letter. Because perceptual overlap exists at short and long delays, this could explain why the last-presented benefit is observed at all delays in our probe–span task. This alternative account was tested in Experiment 2 by including a visual mask and making the probes lower case in order to disallow straightforward perceptual matching between the last-presented memory item and the probe.

Table 1 Evidence in the data for the last-presented benefit for each probe (Probe 2, 3, 4, 5 and 6) and for each pre-probe delay (100, 400 or 800 ms) in Experiment 1. Bayes factors are from paired, one-sided t tests testing the described benefit: faster responses for last-presented item, compared to other target-present probes

Experiment 2

Method

Participants

Forty-five undergraduate students at the University of Geneva participated and received partial course credits. All had normal or corrected-to-normal vision.

Materials and procedure

The probe–span task was the same as the one used in Experiment 4 reported by Vergauwe et al. (2016), except for the inclusion of a mask and the use of lower case probe letters in the current experiment (see Fig. 1b). As in Vergauwe et al. (2016), participants were asked to watch carefully and memorize series of four red letters presented sequentially on screen. A restricted pool of seven phonologically similar consonants was used as stimuli: B, C, D, G, P, T, and VFootnote 3, and each red letter was followed by a mask consisting of three superimposed black letters (A, I and O), presented in upper case 32-point Courier New font.

The trial procedure was very similar to the one used in the current Experiment 1, except for the following modifications. First, red letters were presented for 500 ms, followed by a 50-ms mask. Second, regardless of the delay condition, the interval between two red letters was kept constant at 2050 ms (50-ms mask, followed by 2000 ms-window during which the probe was presented). Depending on the experimental condition defined by the pre-probe delay, the delay between the offset of the mask and the onset of the black probe letter was different (100, 400 or 800 ms), and the remaining delay between the offset of the black letter and the onset of the next red letter also differed as a function of the pre-probe delay (900, 600 or 200 ms, respectively). Third, the experiment consisted of 144 trials and, as in Experiment 1, for each trial and each participant, black letters were sampled randomly from a pool of potential probes in such a way that the likelihood of receiving a positive probe was 50% at each probe position, and every different probe type was associated equally often with each of the possible levels of the pre-probe delay. And, fourth, the maximum recall score was 4.

Results and discussion

Performance-based exclusions

Following Vergauwe et al. (2016), we applied the same performance-based exclusions as in Experiment 1. The data of one participant was discarded because of poor recall performance (i.e., less than 1 letter out of 4). All participants reached the 55% criterion of correct responses in the probe task, but the data of 13 participants were discarded because they scored below 55% on the not-last probes.Footnote 4 These exclusions resulted in a final sample of 31. These participants correctly recalled several memory items at the end of the series (M = 3.51, SD = .40) and had high accuracy on the probes (M = .89, SD = .08).

Last-presented benefit

Serial position curves for the RTs collected at probe positions 2, 3, and 4 (i.e., the probe letters following memory items 2, 3, and 4, respectively) are shown in Fig. 3. As previously observed by Vergauwe et al. (2016), and in line with what we observed in Experiment 1, RTs were affected by the serial position of the matching memory item and became faster over time, but the serial position curves did not drastically change over time and, most importantly, the last-presented item was the fastest responded to, again at all probe positions and for all pre-probe delay durations.2

Fig. 3
figure 3

Mean correct probe response RT in ms observed in Experiment 2, as a function of the serial position of the matching memory item (expressed as the lag between presentation and test; on the x axis) and probe position (Probe 2, Probe 3, or Probe 4 in the left, middle and right panels, respectively). The delay following the probe appears as the graph parameter 100, 400 or 800 ms. Error bars show standard errors of the mean

We tested the serial refreshing hypothesis again by examining the evidence for or against a last-presented benefit at each pre-probe delay duration. For each pre-probe delay duration, and for each probe position, a separate one-sided Bayesian t test was run, testing whether the RTs to probes matching the last-presented memory item were faster than RTs to other target-present probes. All 31 participants could be included in all 9 t tests (3 probe positions × 3 pre-probe delay durations). Table 2 presents the results of these analyses. In line with the observations of Vergauwe et al. (2016), as well as with our observations in Experiment 1, the evidence for a last-presented benefit was again extremely strong at all pre-probe delay durations, with Bayes factors ranging between 823.8 and 2.55 × 108 for a last-presented RT benefit. Thus, despite the fact that straightforward perceptual matching between the last-presented letter and the probe letters was disallowed, faster responses were observed for the last-presented memory item, indicating that this item was still in the focus of attention at all delays and, thus, that no serial refreshing had occurred.

Table 2 Evidence in the data for the last-presented benefit for each probe (Probe 2, 3 and 4) and for each pre-probe delay (100, 400 or 800 ms) in Experiment 2. Bayes factors are from paired, one-sided t tests testing the described benefit: faster responses for last-presented item, compared to other target-present probes. The data of 31 participants were included in all tests reported in this table

General discussion

Based on the assumption that participants are faster to respond to the item that is currently in the focus of attention, the serial refreshing hypothesis predicts that the item that receives the fastest RT should change over time. In line with Vergauwe et al. (2016), the findings of the current Experiment 1 contrasted sharply with this prediction; RTs to the last item were the fastest at all delays, suggesting that the last-presented item was still in the focus of attention when the probe appeared and, thus, that no serial refreshing had spontaneously occurred in the probe–span task, even though six letters were to be maintained. Furthermore, despite disallowing perceptual matching in Experiment 2, very strong evidence was again found for a last-presented benefit at all delays. This indicates that the last-presented benefit is not simply reflecting perceptual overlap between the last-presented item and the probe, and lends support to the idea that the last-presented benefit can be used to track the content of the focus of attention.

Across the current experiments and those reported by Vergauwe et al. (2016), a coherent pattern of invariance emerges, with a clear last-presented benefit at all delays. Under the assumption that the last-presented benefit reflects the last-presented memory item being in the focus of attention, this pattern indicates that serial refreshing does not occur spontaneously in the probe–span task. Because the invariant pattern was obtained across a series of experiments that created optimal conditions to detect the spontaneous operation of refreshing, it is possible that refreshing might never occur spontaneously in the probe–span task. The current observation helps to define the boundary conditions of spontaneous refreshing of verbal material. In particular, the invariant pattern observed in the probe–span task can be contrasted with a recent study using an item-recognition task in which a single probe is presented after list presentation, rather than in between the memory items (Vergauwe & Langerock, 2017). In this study, a last-presented benefit was found after fast list presentation (350 ms/memory item), but this last-presented benefit disappeared when the list was presented more slowly (1000 ms/item). This indicates that serial refreshing occurred spontaneously during slow list presentation in the item-recognition task, resulting in the last-presented memory item being replaced in the focus of attention. The pattern is in contrast with the present probe–span findings using just as slow a presentation pace. Because there is evidence for spontaneous serial refreshing in the item-recognition task (Vergauwe & Langerock, 2017), but evidence against spontaneous serial refreshing in the probe–span task (current findings and findings of Vergauwe et al. 2016), it appears that serial refreshing of verbal material does not occur spontaneously in all task situations.

Further research will have to examine the boundary conditions of the spontaneous occurrence of serial refreshing in verbal WM in more detail. In particular, future work will have to examine which task characteristics differ between the item-recognition and the probe–span experiments in such a way that spontaneous occurrence of serial refreshing might be encouraged in the first but discouraged in the latter (e.g., the presence of probes in between memory items vs. at the end of list presentation, objective differences in time-based parameters, or subjective feelings of time pressure). If serial refreshing does not occur spontaneously in all situations, at least three different views can be proposed: (1) task characteristics determine whether people opt to use attention to consolidate the last-presented memory item or to refresh all items in WM, (2) task characteristics play a role in whether refreshing occurs serially or in parallelFootnote 5, or (3) task characteristics play a role in whether people opt to refresh all items in verbal WM or to use some other mnemonic strategy, or no specific strategy, during inter-item pauses.

To conclude, the current results are in contrast with the prominent hypothesis of serial refreshing in WM. Together with some of our previous work, they strongly suggest that serial refreshing does not occur spontaneously in all situations  requiring the maintenance of verbal material. Here, we ascertain that it fails to occur in a paradigm that was originally developed in search of a direct marker of refreshing. Uncovering the boundary conditions for spontaneous occurrence of serial refreshing will be important to understand in which circumstances a decay-and-refreshing account of limited WM capacity is viable.