Working memory (WM) has received considerable attention in psychology because its capacity is a major determinant of achievement in cognitive tasks and is related to measures of general fluid intelligence (see Conway, Jarrold, Kane, Miyake, & Towse, 2007). Recently, researchers have become increasingly interested in the role of WM in long-term episodic memory (e.g., Loaiza & McCabe, 2012; McCabe, 2008; Rose & Craik, 2012; Rose, Myerson, Roediger, & Hale, 2010; Unsworth & Engle, 2007). The aim of the present study was to examine the impact of an essential factor of WM performance, namely cognitive load, on episodic memory.

More specifically, we examined the long-term retention of information studied in a WM span task through a subsequent delayed test. A few studies have examined this issue (Loaiza & McCabe, 2012, 2013; McCabe, 2008). McCabe (2008) showed that recall performance in a delayed test is greater for items that have been memorized during complex rather than simple span tasks. This advantage on a delayed test for items studied in complex span task was accounted for by the covert retrieval model. According to this model, participants maintain items in complex span tasks by covertly retrieving them between processing episodes. These covert retrievals provide retrieval cues for delayed recall that are not created in simple span tasks (McCabe, 2008). Loaiza and McCabe (2012) further tested that the number of covert retrieval opportunities, one opportunity occurring after each processing episode, would predict recall performance in a delayed test. In accordance with this hypothesis, items that benefited from more covert retrieval opportunities in complex span and adapted Brown–Peterson tasks were subsequently recalled better in a delayed test.

From these results, Loaiza and McCabe (2012) suggested that refreshing is “the mechanism that underlies the delayed recall effect predicted by the covert retrieval model” (p. 192), because it encourages the association of items with their temporal context, which is used as a retrieval cue in a delayed recall test. In a follow-up study, they varied in two distinct experiments the opportunities for rehearsal and refreshing in young and old adults (Loaiza & McCabe, 2013). To vary the opportunity for rehearsal in an operation span task, the segments of arithmetic problems were presented on screen either simultaneously or sequentially, the latter of which required continuous reading and more strongly prohibited rehearsal than did the former (Hudjetz & Oberauer, 2007). Varying the number of opportunities for refreshing was achieved by comparing two complex span tasks that presented either one or two operations interspersed between the presentations of two memoranda. The two-operation span task should allow more refreshing opportunities than the one-operation task, because one covert retrieval could occur after each operation. Whereas variation in rehearsal did not induce any difference in the delayed test, delayed-recall performance did increase with the number of refreshing opportunities (Loaiza & McCabe, 2013). The authors concluded that refreshing is important for maintaining information in both working and episodic memory.

However, it is known that the efficiency of refreshing on WM maintenance depends on the cognitive load of the secondary task—that is, the proportion of time during which attention is distracted from maintenance activities. Increasing cognitive load reduces the availability of refreshing and results in lower WM performance (Barrouillet & Camos, 2012, 2015). As a consequence, if delayed recall of items studied in a complex span task relies on refreshing, it should also depend on the cognitive load of the secondary task. It should be noted that in the three series of experiments reported above (Loaiza & McCabe, 2012, 2013; McCabe, 2008), the cognitive load of the secondary tasks was never varied. For example, in the final study above, the pace of presentation of one or two operations between memory items was the same in the two tasks—that is, one operation every 5,250 ms. Thus, we tested the hypothesis that delayed recall, like immediate recall, depends on the cognitive load of the secondary task, in two experiments.

In both experiments, participants performed a complex span task in which they maintained series of five words while performing a concurrent task. Contrary to Loaiza and McCabe’s (2013) study, we kept the number of refreshing opportunities constant, while the cognitive load varied. The cognitive load of the concurrent task was varied in two ways, by manipulating either the nature of the task (a simple reaction time task vs. a parity judgment task, Exp. 1) or its pace (slow vs. fast, Exp. 2). Both manipulations of cognitive load are known to affect refreshing efficiency. Barrouillet, Bernardin, Portrat, Vergauwe, and Camos (2007) have shown that choice reaction time (CRT) tasks, like parity judgment, have a detrimental effect on immediate recall commensurate to their duration, whereas simple reaction time (SRT) tasks that do not solicit attention for a sizable portion of time have no measurable impact on WM span. Moreover, in CRT tasks, a fast pace reduces the time during which attention is available to refresh memory traces, leading to reduced immediate-recall performance. Thus, if working and episodic memory depend on refreshing, any increase in cognitive load should impede both immediate- and delayed-recall performance. Moreover, this effect on delayed-recall performance should depend on the reliance on attentional refreshing as a maintenance mechanism. Conversely, impeding another maintenance mechanism for verbal information, namely subvocal rehearsal, should have no effect on delayed recall. We should nevertheless replicate the well-known detrimental effect in immediate recall when rehearsal is impeded by a concurrent articulation. In this regard, it should be noted that Loaiza and McCabe (2013) reported that rehearsal did not affect delayed recall. However, the two conditions that they contrasted both involved concurrent articulation, the amount of which varied. Thus, rehearsal was always impeded, at least in part, in both conditions. In the present study, we introduced a stronger experimental manipulation, contrasting conditions with or without concurrent articulation. Indeed, in both experiments, our participants performed the concurrent task either silently (by keypress) or aloud.

Experiment 1

In this experiment, we varied orthogonally the availabilities of refreshing and rehearsal, by introducing two different concurrent tasks (SRT vs. parity judgement) and eliciting two different types of responses (silent vs. aloud), respectively. If episodic memory, like working memory, depends on refreshing, both immediate- and delayed-recall performance should be reduced by the parity judgment task relative to the SRT task. Conversely, the type of responses should interact with the recall test. In particular, responses aloud should impede immediate- but not delayed-recall performance.

Method

Participants

A group of 24 students (14 females, 10 males; mean age = 21 years, 11 months (21;11), SD = 2;8) at the University of Bristol received partial course credit or £10 for participating.

Material and procedure

Each participant was presented with four complex span tasks, defined by a factorial design crossing two types of responses (key vs. oral) and two concurrent tasks (SRT vs. parity judgment). The complex span tasks were presented in distinct experimental blocks randomly ordered for each participant. In each block, two training trials were presented, followed by eight trials of five words each. The words were randomly chosen without replacement from a set of 40 one-syllable nouns. Four different sets of words were created, and across participants, all sets were associated with all complex span tasks.

The four complex span tasks had similar structures. Each trial began with a fixation asterisk displayed for 1,500 ms centered on the screen, followed by a 500-ms blank screen and the first memory word. The words were displayed for 1,000 ms, followed by a delay of 500 ms, before a series of six digits or dots, with as many even as odd digits being displayed in each series. For the parity judgment task, each digit was displayed for 700 ms, followed by a delay of 300 ms. The SRT task was created by replacing each digit of the parity judgment task by a black dot appearing in the center of the screen. At the end of the series of digits or dots, the next word appeared after a 500-ms delay.

Participants were asked to read aloud and remember the words. Depending on the response condition, they had to judge the parity for each digit either by pressing one of two keys on the keyboard (right key for “even” and left key for “odd”) or by saying aloud “even” or “odd.” For the SRT conditions, participants had to either press a key or say a word aloud as soon as the dot appeared. To make the responses similar in the SRT and parity judgment tasks, participants had to alternate between the two keys or to say “odd” and “even” in alternation. Keyed and oral responses were recorded by the computer and the experimenter, respectively. When the word “Recall” appeared on screen, participants had to recall aloud the series of words in their order of presentation.

At the end of each block, participants had to count backward for 1 min by threes from a given three-digit number. This distracting task was highly attention-demanding and was performed aloud to reduce the opportunity to use refreshing or rehearsal. The experimenter verified accuracy. After this distracting task, participants were invited to recall in any order the 40 words presented in the previous eight trials by filling in an 8 ×5 matrix.

Results and discussion

The mean percentages of correct responses for the two concurrent activities were high: in the parity judgment task, 93 % (SD = 3 %) for keyed responses, 99 % (2 %) for oral responses, and greater than 99 % in the SRT task. Performance for the counting-backward task was also good, with an error rate below 2 %, which did not vary across conditions, Fs < 1.

A 2 (test: immediate vs. delayed) ×2 (response: silent vs. aloud) ×2 (task: SRT vs. parity judgment) analysis of variance (ANOVA) was performed on the percentages of correctly recalled items, regardless of position. Following McCabe (2008; Loaiza & McCabe, 2012, 2013), and to allow for comparison between the immediate and delayed tests, analyses were performed on free recall scores. Unsurprisingly, performance in delayed recall (40 %) was reduced as compared to the immediate test (86 %), F(1, 23) = 307.19, p < .0001, η p 2 = .93. Relative to the silent condition (65 %), concurrent articulation reduced recall performance (61 %), F(1, 23) = 5.83, p = .02, η p 2 = .20. Also, recall performance was poorer with the more-demanding parity-judgment task (57 %) than with the SRT task (69 %), F(1, 23) = 56.14, p < .001, η p 2 = .71 (see Fig. 1).

Fig. 1
figure 1

Mean percentages of recall in immediate- and delayed-recall tests, according to the types of responses and the concurrent tasks (simple reaction time [SRT] vs. parity judgment [CRT]), for Experiment 1. The error bars refer to standard errors

Unexpectedly, the type of response interacted with the task, F(1, 23) = 8.45, p < .01, η p 2 = .27, with the effect of concurrent articulation being reduced in the SRT as compared to the parity judgment task. More interestingly for our purpose, and as we predicted, the nature of the concurrent task did not interact with the recall tests, F(1, 23) = 2.67, p = .12, η p 2 = .10. The parity judgment task reduced recall in both the immediate (82 %) and delayed (33 %) tests as compared with the SRT task (91 % and 47 %, respectively), F(1, 23) = 46.69, p < .0001, and F(1, 23) = 28.49, p < .0001, respectively (Fig. 1). However, as we also predicted, the concurrent articulation induced by oral responses affected the recall tests differently, F(1, 23) = 49.29, p < .0001, η p 2 = .68. Concurrent articulation (80 %) reduced recall relative to silent responses (93 %) in the immediate test, F(1, 23) = 52.23, p < .0001, but slightly increased delayed recall (43 % vs. 37 %, respectively), F(1, 23) = 5.42, p = .03. Finally, the three-way interaction was nonsignificant, F < 1.

To summarize, increasing the cognitive load by introducing a more attention-demanding task reduced recall in both immediate and delayed tests. By contrast, inducing concurrent articulation diminished memory performance only in immediate, and not in delayed, recall tests. These findings are in agreement with the assumption that maintenance in both working and episodic memory relies on refreshing. Indeed, introducing a more attention-demanding task results in a reduction of the availability of attention for maintenance purposes through refreshing. However, the two contrasted secondary tasks differed in other respects (e.g., the stimuli to be processed) than solely their attentional demands. Thus, to verify that our findings were related to variation in cognitive load and not to other factors, we performed a second experiment in which participants performed the same attention-demanding concurrent task, in which cognitive load was varied through the pace of this secondary task.

Experiment 2

In Experiment 2, the same parity judgment task was introduced as the concurrent task in a complex span paradigm, and the pace at which the digits were presented was varied to manipulate cognitive load. The availability of rehearsal was still orthogonally manipulated by eliciting different types of responses (keypress vs. oral). Our predictions were identical to those in Experiment 1.

Method

Participants

A group of 27 students (23 females, four males; mean age = 21; 3 years, SD = 3; 4) at the University of Bristol received partial course credit or £10 for participating. None of them had participated to Experiment 1.

Materials and procedure

The materials and procedure were similar to those of Experiment 1, except that the four complex span tasks were defined by the factorial design crossing two types of responses (keypress vs. oral) with two paces of the parity judgment task (slow vs. fast). Each digit was displayed for either 600 ms (fast pace) or 1,125 ms (slow pace) and was followed by a delay of either 200 or 375 ms, for a total of 800 and 1,500 ms, respectively.

Results and discussion

The data of three participants were discarded because they achieved less than 80 % correct responses on the parity judgment task with a fast pace and keyed responses, whereas the mean percentage of correct responses was 90 % or more in each span task (on average, 95 %, SD = 3 %). Participants did well on counting backward, with less than 2 % errors. Although the amounts of numbers produced in 1 min did not differ after a slow- or a fast-paced task, p = .11, participants were able to produce slightly more numbers after a block with oral than one with keyed responses (30 vs. 28), F(1, 23) = 8.20, p < .01, η p 2 = .61. The pace and type of responses did not interact, F < 1.

A 2 (test: immediate vs. delayed) ×2 (response: silent vs. aloud) ×2 (pace: slow vs. fast) ANOVA was performed on the percentages of correct free recall. As in Experiment 1, immediate recall (80 %) was better than delayed recall (29 %), F(1, 23) = 338.11, p < .0001, η p 2 = .94. The main effects of concurrent articulation and pace were both significant, with reduced recall under oral (51 %) versus silent (59 %) responses, F(1, 23) = 16.11, p < .001, η p 2 = .41, and with a fast (51 %) versus a slow (59 %) pace, F(1, 23) = 25.34, p < .0001, η p 2 = .53 (Fig. 2). More interestingly, we replicated the findings of Experiment 1. The effect of concurrent articulation differed across recall tests, F(1, 23) = 32.33, p < .001, η p 2 = .58, whereas the pace effect did not interact with tests, F(1, 23) = 1.28, p = .27, η p 2 = .05. The concurrent articulation reduced recall on an immediate test (72 % vs. 89 % for oral and silent responses), F(1, 23) = 38.30, p < .001, but not on a delayed test (30 % vs. 29 %, respectively), F < 1.Footnote 1 On the contrary, a fast pace reduced recall in both immediate (76 % vs. 85 % for fast and slow pace), F(1, 23) = 30.17, p < .001, and delayed (26 % vs. 33 %, respectively) tests, F(1, 23) = 7.19, p = .01. Finally, the effect of concurrent articulation did not interact with pace, F < 1, and the three-way interaction was nonsignificant, p > .25.

Fig. 2
figure 2

Mean percentages of recall in immediate- and delayed-recall tests, according to the types of responses and the paces of the concurrent tasks, for Experiment 2. The error bars refer to standard errors

To summarize, these findings perfectly replicated the results of Experiment 1, even when we introduced another manipulation of the cognitive load. Increasing the cognitive load through the pace of the concurrent task impacted both immediate and delayed tests, whereas impeding rehearsal reduced only immediate-test performance.

General discussion

If refreshing is involved in both working and episodic memory, the long-term retention of items studied in complex span tasks should be reduced by increased cognitive load of the secondary task, since numerous studies have shown that the efficiency of refreshing depends on this cognitive load (Barrouillet & Camos, 2012, 2015). In the present study, we tested this hypothesis through two manipulations of cognitive load, varying either the nature of the secondary task or its pace. Moreover, to assure that the observed effect was specific to attentional refreshing and not to any mechanism of maintenance, we orthogonally varied the availability of subvocal rehearsal. Despite the difference in manipulating cognitive load, the two experiments led to the same pattern of findings: Whereas impeding rehearsal reduced recall performance only in an immediate test, the manipulations that affected refreshing reduced performance in both immediate- and delayed-recall tests. This study completed and extended two streams of research, one examining the impact of activities during WM tasks on long-term retention, and the other evidencing the dissociation of rehearsal and refreshing as maintenance mechanisms for verbal information.

The present study has provided the first evidence that long-term retention of items depends on the cognitive load of the concurrent task during their maintenance in WM. This impact of cognitive load on delayed recall contrasts with the absence of an effect of rehearsal on this recall. Similarly, Loaiza and McCabe (2013) failed to observe any effect of rehearsal on delayed recall. However, they compared two conditions that both involved concurrent articulation, with only a subtle variation in its pace. With highly contrasted conditions, we established the fact that verbal rehearsal of items in WM does not favor any long-term retention of these items. These findings echo studies in the ’70s that had contrasted Type I and Type II rehearsals (Glenberg, Smith, & Green, 1977). Type I rehearsal is described as rote repetition, and has only transitory and no long-term effect (Woodward, Bjork, & Jongeward, 1973). On the contrary, the added time for Type II rehearsal, which is assumed to involve a deeper processing of information, improves long-term retention (Craik & Watkins, 1973). Our results are nicely in line with these earlier findings.

Nevertheless, how can we understand that the two main mechanisms to maintain verbal information in WM—rehearsal and refreshing—have such distinct impacts on long-term retention? In his first study, McCabe (2008) suggested that any attempts of covert retrieval during maintenance favor long-term retention, whatever the process used (i.e., subvocal rehearsal, simple mental search, or refreshing). This proposal rests on the finding that delayed recall for the memoranda in complex span trials was equally good, whether or not participants were warned that a delayed test was forthcoming. McCabe argued that anticipating the delayed test should have motivated participants to pay attention to the meanings of the words to be remembered instead of subvocally rehearsing them. The present findings, as well as those of Loaiza and McCabe (2013), go against this argument. It is not sufficient that items be actively maintained in WM to favor their delayed recall; maintenance instead requires the engagement of refreshing. What is refreshing doing to help delayed recall? Within a unitary view of memory, in which WM is the activated part of long-term memory, Loaiza and McCabe (2012) suggested that refreshing during WM maintenance would allow for temporal–contextual bindings, with the temporal–contextual cues being used as retrieval cues in a delayed recall test. Conversely, in a dual view of memory, in which the existence of mental representations is restricted to WM, the reconstruction of representations in WM through refreshing would leave instances in long-term memory (Barrouillet & Camos, 2015; Logan, 1988). More reconstructions would result in more instances and a greater probability of correct recall. However, it is premature to favor either of these two theoretical positions. Further studies will be needed to clarify the role of WM in long-term retention, and they should shed light on the debate about the unitary-versus-dual view of memory.

Finally, the present findings also provide further evidence on the dissociation between rehearsal and refreshing. Previous studies have already shown that these two mechanisms are independent, rely on distinct brain networks, can be used adaptively, and are affected by different constraints (Camos, Lagner, & Barrouillet, 2009; Camos, Mora, & Barrouillet, 2013; Camos, Mora, & Oberauer, 2011; Hudjetz & Oberauer, 2007; Mora & Camos, 2013; Raye, Johnson, Mitchell, Greene, & Johnson, 2007). Then, it could be added that they also differ on their impacts on long-term retention.

To conclude, the present study has shown evidence, for the first time, of the impact of cognitive load on episodic memory. Although verbal information can be maintained in WM by either rehearsal or refreshing, its long-term retention requires the involvement of refreshing and depends on the cognitive load of the concurrent task during WM maintenance. Our findings also bring further support for the dissociation between rehearsal and refreshing.