Cognitive psychology defines working memory (WM) as a system devoted to the temporary storage and processing of information in goal-directed activities (Baddeley, 1986). It is usually assumed that the antagonistic functions of maintaining and transforming information require some mechanism to shield temporary memory traces against loss and interference. Several theories assume that this aim is achieved through maintenance mechanisms that strengthen memory traces. Baddeley (1986) has described an articulatory rehearsal mechanism that would recursively reactivate verbal memories within a phonological loop, while Logie (1995) has suggested the existence of an inner scribe that would fulfill the same function for maintenance of visuospatial information within the visuospatial sketchpad. Subsequently, Cowan (1992) suggested the existence of a refreshing mechanism through attentional focusing and covert retrieval assumed to counteract the decay of memory traces. Interestingly, Raye, Johnson, Mitchell, Greene, and Johnson (2007) identified the neural substrates of this mechanism as being distinct from the neural areas involved in verbal rehearsal. These different mechanisms are assumed to reactivate and strengthen WM traces in the face of decay and interference resulting from concurrent processing. However, a recent model of WM, SOB-CS, has rejected the idea of any refreshing or rehearsal mechanism and assumed that maintenance of information in WM would be achieved by actively removing distractors instead of strengthening memory traces (Oberauer, Lewandowsky, Farrell, Jarrold, & Greaves, 2012). The aim of this article was to assess the plausibility of this model by evaluating its capacity to account for the effects of a concurrent articulationFootnote 1 during the maintenance of information in WM.

The disruptive effect of processing on maintenance and the need for some counteracting mechanism required to shield memory traces is well illustrated by a phenomenon known as the pace effect. In WM complex span tasks, a series of items is presented for further recall, each of them being followed by a phase of processing activity (e.g., reading words or solving simple operations). It has been abundantly demonstrated that increasing the pace of this processing activity, either by increasing the amount of information to be processed in a fixed period of time or by reducing the time allowed to process a fixed amount of information, induces more forgetting and results in poorer recall performance (see Barrouillet, Bernardin, & Camos, 2004; Barrouillet, Portrat, & Camos, 2011; Barrouillet & Camos, 2012, for a review). We provided the first account for this effect within the time-based resource-sharing model (TBRS, Barrouillet et al., 2004; Barrouillet & Camos, 2015). This model assumes that a central bottleneck constrains storage and processing activities to take place one at a time, WM traces suffering from temporal decay and interference when the central bottleneck is occupied by processing activities. However, attentional refreshing could counteract this damaging effect by reactivating memory traces when attention is available during pauses that can be freed between processing episodes. As a consequence, recall performance depends on the balance between the duration of processing episodes during which memory traces decay and the duration of free pauses during which they can be reactivated. We called cognitive load the proportion of time during which concurrent processing occupies attention, with high cognitive load resulting in lower recall, hence the pace effect. For verbal memoranda, the TBRS assumes that articulatory rehearsal can be used as an auxiliary mechanism of maintenance that works jointly with attentional refreshing to maintain memory traces. Both attentional and articulatory demands of concurrent processing in complex span tasks have independent and additive detrimental effects on recall performance (Camos, Lagner, & Barrouillet, 2009; Camos, Mora, & Oberauer, 2011; Camos, Mora, & Barrouillet, 2013; Mora & Camos, 2013, 2015). Thus, increasing the pace of a concurrent task involving attentional or articulatory demands, or both, impedes more and more the use of the maintenance mechanisms that could counteract decay and interference, resulting in poorer recall.

However, SOB-CS proposes a totally different account of the pace effect. This model assumes that there is no temporal decay of memory traces, concurrent processing interfering with memory through involuntary encoding of distractors. Free time in-between distractors would be used to remove irrelevant representations, thereby reducing interference. Slower pace provides longer and more frequent periods of free time, rendering distractor removal more efficient. It is worth noting that SOB-CS assumes no maintenance processes (either verbal rehearsal or attentional refreshing) for the strengthening of memory items. Because memory traces are assumed not to suffer from decay, there would be no need for reactivation mechanisms. This is not to say that the existence of verbal rehearsal is denied, but it would be a mere epiphenomenon without any causal role in maintenance (Lewandowsky & Oberauer, 2012). In support of this thesis, Oberauer et al. (2012) emphasize that benchmark findings such as the pace effect can be successfully modeled by SOB-CS without any recourse to maintenance mechanisms. The aim of the present study was to test the SOB-CS claim that verbal rehearsal does not play any role in the active maintenance of WM traces.

In the following experiment, participants were presented with letters for further recall, each letter being followed by one-, three-, or six-syllable ba evenly spaced on time during a 4-s interval, participants being asked to utter the syllable aloud at each of its appearances. The TBRS and the SOB-CS models make different predictions regarding the effect on recall performance of the number of repetitions of the syllable. Let us begin with the TBRS model. As we saw above, according to this model, the effect of the concurrent activity depends on the extent to which it prevents maintenance mechanisms to take place. Varying the number of ba should have very little effect on attentional refreshing. Indeed, articulating this syllable consists of a simple reaction time (SRT) task in which each occurrence of a signal triggers the production of a unique response that does not involve any selection. This type of task does not seem to sufficiently occupy central processes to disrupt attentional refreshing, as Barrouillet, Bernardin, Portrat, Vergauwe, and Camos (2007) demonstrated. In this latter study, we compared the effect on recall of an SRT task with a complex reaction time (CRT) task. The CRT task involved the successive presentation of squares that appeared either in the upper or the lower part of the screen, participants being asked to judge this location by pressing appropriate keys for the up and down responses. We hypothesized that response selection would occupy the central bottleneck and disrupt attentional refreshing. Accordingly, we observed that WM spans decreased as the number of squares to be processed increased in a fixed temporal interval of 6,750 ms. In the SRT task, participants were presented with the same stimuli during the same interval but had only to press the space bar each time a square appeared, whatever its location. As we predicted, varying the number of squares from 5 to 11 in this condition had no effect on concurrent maintenance and recall performance, even when the squares appeared at an irregular and unpredictable rate. In many respects, the ba task resembles the SRT task used in Barrouillet et al. (2007), with no response selection involved, each target requiring the same and a unique response. Its attentional demand is likely to be even lower because the syllables appear at a constant pace whereas Barrouillet et al. (2007) presented the squares at an irregular and unpredictable pace. This is why the TBRS predicts no strong effect of the number of ba on the availability of attentional refreshing. By contrast, the articulatory demand of the task should impede verbal rehearsal, this disruptive effect increasing with the number of syllables to be articulated. Because both verbal rehearsal and attentional refreshing contribute to the maintenance of verbal memory traces (Camos et al., 2009), the TBRS model predicts lower WM spans with more uttered syllables in a fixed time interval.

Let us now consider what SOB-CS predicts. Recall performance in SOB-CS depends on the interplay between two factors: the amount of interference created by distractors and the time available for the distractor-removal process to clean up WM for interfering material. Concerning the amount of interference, because there is no functional verbal rehearsal in SOB-CS, there should be no effect of concurrent articulation beyond the interference created by processing ba. Following Farrell and Lewandowsky (2002) SOB model, SOB-CS is a distributed model in which items are represented by vectors of features associated with a positional marker and superimposed into a common weight matrix. The model assumes that encoding is energy gated, encoding strength depending on novelty or dissimilarity with the current content of WM. Novel items involve a large encoding weight, resulting in strong interference, whereas repeated items receive a negligible weight and involve no further forgetting. Because repeated distractors do not involve further forgetting, only the first ba presented after each letter would involve sizeable interference, and presenting either one, three, or six times ba should not vary the amount of interference created by the articulation task.

Considering now the second factor, the removal of distractors in SOB-CS is described as an active process that competes with other attention-demanding processes through the occupation of an attentional bottleneck (Oberauer et al., 2012). Thus, the effect induced by varying the number of ba on the time available for distractor removal should depend on the attentional demand of each utterance of the syllable. If each utterance brings about a sizeable occupation of the attentional bottleneck, increasing the number of to-be-articulated syllables from one to six would reduce the time available for removal, resulting in lower recall performance. The longer the occupation of the attentional bottleneck by each utterance, the stronger this effect. However, Barrouillet et al. (2007) results showed that varying the number of distractors in the SRT square task did not have any effect on recall performance. This indicates that, within the SOB-CS framework, the attentional demand resulting from variations in the number of squares was not sufficient to affect the removal process. Now, the successive presentations of the syllable ba involve even less attentional capture than the squares presented in Barrouillet et al. (2007), which appeared at an unpredictable rhythm in two different locations, whereas the syllables ba were displayed at a constant pace on the same location on screen. Thus, it can be assumed that varying the number of syllables does not involve sufficient variation in attentional demand to have a differential impact on the removal process. As a consequence, under the reasonable hypothesis that the mere repeated utterance of the same syllable ba involves a negligible attentional capture, SOB-CS should not predict any effect of the number of appearances of ba to be articulated after each letter to be remembered. We tested in the following experiment the TBRS prediction that increasing the number of utterances should affect verbal WM span. The results were compared with the output of a simulation of SOB-CS in order to determine the assumptions under which SOB-CS could account for them.

Experiment 1

Methods

Participants

Twenty-four undergraduate students (mean age = 21.63 years, SD = 3.83, 18 females) at the University of Geneva received partial course credit or were paid 20 CHF to participate.

Materials and procedure

In a complex span task, a series of two to eight consonants were presented in ascending length, each consonant being followed by a 4-s interval filled with one, three, or six syllables ba to be uttered. All the consonants were used except W, which is tri-syllabic in French. Each trial started with an indication of the presentation rhythm of the syllable, with “slow,” “medium,” and “fast” corresponding to one, three, and six ba, respectively. Then, participants were familiarized with the forthcoming rhythm of utterance by two successive 4-s intervals with the corresponding number of syllables during which they performed the articulation task. Following this warm up, the experimental trial began. Each letter was presented for 1,000 ms and followed by a blank screen for 500 ms and the 4-s interval during which were presented either one, three, or six ba centered on the screen for 330 ms. In each condition, the first syllable was displayed on the screen at the beginning of the 4-s interval (i.e., 500 ms after the consonant), with the following syllables steadily spread over the interval (i.e., inter-onset-intervals for the syllables ba were 1,333 ms and 667 ms for the three- and six-syllable conditions, respectively). While letters were presented in black, the syllable ba was presented in blue, except for the last of the series that was presented in red, thereby indicating the end of the articulation task and the upcoming appearance of a memory item. This color change was introduced to avoid participants continuing to repeat “ba” while the next letter appeared on screen, something that could have hindered its encoding. Participants were asked to read aloud the consonants and the syllables. The experimenter was present throughout the task to ensure compliance. At the end of the series, the word rappel (recall) appeared on the screen and participants had to orally recall the letters in strict serial order and were not allowed to go back and correct an item. There were two trials per experimental condition for each list length, resulting in six trials per length. Six series of consonants were created for each length and were randomly assigned by the computer to the three experimental conditions for each participant. The task ended when the participant failed to correctly recall all the trials of a given length. In each experimental condition, each series perfectly recalled in correct order added 0.5 to a basic score of 1, series of one letter being omitted. The resulting total corresponded to the span score in each condition, with a maximum score of 8.

Results and discussion

We performed a repeated-measures analysis of variance (ANOVA) on the individual span scores with the number of syllables to be uttered (one, three, or six) as a within-subject factor. This analysis revealed a significant effect of the number of syllables, with more syllables uttered resulting in a lower span (mean spans of 5.75, 5.23, and 5.06 for one, three, and six syllables, respectively), F (2, 46) = 5.96, p = .005, MSe = 0.52, η p 2 = .21, with a linear trend accounting for 92 % of the experimental variance, F (1, 23) = 9.88, p = .005, MSe = 0.52, the difference between the one- and six-syllable conditions remaining significant under the more conservative Tukey HSD test (p = .005). These results clearly indicated that, as the TBRS model predicts, increasing the rate of the concurrent articulation of a single syllable has a disruptive effect on concurrent maintenance.

These results were compared with the predictions of the SOB-CS model. For this purpose, we used the simulation of SOB-CS (identified as Simulation 1 in Oberauer’s websiteFootnote 2) intended to reproduce the effects of cognitive load and number of operations (i.e., of distractors to be processed) in a complex span task with letters as memory items and the syllable ba as distractor, which is coded by Oberauer as a word in Simulation 1. The number of operations was set to either one, three, or six, and the duration of each operation (i.e., the hypothesized duration of the attentional capture of each utterance) was varied from 0 to 500 ms by increments of 50 ms. The resulting free times (the time available for removal after each distractor) were calculated by subtracting the duration of each operation from the quotient of the duration of the interletter interval divided by the number of operations (e.g., with a duration of operation of 100 ms, the free times are 3,900 ms, 1,233 ms, and 567 ms for one, three, and six syllables, respectively). The distractor similarity was set to its maximum (1) within and between bursts. We simulated an experiment involving 1,000 subjects. The outputs of this simulation are shown in Fig. 1.

Fig. 1
figure 1

Observed working memory span performance as a function of the number of repetitions of the syllable ba in Experiment 1 and results of the SOB-CS Simulation 1 for different durations of occupation of the attentional bottleneck by each repetition from 0 ms to 500 ms

Concentrating first on the main effect of duration of the operation while collapsing across number of repetitions, rather surprisingly, Simulation 1 predicts that mean recall performance increases with the duration of the operations up to 150 ms and then decreases with longer durations. This first dramatic increase (from an overall mean span of 2.30 to 5.30) is totally at odds with the basic tenets of the SOB-CS theory. Indeed, all other things being equal, increasing the duration of attentional capture cannot result in better recall performance, because this increase reduces the opportunities of removal of an unchanged amount of interference. Moreover, when considering the effect of the number of distractors, the dramatic decrease in span with more distractors produced by Simulation 1 when the duration of attentional capture is set to 0 (3.52, 1.89, and 1.50 for one, three, and six syllables, respectively) does not fit with the verbally stated assumptions of SOB-CS. The absence of attentional capture leaves the same amount of time available to remove distractors in the three conditions (i.e., the entire interletter interval) while the amount of interference remains unchanged due to the repetition of an unchanged distractor. Thus, contrary to the results of the simulation, the theory predicts a high and unchanged level of recall performance. One possible explanation for these undesirable outputs of the simulation might be that because the model assumes that encoding of the distractor is not fully achieved before 150 ms, the distractor-removal process blindly removes the last item strongly encoded (i.e. the memory item) instead of the distractors. Another surprising output of Simulation 1 is that operation durations of 50 ms and 100 ms result in trends that are difficult to interpret in which a first decline in recall performance from one to three distractors is followed by an increase with six distractors (Fig. 1). These problems can probably be easily fixed for obtaining either an absence or a small effect on recall of the number of syllables that would progressively increase with operation duration to reach the level produced with 150 ms. Indeed, it is only when this duration exceeds 150 ms that the traditional cognitive load effect is successfully simulated with a progressive decline of memory performance as processing distractors becomes more and more attentionally demanding. The best fit with Experiment 1 results is obtained for a duration of operation of 250 ms, which reproduces the reduction of 12 % in span observed in the behavioral data between the one- and the six-syllable conditions.

In summary, SOB-CS can account for the findings of this experiment, but only by assuming that the syllable occupies the attentional bottleneck for about 250 ms at each of its utterances. According to Oberauer et al. (2012) who based their estimates from Jolicoeur and Dell’Acqua (1998) and Vogel, Woodman, and Luck (2006), the average time for encoding an item into WM is about 150–300 ms. Thus, SOB-CS can account for the data under the hypothesis that each repetition of an unchanging syllable involves the same attentional demand as encoding a new item in WM. This hypothesis lacks plausibility when considering that it is usually assumed that no encoding is required for a simple-immediate response task (e.g., Newell, 1990), and that it has been established that after a first stage of retrieval of an articulatory program, the repetitive execution of this program is more and more automatic (Naveh-Benjamin & Jonides, 1984).

Nonetheless, an additional test of the two competing theories is in order before reaching a firm conclusion. If the SOB-CS model is correct and if the effect of the number of distractors observed in Experiment 1 results from the cognitive load induced by the repetition of the syllable ba, then this effect should also be observed when combining the ba repetition task with visuospatial memoranda. Of course, SOB-CS predicts that verbal distractors generate less interference with spatial than with verbal memoranda. This is because the degree of interference does not only depend on the novelty of distractors but also on the similarity between distractors and memory items. When distractors and items emanate from different domains, their representations are encoded in only partially overlapping sets of units, resulting in reduced interference compared with associations of distractors and memory items that share the same set of units (Oberauer et al., 2012). However, though reduced, SOB-CS predicts an effect of the number of syllables on the maintenance and recall of visuospatial memoranda. Unfortunately, the available SOB-CS simulations do not allow for the maintenance of visuospatial memoranda, but only the combination of verbal memoranda and visuospatial distractors. Nonetheless, the test of such a combination would be informative in providing information about the size of the effect that can be expected in SOB-CS when distractors and memory items pertain to different domains and overlap only partially. For this purpose, we ran a simulation allowing for variations of the degree of overlap in cross-domain interference between verbal memoranda (letters) and visuospatial distractors (Simulation 6 in Oberauer’s website). Of course, a pending question is the degree of overlap between visuospatial and verbal representations. In a recent publication, Oberauer and Lewandowsky (2014) set such an overlap to 50 % between letters to be memorized and squares displayed on the screen for location judgment. Thus, for the sake of comparison, we simulated the same complex span task as previously used with letters as memory items and 1, 3, or 6 visuospatial distractors with a duration of operation of 250 ms (the value needed to simulate the effect observed in Exp. 1) while varying the degree of overlap (50 %, 25 %, and 0 %). It can be seen in Fig. 2 that, while a 50 % overlap results in an effect on spans that does not strongly differ from that observed with words as distractors, an overlap reduced to 25 % still produces a sizeable effect with a decline of 7 % in spans. Of course, with no overlap at all (0 %), the effect of repetition disappears because there is no interference to remove and consequently no effect of the factors preventing distractor removal. Nonetheless, assuming the absence of interference between verbal and visuospatial representations would make SOB-CS unable to account for the well documented cross-domain effects between distractors and memory items (e.g., Barrouillet et al., 2007; Barrouillet, Portrat, Vergauwe, Diependaele, & Camos, 2011; Camos et al., 2009; Camos et al., 2011; Lilienthal, Hale, & Myerson, 2014; Oberauer & Lewandowsky, 2014; Vergauwe, Barrouillet, & Camos, 2010; Vergauwe, Dewaele, Langerock, & Barrouillet, 2012). Thus, some overlap must be assumed between verbal and visuospatial stimuli in SOB-CS. Under the hypotheses of distractors occupying the attentional bottleneck for 250 ms and some overlap between verbal and visuospatial representations, the simulation confirms that SOB-CS predicts a reduced but still sizeable effect on WM spans when memory items and distractors do not pertain to the same domain. As a consequence, SOB-CS predicts that increasing the number of these repetitions in a fixed delay should have a detrimental effect on the maintenance and recall of visuospatial information.

Fig. 2
figure 2

Results of the SOB-CS Simulation 6 for different degrees of overlap (0 %, 25 %, and 50 %) between memory items and distractors pertaining to different domains with the recall of the results of Simulation 1 in which items and distractors pertain to the verbal domain (duration of attentional capture: 250 ms)

By contrast, the TBRS model predicts no effect of the mere repeated articulation of a syllable on the maintenance of visuospatial information, because this repetition induces a negligible cognitive load and only affects the availability of the articulatory rehearsal mechanism that is not involved in visuospatial maintenance. It should be noted that Baddeley’s multicomponent model predicts the same pattern of results (Baddeley, 1986; Baddeley & Logie, 1999). These competing hypotheses were tested in a second experiment. However, instead of using a complex span task, we opted for a Brown Peterson paradigm. Indeed, it could be argued that the changing color of the last ba of the series in Experiment 1 could have induced a higher level of novelty in the three- and six-syllable conditions compared with the one-syllable condition resulting in a higher level of interference and more forgetting.

Thus, Experiment 2 used a Brown-Peterson paradigm in which all the memoranda were presented before the concurrent articulation, thus avoiding any possible overlap between processing and encoding phases. Series of either four to eight consonants or one to five spatial locations were presented for further recall, followed by a 12-s interval during which one, six, or 12 syllables ba were presented for articulation without any color change. It must be noted that the SOB-CS model has not been extended to the Brown-Peterson paradigm. However, it can be assumed that any extension of this model would have to be consistent with its core architectural principles. Thus, there is a strong expectation that the predictions of the SOB-CS model for the complex span task would extend to the Brown-Peterson task. Moreover, it has been shown that the effects related to variations in cognitive load are observed in Brown Peterson as in complex span tasks (Liefooghe, Barrouillet, Vandierendonck, & Camos, 2008; Vergauwe, Langerock, & Barrouillet, 2014). The TBRS model predicts that increasing the number of repetitions of the syllable ba should have a detrimental effect on verbal recall performance while leaving visuospatial recall unaffected, whereas SOB-CS model predicts a detrimental effect on both types of memoranda, with a smaller effect on visuospatial memoranda.

Experiment 2

Methods

Participants

Fifty-five undergraduate students (Mean age = 22.53 years, SD = 6.21, 48 females) at the University of Geneva received partial course credit or were paid 20 CHF to participate. None of them took part in the first experiment. Twenty-seven participants were assigned to the verbal condition, and the others to the visuospatial condition.

Materials and procedure

As far as the verbal condition is concerned, series of four to eight consonants were presented in ascending order with two series per length for each of the three conditions of the articulation task. In each trial, the letters were successively displayed on screen for 750 ms followed by a 250-ms interval. The blank interval after the last letter was followed by a 12-s interval beginning with the presentation of a syllable ba. As in the previous experiment, each syllable was displayed on screen for 330 ms and then disappeared. Following this first syllable, the other distractors in the six- and 12-syllable conditions were steadily spread over the retention interval. After this interval, the word rappel (recall) appeared and participants were asked to recall the letters in strict serial order by typing them on the keyboard while using the “enter” key to validate their response. As in Experiment 1, each trial was preceded by an indication of the pace of the forthcoming articulation task and participants had a training phase of 12 s to get themselves in the requested rhythm. As in the first experiment, the experimenter was present throughout the task to ensure compliance. The stop rule and scoring method were the same as in Experiment 1.

The visuospatial condition followed exactly the same design except that memoranda were series of one to five spatial locations consisting of squares successively lighting up in gray among 16 possible locations indicated by 16 empty squares randomly distributed on the screen to avoid verbal coding of their position. After the 12-s interval, when the word rappel (recall) appeared, participants were asked to use the mouse to click successively on each location. Each click on a square turned it gray until the next square was clicked.

Results and discussion

We performed the same ANOVA as in Experiment 1 on the individual span scores for verbal and visuospatial memoranda. As in Experiment 1, increasing the number of repetitions of the syllable ba had a detrimental effect on verbal maintenance with spans progressively decreasing (mean spans of 5.44, 5.22, and 4.96 for one, six, and 12 syllables, respectively), F (2, 52) = 6.18, p = .004, MSe = 0.25, η p 2 = .19, with a linear trend virtually accounting for all the experimental variance (99.7 %), F (1, 26) = 10.67, p = .003, MSe = 0.29. The difference between the one- and 12-syllable conditions was significant under the more conservative Tukey HSD test (p = .01). As far as visuospatial maintenance was concerned, in line with the TBRS, but contrary to SOB-CS, increasing the number of syllables had no effect at all on recall performance (mean spans of 3.82, 3.88, and 4.02 for one, six, and 12 syllables, respectively), F (2, 54) = 1.02, p = .367, MSe = 0.28, η p 2 = .04. This resulted in a significant interaction between the number of repetitions and the nature of the memoranda, F (2, 106) = 5.95, p = .004, MSe = 0.27, η p 2 = .10 (Fig. 3).

Fig. 3
figure 3

Mean spans as a function of the number of repetitions of the syllable ba for verbal and visuospatial working memory in Experiment 2

These results replicated with another paradigm the detrimental effect of the repetition of the same syllable on verbal WM spans, while establishing that this repetition has no effect on visuospatial WM. This latter finding strongly suggests that the effect observed on verbal WM spans in Experiments 1 and 2 is not due to a cognitive load effect induced by the repetition task. If this was the case, the increase in cognitive load resulting from the increase in the number of repetitions of the syllable ba would have affected visuospatial recall. Indeed, there is ample empirical evidence that attention-demanding verbal tasks disrupt visuospatial maintenance (e.g., Lilienthal et al., 2014), and that this effect is commensurate with the cognitive load of these tasks (Vergauwe, et al., 2010, 2012).

General discussion

In two experiments, we demonstrated that increasing the number of utterances of the same syllable during a fixed period of time had a detrimental effect on recall performance when verbal information has to be maintained. This effect contrasts with both Barrouillet et al. (2007) who did not observe any effect on verbal maintenance when increasing the number of distractors in an SRT task, and the present study in which increasing the number of repetitions of the same syllable had no effect on visuospatial maintenance. However, it is worth noting that the present syllable task involved concurrent articulation whereas the square task in Barrouillet et al. (2007) was silent.

In summary, the maintenance of verbal items remains unaffected by a silent SRT task (Barrouillet et al., 2007), but is disrupted by the repeated articulation of a syllable (Experiments 1 and 2), this articulation having no effect on the maintenance of visuospatial information (Experiment 2). Thus, the present results suggest that the effect we observed in verbal WM is attributable to the requirement of articulating the syllable, with a higher rate of articulation resulting in more forgetting, and that this effect is specific to verbal maintenance. These findings are in line with the TBRS model, which assumes that, along with a general attention-based mechanism that can be used to refresh both verbal and visuospatial information, there is an articulatory rehearsal specifically devoted to the maintenance of verbal items. Consequently, a task impeding articulatory rehearsal while involving a negligible attentional demand such as the repetition of a syllable should disrupt verbal but not visuospatial maintenance.

We have seen that SOB-CS can simulate this pattern of results under two strong assumptions. Accounting for the effect of articulation on verbal maintenance requires the assumption that each utterance of ba occupies the attentional bottleneck for 250 ms. Under this assumption, the absence of effect on visuospatial maintenance can be explained by hypothesizing a total absence of overlap between visual and verbal representations, which excludes any interference between the two domains. As we argued above, it is highly improbable that the simple utterance of a syllable occupies attention for 250 ms and even less plausible that this attentional demand remains at this level through hundreds of repetitions as was the case in our experiments (e.g., 280 utterances for a mean span of five in Exp. 1). Moreover, as we noted above, assuming that there is no overlap between verbal and visuospatial representations would make SOB-CS unable to account for the well established fact that variations in the attentional demand of verbal tasks disrupt visuospatial maintenance (e.g., Lilienthal et al., 2014; Vergauwe et al., 2010). Thus, even if SOB-CS can find ways to account for our findings, the post hoc assumptions that have to be made severely limit the plausibility and significance of the model.

Finally, it could be argued that the effects we observed are small and negligible, and concluded that the contribution of verbal rehearsal to memory performance is ancillary at best. First of all, it can be noted that the effects we observed in the maintenance of verbal material correspond to Cohen’s d values of .63 and .64 for Experiments 1 and 2, respectively, far from the values usually considered as reflecting small effects (i.e., about .20). Moreover, the effects on memory performance that can be expected from the introduction of a concurrent task like the syllable task we used should not be overestimated. Our results concerning visuospatial maintenance demonstrated that the syllable task involves at best a negligible cognitive load, leaving attention available for the maintenance of memory items through attentional refreshing (Barrouillet et al., 2004, 2011). Vergauwe, Camos, and Barrouillet (2014) have recently demonstrated that people are able in a Brown-Peterson paradigm to maintain up to four letters for further recall under articulatory suppression, a value very close to the four chunks that can be maintained in the focus of attention according to Cowan (2001). We concluded from this finding that at least four letters can be attentionally maintained at the central level of WM without any recourse to verbal rehearsal. Moreover, repeating the same syllable is known to involve smaller effects than articulating more complex material (Macken & Jones, 1995) as we used in Vergauwe, Camos, and Barrouillet (2014), where participants were asked to repeat “ba…bi…boo.” Consequently, varying the number of utterances in a syllable task that leaves attention free for maintenance purposes could not be expected to decrease WM spans below 4. Thus, decreases in span from 5.75 to 5.06 as we observed in Experiment 1 or from 5.44 to 4.96 as in Experiment 2 are far from negligible.

Overall, the hypothesis of a maintenance mechanism based on verbal rehearsal of memory traces that is prevented from taking place by concurrent articulation still provides the simplest and most compelling account of our results. Faster articulation rates would increasingly impair verbal rehearsal, resulting in more forgetting and poorer verbal recall, while leaving visuospatial memory unaffected. Despite computational investigations such as SOB-CS (Lewandowsky & Oberauer, 2015) and recent claims from developmental inquiries (Jarrrold & Citroën, 2013), it seems difficult to produce a plausible and coherent model of WM that does not attribute to verbal rehearsal a causal role in verbal WM.