Cognitive flexibility – the rapid change of currently relevant object representations or task-sets (Logan & Gordon, 2001) – is a hallmark of human cognition. To ensure such flexibility, inhibition of no-longer relevant representations has been proposed to complement the activation of currently relevant ones. Inhibition is assumed to reduce the activation of recently used representations, thereby enabling more efficient disengagement from irrelevant information and selection of newly relevant ones. The aim of this study was to investigate the putative use of inhibition to ensure the successful selection and implementation of currently relevant memory sets in working memory (WM).

In his WM model, Oberauer (2009, 2010) proposed a distinction between a declarative and a procedural WM sub-system. Whereas the declarative sub-system makes representations of the contents of processing available (i.e., the currently relevant objects, events, or symbols), the procedural part holds the representations that control ongoing processing (i.e., the currently relevant task set). Even though the two systems are largely independent (Gade, Druey, Souza, & Oberauer, 2014), the selection of representations within each sub-system is accomplished by analogous mechanisms. As a consequence, comparable behavioral effects are observed for the two sub-systems. For example, when participants are asked to switch between memory lists on a trial-by-trial basis, there are costs to keep more than one list available (list-mixing costs) and to switch between lists (list-switching costs), as is the case when participants have to switch between task sets (Souza, Oberauer, Gade, & Druey, 2012). In another study (Oberauer, Souza, Druey, & Gade, 2013), we further extended the analogy to the mechanisms of selecting individual elements within memory sets or task sets: After selecting and executing a response within a task set, that response is temporarily inhibited, making it harder to subsequently select the same response in the context of another task set (Druey, 2014). Analogously, after an item has been selected from a memory set, it is inhibited, temporarily rendering it harder to select the same item in the context of another memory set.

In the present study, we investigate whether we can extend the analogy to yet another effect often reported in studies of procedural WM, namely n-2 task-repetition costs (Mayr & Keele, 2000; see Gade, Schuch, Druey, & Koch, 2014 for a review). N-2 task-repetition costs are observed when people switch between three tasks (A, B, and C): Responses are slower and more error-prone when switching back to a task set abandoned two trials before (i.e., in sequences of the type ABA when compared to task sequences of type CBA). These costs are commonly taken as evidence for inhibition of competing task-sets to ensure successful switching among tasks (Mayr & Keele, 2000). If the analogy of mechanisms holds for declarative and procedural WM, we expect n-2 list-repetition costs: When people are switching between different memory lists, access to a memory list abandoned two trials before (ABA trials) should be slower and more error prone than access to a not recently used memory list (CBA trials).

N-2 task-repetition costs have been investigated extensively to map the boundary conditions for observing this effect. Most of the recent findings support the notion that the task inhibition reflected in n-2 repetition costs serves to reduce competition (and therefore conflict) between concurrently active task-sets (Gade, Druey, et al. 2014; Sexton & Cooper, 2015). Such competition can arise on various levels, for example in cue or stimulus processing (Altmann, 2007; Gade & Koch, 2008; Costa & Friedrich, 2012; Sdoia & Ferlazzo, 2008) or response selection (Koch, Gade, & Philipp, 2004). Accordingly, manipulation of these parameters influences the size of n-2 task-repetition costs. For instance, Gade and Koch (2005) manipulated residual activation of a recently abandoned task set and observed that less activation led to smaller n-2 repetition costs. Likewise, abolishing the need for response selection by introducing no-go trials (Schuch & Koch, 2003) or reducing the overlap of response-sets across the different tasks (Gade & Koch, 2007) both reduced the n-2 task-repetition costs to a non-significant level. In a related vein, task competition (and hence n-2 task-repetition costs) can also be reduced by facilitating task selection with spatial cues (Arbuthnott, 2005; Arbuthnott & Woodward, 2002) or by increasing cue-target overlap (Houghton, Pritchard, & Grange, 2009). Taken together, these findings have led to the suggestion that inhibition of abandoned task-sets is used when the degree of competition between tasks is high.

Our main aim is to establish further analogous processing principles in declarative and procedural WM. Hence, in a first step, we designed several paradigms, and manipulations within them, to test for n-2 list-repetition costs. Based on the literature reviewed above, we expected n-2 list-repetition costs to be observed more readily in situations that increase competition between memory lists in WM. In eight experiments we asked participants to switch between three different lists across trials. On each trial they had to access one element from a randomly selected memory list in order to perform speeded classification (Experiments 1–4), mental arithmetic (Experiments 5 and 8), or local recognition (Experiments 6–8). Moreover, across experiments, we manipulated list composition to vary the degree of list overlap (Experiments 15), or to draw on lists well established in long-term memory (LTM; Experiments 6 and 7) in contrast to recently learned, unfamiliar lists (Experiments 7 and 8).

To foreshadow our results, we observed n-2 list-repetition costs, but these costs were observed consistently only in conditions supposedly yielding a high degree of competition between lists in declarative WM.

In a second step, we assessed whether n-2 list-repetition costs could be explained without recourse to inhibition, based on episodic retrieval of the previously used (but currently irrelevant) item within the repeated list. This episodic-retrieval account could explain most of the n-2 list-repetition costs, leaving little variance for inhibition to account for. To assess whether this finding was specific to declarative WM, or whether it also applies to procedural WM, we reanalyzed data of eight task-switching experiments to gauge the contribution of episodic retrieval to n-2 task-repetition costs. We found that episodic retrieval also significantly contributes to n-2 task-repetition costs.

Experiments 1–4

In this first series of experiments we searched for n-2 list-repetition costs when switching among stimulus classification tasks. In addition, we investigated the role of list overlap for observing n-2 list-repetition costs.

General method

Participants

Across the first four experiments, a total of 115 students from the University of Zurich took part. Twenty-four students took part in Experiment 1, but the data of four participants were excluded due to low accuracy (<85 %; final sample n = 20; 18 women, two men; mean age 25.8 years). Twenty students participated in Experiment 2 (15 women, five men; mean age 24.0 years), 39 in Experiment 3 (29 women, ten men; mean age 25.7 years), and 32 in Experiment 4 (25 women, seven men; mean age 22.1 years). Experiments 13 consisted of one 1-h session, and Experiment 4 of two 1-h sessions.

For all studies reported in this paper, participation was compensated with course credit or 15 CHF per hour. Participants signed an informed consent form prior to the experiment, and were debriefed at the end.

Stimuli and tasks

Stimuli were digits from 1 to 9, excluding 5. For all experiments, participants learned three lists of three digits each in the beginning of the experimental session, and these lists remained constant throughout the experiment. Each list was displayed across a row of three boxes. The lists differed by the color of the box-frames (yellow, green, and blue in Experiment 1; red, green, and blue in Experiments 2, 3, and 4).

The experiments differed regarding list composition (see Fig. 1). In Experiment 1, list items were drawn without replacement from a pool of eight digits plus one (randomly selected) repeated digit. Lists were created with the constraint that the repeated digit appeared in different positions across two lists. Given that only one element was repeated across the lists, this condition is a low-overlap condition. In Experiment 2, six different digits were selected to create the three memory lists such that each list shared exactly one digit (in different positions) with every other list (yielding partially overlapping lists).Footnote 1 In Experiment 3, all three lists consisted of the same three digits presented in different positions across lists (high overlap lists). Finally, in Experiment 4, participants completed two sessions: one session with partially overlapping lists as in Experiment 2, and one session with fully overlapping lists as in Experiment 3 (session order was counterbalanced across participants). For all experiments, no list contained items from only one single digit classification category (see below).

Fig. 1
figure 1

The top panel shows the different list compositions across all list-switching experiments (E). Lists in E6–E8 were considered to be of low overlap. The bottom panel illustrates the flow of events in the tasks used across E1–E8, and the relevant conditions (n-2 list-repetition and n-2 list-switch trials). List colors are written on top in parenthesis to make list transitions unambiguous without color printing

Participants were asked to perform a speeded classification task on a cued digit from one of the lists. In Experiment 1, participants classified the retrieved digit as (a) smaller or larger than 5 (magnitude decision), (b) odd or even (parity decision), or (c) being in an inner or outer position on a number line running from 1 to 9 (inner-outer decision). For this experiment, the classification task remained constant within one block of trials, but changed between blocks, such that task switching from one trial to the next was not required. The order of blocks with different classification tasks was counterbalanced across participants. Given that we observed no significant effect of task on n-2 list-repetition costs, in Experiments 24 we only used the magnitude task because it was found to be the easiest task in Experiment 1.

Procedure and design

Testing took place in a sound-dimmed individual booth. Viewing distance to the monitor was about 50 cm. In the beginning of the experiment, participants underwent a practice phase to learn the three memory lists by heart. In Experiment 1, training comprised ten blocks of 27 trials in which recall of the list digits was required and immediate feedback was provided after each response. We also excluded the first 24 trials in Blocks 1, 4, and 7 as task training trials, given that in these blocks a new classification task was introduced. In Experiments 24, the recall training was conducted until participants correctly recalled all three lists three consecutive times. Moreover, following list training, a short practice phase (40 trials) with the classification decision was introduced in these experiments. In this practice phase, participants had to classify individually displayed digits, and feedback was provided.

Following training and practice, the experiment proper began. In each trial, a row of three identically colored box frames was shown, and inside one of the boxes a cue indicated the target of the decision (see Fig. 1). The color of the frames indicated the relevant memory list, and the box in which the cue was shown indicated the relevant list item. List sequence was randomized with the constraint that no immediate list repetitions could occur, and the proportion of n-2 list repetitions was 50 %. In Experiment 1, the position cue consisted of a question mark, and in Experiments 24, it consisted of the symbol “iI” (transparently referencing the smaller-larger decision to be performed). Participants were instructed to retrieve from memory the cued item in the relevant list, classify it according to the instructed task, and respond as quickly and accurately as possible by pressing one of two keys. They responded using the ALT and ALT-GR (Experiment 1) or the left and right arrow keys (Experiments 24) of a Swiss-German keyboard. In Experiment 1, feedback was given only between blocks (mean reaction time (RT) and % errors were displayed), but not on a trial-by-trial basis. In between trials, a blank screen was shown for 250 ms. In Experiments 24, visual feedback was provided after each response (500 ms), followed by a blank interval of another 500 ms before the next trial started. Participants completed nine blocks of 72 trials in Experiment 1, and 11 blocks of 70 trials in Experiments 24.

Data analysis

Dependent variables were mean RTs and accuracy. The first two trials in each block were discarded from all analyses as they could not be classified as n-2 list repetition or switch. Data from the first block in Experiments 24 were removed from further analysis (E2 = 8.33 %; E3 = 8.5 %; E4a = 8.6 %; E4b = 8.5 % of the data), and in Experiment 1 the first 24 trials of each block with a new task were excluded (11.23 %). Those trials were considered as (additional) practice. In all experiments, RTs associated with errors and RTs from the two trials following an error were also excluded (E1 = 11.9 %; E2 = 13.4 %; E3 = 5.5 %; E4a = 8.9 %; and E4b = 6.8 %). Furthermore, we removed outliers, which were defined as RTs being more than 4.5 median absolute deviations above or below the individual median RT in each condition (Leys, Ley, Klein, Bernard, & Licata, 2013). This outlier identification procedure further removed 1.5 % of RTs in E1, 4.5 % in E2, 5.3 % in E3, 3.8 % in E4a, and 1.5 % in E4b. The mean of (non-transformed) RTs and accuracy for each condition are listed in Table 1.

Table 1 Overview and descriptive statistics of Experiments (E) 18

For all statistical analyses, RTs were log-transformed to better approximate normality. We tested for n-2 repetition costs by comparing n-2 list-repetition trials to n-2 list-switch trials. Effect sizes and 95 % confidence intervals (CIs) around them were computed using the MBESS package (Kelley & Lai, 2012) implemented in the R environment for statistical computing (R core team, 2014).

Results

In Experiment 1 (low list overlap), paired t-tests did not show a significant n-2 list repetition effect in RTs, t(19) = −0.675, p = 0.508, d = −0.151, 95 % CI of the effect size [−0.590, 0.292]. Accuracy data showed a n-2 list repetition benefit, t(19) = 2.29, p = 0.034, d = −0.511 [−0.972, −0.038]. To test whether this finding was consistent across all three classification tasks, we ran a list transition (n-2 repetition vs. n-2 switch) by classification decision (magnitude, parity, and inner-outer) repeated measurement analysis of variance (ANOVA) on RT and accuracy. For RT, the main effect of task was significant, F(2, 38) = 8.25, p = 0.001, μp 2 = 0.303. The magnitude task was performed fastest (M = 1342 ms), followed by the parity task (M = 1528 ms), and the inner-outer task (M = 1577 ms). Most importantly, the interaction of list transition and decision was not significant, F(2, 38) = 0.028, p = 0.973, μp 2 = 0.001. In accuracy, the ANOVA also yielded a significant main effect of task, F(2, 38) = 6.531, p = 0.004, μp 2 = 0.26, which followed the same pattern as in the RT data (i.e., highest accuracy in the smaller/larger task and lowest in the inner-outer task). The main effect of transition was also significant, F(2, 38) = 6.930, p = 0.016, μp 2 = 0.27 (indicating a n-2 repetition benefit in accuracy). Again no interaction was observed, F(2, 38) = 1.10, p = 0.344, μp 2 = 0.06.

In Experiment 2 (partial list overlap), n-2 list-repetitions were not significantly different from n-2 list-switches, t(19) = −1.18, p = 0.254, d = −0.263 [−0.706, 0.186]. Accuracy did not differ as a function of n-2 list transition, t(19) = −0.681, p = 0.504, d = 0.152 [−0.291, 0.591].

The opposite pattern was observed in Experiment 3 (high list overlap), with slower responses in n-2 repetition trials than n-2 switch trials, a difference that was significant, t(38) = 3.143, p = 0.003, d = 0.503 [0.167, 0.834]. Overall accuracy was very high (97.8 %), but there was a trend for less accurate responses in n-2 repetition trials than n-2 switch trials, t(38) = 1.872, p = 0.069, d = 0.300 [−0.023, 0.619].

Finally, in Experiment 4 (partial vs. high list overlap), we conducted an ANOVA with the factors list transition and list overlap, which yielded evidence for a main effect of list transition, F(1, 31) = 4.349, p = 0.045, μp 2 = 0.12, but no main effect of list overlap, F(1, 31) = 0.001, p = 0.982, μp 2 < 0.01. The two-way interaction was also not significant, F(1, 31) = 1.348, p = 0.254, μp 2 = 0.04. As shown in Table 1, descriptively, in both conditions, n-2 repetition costs were present, but the mean effect was smaller in the partial list overlap condition, similarly to the pattern observed in Experiments 2 and 3. To further explore this difference, we ran separate t-tests on the effects of n-2 list transition in each overlap condition. There was no significant effect of n-2 list transition with partial list overlap, t(31) = 1.059, p = 0.298, d = 0.187 [−0.164, 0.535]; in contrast, there was a small but significant n-2 list-repetition cost with high list overlap, t(31) = 2.461, p = 0.02, d = 0.435 [0.069, 0.795]. No significant effects were observed for accuracy data.

Discussion

Our first four experiments showed small n-2 list-repetition costs when list overlap was increased to its maximum (Experiments 3 and 4), but no such cost, or even a benefit of n-2 list repetition, when list overlap was low (Experiments 1, 2, and 4). We reasoned that classification tasks might not be ideal for investigating list switching in declarative WM because participants could gradually change their representations of the lists from a set of digits to a set of required responses (e.g., for the large-small task, the list 2 7 9 could be represented as “left key, right key, right key”). In this way they could merge the digit lists with the classification task set into new task sets directly mapping the three boxes to the corresponding responses. To avoid this ambiguity about the representations participants switch between, we tested list memory in different ways in Experiments 58.

Experiment 5

In Experiment 5 we replaced the digit classification task by a mental arithmetic task: On each trial one digit from one list had to be accessed, and an arithmetic operation applied to it. As the arithmetic operations were chosen at random, there was no way of recoding the digits in the memory lists in terms of the required responses. We tested again whether n-2 list-repetition costs are observed when highly overlapping lists have to be accessed, because Experiments 3 and 4 suggested that high list overlap encourages inhibition of lists switched away from.

Method

Participants, stimuli, and procedure

Twenty-two new participants (15 women, seven men; mean age 23.3 years) took part in Experiment 5. As in Experiment 3, participants were given three lists of fully overlapping sets of digits (see Fig. 1). Again, list color served as cue. Participants first underwent a training phase comprising recall of the memory lists until all lists were correctly recalled three times in succession. Next, participants completed three blocks of mental arithmetic on cued digits. In each trial, a randomly chosen operation (+1, −1, +2, or −2) was shown in one of the box-frames of a cued list. Participants had to apply the operation to the retrieved digit and enter the result using the number pad of a standard Swiss-German keyboard. After each response, visual feedback (500 ms) and a blank interval (500 ms) followed before the next trial started.

Each block consisted of 18 runs of 13 trials. Between runs, participants were allowed a short break. Within each run, 11 trials could be classified into n-2 list-repetition and n-2 list-switch trials. The first block was considered as practice and excluded from subsequent analyses (33.3 % of the data). The remaining two blocks yielded 396 trials for analyses, half of them n-2 list repetitions and the other half n-2 list switches. RTs were trimmed by removing errors and two trials following an error (6.33 %). Additionally, we excluded RTs more than 4.5 median absolute deviations above or below the individual median RT in each condition (1.5 % of the data). We again used log-transformed RTs and accuracies to test for n-2 list-repetition costs.

Results

The mean (non-transformed) RTs and accuracies per condition are shown in Table 1. There was a tendency for a n-2 repetition costs in RT, which did not reach significance, t(21) = 1.967, p = 0.063, d = 0.419 [−0.022, 0.851]. There was no effect of n-2 list transition on accuracy, t(21) = −0.045, p = 0.964, d = 0.009 [−0.409, 0.427].

Discussion

Our experiment using mental arithmetic, a standard paradigm for probing declarative WM, revealed n-2 list-repetition costs of 42 ms, which, however, were not significant. Hence, as in the preceding experiments, we obtained only a weak signal for n-2 list repetition costs. Experiment 5 therefore suggests that the small and inconsistent n-2 list-repetition costs observed in the previous experiments are not simply due to the use of classification tasks and the possible re-coding of the memory representations (from digits to responses). Together, these findings could be taken as evidence for a smaller reliance on inhibitory processes in declarative WM than previously observed in procedural WM. However, such a conclusion would yet be premature given that there are still several differences between the experimental procedures we used so far in our attempts to measure n-2 list repetition costs in declarative WM and the procedures used in previous studies on n-2 task repetition costs in procedural WM, which may explain this discrepancy.

In the following experiments we therefore aimed to remove yet another difference between the traditional task switching and our list switching paradigms to obtain n-2 list repetition costs: Whereas the tasks in task switching usually require the classification of stimuli into meaningfully related categories, with which participants have long experience (therefore being well-established in LTM), our lists were learned through a few practice trials up to a certain degree. Arguably, the brief experimental learning experience did not result in the same degree of unification of the learned lists in memory as can be assumed for familiar classification schemes. Consequently, we reasoned that memory sets or task sets as a whole might become targets of inhibition only to the extent that they are unified. Accordingly, in our last series of experiments, we varied both the task participants had to perform on the list items (local recognition vs. mental arithmetic) and the degree of extra-experimental LTM learning in order to investigate the relevance of list unification and LTM contribution with respect to the n-2 list repetition costs. In the remaining experiments we used a local-recognition task to test memory, because this task enabled us to increase conflict between the three memory lists. In our local recognition task, a recognition probe appeared in one box-frame on each trial, and participants had to decide whether the probe matched the item in that box-frame of the currently relevant list. Critically, in some trials we presented other-list intrusion probes (i.e., items that were not present in the current list but in another, currently irrelevant list). Correctly rejecting other-list intrusion probes requires minimizing the contribution of the other lists to the recognition decision, thereby creating a strong incentive for inhibiting these lists. In addition, each other-list intrusion probe could act as a retrieval cue for the list it matches an item of. This misleading retrieval cue might increase the degree of inter-list interference, and therefore enhance the degree of inhibition required to successfully select the relevant memory set (see Kuhns, Lien, & Ruthruff, 2007, for a similar argument in task switching).

Experiments 6–8

In this series of experiments, we tested the role of two variables for observing n-2 list-repetition costs: (1) the pre-experimental familiarity of the memory lists, and (2) the type of task to be performed. We tested the role of memory-set unification in LTM of the lists by varying the list composition in a between-subjects fashion (Experiment 6 vs. Experiment 8) and within-subjects (Experiment 7). Furthermore, in Experiment 8 we aimed at a within-subject test for the type of task and its contribution to n-2 list-repetition costs.

General method

Participants

Forty-eight students took part in this series of experiments (Experiment 6: n = 16, nine women, seven men; mean age 27.1 years; Experiment 7: n = 16; 12 women, four men; mean age 27.7 years; Experiment 8: n = 16, ten women, six men; mean age 24.7 years). The data of one participant in Experiment 7 and one in Experiment 8 had to be discarded because of low (<85 % correct) accuracy. Participants in Experiments 7 and 8 took part in two sessions.

Stimuli, task, and procedure

In Experiment 6, the dates of three historically important events, namely the Spanish discovery of the Americas (1492), the French revolution (1789), and the end of the Second World War (1945), served as memory lists. Participants were selected based on a pre-test assessing knowledge of eight historical events such as the beginning of the first world-war. Among the probed historical dates, the three dates used in the experiment were interspersed. The questionnaire was given to anybody taking part in experiments in the Cognitive Psychology Unit of the University of Zurich. When participants knew the three target dates, they were invited to take part in the study. Each list was presented in a row of boxes, using frame colors (blue, green, and yellow; see Fig. 1) to distinguish the lists. Each trial consisted of a local-recognition test: Participants had to judge whether a probe digit shown in one randomly selected list-box matched the list item in that location and then press the “ALT” and “ALT-GR” keys for “yes” and “no” responses, respectively. The first box in any list was never probed because this item was the same across lists. Participants performed 12 blocks of 72 trials. Within each block, recognition probes had a 50 % chance of being the correct item (positive probes). For the remaining 50 % of trials which required a “no” response, the probe digit was equally likely to be an item shown in the same location in another list (25 %), another location within the same list (25 %), another location of another list (25 %), or a completely new digit (25 %; i.e., digits 3 or 6).

In Experiment 7, letters instead of digits were used. There were two conditions differing only regarding list composition (see Fig. 1). In the Word condition, the memory lists comprised three common German words: “NASE” (nose), “BROT” (bread), and “RING” (ring). In the Strings condition, meaningless letter strings comprising the recombination of the same letters as in the Word condition were used (i.e., “ANBR”, “STGI,” and “RENW”). These conditions were run in different sessions, and the order of the sessions was counterbalanced across participants. Participants performed a local recognition task again as in Experiment 6. New probes were the letters “P” and “U” in both sessions. Participants performed ten blocks of 96 trials. Before entering the experimental trials, participants underwent a training phase to learn the lists by heart. During the training phase, the lists were presented in randomized order to get accustomed to the later experimental procedure.

In Experiment 8, letter strings were used to create the memory lists, and two conditions were established which differed only on the type of task participants had to perform: local recognition (as in Experiment 7), or letter arithmetic (for comparison with Experiment 5). The two conditions were run in different sessions, and session order was counterbalanced across participants. The recognition condition was as in Experiment 7. In the letter arithmetic condition, participants were cued to retrieve a memory letter from one of the lists by presenting an arithmetic sign in one of the list boxes (“+” or “−”). Participants were instructed to use this sign as a cue to count one letter onwards or backwards in the alphabet from the retrieved letter, and to enter the resulting letter using the keyboard. To make sure that participants learned the to-be-used letter strings well, they had to perform a recall training phase comprising 168 trials (with immediate feedback) before starting the test session. When they entered the arithmetic session they had to practice both arithmetic operations on displayed letters (the same as in the lists) for one block each (one block of 96 trials with addition only, the second block with subtraction only). Again, immediate feedback was given. The test phase consisted of nine blocks of 96 trials in each condition.

In all experiments, lists were distinguished by frame color (blue, green, and yellow), and feedback was provided only at the end of a block of trials (mean RT and percentage of errors). Lists never repeated across trials, and blocks consisted of 50 % ABA list sequences and 50 % CBA list sequences. The first experimental block was used as practice and was performed under supervision of the experimenter. Participants were invited to take small breaks after each block, and before a block started, the lists in their respective colors were displayed as a reminder.

We removed from analyses the first block of trials (E6 = 9.2 %; E7a and 7b = 10 %; E8a = 12.0 %; and E8b = 12.1 %). In addition, the first two trials in each block, as well as trials in which participants committed an error plus the following two trials were discarded (E6 = 12.3 %; E7a = 12.9 %; E7b = 19.3 %; E8a = 9 %; and E8b = 19.4 %). We again removed RTs more than 4.5 median absolute deviations above or below the individual median RT in each condition (E6 = 1.2 %; E7a = 1.5 %; E7b = 1.2 %; E8a = 1.5 %; and E8b = 0.5 %).

Results

Mean (non-transformed) RTs and accuracy per condition are presented in Table 1. In Experiment 6, n-2 list-repetition costs were observed in RTs, t(15) = 7.019, p < .001, d = 1.755 [0.952, 2.535], and accuracy, t(15) = −3.065, p = 0.008, d = 0.766 [0.196, 1.317].

In Experiment 7, we ran a List transition by List composition ANOVA which yielded significant main effects of list transition, F(1, 14) = 28.802, p < 0.001, μp 2 = 0.67, and list composition, F(1, 14) = 42.396, p < 0.001, μp 2 = 0.75. The interaction was not significant, F(1, 14) = 2.928, p = 0.109, μp 2 = 0.17, indicating that n-2 list-repetition costs were present in both sessions, although numerically smaller in the letter-string condition than in the word condition. When tested separately, n-2 list-repetition costs were significant in both list composition conditions: Word, t(14) = 7.381, p < 0.001, d = 1.906 [1.031, 2.757]; Strings, t(14) = 2.452, p = 0.028, d = 0.633 [.067, 1.181]. In accuracy, the ANOVA revealed only a significant main effect of list composition, F(1, 14) = 14.024, p = 0.002, μp 2 = 0.5. Participants made more errors in the Strings condition than in the Word condition. List transition did not yield a significant main effect, F(1, 14) = 0.166, p = 0.690, μp 2 = 0.01. The interaction was also not significant, F(1, 14) = 0.443, p = 0.517, μp 2 = 0.03. The accuracy difference between ABA and CBA trials yielded a d = 0.055 [−0.453, 0.560] for the Word condition, and a d = −0.178 [−0.685, 0.335] for the Strings condition.

In Experiment 8, we tested for n-2 list repetition costs in the Recognition and Arithmetic conditions separately. In the Recognition condition, we observed significant n-2 list-repetition costs in RT, t(14) = 3.542, p = 0.003, d = 1.176 [0.499, 1.829], accompanied by n-2 list repetition costs in accuracy, t(14) = −2.634, p = 0.02, d = 0.681 [0.107, 1.236]. In the Arithmetic condition, n-2 list-repetition costs were observed in RT, t(14) = 2.389, p = 0.032, d = 0.617 [0.054, 1.162]. For accuracy, no significant effect was found, t(14) = 0.371, p = 0.716, d = 0.096 [−0.413, 0.601].Footnote 2

Discussion

Across all experiments using local recognition, n-2 list-repetition costs were observed irrespectively of the list status in LTM (pre-experimentally learned or not), although they were numerically larger when lists formed a unified chunk in LTM. We also observed significant n-2 list-repetition costs using the letter arithmetic task in Experiment 8, and this effect was of similar magnitude as the one observed in Experiment 5, despite the use of low overlap lists in this experiment. Thus, overall, the n-2 list-repetition costs were more substantial with the recognition task than the arithmetic task.

Overall analysis across all experiments

Overall, our eight experiments suggest important boundary conditions for the observation of n-2 list-repetition costs. Figure 2 presents the size of n-2 list-repetition costs for RT (panel a) and accuracy (panel b) measures. Comparison of the panels shows that n-2 list-repetition costs were more consistently observed in RT, with most experiments showing positive values of d with the confidence intervals not including zero. For accuracy, most values were close to zero.

Fig. 2
figure 2

Panel (a) shows the effect size of the n-2 list-repetition effects obtained from the analysis of the log-transformed reaction times in Experiments 18. Panel (b) shows the effect sizes of the n-2 list-repetition effects obtained from the accuracy data in each experiment. The data was coded regarding the type of task used, and list composition (i.e., degree of overlap). Chunked lists were of low content overlap. Positive values indicate n-2 list-repetition costs, and negative values n-2 list-repetition benefits. Error bars represent the 95 % confidence interval of the effect size

To substantiate the differential influence of the variables tested across all experiments on the size of n-2 list-repetition costs in RT, we ran a linear mixed effects (LME) model using the data of all eight experiments. The model was fitted using the lme4 package (Bates, Maechler, & Bolker, 2012) implemented in R. Our main dependent variable was the n-2 list-repetition effect computed as the difference in log-transformed RTs between ABA and CBA trials. We entered in the model the data of each participant in each experiment (and, depending on the experiment, in each of the experimental conditions). The n-2 list-repetition effects were transformed into standardized effect sizes (Cohen’s d) for each experiment, thereby removing the mean RT difference across experiments and coding the effects in terms of standard-deviation units.

Our aim was to estimate the effects of three variables on the size of the n-2 list-repetition costs: list overlap (low, partial, or full), LTM unification of the lists (chunked lists or not), and task (classification, mental arithmetic, or local recognition). We contrast-coded each of these variables to test for the effects of interest (see Table 2), and entered these predictors as fixed effects in the model. To account for the different numbers of participants that entered the analysis at each level of our predictors, we re-centered our predictor variables on zero by subtracting from each observation the mean across all contrast levels (taking into account only contrast values different from 0). Participant and experiment were treated as a random intercept effects.

Table 2 Contrast coding of the predictors entered in the linear mixed effects model with the data from Experiments 18

Table 3 presents the regression coefficients, t-statistics, and p-values for each predictor in the model. Statistical significance was assessed using the lmerTest package (Kuznetsova, Brockhoff, & Christensen, 2014) implemented in R. As can be seen in this table, the intercept reflecting the n-2 list-repetition costs across all experiments was significant. Among the predictors, LTM unification and the first Task contrast were significant, whereas List overlap and the second Task contrast did not explain variance to a significant degree. LTM unification of the lists substantially pushed up the n-2 list-repetition costs. Next, there was a significant effect of the recognition memory task that lead to an increase in n-2 list-repetition costs. There was no evidence for a substancial difference between using a classification task or mental arithmetics on the size of the n-2 repetition costs. Collectively, the data of these eight experiments provide evidence for n-2 list-repetition costs and for some boundary conditions for observing this effect.

Table 3 Results of the linear mixed effects model on the data of Experiments 18

Episodic memory as an alternative explanation for n-2 list-repetition costs?

So far we assumed that list inhibition causes the n-2 list-repetition cost. Yet, there is an alternative explanation in terms of episodic retrieval. Imagine participants are asked to retrieve a digit in the same list as two trials before (i.e., an n-2 list repetition). The cued list in trial n might evoke retrieval of an episodic record of trial n-2. This could be beneficial for performance when trial n demands retrieval of the digit in the same position as trial n-2 (i.e., in case of an n-2 position repetition, occurring on one-third of all cases), but harmful when retrieval of a different digit from a different list position is required (n-2 position switch, in two-thirds of all cases). Because the potentially harmful case is more frequent, the net effect of episodic retrieval could be an RT-cost for n-2 list repetitions. To investigate this alternative explanation, we considered n-2 position (and thereby item) repetition as a further predictor.Footnote 3 We re-computed effect sizes separately for n-2 list-position repetitions and switches (see Fig. 3).

Fig. 3
figure 3

Effect sizes for the n-2 list-repetition effects (in reaction times (RTs)) in Experiments 18, presented separately for n-2 position-repetition trials and n-2 position-switch trials. Error bars depict the 95 % confidence intervals for the effect size

Position repetition had a large impact on whether or not n-2 list-repetition costs were observed. To further substantiate this influence, we ran the LME with an additional (contrast-coded) n-2 position-transition predictor (position repetition = −0.5; position switch = 0.5) in addition to our already identified predictors. Table 4 presents the results of this analysis. The data of our eight list-switching experiments showed a substantial moderation of n-2 list-repetition costs by n-2 position repetitions: In the presence of n-2 position repetitions, n-2 list repetitions yielded a benefit instead of a cost, in line with the assumption that episodic retrieval of the n-2 trial drives the effects of n-2 list repetitions.

Table 4 Results of the linear mixed effects model entering position transition as a further predictor for the data of Experiments 18

Given our assumption of analogous processing principles in declarative and procedural WM, we were then interested in assessing whether a comparable moderation is also observed for n-2 task-repetition costs. The observation of analogous effects across the two WM sub-systems is critical to test whether our theoretical model is a viable framework to describe cognitive processes in short-term memory and action control.

Assessing the role of episodic retrieval in n-2 task-repetition costs

The analogous effect in procedural WM for episodic memory contributions to n-2 task repetition costs would be an influence of n-2 stimulus repetition on n-2 task-repetition costs. This is because stimulus repetitions serve an analogous function in task switching as list position repetitions in our list-switching tasks: the stimulus serves as the cue to retrieve a response within the currently relevant task-set, in the same way as the position serves as the cue to retrieve an item within the currently relevant list. To assess the role of episodic retrieval for n-2 repetition effects in procedural WM, we took two steps. First, we reviewed the published literature for any reports of n-2 stimulus repetition effects in experiments assessing n-2 task-repetition costs. Our review revealed that n-2 stimulus repetitions are not controlled for in some studies (i.e., Arbuthnott & Woodward, 2002; Arbuthnott & Frank, 2000; Gade & Koch, 2014; Mayr & Keele, 2000), whereas in other studies n-2 stimulus repetitions were excluded by design (i.e., Gade & Koch, 2005, 2007). Mayr (2002) explicitly addressed the question of episodic memory contributions to n-2 task-repetition costs. In his study, participants had to apply one of three displacement decision rules (i.e., vertical, horizontal, or diagonal displacement) to a given stimulus and indicate the resulting stimulus location with a keypress. Mayr analyzed response repetitions (which were coupled with stimulus repetitions) from trial n-2 to trial n and found no significant modulation of n-2 task-repetition costs. However, given the overall small n-2 task-repetition effects in his study, Grange and colleagues (2015) recently re-ran a closely matched experiment and found that n-2 response repetitions significantly reduced the size of the n-2 task repetition costs. Thus, this latter result points to a similar modulation of n-2 repetition costs in declarative and procedural WM. However, given that this is only one study, more evidence is clearly needed to firmly establish the analogy.

Therefore, in a second step, we re-analyzed our own data from task-switching experiments. We used the data of eight task-switching experiments (partly published, see Gade & Koch, 2014) that have been conducted by the first author (M.G.). For a short description of the incorporated experiments see Table 5, for the experimental method used in the experiments see online supplemental material. Briefly, in these experiments, a column with three stimuli (a letter, a digit, and a symbol) was presented in each trial together with a task cue (“letter”, “digit,” or “character”). Furthermore, type of stimulus (letter, digit, and symbol) altered position within the column on a trial-by-trial basis. Participants had to select the stimulus indicated by the task-cue, and classify it according to the response mapping defined by the respective task-set. Some of these experiments also included direct task repetitions as well as up to four different timing intervals (cue-target as well as response-cue manipulations). As a consequence, n-2 task-repetition costs were only small and sometimes non-existent (see Table 5) in these experiments, therefore our reanalysis of these data only provides a first approach to a systematic investigation of episodic memory contributions to n-2 task-repetition costs.

Table 5 Overview of experiments (e) analyzed for the impact of stimulus repetitions on the n-2 task-repetition effects

As for the list-switching experiments, we assessed the size of the n-2 task-repetition costs as a function of n-2 stimulus repetitions. However, given that participants encountered multivalent stimuli whose composition altered from trial to trial, stimulus repetitions were only partial and comprised only target stimulus repetitions (ignoring stimulus position; stimuli for each category consisted of six elements, see online supplemental material for a detailed description). Thus, we regarded as stimulus repetitions those trials that used the same task-relevant stimulus in trial n-2 and trial n, regardless of the other two, task-irrelevant stimuli. Our main dependent variable was the n-2 task-repetition effect computed as the difference in log-transformed RTs between ABA and CBA trials, depending on n-2 stimulus repetitions or switches. We entered in the model the data of each participant in each experiment for each stimulus transition condition. The n-2 task-repetition effects were transformed into standardized effect sizes (Cohen’s d). We present the results of this analysis in Fig. 4. Moreover, we ran an LME using the data of all eight task-switching experiments, including n-2 stimulus repetition as one predictor of n-2 task-repetition costs. We also included n-1 task repetitions (i.e. whether direct task repetitions were possible in the experiment or not) because this variable has been found to affect the size of n-2 task-repetition costs overall (Philipp & Koch, 2006). We contrast-coded both predictors and entered them as fixed effects in the model. Table 6 presents the results of this analysis. Neither the intercept nor the n-1 task-repetition predictor yielded significant effects. Only stimulus repetition moderated n-2 task-repetition costs: As shown in Fig. 4, n-2 task-repetition costs were reduced or even turned into benefits with stimulus repetitions from trial n-2 to trial n.

Fig. 4
figure 4

Effect sizes for the n-2 task-repetition effects (in reaction times (RTs)) in Experiments 9–16, presented separately for n-2 stimulus-repetition trials and n-2 stimulus-switch trials. Error bars represent the 95 % confidence intervals for the effect size

Table 6 Results of the linear mixed effects model on the data of Experiments 9–16

To conclude, as for n-2 list repetition effects in declarative WM, we found episodic memory contributions to n-2 task repetition effects in procedural WM. These findings, together with the ones reported by Grange et al. (2015), strongly point to episodic contributions to n-2 task-repetition costs.

General discussion

In the task switching literature, it has been argued that no-longer relevant task sets are inhibited to facilitate set switching (Gade et al., 2014; Mayr & Keele, 2000). The n-2 task-repetition costs have been considered as providing the most convincing evidence for task inhibition (Koch et al., 2010). In the present study, we provided first evidence for an analogous effect when participants switch between memory lists in declarative WM. Across eight experiments, we varied memory list overlap, whether lists formed chunks in LTM, and the type of task participants performed on the memory items. These manipulations were designed to increase competition between memory lists because the degree of task competition has been found to moderate the size of n-2 task-repetition costs (see Gade et al., 2014; Koch et al. 2010, for similar conclusions). Two manipulations significantly increased the size of n-2 list-repetition costs: testing memory with recognition probes, and using chunks instead of novel lists as the memoranda.

In sum, n-2 repetition costs occur in switching between task sets in procedural WM, and in switching between memory lists in procedural WM. However, their origin is unclear. We considered two competing explanations: inhibition and episodic retrieval. In our final analysis, we showed that episodic retrieval provides a viable alternative explanation for the n-2 repetition costs for both list-switching and task-switching paradigms. In the case of list switching, when the memory list from trial n-2 is repeated in trial n, an episodic record of trial n-2 is retrieved. This facilitates access to the information used in trial n-2, that is, the tested list position, and the item bound to this position. In most cases, however, trial n requires access to another item bound to another position of the same list, and hence, retrieval of the position-item binding used in trial n-2 interferes, thereby slowing responses in trial n. The episodic retrieval account may also explain the impact of the type of task and LTM chunks on n-2 list-repetition costs. By increasing the list-length (from three to four items in some experiments) as well as by increasing the number of possible probe types in the recognition task, the chance of complete episodic repetitions was reduced, thereby increasing episodic mismatches and the costs associated with such mismatches. This experimental feature could explain the differences in the size of the observed n-2 list repetition costs across the different list-switching experiments reported here. Regarding the impact of LTM chunks on n-2 list-repetition costs, chunked information is presumably more distinctively represented in LTM, which then facilitates episodic retrieval. At the same time, however, it also seems to be more vulnerable to episodic mismatch (see Fig. 3, Exp. 6 and 7a).

Task switching yields an analogous situation to the one described above: Repeating the task of trial n-2 facilitates access to the stimulus-response binding used in trial n-2, but interferes with using another stimulus-response binding. Because n-2 stimulus repetitions are rare, or even excluded entirely in many previous experiments, the interfering effects of retrieving the episodic record of trial n-2 predominate, leading to n-2 repetition costs. Interference from episodic retrieval has been largely overlooked as one alternative explanation of n-2 task-repetition costs (but see Anderson & Levy, 2007). Recently, Grange et al. (2015) have re-evaluated episodic contributions to the size of n-2 task-repetition costs. They observed that stimulus (and response) repetitions reduced these costs, although not completely eliminating them. To explain both the reduction of n-2 repetition costs, and the remainder of some n-2 repetition costs in stimulus repetition trials, Grange et al. argued that both inhibition and episodic traces may contribute to the size of n-2 task-repetition costs. Here we only observed consistent evidence for n-2 list-repetition costs in the absence of position repetitions in three out of the eight list-switching experiments, and in the absence of stimulus repetitions in one (out of eight) task-switching experiment. Hence it is unclear whether there is a need to assume inhibition in addition to episodic retrieval to explain n-2 repetition costs. What it is clear from our data, though, is that episodic retrieval plays a crucial role in explaining these costs in both declarative and procedural WM.

To conclude, the parallel behavioral effects observed across list-switching and task-switching paradigms bolster the contention of analogous processing principles in declarative and procedural WM (Oberauer, 2009). Further research, however, is needed to determine the respective contribution of inhibition and episodic retrieval to these costs.