Psychological Research

, Volume 69, Issue 5, pp 369–382

Attentional load and implicit sequence learning


    • Department of PsychologyUniversity College London
  • Lee A. Rowland
    • Department of PsychologyUniversity College London
  • Mandeep S. Ranger
    • Department of PsychologyUniversity College London
Original Article

DOI: 10.1007/s00426-004-0211-8

Cite this article as:
Shanks, D.R., Rowland, L.A. & Ranger, M.S. Psychological Research (2005) 69: 369. doi:10.1007/s00426-004-0211-8


A widely employed conceptualization of implicit learning hypothesizes that it makes minimal demands on attentional resources. This conjecture was investigated by comparing learning under single-task and dual-task conditions in the sequential reaction time (SRT) task. Participants learned probabilistic sequences, with dual-task participants additionally having to perform a counting task using stimuli that were targets in the SRT display. Both groups were then tested for sequence knowledge under single-task (Experiments 1 and 2) or dual-task (Experiment 3) conditions. Participants also completed a free generation task (Experiments 2 and 3) under inclusion or exclusion conditions to determine if sequence knowledge was conscious or unconscious in terms of its access to intentional control. The experiments revealed that the secondary task impaired sequence learning and that sequence knowledge was consciously accessible. These findings disconfirm both the notion that implicit learning is able to proceed normally under conditions of divided attention, and that the acquired knowledge is inaccessible to consciousness. A unitary framework for conceptualizing implicit and explicit learning is proposed.


It has been very common in recent conceptualizations of memory and learning to distinguish both behaviorally and neurally between declarative (explicit) and procedural (implicit) systems (Ashby & Ell, 2002; Squire, 1994). Explicit learning and memory are thought to depend heavily on the integrity of the hippocampus and surrounding brain structures, while implicit learning is controlled by other brain systems, such as the basal ganglia (Keele, Ivry, Mayr, Hazeltine, & Heuer, 2003).

A major challenge to memory researchers, however, is to delineate precisely what the characteristics and operating properties of these putative learning/memory systems are. This has proved to be difficult even though a number of proposals have been put forward. It has been suggested, for example, that the main distinguishing feature is whether or not a particular learning capacity is intact in the classic amnesic syndrome. Undoubtedly, certain learning abilities are spared in amnesia whilst others are impaired, yet it has been questioned whether this arises because of a separation between different forms of learning or is instead attributable to differences in the ways in which different memory tests place demands on a single underlying system (Buchner & Wippich, 2000; Kinder & Shanks, 2001, 2003; Zaki, Nosofsky, Jessup, & Unverzagt, 2003). Another possibility is that the defining hallmark that distinguishes between implicit and explicit learning is that the latter but not the former yields contents that are accessible to consciousness. Again, however, this definition has proven difficult to demonstrate beyond question (Shanks & St John, 1994; Shanks, Wilkinson, & Channon, 2003).

A third possibility—and one which is the focus of the present article—is that explicit/declarative but not implicit/procedural learning requires attentional resources. The idea is that learning may sometimes proceed independently of central resources and to that extent is hence “implicit.” It is true that attention and awareness may partially overlap, but according to this proposal it is the demands placed upon attentional resources that should be the principal focus of research attempting to delineate the essential nature of the implicit and explicit systems. The notion of an automatic process has long been conceptualized in terms of two properties, namely that it does not require limited-capacity resources (or mental effort) and that it is driven in response to stimulus input independently of control processes or selection. It is important to emphasize the distinction between these two properties: Attention as resources and attention as a selection mechanism. Our focus in this article will be on the former notion, usually studied in experiments using a divided-attention or dual-task methodology. Studies of the role of selective attention in implicit learning, using manipulations of focused attention, have not been conclusive: We briefly review some relevant work in the general discussion.

The vast bulk of research addressing the proposal that implicit learning does not require attentional resources has used the well-known sequential reaction time (SRT) task in which participants detect the location of a target that can appear in each trial in any one of several (usually 4–6) locations on a display. Participants react to the target by pressing as fast as possible a key allocated to its current location, with a compatible relationship between display locations and response keys. Several thousand trials may be presented, but critically the target follows a predictable repeating sequence rather than appearing at random. It is useful to distinguish between deterministic sequences, in which the target follows an unvarying pre-determined sequence of, say, 12 locations, and probabilistic sequences in which the target follows a noisy sequence. In the deterministic case, learning is usually indexed by testing performance in a transfer block with a different sequence: If RTs increase on the introduction of the new sequence, this establishes that participants had learned something about the training sequence and were able to use their knowledge to anticipate the target’s location in some or all of the trials. In the probabilistic case, learning can be monitored on-line by seeing whether RTs to predictable trials (those that are consistent with the generating sequence) are faster than ‘noise‘ trials in which the target does not appear at the location prescribed by the generating sequence.

The SRT task has been taken to be an ideal tool for studying procedural learning as the sequential structure is entirely incidental to the participant’s task and the nature of sequence learning appears to be very “low-level” in the sense that participants develop a perceptual-motor rather than a cognitive skill. Moreover, initial studies suggested that this form of learning is both spared in amnesia (Reber & Squire, 1998) and dissociable from awareness (Willingham, Nissen, & Bullemer, 1989). However, both of these claims have been challenged: Impairments in amnesia have been reported (Curran, 1997), and most of the evidence for the dissociation of sequence learning from awareness has not stood up in attempts at replication (Perruchet & Amorim, 1992; Perruchet, Bigand, & Benoit-Gonin, 1997; Shanks et al., 2003; Wilkinson & Shanks, 2004). The suggestion, therefore, that SRT learning may be independent of attentional resources—in a way that distinguishes it from typical forms of explicit learning—is hence a highly appealing one that deserves detailed consideration.

Many studies have combined the primary SRT task with a secondary tone-counting task designed to occupy attentional resources and have asked whether SRT learning is impaired (Cohen, Ivry, & Keele, 1990; Curran & Keele, 1993; Frensch, Lin, & Buchner, 1998; Hsiao & Reber, 2001; Schvaneveldt & Gomez, 1998; Shanks & Channon, 2002; Stadler, 1995). In these studies, the usual procedure is that a high-pitched or low-pitched tone is emitted in the intertrial intervals of the SRT task and participants are required to maintain a running count of the number of high tones that they have to accurately report at the end of each SRT trial block. We will not attempt to summarize the results of this complex literature (see Hsiao & Reber, 1998; Shanks, 2003, for reviews) because it has generally been recognized that tone-counting is a less than ideal secondary task. If it interferes with implicit sequence learning (which it appears to do), then it is difficult to know whether this is attributable to its demands on attentional capacity or alternatively to some nonattentional factor such as disruption of the timing or organization of the primary task sequence (see Heuer & Schmidtke, 1996; Hsiao & Reber, 2001; Shanks, 2003; Stadler, 1995, for extensive discussion of these issues).

For this reason, alternative secondary tasks have been employed. Especially relevant here is the symbol-counting task of Jiménez and Méndez (1999, 2001), which does not introduce an additional stimulus into the task, but instead requires participants to make two decisions to each SRT target. The target is one of four symbols (×, ?, $, and o) and the primary task is to respond to the location of the target. Under single-task conditions participants ignore the identity of the symbol whereas under dual-task conditions they count the combined number of (say) ×’s and ?’s, thus requiring them to attend to both the location and the identity of each target. Jiménez and Méndez used this secondary task in a rather complex experimental procedure, which we will not describe here (we discuss it more fully in the general discussion). We have adopted it here in three experiments designed to test, under ideal conditions, the conjecture that implicit learning in the SRT task can proceed without making demands on attentional resources.

The experiments combine several features chosen to make the test as comprehensive as possible. First, we used probabilistic rather than deterministic sequences. It is commonly suggested that deterministic sequences may fail to maximally recruit implicit learning capacities because small chunks of a fixed repeating sequence can often be consciously identified. With probabilistic sequences it is assumed to be much harder consciously or intentionally to identify repeating chunks (Cleeremans & Jiménez, 1998). Hence the use of probabilistic sequences should increase our chances of detecting true implicit learning, if it exists. Second, we used a concurrent task that involves counting stimuli that are targets in the SRT display (Jiménez & Méndez, 1999). Counting stimuli under these circumstances should minimize the disruption effects caused by tasks such as tone counting (see above) and therefore provide a purer measure of the effects of reduced attention on sequence learning (we re-evaluate this assumption in the general discussion). Third, we tested sequence knowledge in our single-task and dual-task groups under identical conditions. To avoid the possibility that implicit sequence learning is equivalent under single-task and dual-task learning conditions, but that its expression is suppressed under dual-task conditions (Frensch et al., 1998), both groups were tested for sequence knowledge under single-task (Experiments 1 and 2) or dual-task (Experiment 3) conditions.

In summary, we designed our experiments to determine whether the requirement to count symbols at the same time as the primary SRT task interfered with implicit sequence learning.

Experiment 1

Participants were trained in probabilistic sequences as developed by Schvaneveldt and Gomez (1998). Such sequences disguise the repeating nature of the sequence and also allow an online measure of learning. Being able to track participants’ learning throughout the training phase of the experiment allows us to identify any differences in the rate of learning between single-task and dual-task participants.



Thirty-six participants (half men, half women, with a mean age of 25.8 years, range 19–42 years) took part in this study. The majority were undergraduates from University College London. Participants were randomly assigned to the single-task (n=17) or dual-task (n=19) group. They were each paid £3 for their participation.


The experiment was fully automated and run on a laptop PC using software written in Microsoft Visual Basic 6.0. Responses were made using a standard “QWERTY” keyboard. Four boxes were arranged in a horizontal line just below the center of the screen. Boxes were presented on a gray background; they were white in color and were 12 × 12 mm in size. A 37-mm gap separated each of the four boxes. In each target-location trial one of four possible text symbols (‘×,’ ‘?,’ ‘$,’ and ‘o’) appeared in the center of one of the boxes. The boxes will be referred to as locations 1–4 from left to right respectively. Reaction times were measured using ExacTicks software in Visual Basic.

Sequential reaction time task

The experiment followed a 2 × 2 design. The between-participants factor was the two levels of training conditions (single-task or dual-task). The within-participants factor was the two levels of target probability, probable and improbable.

During the training blocks of the SRT task, participants in the dual-task condition concurrently counted targets while responding to target location. The identity of the target was selected at random. Single-task participants simply responded to target location without any additional tasks. Following training all participants carried out a test block under single-task conditions. The RT difference between probable and improbable targets was used as an index of sequence learning; greater differences indicated a greater degree of sequence learning.


Two 12-item second-order conditional (SOC) sequences were used in this experiment; these are referred to as SOC 1 (2-4-2-1-3-4-1-2-3-1-4-3) and SOC 2 (3-4-3-1-2-4-1-3-2-1-4-2). For both sequences, predicting an item in the sequence depends on knowing the two prior target locations. The sequences were structurally identical. They were equated with respect to location frequencies (each location occurs three times), first order transition frequency (each location is preceded once by each of the other three locations), repetitions (no repetitions in either sequence), reversals (once only, i.e., 2-4-2 and 3-4-3). The only difference between the sequences is in their second-order conditional structure (e.g., 2-4 is always followed by 2 in SOC1 but by 1 in SOC2). The order of training sequences was counterbalanced so that half the participants saw SOC1 as the training sequence and half SOC2. During the training blocks, target location was specified by the assigned training sequence with a probability of .85 and by the alternative sequence with a probability of .15. For example, if SOC1 was the training sequence, then the transition 3-1 was followed by a target at location 4 with a probability of .85 and by location 2 with a probability of .15. This procedure iterated in each trial and generated the next target based solely on the locations of the preceding two targets. A typical sequence might therefore be 3213413412423123 where the italicized symbols refer to improbable locations (i.e., from SOC2) and the remainder to probable ones (from SOC1).


Each participant was tested individually in a quiet room. Although all instructions were presented on the screen, participants were informed that the experimenter would be outside the testing room and could be contacted if they had any queries regarding the instructions. Participants in both groups were told that they were taking part in a simple reaction-time experiment that was designed to see how fast they could become at responding to a target. They were informed about the nature of the targets (the symbols ×, o, ?, and $) and that they would appear in different locations across the screen. In each trial, participants reacted to the location of a target by pressing keys V, B, N, and M for locations 1–4 respectively. Participants responded to locations 1 and 2 with the middle and index fingers of their left hands respectively, and to locations 3 and 4 with the index and middle fingers of their right hands respectively. Participants were instructed to respond to targets as quickly as possible and it was emphasized that they should avoid making any errors.

First, all participants received ten practice trials to become familiar with the task. The SRT display appeared with the letter symbols (V, B, N, and M) appearing under the corresponding box to aid correct finger placement. The ten stimulus presentations showed each of the various symbols that participants had to respond to.

Following the practice trials, dual-task participants were instructed that they were also required to keep a running count of the combined number of “×” and “?” targets that appeared in each block, and that they would be required to type in this count at the end of each block. For single-task participants the secondary task was not mentioned. Participants then began the training phase of the experiment.

The training phase comprised nine blocks during which participants were presented with a four-choice SRT task. Each block consisted of 100 trials for a total of 900 trials. For each block the target-location trial began at a random point in the sequence. A target location trial ended when a participant made the correct key response, which resulted in the target being erased. Response latencies were measured from the onset of the target until the participant had made the correct response. On making the correct response for the target location, there was a response-stimulus interval (RSI) of 250 ms before the onset of the next target. If participants made an error this was signaled by a tone. The target remained until the correct response was made, and errors were recorded. At the end of each block dual-task participants were requested to type in their combined count of ×’s and ?’s and given feedback about their counting accuracy. If their accuracy was 100% they were commended and encouraged to maintain their high level of accuracy. If, however, it was less than 100% they were asked to pay more attention to the counting task. Following completion of each block participants were told they could have a short pause and when ready they could initiate the next block by pressing the return key.

Test block

All participants completed block 10 under single-task conditions. Dual-task participants were informed that they were no longer required to count symbols and were required only to respond as fast as possible to the location of each target. Single-task participants continued to respond to the location of each target as before.


Error rates (i.e., incorrect target localization responses) were low in this and the other experiments of this article, though higher for improbable than for probable targets (Schvaneveldt & Gomez, 1998; Shanks et al., 2003), and are not discussed further. Note that RTs in error trials were excluded from all analyses. RT data for participants trained with SOC1 and SOC2 as their probable sequence were combined in the following analyses.

Figure 1 presents the mean RTs obtained for single-task and dual-task participants for both probable and improbable targets over the training phase (blocks 1–9) and for the test block (block 10). Throughout the training phase dual-task performers responded more slowly to targets than single-task participants signifying that the secondary task made an impact on responding. An analysis of variance (ANOVA) on the data from blocks 1–9 with block and probability as within-participants factors and group as a between-participants factor revealed main effects of block, F(8, 272) = 3.85, < .001, group, F(1, 34) = 23.85, < .001, and probability, F(1, 34) = 9.96, < .01, and a block × group interaction, F(8, 272) = 2.92, < .01. The latter reflects the fact that RTs speeded up more across blocks in the dual-task group. Figure 1 also shows that participants in the single-task group responded consistently faster to probable than improbable targets, indicative of sequence learning, while dual-task performers showed no clear tendency to make faster responses to probable targets. However, the probability × group, F(1, 34) = 1.35, > .2, and probability × block × group, F<1, interactions were not statistically significant. The probability × block interaction was also not reliable, F<1.
Fig. 1

Mean reaction time (RT) for probable and improbable targets in Experiment 1 in a group (single) that performed the localization task alone and in a group (dual) that performed a concurrent symbol-counting task as well across blocks 1–9. In block 10 both groups performed the localization task alone.

When all participants performed under single-task conditions in the test block, a larger probable/improbable difference was observed in the single-task than in the dual-task group. This effect is highlighted in Fig. 2, which plots the probable/improbable difference for each group in block 10. The difference between the two groups was significant, t(34) = 2.31, < .05. The difference score was greater than zero in the single-task group, t(16) = 3.90, < .001, and marginally so in the dual-task group, t(18) = 1.97, < .05 (one-tailed). Thus, sequence learning was attenuated, but not completely eliminated, under dual-task training conditions.
Fig. 2

Mean difference between RTs for improbable and probable targets in the single-task testing block in Experiment 1 (block 10, left) and Experiment 2 (block 15, right) for groups trained under single-task or dual-task conditions. Error bars depict the standard errors of the mean.

Dual-task group counting accuracy

The concurrent symbol counting task was performed with great accuracy. There were differences between participants and across blocks, but the worst level of performance produced by a single participant in the final dual-task block, block 9, was a mean counting error of 4.6%. The overall mean symbol counting error across training blocks for all dual-task participants was 3.3%. This can be taken as an indication that participants were able to cope well with the counting task.


The results of this experiment are fairly straightforward: When expressed under equivalent (single-task) conditions, sequence knowledge is greater in participants trained under single-task conditions than in those trained under dual-task conditions. That is, dividing attention at study reduces the extent of implicit sequence learning.

Experiment 2

The present experiment sought to replicate Experiment 1 and to provide evidence of the extent to which sequence knowledge is truly implicit or unconscious. On the basis that longer training might induce greater reliance on implicit processes, the training stage was lengthened to 14 blocks. To test the degree to which any sequence knowledge obtained was available to consciousness, participants completed two different free generation tests after the main SRT phase. Both used the same response keys as in the training and testing phase of the experiment. First, participants completed an inclusion free generation test in which they were requested to try to generate the sequence they had responded to in the training phase of the experiment. Then they completed an exclusion free generation test; here they were requested to generate any sequence other than the training sequence. If participants generated more triplets consistent with the training sequence under inclusion than under exclusion instructions, this may be taken as evidence of awareness of the training sequence, as participants could control their sequence knowledge (Destrebecqz & Cleeremans, 2001; Jacoby, Toth, & Yonelinas, 1993; Wilkinson & Shanks, 2004). If, on the other hand, the number of triplets generated was equivalent under inclusion and exclusion instructions, this may be taken to demonstrate unconscious knowledge, as participants would in this case be unable intentionally to control the production of their sequence knowledge.


Participants and apparatus

A further 34 participants (16 men and 22 women) took part in this study. The majority were undergraduates from University College London. Ages ranged from 18 to 34 years (mean age 21.2 years). Participants were randomly assigned to single-task (n=18) or dual-task (n=16) conditions and were each paid £3 for their participation. The apparatus was as for Experiment 1.


The sequences were the same as those in Experiment 1.


The training phase comprised 14 blocks during which participants were presented with a four-choice SRT task. Each block consisted of 100 trials, making a total of 1,400 trials. All participants completed block 15 under single-task conditions. The RSI was again 250 ms.

Free generation test

The free generation phase required participants firstly to attempt to generate and then to refrain from generating the training sequence. The free generation tests were administered in this order to aid participants’ understanding of what was required. The order of these tests has been shown to have no effect (Wilkinson & Shanks, 2004). Participants were informed that the targets followed a repeating sequence and were asked to press the keys 100 times, attempting to freely generate the training sequence that they saw in the RT phase using the same keys. They were told that each time they pressed a key, an “×” would appear in the appropriate box and that it would remain on the screen until a further key press was made. They were told not to worry if their memory for the sequence was poor, just to try to generate the sequence as best they could. The “×” moved to the corresponding location each time one of the four keys was pressed. As an incentive for accurate generation, participants were informed that the six most accurate performers would receive a £15 book token and that the higher their accuracy the greater their chance of winning.

On completion of the inclusion generation test participants were told that they would now be tested for their sequence knowledge using a different method. Again, they were requested to press the keys 100 times, as in the inclusion test, but this time attempting to freely generate a sequence that was as different as possible from the training sequence. They were instructed that if they could remember the sequence they should try to avoid generating it. Again, participants were reminded of the book token prize and encouraged to be as accurate as possible to increase their chances of winning.


Figure 3 presents the mean RTs for single-task and dual-task participants for both probable and improbable targets over the training phase (blocks 1–14) and the test block (block 15). Results across blocks 1–9 were similar to those obtained in Experiment 1. Again, the probable/improbable difference was rather more consistent in the single-task group. When all participants performed under single-task conditions (block 15, the test block), dual-task participants responded to both probable and improbable targets faster than single-task participants. Critically, there was a smaller probable/improbable difference in the dual-task group.
Fig. 3

Mean RT for probable and improbable targets in Experiment 2 in a group (single) that performed the localization task alone and in a group (dual) that performed a concurrent symbol-counting task as well across blocks 1–14. In block 15 both groups performed the localization task alone.

In order to analyze the training data a three-way ANOVA was conducted with probability and block as within-participants factors and group as the between-participants factor. The pattern of significant results was identical to that in Experiment 1. The main effects of probability, F(1, 32) = 44.57, < .001, block, F(13, 416) = 3.69, < .001, and group, F(1, 32) = 13.50, p < .001, were all significant. The block × group interaction was significant, F(13, 416) = 3.06, < .001, reflecting the fact that dual-task participants speeded up more across blocks than those in the single-task group. The probability × group interaction did not reach significance, F < 1. This indicates no difference in learning between groups throughout the training phase of the experiment, despite the apparently greater variability in the dual-task group. Other interactions that failed to reach significance were block × probability, F(13, 416) = 1.31, > .05, and probability × block × group, F(13, 416) = 1.41, > .05.

Sequential reaction time test block

For the test block, block 15, all participants were tested on their sequence knowledge under single-task conditions. A difference score based on the difference between RTs for probable and improbable targets in block 15 was calculated for each participant (see Fig. 2). The difference score was significantly greater in single-task than dual-task participants, t(32) = 2.21, < .05. The difference score was greater than zero both in the single-task group, t(17) = 4.19, < .001, and in the dual-task group, t(16) = 3.20, < .01. Thus, as in Experiment 1, sequence learning was attenuated, but not completely eliminated, under dual-task training conditions.

Dual-task group counting accuracy

The concurrent symbol counting task was again performed accurately. The worst level of performance produced by a participant throughout training was a mean counting error of 4.6 per block. The overall mean symbol counting inaccuracy across training blocks for all dual-task participants was 3.0%.

Free generation data

To establish the extent to which sequence knowledge was consciously accessible, participants were requested to create sequences of 100 key presses under both inclusion and exclusion instructions. To obtain inclusion and exclusion scores, each generated sequence was coded as 98 consecutive response triplets and the number of triplets that were consistent or inconsistent with the training sequence were calculated. For example if a participant was trained with SOC1 and generated the sequence 2-1-3-2 it was coded as two triplets: 2-1-3 and 1-3-2. 2-1-3 is consistent with respect to SOC1 but 1-3-2 is not. It is possible that some participants disregarded the free generation instructions in order to complete the experiment as quickly as possible (Wilkinson & Shanks, 2004). Such participants would have made perseverative key responses (e.g., pressing 1-2-3-4 repeatedly) throughout the tests; therefore, all the data were scrutinized, and participants who showed such trends were removed from the analysis. The final analysis excluded data from 4 participants leaving a total of 30 participants (n=16 single-task, n=14 dual-task), though note that the statistical conclusions are identical when the entire sample is included. Figure 4 shows the average inclusion and exclusion scores for dual-task and single-task trained participants. Both dual-task and single-task trained participants generated more SOC triplets consistent with the training sequence under inclusion instructions than under exclusion instructions.
Fig. 4

Mean number of second-order conditional (SOC) triplets generated that were consistent with the training sequence, under inclusion and exclusion instructions, for single-task and dual-task trained participants. The figure shows data for Experiment 2 (left) and Experiment 3 (right). Error bars depict the standard errors of the mean.

A two-way ANOVA conducted on these data with group (single vs. dual) as the between-participants factor and instruction (inclusion vs. exclusion) as the within-participants factor revealed a significant main effect of instruction, F(1, 28) = 8.26, < .05. Thus, participants were able to intentionally control their sequence knowledge: They generated the training sequence to some extent when required, and could also refrain from generating it when requested. The main effect of group failed to reach significance, < 1, and the group × instructions interaction also was not significant, F(1, 28) = 1.14. This suggests that both groups of participants had similar levels of intentional control over sequence knowledge. Together the findings indicate that participants had some conscious access to their sequence knowledge.

Experiment 3

The final experiment sought to avoid two possible criticisms of the previous experiments. First, it might be argued that testing all participants under single-task conditions is unfair on the dual-task group in that for this group—but not for the single-task learning group—there is a change of conditions between the learning and test phases. Whereas the single-task group performs the same task in the test block as in the training blocks, the dual-task group has to adjust to a different set of conditions (i.e., the absence of the secondary task) and this may introduce a confounding factor. Perhaps learning is normal under dual-task conditions, but participants find difficulty in fully expressing their sequence knowledge under the transfer conditions. To counter this possibility, we tested participants under both single-task and dual-task conditions in the present experiment.

Second, it might be objected that the length of training in the first two experiments was inadequate to allow the development of attention-independent implicit learning. We therefore extended the training stage to 4,000 SRT trials.

In blocks 10, 20, 30, and 40 of the experiment participants in the single-task group were tested under dual-task conditions. Dual-task participants performed the secondary task in all blocks except block 40 where they were tested under single-task conditions. We are thus able to compare sequence knowledge under dual-task testing conditions by analyzing RTs in blocks 10, 20, and 30, and RTs of the single-task group in block 40 with those of the dual-task group in block 39. In addition, we can compare sequence knowledge under single-task testing conditions by analyzing RTs of the single-task group in block 39 with those of the dual-task group in block 40.

Lastly, we reduced the RSI to 0 ms in this experiment. The reason for doing this is that the longer interval (250 ms) in the previous experiments may have provided an opportunity for participants to interleave the different requirements of the localization and counting tasks so as to reduce competition between them for central resources. For instance, this interval provides an opportunity for participants to decide whether the previous symbol needs counting and to update their symbol count if necessary. Setting this interval at 0 ms should eliminate this possibility. In other words the previous experiments, by allowing for such interleaving, may have underestimated the cost of the secondary task on sequence learning as they may have reduced the concurrent demands of the two tasks for a central attentional bottleneck.



Thirty-eight volunteers (19 women and 19 men) from University College London took part in this study. Their mean age was 24.6 years with a range of 17–40 years. Participants were randomly assigned to one of two experimental groups, single or dual. They were paid £6 for their time upon completion of the experimental task; there was an additional award of a £15 book token for the six participants who performed best in the free generation phase.


For all participants, the experimental procedure comprised a pre-training session in the SRT task under dual-task conditions, followed by 20 blocks of the SRT task in the morning, and then a further 20 blocks of the SRT task and a free generation test in the afternoon. The interval between the morning and afternoon sessions was inserted to relieve fatigue. The purpose of including a pre-training period under dual-task conditions was to prepare participants for the dual-task procedure used in the training period. The crucial difference between the procedures for the two groups was the relative order and frequency of trial blocks performed under dual-task conditions (see Table 1). Participants in the single-task group performed most blocks of the SRT task without a concurrent secondary task. The single-task group did, however, perform four test blocks under dual-task conditions (blocks 10, 20, 30, and 40) so that learning effects could be investigated under identical conditions at regular intervals. Conversely, participants in the dual-task group performed the SRT task whilst undertaking the concurrent secondary task in every block except the final one (block 40). It would be undesirable to subject the dual-task group to intermittent single-task test blocks as such a treatment may contaminate sequence knowledge acquired under conditions of divided attention (Shanks & Channon, 2002) by providing an opportunity for single-task learning.
Table 1

Experimental conditions for pre-training, each block of the sequential reaction time (SRT) task, and post SRT. S single-task, D dual-task, Inc inclusion, Exc exclusion

Experimental group


Blocks 1–9

Block 10

Blocks 11–19

Block 20

Blocks 21–29

Block 30

Blocks 31–38

Block 39

Block 40

Post SRT

























Participants had an extended break between blocks 20 and 21, which divided a morning and afternoon session

The final test phase occurred at blocks 39 and 40. For the single-task group block 39 was performed under single-task conditions and block 40 was performed under dual-task conditions; for the dual-task group this order was reversed. Whenever a switch from single-task to dual-task conditions, or vice versa, was required, participants were instructed before the block of the necessary change. This procedure allows both single-task and dual-task groups to be tested under both single-task and dual-task conditions. Table 1 shows the experimental procedure in detail.

Sequence information

The sequences were the same as in Experiments 1 and 2. The RSI was reduced to 0 ms.

Free generation test phase

After completion of the SRT task participants performed a free generation test under either inclusion or exclusion instructions. In contrast to Experiment 2, order was counterbalanced so some participants were presented with the inclusion test before the exclusion one and others with the reverse order. In an effort to heighten motivation and encourage optimal performance, participants were told in advance that the top six performers in the free generation test would each receive a £15 book token.


Reaction time data

Figure 5 shows the mean RT data across all blocks of the SRT task for each condition, with probable and improbable target data plotted separately. The dramatic increases in RTs in blocks 10, 20, 30, and 40 for the single-task group are due to the switch from a single-task to a dual-task procedure. The decrease in RTs in block 40 for the dual-task group is due to the switch from a dual-task to a single-task procedure. Participants’ mean RT data were entered into a global three-way mixed-model ANOVA with the within-participants factors of block (36 levels) and target probability (two levels, probable and improbable), and the between-participants factor of task (two levels, single-task and dual-task). Only 36 levels of block were included in this analysis because the four dual-task test blocks (i.e., 10, 20, 30, and 39/40) were entered into a separate ANOVA (see below). The RTs for training blocks conducted under dual-task conditions were in all cases considerably larger than those obtained under single-task conditions, F(1, 36) = 77.80, with RTs decreasing with practice, F(35, 1,260) = 58.36. This practice effect appears to be more pronounced for the dual-task group, presumably because performance was initially hindered by the presence of the secondary task; accordingly, there was a significant block × group interaction, F(35, 1,260) = 12.16. From very early on, RTs for probable targets were lower than those for improbable targets, and this pattern was maintained across all training blocks in both groups, F(1, 36) = 168.01, except for odd notable exceptions such as blocks 2–4 of the dual-task group. There was no probability × group interaction, suggesting that during training, RTs to both probable and improbable targets were similar across groups. This consistent difference in RT (approximately 40 ms on average) between probable and improbable targets is evidence that participants learned elements of the sequence and displayed priming effects for probable targets.
Fig. 5

Mean RT for probable and improbable targets in Experiment 3 in a group (single) that performed the localization task alone in all blocks except blocks 10, 20, 30, and 40 and in a group (dual) that performed a concurrent symbol-counting task as well as the localization task across all blocks except block 40. The increase in RTs at blocks 10, 20, 30, and 40 for the single-task group represents the change from single-task to dual-task conditions. The reduction in RTs in block 40 for the dual-task group represents the switch from dual-task to single-task conditions.

The three-way block × probability × group interaction also reached significance, F(35, 1,260) = 4.10, indicating that the development of a probability effect across blocks is more robust in the single-task group. This three-way interaction was not found under the shorter training conditions of Experiments 1 and 2.

Difference score analyses

The principal data of interest in the present study were the differences between RTs for probable and improbable targets in the test blocks. The primary test blocks were blocks 39 and 40, but because blocks 10, 20, and 30 for the single-task group, and by default the dual-task group, were performed under dual-task conditions they can provide additional data from which to analyze learning effects measured under identical conditions. The difference scores were calculated by subtracting the block mean probable RT from the block mean improbable RT for each participant and then averaging over all participants in that group. The difference scores for dual-task testing conditions are shown in Fig. 6.
Fig. 6

Mean difference between RTs for improbable and probable targets for both single-task and dual-task training groups in dual-task testing blocks (final refers to blocks 39 and 40 for dual-task and single-task groups respectively). Error bars depict the standard errors of the mean.

Consistent with the idea that sequence learning is impaired by a secondary task, difference scores for the single-task group were greater than for the dual-task group across all dual-task testing blocks. The final test block for the single-task group under dual-task testing was block 40, whereas for the dual-task group testing under dual-task conditions took place in block 39. With the exception of the dual-task group in block 20 all difference scores were greater than zero, providing evidence of sequence learning.

The difference score data for the dual-task test blocks were entered into a 2 × 4 ANOVA with group (single vs. dual) as a between-participants factor and block (10 vs. 20 vs. 30 vs. final test) as a within-participants factor. The ANOVA results revealed a significant main effect for group, F(1, 36) = 8.75, suggesting that the two groups differed in the degree of sequence learning. The block × group interaction was not significant, F(3, 108) = 1.49, nor was there a main effect of block, F(3, 108) = 1.53.

t tests were computed between the difference scores of each group for each of the four test blocks. The probability values reported are one-tailed to conform with the prediction that learning scores would be greater for the single-task training group. There was no statistically significant difference between scores at block 30,  = .28. However, there were reliable differences in block 10, t(36) = 1.97, < .05, block 20, t(36) = 3.42, < .05, and in the final test block, t(36) = 1.97, < .05, supporting the observation that mean difference scores were generally larger for the single-task training group when tested under dual-task conditions.

Next, it is important to analyze the difference scores for both groups under single-task test conditions. As can be seen from Fig. 5, the difference between probable and improbable RTs for the single-task group in block 39 was greater (35 ms) than that for the dual-task group in block 40 (11 ms). Both these test blocks were conducted under single-task conditions. This difference was statistically reliable, t(36) = 2.11, < .05. Learning scores were therefore higher for the single-task training group when tested under single-task conditions, as in Experiments 1 and 2.

For the symbol-counting task all participants were highly accurate on average, with the range across all participants being 0–6% for the single-task group and 1–4% for the dual-task group.

Free generation test data

The free generation data were analyzed as in Experiment 2. The data of five participants were eliminated because they showed strong evidence of perseverative response strategies. This left 17 participants in the single-task group and 16 in the dual-task group.

Figure 4 shows that the mean triplet generation score for the single-task training group is greater under inclusion than under exclusion conditions. The same pattern is observed for the dual-task training group. These data were entered into a 2 × 2 mixed model ANOVA with group (single-task vs. dual-task training) as a between-participants factor and instructions (inclusion vs. exclusion) as a within-participants factor. There was a statistically significant main effect of instructions, F(1, 31) = 4.41, but no main effect for group, F(1, 31) = 1.05, nor was there a statistically significant group × test interaction, F < 1. These results indicate that to some extent participants possessed explicit knowledge of the probabilistic sequences used in this experiment.

In summary, Experiment 3 confirms that sequence learning is reduced under dual-task conditions and adds two new findings to those of the previous experiments: First, this effect is not diluted as training is extended to 4,000 trials, and second, the effect can be detected even when sequence knowledge is evaluated under dual-task conditions, thus avoiding the possible criticism of Experiments 1 and 2 that their results are attributable to transfer decrement from dual-task training to single-task testing.

Relationship between sequence learning and free generation

Together, the RT and free generation data of this and Experiment 2 are somewhat paradoxical; the RT data reveal a difference in learning between dual-task and single-task conditions, whereas both groups showed similar levels of conscious sequence knowledge according to the inclusion–exclusion difference in the free generation tasks. What is the significance of this apparent dissociation?

We suggest that the most plausible conclusion is that the magnitude of the inclusion–exclusion difference is somewhat insensitive to variations in the extent of sequence knowledge as indexed by RT difference scores. However, it is important to acknowledge another possibility, namely that the data reflect distinct learning mechanisms. Perhaps the group data result from the combination of distinct patterns from two sub–groups of participants: A subgroup that has explicit knowledge (i.e., a positive inclusion–exclusion difference), and in which sequence learning is attention-demanding (i.e., an effect of the secondary task), and a subgroup that has no explicit knowledge (i.e., no inclusion–exclusion difference), and in which sequence learning is not attention-demanding (i.e., no effect of the secondary task). If that is the case, we would expect to see a much smaller difference between single-task and dual-task RT difference scores in participants who showed the smallest inclusion–exclusion difference. In fact the data do not show this: If we conduct a median split on each of the two groups on the basis of the magnitude of their inclusion–exclusion difference, and then calculate the mean RT difference score averaged across blocks 10, 20, 30, and the final block for each subgroup, we find that these scores are 67 and 13 ms for the single-task and dual-task groups respectively, for the subgroup with the largest inclusion–exclusion difference and 46 and 8 ms for the single-task and dual-task groups respectively, for the subgroup with the smallest inclusion–exclusion difference. An ANOVA revealed a reliable effect of training conditions (single/dual), F(1, 29) = 7.50, = .01, but no effect of inclusion–exclusion difference (high/low) and no interaction, < 1, in each case. Notably, the difference in the subgroup with poor free generation performance is reliable, t(16) = 1.92, p < .05 (one-tailed). Hence, the effect of the secondary task is unrelated to the degree of conscious knowledge as indexed by the free generation test.

It is important to emphasize that this dissociation between the two measures is not in itself evidence for distinct learning systems. As we have shown elsewhere (Shanks, 2005; Shanks & Perruchet, 2002; Shanks et al., 2003), a single-system model in which a common underlying variable determines the magnitude of both sequence knowledge and conscious knowledge is capable of predicting a complete lack of correlation between them if there is independent noise or error in the processes that translate that variable into each type of response.

Finally, another analytic method for determining attentional effects on implicit and explicit learning would involve estimating the separate contributions of these based on the assumptions of the process dissociation procedure (Jacoby et al., 1993). We have not adopted this method here because of doubts over the validity of these assumptions (Ratcliff, McKoon, & Van Zandt, 1995; Tunney & Shanks, 2003).

General discussion

This study has investigated the effects of attentional load on implicit sequence learning. The RT data suggest that learning under dual-task conditions detrimentally affects sequence learning. This is supported by the finding that RTs to probable and improbable targets differed more for single-task participants than for dual-task ones. The effect occurred when sequence knowledge was tested both under single-task and dual-task conditions. Therefore, we infer that the single-task participants demonstrated greater sequence learning.

The lower level of learning found for dual-task participants compared with single-task participants is consistent with the results of Shanks and Channon (2002), in that reduced attention had a detrimental effect on sequence learning. In that study, however, deterministic rather than noisy sequences were used and it is widely agreed that the latter are preferable to ensure that the involvement of explicit processes is minimized (Cleeremans & Jiménez, 1998). Like the results of Shanks and Channon, the present finding that learning is greater for single-task participants even when tested under dual-task conditions casts doubt on some earlier studies (e.g., Curran & Keele, 1993, Experiment 2), which reported equivalent learning effects in single-task and dual-task groups tested in this way. Overall, our findings conflict with the notion that sequence learning is able to proceed without the use of attentional resources (Cleeremans & Jiménez, 1998; Frensch, 1998; Frensch et al., 1998). Moreover, the symbol-counting task provides a purer effect of reduced attention on sequence learning as it is suggested to be free from some of the disruption effects associated with tone counting (Frensch, Buchner, & Lin, 1994; Frensch & Miner, 1994; Jiménez & Méndez, 1999; Stadler, 1995).

The free generation data revealed that both dual-task and single-task groups had a degree of intentional control over their sequence knowledge: they were able to refrain from generating information from the training sequence when asked to do so during the exclusion test. Both groups of participants were able to generate more sequence-consistent triplets under inclusion instructions than under exclusion instructions. The data from the process dissociation comparison of the free generation task challenges the traditional view that implicit learning yields an unconscious knowledge base. As indicated by the differences in triplet generation under inclusion and exclusion instructions, both groups’ sequence knowledge was, to some degree, conscious. Fuller discussion of the inclusion/exclusion comparison and its interpretation can be found in Destrebecqz and Cleeremans (2001) and in Wilkinson and Shanks (2004).

The RT results conflict with the findings of Jiménez and Méndez (1999), who reported no effect of divided attention on sequence learning. They employed a similar secondary task to that used in this study along with probabilistic sequences. Jiménez and Méndez used a finite-state grammar to generate their sequences (Cleeremans & McClelland, 1991), thus producing far noisier sequences than those in the present experiments. Why are there contrasting results between this study and that of Jiménez and Méndez (1999)? We consider several possibilities.

First, Jiménez and Méndez trained participants for much longer in the SRT task than we did in this study. Participants underwent ten training sessions (carried out at the rate of two per day) each composed of 20 blocks of 155 trials. Therefore, participants had a total of 31,000 learning trials compared to a maximum of 4,000 in our experiments. Jiménez and Méndez analyzed their data in blocks of 3,100 trials—hence the very first data point on their learning curves is about three times longer than the entire training stage of Experiment 1 and is roughly equivalent to the whole of the present Experiment 3. It is possible that with such substantial training participants simply became very efficient at task sharing; the secondary task may have become automatic, therefore allowing attentional resources to be allocated to sequence learning. Indeed, whereas the overall RT difference between single-task and dual-task groups in our experiments was never less than 100 ms, in Jiménez and Méndez’ study it was only about 40 ms by the third of their blocks of 3,100 trials, and was eliminated entirely by the end of training. In one of their experiments (Experiment 3), Jiménez and Méndez did require participants to count a different pair of symbols in each block, but that still means that the same targets were counted across the whole of the first block of 3,100 trials. Moreover, changing the targets to be counted does not eliminate the possibility that target-nonspecific counting operations could have become automatized. In contrast to the present Experiment 3, in which the RSI was 0 ms, their studies (similar to the present Experiments 1 and 2) employed an RSI of 240 ms, which provides a considerable period in which the participant may update his or her symbol count before the appearance of the next target. Indeed, Jiménez and Méndez themselves favored the conclusion (p. 255) that “participants may have learned over training about how to best interleave the different requirements of their tasks to perform them optimally.” But of course if the two tasks were reconfigured in such a way as to allow efficient timesharing, it is no longer obvious that the results speak to the issue of the attentional demands of implicit learning. By interleaving the counting and target-response tasks, participants were ensuring that they did not compete for a common resource.

A second reason for caution in interpreting Jiménez and Méndez’ results is that, unlike the present experiments, they arranged predictive relationships not only between locations, but also between the shape occurring in one trial and the location of the target in the next. They found (see also Jiménez & Méndez, 2001) that dual-task participants learned both of these sets of contingencies. However, this means that attending to the shapes provided participants with information (over and above their location) that was relevant to learning the sequence of locations. Consider a typical location sequence chunk such as 3-1-2-4-1, which might have occurred regularly for a given participant. If we denote the identity of the shapes as A–D, then this sequence might actually have been realized as 3D-1A-2B-4D-1C, where the initial 3D, for instance, means shape D appearing in location 3 (note that the same shape precedes both trials in which the target appears in location 1). It is obvious that, from the point of view of elementary learning processes, this regularity of shapes and locations creates a sequence-learning problem that is informationally quite different to (indeed richer than) the sequence presented to a single-task participant who is completely ignoring the shapes and hence only apprehending a sequence of locations. Examples of augmentation or potentiation of the learning of one contingency by another simultaneous one (Batsell, 2000; Durlach & Rescorla, 1980) raise the possibility that dual-task participants were only able to match the learning of single-task participants because the negative cost (in terms of an attention decrement for sequence learning) of having to perform the secondary task was offset by the advantage of having target shape information that in some way ‘supported’ location sequence learning. Put differently, it is not at all obvious that Jiménez and Méndez would have obtained the same results if there had been no shape-location regularities and if (as in the present experiments) the secondary task had provided no possible support for learning the primary sequence.

It has been suggested (Jiménez & Méndez, 1999) that another piece of evidence supporting the attentional-independence view of implicit sequence learning is that participants are under some circumstances as capable of learning two simultaneous sequences as they are of learning only one. Thus, Jiménez and Méndez showed that location sequence learning was of the same magnitude in the dual-task group, who also learned shape-location contingencies, as it was in the single-task group, who did not. Similar findings have been reported by Jiménez and Méndez (2001) and Mayr (1996). This is a powerful piece of evidence. However, in none of these studies, except that of Mayr, were the sequences uncorrelated, and this raises again the possibility discussed in the preceding paragraph that the two sequences may in some way have ‘collaborated’ with each other to augment learning.

In Mayr’s (1996, Experiment 2) study, a sequence of locations and a sequence of objects were apparently learned without any decrement when they were combined (but independent). Participants responded in the training stage to the identity of a target (square, triangle, or circle), which could appear in any of three locations at the vertices of an imaginary triangle on the display. For one group, there was a structured object sequence and an uncorrelated structured location sequence; for a second group, the object sequence was structured while the location sequence was random, and for a third the location sequence was structured while the object sequence was random. There was a concurrent tone-counting task for all participants, and the sequences were deterministic. In test blocks, sequence learning (of either the object or location sequence) was evaluated by replacing a structured sequence with a random one. Mayr found that location and object sequence learning were at least as large (indeed, somewhat larger) in the first group compared with their relevant controls in the remaining two groups, and concluded that the learning of two regularities can proceed without decrement. This in turn implies that there is no common limited-capacity resource that needs to be called upon by the two learning modes.

The problem with this conclusion is that it assumes that a random sequence places no demands on the learning system. An alternative interpretation of the data is that both sequence and object learning make attentional demands, and that these demands are equivalent in the experimental group for whom both sequences were structured and in the control groups in which one or other sequence was random. It may be an intrinsic property of the learning system that it always makes an effortful attempt to learn about contingencies, even when those contingencies are random. Put another way, what we need to know is how much location sequence learning—relative to the experimental group—there would be in a group in which there was only a single object and hence where no object sequence learning could possibly take place; and how much object sequence learning—relative to the experimental group—there would be in a group in which there was only a single location and hence where no location sequence learning could possibly take place. Mayr’s experiment does not provide the answer to these questions and hence does not prove that two independent sequences can be learned without cost.

Although we have argued that our data provide convincing grounds for doubting the attentional-independence claim, we should acknowledge yet further possibilities. First, it is possible that our own results came about not because the secondary-task is attention demanding, but because it disrupts the temporal structure of the task; perhaps the symbol-counting processes engaged in by participants disrupt their perception of the organization of the primary sequence (Hsiao & Reber, 2001; Rah, Reber, & Hsiao, 2000; Stadler, 1995). The problem with this suggestion is that it is hard to imagine how the attentional-independence claim can be tested without introducing a secondary task that disrupts the timing and organization of the location sequence. Proponents of the claim would need to suggest some way of testing it that avoids this eventuality. Second, perhaps there are two separate forms of learning, a fast variety (revealed in our experiments), which makes attentional demands, and a slower variety (revealed in Jiménez and Méndez’ experiments), which does not. This possibility is consistent with the data we have presented here and needs to be considered in further research.

It was noted in the Introduction that two senses of the term attention should be distinguished, namely attention as limited-capacity resources (studied here) and attention as a selection mechanism (not investigated here). Although we would argue that the case for the independence of implicit learning from general-purpose attentional resources has not been proved, it remains possible that implicit learning is—unlike explicit learning—distinguished by its independence of selectional control. Does the evidence support this idea? Briefly, the existing evidence is contradictory. Jiménez and Méndez (1999) claimed that selective attention is necessary for implicit sequence learning. An element of their findings on which we have not commented thus far is that their participants only learned the contingencies programmed between shapes and locations when they had to count the shapes, and not otherwise. Counting requires selective attention to the shape of each target, which was presumably ignored by participants who did not have to count. Hence the lack of shape-location learning in single-task participants is evidence that implicit learning can only occur for objects that come within the focus of selective attention (for similar findings in a different implicit learning task, see Jiang & Chun, 2001). On the other hand, Cock, Berry, and Buchner (2002) presented some contrasting evidence. They presented participants with a primary sequence (e.g., with a green target) and a different sequence that was to be ignored (e.g., with a red target) such that two stimuli appeared in each trial. Cock et al. argued that participants were able to learn the unattended sequence and hence concluded that selective attention is not necessary for implicit learning. Plainly, more research is needed to make sense of this conflicting pattern of findings.

In conclusion, this study has challenged both the notion that implicit sequence learning is able to proceed without attentional resources and the idea that it is accurately characterized by a lack of awareness of the products of learning. The concept of implicit learning continues to be ambiguous and requires further investigation in order to formulate an empirically justifiable definition that distinguishes it from explicit learning.


The research described here was supported by grants from the United Kingdom Economic and Social Research Council and from the Biotechnology and Biological Sciences Research Council.

Copyright information

© Springer-Verlag 2005