Introduction

One crucial aspect of motor performance is the ability to learn sequences of movements. Typically, motor sequence learning is studied using button-pressing tasks such as the serial reaction time (SRT) task or the discrete sequence production (DSP) task, in which participants are required to respond to single stimuli presented visually on a screen. However, in daily life we simultaneously encounter multiple sources of sensory information across different modalities.Footnote 1 Whereas the effect of bimodal, congruent stimuli has been extensively explored with respect to trial by trial performance in simple and choice reaction time (RT) tasks (e.g., Frens et al. 1995; Giard and Peronnet 1999; Rowland and Stein 2007), far less is known about the impact of such stimulus pairs on sequence learning across trials. In the current study we explored whether congruent and temporally synchronized visual and tactile stimuli enhance learning of a sequence of actions in an SRT task.

In its basic form, the SRT task requires participants to respond fast and accurately by pressing the buttons corresponding to the locations of successively presented visual stimuli (e.g. Nissen and Bullemer 1987). Unbeknownst to them, however, stimulus presentation is structured, and reaction time (RT) decreases with practice. To differentiate sequence learning from general practice effects, a random block of stimuli is inserted at the end of the practice phase. The cost in terms of RT and/or accuracy (i.e., sequence effect) of this random block relative to its surrounding sequence blocks serves as an index for sequence learning. Often, participants are unable to express their sequence knowledge in other ways than reflected by RT and accuracy scores, and learning is said to (partly) have taken place implicitly.

The nature of the representation underlying implicit learning is still being debated. Whereas response-based learning is the dominant and best documented account in literature (e.g., Bischoff-Grethe et al. 2004; Grafton et al. 1995; Nattkemper and Prinz 1997; Rüsseler and Rösler 2000; Willingham 1999; Willingham et al. 2000), recently support is mounting also for sequence learning that involves stimulus features: response-effect learning (e.g., Stöcker et al. 2003; Ziessler and Nattkemper 2001) and perceptual (location) learning (e.g., Deroost and Soetens 2006; Mayr 1996; Remillard 2003). This prompts investigation on the effects that different sensory environments have upon sequence learning (e.g., Abrahamse et al. 2008; Jiménez and Vázquez 2008; Robertson and Pascual-Leone 2001; Robertson et al. 2001). Robertson and colleagues (Robertson and Pascual-Leone 2001; Robertson et al. 2001) recognized the fact that we are continuously surrounded by multiple sources of sensory information in the real world. They explored sequence learning in an SRT task in which required responses were signaled through redundant position and color cues. They reported that, compared to either single cue condition (position or color), sequence learning was augmented with combined position and color cues.

The latter supports the notion that perceptual-motor skill acquisition can benefit from multiple sources of congruent information, at least within the visual domain. However, it remains unclear whether these findings would extend to congruent stimuli presented through different sensory modalities. It is known from simple detection and choice RT tasks that presenting congruent stimuli across modalities sometimes results in additive or even superadditive sensory interactions (e.g., Miller and Ulrich 2003; Santangelo et al. 2008; Stein and Meredith 1993), indicating that information from the different sensory sources gets integrated along the time-course of S-R processing. This integration of bimodal stimuli has been found to occur both at early and late(r) sensory-perceptual processing stages, and seems to be conditional on the spatial proximity and/or temporal synchrony of the separate stimuli (e.g., Atteveldt et al. 2007; Harrington and Peck 1998; Helbig and Ernst 2007; Teder-Sälejärvi et al. 2005; Murray et al. 2005). From the notion that sensory information plays a role in the formation of the representations underlying sequence learning (e.g., Clegg 2005; Remillard 2003), one may expect that the enriched perceptual events that follow from (integrated) bimodal stimuli produce stronger sequence representations than those obtained with single stimuli.

Recently, Abrahamse et al. (2008) introduced a new version of the SRT task in which stimuli were presented tactilely to the fingers, and learning was compared to the typical visual version of the SRT task. Sequence learning was reliably observed for both stimulus conditions, but it appeared to be better for the condition with visual stimuli. In a subsequent transfer phase, for both visual and tactile training groups we assessed transfer of sequence learning to the other modality. It seemed that transfer was perfect from tactile to visual stimuli, but only partial the other way around. As we will elaborate on below, though, these findings deserve some closer inspection because of methodological issues.

In the current study, we extended the study of Abrahamse et al. (2008) by adding a condition in which congruent visual and tactile stimuli were presented simultaneously. Hence, participants were trained either with congruent visual and tactile stimuli (bimodal training group), with visual stimuli only (visual only training group), or tactile stimuli only (tactile only training group). This allowed us to investigate the employment by the cognitive system of redundant visual and tactile stimuli, each of which has been shown to produce sequence learning when presented alone (i.e., Abrahamse et al. 2008). In a subsequent transfer phase, transfer to all three stimulus conditions (i.e. visual, tactile and bimodal transfer test) was assessed for each training group. The transfer of sequence knowledge to new conditions is one of the major tools in exploring the nature of sequence learning (Clegg et al. 1998). Thus, exploring whether sequence knowledge acquired in one stimulus condition could readily be applied to different stimulus conditions provides indications on the nature of the representation underlying sequence learning. In this respect, the transfer test to the initial training condition offered a clear baseline for transfer. Additionally, comparing across identical stimulus conditions at transfer allows controlling for effects of the training stimulus condition on just the expression of sequence knowledge: It has been shown a number of times that sequence knowledge is better expressed under some experimental conditions than others (e.g., Deroost et al. 2009; Frensch et al. 1998).Footnote 2 Finally, and closely related to the latter, assessing performance across one or more identical stimulus conditions allows comparing performances with more or less similar baseline RTs, thereby circumventing the debate of whether differences in baseline RTs should be considered in determining the amount of sequence learning (some authors have chosen to normalize the data for baseline differences; e.g., Robertson and Pascual-Leone 2001).

We would like to stress that for both the training and transfer phase our main interest was whether the bimodal training group would benefit from the addition of tactile stimuli in comparison to the visual only training group. The bimodal training group was logically expected to show better sequence learning than the tactile only training group due to the availability of visual stimuli (since visual stimuli have been shown to produce better sequence learning than tactile stimuli only; Abrahamse et al. 2008).

As a minor aim of the current study, the transfer phase allowed us also to explore in more detail the findings and interpretations of the study by Abrahamse et al. (2008). First, in our previous study we reported better sequence learning for participants training with visual stimuli than for those training with tactile stimuli. However, we never tested both training groups simultaneously under identical stimulus conditions in the transfer phase. Therefore, we were unable to distinguish between genuine differences in sequence learning versus differences in just performance. The second observation we want to further examine is the seemingly partial transfer from visual to tactile stimuli, while transfer appeared perfect the other way around. Abrahamse et al. (2008) tested transfer by comparing between performances from the training phase and a subsequent transfer phase, thus with unequal amounts of training. Moreover, blocks in the training and transfer phase comprised unequal amounts of trials, possibly affecting the expression of sequence learning. The current study can provide a cleaner measure of transfer as both stimuli are employed in counterbalanced order during transfer, thus balanced in the amount of training.

To summarize, in the current study a first attempt was made to investigate the role of bimodal stimulus presentation in sequence learning. This acknowledges the continuous stream of multiple sensory inputs we face from the real world. We combined visual and tactile stimuli in a bimodal condition and compared sequence performance to that under single stimulus conditions. As noted above, the most interesting comparison would be between the visual only and the bimodal training groups, examining whether adding tactile stimuli to a typical visual setting is beneficial to sequence learning. Additionally, we attempted to replicate the findings by Abrahamse et al. (2008) in a more elaborate transfer design.

Method

Participants

Sixty-six undergraduates (40 women, mean age of 21 years, three left-handed) from the University of Twente (Enschede, The Netherlands) gave their informed consent to participate in the experiment in exchange of course credit. They had normal or corrected to normal vision. The current study was approved by the local ethics committee.

Stimulus and apparatus

Stimulus presentation, timing, and data collection were achieved using the Presentation 10.1 experimental software package on a standard Pentium© IV class PC. Visual stimuli were presented on a 17 inch Philips 107T5 display running at 1024 by 768 pixel resolution in 32 bit color, with a refresh rate of 85 Hz. From a viewing distance of approximately 60 cm (this was not strictly controlled), placeholders consisted of four white, 1.5º × 1.0º horizontally outlined rectangles with a total width of 8º visual angle, continuously presented on a black background. The target stimulus consisted of the filling in red of one of these rectangular placeholders. Vibro-tactile stimuli were delivered to the fingers by using four miniature loudspeakers, taped to the proximal phalanx of the ring and index fingers of both hands (cf. Abrahamse et al. 2008). Tactile stimuli consisted of clearly detectable 200 Hz triangle-wave vibrations, generated by the computer and amplified by two 2 × 8 W stereo amplifiers. For the bimodal SRT task condition, the visual and tactile stimuli were carefully synchronized using an oscilloscope (onset and offset differences amounted to 0 ± 5 ms). All participants had the loudspeakers attached to the fingers throughout the experiment, in order to hold experimental settings as similar as possible for all three training groups. Furthermore, participants wore headphones presenting white noise at a loudness level that prevented them from using the tones as auditory spatial cues, while a cover over their hands prevented them from seeing their hands.

Procedure

All participants were first tested on a block of randomly presented tactile stimuli, in which they were required to react as accurately as possible. Only participants reaching in this single block a criterion of 95% accuracy were allowed to continue with the main experiment. Then participants were randomly assigned to one of three groups for the training phase, in which an SRT task was performed: the visual only training group (21 participants), the tactile only training group (23 participants), or the bimodal training group (22 participants). In the former two, single stimuli (visual or tactile, respectively) were used as targets in the training phase, whereas both stimuli were presented simultaneously for the bimodal training group. Participants were required to respond with the ring and index fingers of both hands on the A-, F-, K-, and ‘- keys of a QWERTY keyboard to stimuli from left to right, respectively (pilot studies indicated that using adjacent fingers increased errors in the tactile condition). A correct response was defined as the participant pressing the appropriate key within a 1.5-s time limit. Erroneous responses were signaled to the participants, after which the next stimulus was presented at a 1-s interval. Stimulus presentation always continued until a response was given. Short 30-sec breaks were provided in between blocks. The training phase consisted of a pseudo-random block, 10 sequence blocks, a pseudo-random block and finally another sequence block, for a total of thirteen blocks. The increase of response time in the pseudo-random block 12, relative to the mean response time of blocks 11 and 13, was used as an index for sequence learning. During sequence blocks a 12-item second order conditional (SOC) sequence (242134123143; numbers from 1 to 4 are denoting stimulus locations from left to right) was repeated nine times for a total of 108 trials per block. The pseudorandom blocks consisted of a series of nine different SOC sequences, with no element and sequence repetitions allowed. Pseudorandom blocks were never repeated for the same participant. The response-to-stimulus interval (RSI) was always 210 ms.

After this training phase, participants were tested in a fully within-subject design for transfer to each of the three stimulus conditions, i.e. a transfer test with just visual stimuli, a transfer test with just tactile stimuli, and a transfer test with combined visual and tactile stimuli (bimodal transfer test). The order of these three transfer tests was counterbalanced across participants. For each transfer test, three blocks of stimuli were presented: a pseudo-random block, a sequence block, and another pseudo-random block. The sequence block in every transfer test involved four repetitions of the same 12-item sequence as practiced in the training phase, for a total of 48 trials (less trials were used than in the training phase to reduce sequence learning in the transfer phase as much as possible). The pseudo-random blocks in each transfer test now consisted of a series of four randomly picked SOC sequences, with no element and sequence repetitions allowed. Again, pseudo-random blocks were never repeated for the same participant. In all other aspects the transfer phase was identical to the training phase.

After the transfer phase, participants were tested for their awareness of the practiced sequence by the process dissociation procedure (PDP; Destrebecqz and Cleeremans 2001) task. The PDP consisted of two free generation tasks of 96 key presses, first under inclusion instructions (i.e. participants were required to reproduce as much of the experimental sequence as possible), and subsequently under exclusion instructions (i.e. participants were required to avoid the experimental sequence as much as possible). In the latter task, participants received the additional instruction that no strategy was allowed to facilitate performance (such as constantly repeating a small and unfamiliar set of key presses). For each participant the same stimuli were used in the PDP task as in the training phase. In order to enhance motivation, a €20- reward was promised for the five participants performing best on the PDP task (see Fu et al. 2007).

Results

Erroneous key presses and correct responses with RTs three standard deviations above the mean were excluded from analyses. This eliminated less than 5% of the data in both the acquisition and the test phases. Then, for each of the remaining participants, mean RTs and error percentages (PEs) were calculated for each block in both the training and transfer phases.

Awareness

An awareness score was calculated for both the PDP inclusion and exclusion tasks by counting the number of 3-element chunks (which constitute the basis of an SOC sequence) corresponding to the SOC sequence used in the training phase, and dividing this number by the maximum number of correctly produced chunks of three (which is 94), in order to create an awareness index ranging from zero to one.

A mixed ANOVA was performed on awareness scores for the PDP, with Task (2; inclusion versus exclusion) as within-subject variable, and Training Group (3; visual only training group, tactile only training group and bimodal training group) as between-subject variable. This produced a reliable Task main effect, F(1,63) = 12.5, p < 0.01, indicating more correctly produced chunks of three sequence elements in the inclusion (mean awareness score = 0.45) than the exclusion task (mean awareness score = 0.38). The main effect for Training Group, and the more important Task × Training Group interaction were far from significant (ps > 0.80). We then compared the inclusion and exclusion scores (collapsed across groups as there were no reliable group differences) to chance level (0.33), demonstrating that both inclusion, t(65) = 6.7, p < 0.001, and exclusion scores, t(65) = 5.8, p < 0.001, exceeded chance level. Thus, overall, there are indications of both explicit (i.e. the inclusion score exceeding the exclusion score) and implicit (both inclusion and exclusion scores exceeding chance level) sequence learning. Importantly, however, sequence awareness did not differ reliably between training groups.

Training

Blocks 2 to 11

Mean RT’s were analyzed for Blocks 2 to 11 (see Fig. 1) in a mixed ANOVA with Block (10; Blocks 2 to 11) as within-subject variable and Training Group (3; visual only training group, tactile only training group and bimodal training group) as between-subjects variable. This indicated reliable main effects for both Block, F(9,567) = 25.7, p < 0.001, and for Training Group, F(2,63) = 20.1, p < 0.001. There was no significant Block × Group interaction (p = 0.50). The main effect of Block confirmed learning during training. With regard to the Training Group main effect, subsequent post-hoc analyses (Tukey HSD) showed that the tactile only training group responded slower in general than both the visual only training group, p < 0.001, and the bimodal training group, p < 0.001, whereas there was no reliable difference between the visual only and the bimodal training groups (p = 0.98).

Fig. 1
figure 1

Mean reaction times (ms) for the visual only, tactile only, and bimodal training groups in the training phase. Blocks 1 and 12 are pseudo-randomly structured, while the rest is sequential

Similar analyses on PEs indicated that the tactile only training group produced more errors on average than the visual only training group, F(1,42) = 9.5, p < 0.01, and a strong tendency to produce more errors than the bimodal training group (p = 0.06). Across all blocks and all groups, PEs never exceeded 5%.

In conclusion, the time course of learning appeared the same for the different training groups, but participants in the tactile training group were generally slower in responding than the visual only and bimodal training groups.

Blocks 11/13 versus block 12

The critical comparison with respect to sequence learning is between the mean of Blocks 11 and 13 and the mean of Block 12 (see Fig. 1). A mixed ANOVA was performed with Block (2; mean of Blocks 11 and 13 versus Block 12) as within-subject variable and Training Group (3; visual only training group, tactile only training group and bimodal training group) as between-subject variable. Reliable effects were found for Block, F(1,63) = 190.9, p < 0.001, Training Group, F(2,63) = 20.7, p < 0.001, and the Block by Training Group interaction, F(2,63) = 3.4, p < 0.05. The main effect of block indicated reliable sequence learning overall. The main effect of Training Group was rooted in slower RTs in general for the tactile only training group than for both the visual only, F(1,42) = 21.9, p < 0.001, and the bimodal training groups, F(1,43) = 24.5, p < 0.001. Further investigation of the Block by Training Group interaction revealed larger sequence effects for both the visual only (sequence effect = 60 ms), F(1,42) = 6.5, p < 0.05, and the bimodal training groups (sequence effect = 56 ms), F(1,43) = 3.4, p < 0.05, than for the tactile only training group (sequence effect = 38 ms). There was no reliable difference in sequence effect between the visual only and bimodal training groups (p = 0.51).

Similar analyses on PEs showed that sequence learning was also reflected in PEs, F(1,63) = 35.9, p < 0.001, but no reliable differences were observed between training groups (p = 0.91). Finally, there was a tendency for the tactile only training group to produce more errors in these final three blocks of the training phase than the visual only and bimodal training groups (p = 0.06).

Overall, sequence performance during training was better with either visual or visual/tactile combined stimuli than with only tactile stimuli. Most importantly, however, the bimodal training group did not show a reliable benefit from the addition of tactile to visual stimuli.

Transfer

Transfer scores were calculated for each participant and for each transfer test (visual, tactile, bimodal) by taking the difference in RT between the sequence block and its two surrounding pseudo-random blocks (see Fig. 2). Thus, transfer scores indicate how readily sequence knowledge from the training phase can be applied across the different stimulus conditions in the transfer phase.

Fig. 2
figure 2

Mean transfer scores (ms) for the visual only, tactile only, and bimodal training groups across transfer tests, indicating the mean difference in RT between a sequence block and its two surrounding pseudo-random blocks. Error bars depict standard errors

One-sample t-tests (test-value = 0) showed positive transfer to all three stimulus conditions for the visual only training group, t(20) > 2.9, p < 0.01, for the tactile only training group, t(22) > 3.5, p < 0.01, and for the bimodal training group, t(21) > 1.8, p < 0.05.

Then we performed a MANOVA with the three transfer scores (visual, tactile and bimodal) as dependent variables, and with Training Group (3; visual only training group, tactile only training group, bimodal training group) as a fixed factor. This produced a reliable effect for Training Group, F(6,122) = 2.5, p < 0.05. Exploring this effect in more detail, reliable differences were observed between training groups only on the bimodal transfer scores, F(2,63) = 5.7, p < 0.01, but not on the visual and tactile transfer tests (ps > 0.20). Further exploration showed that both the visual only training group, t(42) = 2.4, p < 0.05, and the bimodal training group, t(43) = 4.3, p < 0.01, showed better transfer to the bimodal transfer test than the tactile training group. There was no difference between the visual only and bimodal training groups on the bimodal transfer test (p = 0.80).

Comparable analyses with just the visual only and bimodal training groups, the main comparison of interest in this study, also showed more or less comparable sequence learning on the two remaining transfer tests (i.e., visual and tactile; p > 0.18). Thus, this strengthens the observation from the training phase that the bimodal training group did not benefit from the additional availability of the tactile stimuli when compared to the visual only training group.

As mentioned above, a second aim was replicating the findings from Abrahamse et al. (2008). Comparing the visual only and tactile only training groups across the visual and tactile transfer tests showed no reliable differences (p > 0.40). This indicates that the difference found in sequence effects during training with visual versus tactile stimuli in both the current study and in Abrahamse et al. (2008) mainly reflect performance differences, and not reduced sequence learning in the tactile training group. Finally, paired-sample t-tests between the visual and tactile transfer scores for the visual only training group showed smaller sequence effects on the visual than the tactile transfer test, t(20) = 2.1, p < 0.05, whereas for the tactile only training group more or less similar sequence effects were observed for the visual and tactile transfer tests (p = 0.25). The latter findings replicate those from our previous study (Abrahamse et al. 2008).

Analyses on PEs provided no new information, as all reliable differences were in the same direction as the findings on RTs mentioned above (and thus no speed-accuracy trade-offs occurred). For the sake of brevity we decided not to report them.

Discussion

The current study aimed at exploring the impact of adding congruent tactile stimuli to a typical visual SRT task, knowing that tactile stimuli by themselves can produce reliable sequence learning (Abrahamse et al. 2008). This investigation is particularly interesting as sequence learning in the real world is likely to be guided by multiple sources of sensory information. From the notion that stimulus information has a significant role in sequence learning (e.g., Clegg 2005; Remillard 2003) we predicted that congruent bimodal stimuli would enhance the strength of sequential representations. However, no indications were observed here that the combination of tactile and visual stimuli affected the amount and/or nature (i.e. explicit versus implicit) of sequence learning, as compared to single visual stimuli. Performance on the SRT task was highly comparable for the bimodal and the visual only training groups, even when assessed under identical stimulus conditions in the transfer phase. Moreover, no differences were observed on the PDP task, indicating that the groups did not differ significantly in sequence awareness either.

It has been shown several times that stimulus information plays a role in sequence learning, at least under some conditions (e.g., Clegg 2005; Remillard 2003). This prompted investigation of the effects of multiple congruent stimuli in sequence learning, an issue touched upon before only by Robertson and colleagues (i.e., Robertson and Pascual-Leone 2001; Robertson et al. 2001). They observed that sequence learning was enhanced in a condition with congruent cues (i.e., location and color) relative to single cue conditions. Why, then, did sequence learning not benefit from combined visual and tactile stimuli in the current study? One could argue that the visual/tactile combination did not enable sufficient integration of the two sources because of the spatial disparity between cued locations. In other words, it could be that participants were unable to effectively divide their attention across both the visual and tactile stimulus locations, therefore strategically selecting the visual stimuli to focus on (due to visual dominance). This can explain why sequence learning in the typical visual setting did not benefit from the addition of tactile stimuli, as well as accounting for the differential findings of Robertson and colleagues. However, we believe that some notions need consideration in light of this possibility.

Tactile stimuli were presented directly to the fingers, nearby the response locations. It seems hard to believe that attention was not focused on these locations. Moreover, tactile stimuli are highly pregnant, and therefore unlikely to be fully ignored. More importantly, Cock et al. (2002) simultaneously presented two stimuli at different locations of a horizontally outlined array, only one of which was task-relevant (indicated by the color). Presentation of both stimuli followed independent sequences. Despite the spatial disparity of stimuli, participants learned the sequence of locations of the task-irrelevant stimulus (as indicated by negative priming effects when this stimulus was made task-relevant in a transfer phase). This indicates either that spatial attention is not a strict prerequisite for sequence learning, or that spatial attention can be effectively divided across different locations. Finally, because of their high temporal synchrony, one could expect the visual and tactile stimuli to become integrated as one percept, regardless of their spatial disparity. This may very well enable sufficient processing of both stimuli. Indeed, it is known from simple detection RT tasks that integration of stimuli can occur on the sole base of temporal synchrony (e.g., Murray et al. 2005). So, even though spatial disparity may be a logical and fertile issue to explore in future research, we would like to postulate two additional explanations for the absence of any benefit of the addition of tactile stimuli.

First, it may be that the tactile stimuli are so strongly S-R compatible (i.e., they are presented directly to the finger to respond with) that they need no elaborated processing on the level of stimulus features (including stimulus location). Hence, they may only produce substantial processing at response-based stages that are shared with the S-R processing for the visual stimuli, and not at any stages related to sequence learning that are not already engaged by the visual stimuli.

Second, it may be that visual and tactile sequence learning (partly) develop in different sensory modules of information processing, that independently enable speeding up of responses. If that is the case, then the relative speed of processing within each module becomes relevant: if one of the modules is much slower than the other, than little or no benefit can be taken in addition to a much faster working module. Clearly, in the current study that may have been the case, as tactile stimuli by themselves generally produced much larger response latencies than the visual stimuli. This notion would be in line with a recent race model proposed for sequence learning in the DSP task (Verwey 2003), in which it is indeed proposed that different modules exist for sequence learning that all race each other in producing the next response. So, whereas the current study provides a start in exploring the effects of congruent bimodal stimuli on sequence learning, further research is needed to determine the underlying mechanisms in more detail.

The current findings also relate to some other issues that deserve to be discussed here briefly. It was observed that sequence performance for the visual only and the tactile only training groups was more or less similar when compared under same stimulus conditions in the visual and tactile transfer tests (see below for a possible explanation on why this was not the case in the bimodal transfer test). Thus, in contrast to the claim by Abrahamse et al. (2008) that visual stimuli produce better sequence learning than tactile stimuli (as appeared to be the case also in the training phase of the current study), the current study seems to indicate that the smaller sequence effect for the tactile stimuli mainly reflects impaired sequence performance, rather than sequence learning (for similar ideas, see Deroost et al. 2009; Frensch et al. 1998; Hoffmann and Koch 1997). In other words, sequence learning is expressed differentially with visual and tactile stimuli. This may be explained by taking into consideration a short-cut model of sequence performance. It has been suggested that sequence knowledge may work to (partly) circumvent or facilitate processing stages by priming the next response. More specifically, a clear candidate would be the response selection stage (e.g., Clegg 2005; Pashler and Bayliss 1991). As tactile stimuli in the current study were more S-R compatible than visual stimuli (the latter needing a more demanding spatial translation, as the former are directly mapped to the fingers to respond with), they may require less demanding response selection processing than their visual counterparts. Thus, if sequence knowledge serves (among others) to circumvent or facilitate response selection, more benefit can be taken of this sequence knowledge with visual than tactile stimuli. This would explain the performance differences observed in the current study.

Another observation from Abrahamse et al. (2008) that was tested here in a more elaborate transfer design was the seemingly partial transfer from visual to tactile stimuli, and the perfect transfer the other way around. These findings were replicated in the current study, but the interpretation may need some consideration. Abrahamse et al. claimed that the partial transfer from visual to tactile stimuli indicated a modality-specific component of sequence learning in the typical visual SRT task. Of course this remains a solid interpretation, thereby strengthening the notion from Abrahamse et al. (2008) that sequence learning cannot easily be explained by pure response location learning (i.e., Willingham et al. 2000) and that stimulus information has a role, too. However, in line with the idea discussed above that the benefit of sequence knowledge may be larger for visual than tactile stimuli (due to the differences in S-R compatibility), the lower sequence effect of the visual only training group in the transfer test with the tactile stimuli than in the transfer test with the visual stimuli could also just be performance differences. This issue motivates further research.

We believe it is important to note here that, in line with earlier studies (e.g., Deroost et al. 2009; Frensch et al. 1998), the current study indicates that sequence effects can not always readily be taken as a clean index for the amount of sequence learning, but rather reflects a combination of the amount of sequence learning and the task-dependent constraints for expressing this knowledge. Therefore, comparing sequence learning across different task-variations should be taken with the necessary caution.

Finally, it was observed that the tactile only training group could not transfer its sequence knowledge to the bimodal transfer test as well as the visual only and bimodal training groups. This probably does not reflect differences in the amount of sequence learning, as sequence learning was comparable between the training groups on the two further transfer tests (i.e., the visual and bimodal transfer tests). Therefore, it seems that the participants who trained with tactile stimuli were unable to fully express their sequence knowledge in the bimodal stimulus condition. This might be due to a conflict in selective attentional processing. Typically, the visual stimuli are easier to process than the tactile stimuli, and therefore probably preferentially selected by naïve participants. However, during training the tactile only training group became highly familiar with responding to the tactile stimuli, thereby producing a selection conflict in the bimodal transfer test. It has been suggested before that certain task changes may affect participants’ sense of control, causing them to (temporarily) suspend all ongoing automatic processes (e.g., Abrahamse and Verwey 2008). The conflict arising in the bimodal transfer test, then, may have drawn participants from the tactile training group to partly suspend implicit learning effects, and return to controlled stimulus-response processing. However, we agree that this issue needs more exploration.

Overall, the current study is another step in moving towards an ecologically more valid approach of the SRT task, in line with other recent studies (e.g., Chambaron et al. 2006; Jiménez and Vázquez 2008; Witt and Willingham 2006). Comparing between visual stimuli only, tactile stimuli only, and a combination of congruent tactile and visual stimuli, it partly replicated and extended earlier findings from Abrahamse et al. (2008). Most importantly, it showed that a combination of congruent tactile and visual stimuli does not produce better sequence awareness, sequence learning or sequence performance than single visual stimuli. Additionally, opposed to what was claimed by Abrahamse et al. (2008), rather than sequence learning it seems the expression of sequence learning that is impaired with single tactile stimuli compared to single visual stimuli.