Remembering a series of instructions or action commands is a common requirement across a range of different educational and practical settings. Recent research on following instructions (FI) has suggested that this ability draws heavily on working memory (Gathercole, Durling, Evans, Jeffcock, & Stone, 2008; Jaroslawska, Gathercole, & Holmes, 2018; Jaroslawska, Gathercole, Logie, & Holmes, 2016a; Yang, Allen, & Gathercole, 2016; Yang, Gathercole, & Allen, 2014). In these studies, participants typically receive verbal instructions involving series of operations to be performed on objects or card shapes, such as “Touch the yellow bag and then spin the blue rule,” and then recall the actions in serial order. During encoding, observing actions performed by the experimenter (Addis & Schacter, 2008; Lui et al., 2017; Wojcik, Allen, Brown, & Souchay, 2011), watching on-screen demonstrations (Yang, Allen, Yu, & Chan, 2015) and physically performing the actions oneself (Addis & Schacter, 2008; Allen & Waterman, 2015; Jaroslawska, Gathercole, Allen, & Holmes, 2016b; Lui et al., 2017) all facilitate recall. During retrieval, recall by physical action substantially improves recall accuracy relative to spoken repetition (Allen & Waterman, 2015; Gathercole et al., 2008; Jaroslawska, Gathercole, Allen, et al., 2016a; Koriat, Ben-Zur, & Nussbaum, 1990; Yang et al., 2014). These action-based effects have been found in FI studies with healthy populations (Allen & Waterman, 2015; Gathercole et al., 2008; Waterman et al., 2017; Yang et al., 2014; Yang et al., 2016) and clinical groups with lower working memory capacity (Charlesworth, Allen, Morson, Burn, & Souchay, 2014; Lui et al., 2017; Wojcik et al., 2011; Yang, Allen, Holmes, & Chan, 2017). Such effects indicate a superiority of action/visual-motor-based representations relative to verbal-only storage.

To date, studies of memory for instructions have focused on strict forward serial recall, and none have explored how it may vary with recall direction. Instructions and action sequences do not necessarily always need to be followed in their precise original order, and often can be reordered or reversed (for example, housework, gardening, or retracing of an individual’s action steps in order to identify an earlier error). This ability to flexibly arrange serial actions can be important in learning new skills. Reordering may involve holding items and their original order in temporary memory storage and then rearranging the order of the items according to requirement. This is exactly the function of working memory, a flexible mental workspace for storing and manipulating information to achieve goals (Baddeley, 2012; Cowan, 2001). Exploring how individuals are able to rearrange instruction sequences for recall, and how this varies across different encoding and retrieval contexts, will extend understanding of FI ability beyond the current focus on forward serial recall and provide new insights into how this ability might be optimized in different conditions. It will also contribute to understanding of serial ordering mechanisms more generally.

Most studies investigating effects of reordering have focused on verbal or visuospatial sequences of individual items. For verbal sequences, higher scores are often obtained in forward than backward recall (Baker, Tehan, & Tehan, 2012; Haberlandt, Lawrence, Krohn, Bower, & Thomas, 2005; St Clair-Thompson, 2010; St Clair-Thompson & Allen, 2013; Wilde & Strauss, 2002), though some studies have observed similar performance levels in the two types of recall (Anderson, Bothell, Lebiere, & Matessa, 1998; Bireta et al., 2010; Guerard & Saint-Aubin, 2012; Li & Lewandowsky, 1995; Thomas, Milner, & Haberlandt, 2003). Forward recall may represent a relatively simple, verbal-oriented sequential task that involves rehearsal (and refreshing) from the first item to later items. In contrast, backward recall requires storage of all the items as well as manipulation of order, a set of processes that may draw more on executive control resources (Sun et al., 2005) and visuospatial imaging (Li & Lewandowsky, 1995; St Clair-Thompson & Allen, 2013). Forwards recall (typically using digits) is often used as a measure of simple verbal storage while backward recall is considered a measure of complex working memory ability (e.g., Berry, Waterman, Baddeley, Hitch, & Allen, 2018; St Clair-Thompson, 2010, for a brief review). In terms of retrieval strategy, forward recall involves a retrieval process moving from first to last item in equal steps (i.e., a single-step strategy) while backward recall may involve repeated covert forward recall to the desired items (i.e., a multiple-scan strategy; Thomas et al., 2003). While research has tended to focus on the negative effects of reversed order recall on overall accuracy, performance may also benefit later sequence items by reducing interference and loss suffered under forward recall, emerging as a large recency effect (e.g., Anderson et al., 1998; Brown, Neath, & Chater, 2007; Page & Norris, 1998; Thomas et al., 2003; Vandierendonck, Kemps, Fastame, & Szmalec, 2004).

In contrast to the pattern of reduced recall accuracy that is typically observed in verbal experiments, visuospatial studies have consistently produced equivalent performance in forward and backward recall of the sequence of spatial locations (Cornoldi & Mammarella, 2008; Vandierendonck et al., 2004, Experiment 3; Wilde & Strauss, 2002). Indeed, direct comparison between modality types has indicated higher overall accuracy in forward than in backward digit recall, in contrast to similar performance levels in forward and backward recall of spatial locations using the Corsi block-tapping task (Kessels, Berg, Ruis, & Brands, 2008).

Unlike digits, words, or spatial locations, action representations are multimodal and contain information from different domains (e.g., verbal, visuospatial, and motor), thus the component cognitive processes may be more complex. Memorizing an action sequence involves representing these multidimensional action chunks in order, which is likely to be more complex than pure verbal or spatial sequence. According to Keele, Ivry, Mayr, and Hazeltine, (2003), complex sequences are maintained by a unidimensional system in charge of implicit learning within dimensions as well as a multidimensional system that connects different modality domains. Goal-directed selective attention facilitates the integration of sequential information from various dimensions. This view corresponds to the multimodal nature of action representations, with action chunks possibly retained within the episodic buffer, a temporary modality-general store in working memory (Allen & Waterman, 2015; Baddeley, 2000; T.-x. Yang et al., 2016). Besides domain-specific content, a novel action sequence also requires storage of serial order. Evidence has suggested that order information is formed during action planning before execution, represented as a primacy gradient of action features (Averbeck, Chafee, Crowe, & Georgopoulos, 2002; Fournier, Gallimore, Feiszli, & Logan, 2014). However, it remains unknown whether and how representations of complex instructions would be affected by the direction of recall. We were also interested in how the emergence of action-based effects might be mediated by recall direction.

In sum, there is currently no understanding of how memory for instruction and action is affected by the requirement to reverse recall direction, and it remains to be explored how the direction of recall interacts with action-related benefits. In exploring such benefits, we focused on the enacted-recall advantage (i.e., recalling instructions by physical enactment over oral repetition) and the demonstration advantage (i.e., superior memory performance following demonstrated than spoken instructions). We report three experiments examining these issues. In the first experiment, we investigated the influence of recall direction (forward vs. backward) and recall modality (verbal and enacted recall) on memory for spoken instructions. In the second experiment, we examined these effects in demonstrated instructions using the same design. Finally, Experiment 3 replicated these manipulations using a within-subjects design. We also, for the first time, report examination of response timings (in Experiment 3), to cast further light on the temporal processes involved in following instructions, and the impact of action-based manipulations and recall direction. While previous studies have primarily attributed action-based effects to factors at the encoding stage (Koriat et al., 1990), it remains unclear whether it is also associated with benefits at recall, through a faster and more efficient retrieval process. Serial memory research has previously used temporal measures to reveal the distinctive retrieval processes of forward and backward digit recall (Anderson et al., 1998; Haberlandt et al., 2005; Thomas et al., 2003); including such measures in the present study should also advance our understanding of the retrieval process of instruction sequences and associated action-based effects.

Experiment 1

In this experiment, we aimed to investigate the effect of recall direction (forward vs. backward) and recall modality (verbal vs. enacted recall) on following spoken instructions, using the following instruction span task. Based on previous studies (Allen & Waterman, 2015; Gathercole et al., 2008; Jaroslawska, Gathercole, Allen, et al., 2016b; Koriat et al., 1990; Yang et al., 2014), we predicted superior performance in enacted recall compared to verbal recall (i.e., enacted-recall advantage).

What might be predicted for the effects of recall direction? On the face of it, a simple prediction might be that backward recall results in reduced accuracy, reflecting the more complex nature of this task. In line with the modality differences that have previously been observed (e.g., Allen & Waterman, 2015; Gathercole et al., 2008; Jaroslawska, Gathercole, Allen, et al., 2016b; Yang et al., 2014), this might particularly emerge for verbal recall, while the nonverbal nature of enacted recall may remain unaffected by recall direction (Cornoldi & Mammarella, 2008; Vandierendonck et al., 2004, Experiment 3; Wilde & Strauss, 2002). However, a different pattern of results is also possible. The present manipulation of direction is necessarily based on forward or backward recall of whole action chunks (e.g., “pick up the red pencil”), rather than reversed recall of the individual elements. It is possible that reversal of chunk order is less demanding than backward recall of every word or digit, and therefore that performance decrements relative to forward recall will not be observed.

Method

Participants

Twenty-four native Mandarin Chinese speakers (18 females and six males) with a mean age of 22.67 years (SD = 3.21), and education of 15.71 years (SD = 2.17) participated in the experiment. All of them were right-handed. None of them had hearing problems, or a history of neurological or psychiatric diseases. The study was approved by the ethical committee of the Institute of Psychology, Chinese Academy of Sciences.

Design and procedure

This study applied a 2 (recall direction: forward vs. backward recall) × 2 (recall modality: verbal vs. enacted recall) within-subjects design. Each condition was presented in a separate block and participants completed the four conditions in counterbalanced order. The dependent variable was recall accuracy, represented by the number of action–object pairs correctly recalled in the correct serial position (summing across trials in each condition, gives a score range of zero to 126).

The instruction span task, including the materials and basic procedure, was adapted from the spoken-instruction subtest as used in a previous study (Yang et al., 2015). In the instruction span task, participants listened to the instructions and then recalled them according to the type of condition (see Fig. 1).

Fig. 1
figure 1

The procedure and task setting in Experiment 1 and 2. The laptop for displaying demonstration videos is placed in the middle front of the table

The instructions contained a series of actions associated with a set of objects. These objects were placed on the table in two rows: the front row included six smaller objects from left to right (a white eraser, a yellow ruler, a blue ruler, a green eraser, a red pencil, and a black pencil) and the back row included six containers from left to right (a white basket, a yellow basket, a blue folder, a green folder, a red bag and a black bag). There were five types of actions (touch, push, drag, spin, pick up . . . put it into . . . ).

The span task involved six blocks of trials, with each block containing six trials involving sequences with the same number of action–object pairs. The sequences in the first block contained only one action–object pair (e.g., “Spin the green eraser”), the second block contained two action–object pairs (e.g., “Pick up the red pencil and put it into the white bag”), the third block contained three pairs (e.g., “First, pick up the green eraser and put it into the blue folder, then touch the yellow bag”), the fourth block contained four pairs (e.g., “First, touch the blue folder, and then push the yellow basket; finally, pick up the black pencil and put it into the red bag), and so forth. The task included four parallel instructional lists. Spoken instructions were recorded by a native Chinese female speaker at a moderate speed of approximately 350 ms per word, and the durations for the presentation of sequences of one to six actions were 3, 5, 8, 11, 13, and 16 seconds, respectively. In each condition, a participant sat at a desk facing the objects. The speakers were placed at another desk away from the participant where the experimenter sat and controlled the delivery of instructions. A video camera was set up behind the participants to record the entire experiment. The experimenter first introduced the task, and this was followed by a practice of object naming and operations. Participants were told not to repeat the instructions aloud, touch, operate, or move the objects during encoding. The experimenter informed the participant of the task condition and then played the to-be-remembered instruction sequence through speakers. A blank screen would appear at the end of the trial, and the participants then recalled the instructions according to the type of the condition.

In the forward recall conditions, participants were required to repeat the instructions in the same serial order as the presentation, whereas they recalled the instructions (verbally or by enactment) in the opposite order in the backward recall conditions. The reversal of order was performed on action–object chunks, with word sequence within a chunk being unchanged. The “pick up. . . put it into. . .” were concatenated actions and cannot be separated in backward recall, but they were still scored as two actions for all recall conditions. In each condition, participants started from a one-chunk instruction and progressed to the next span if they correctly recalled four out of six trials at a given sequence length (with any skipped trials considered to be correct); otherwise, the test of this condition ended.

Results

Descriptive results of recall accuracy are displayed in Fig. 2. A 2 × 2 (Recall Direction × Recall Modality) ANOVA indicated a significant main effect of recall modality, F(1, 23) = 51.36, MSE = 143.78, p < .001, ηp2 = 0.69, emerging as superior performance with enacted recall relative to verbal recall (i.e., an enacted-recall advantage). The main effect of recall direction was not significant, F(1, 23) = 1.50, MSE = 283.61, p = .233, ηp2 = 0.06, but there was a significant interaction between recall direction and recall modality, F(1, 23) = 7.01, MSE = 135.52, p = .014, ηp2 = 0.23 ). This interaction was mainly driven by a superiority for backward over forward recall in verbal recall conditions (p = .002, ηp2 = 0.35), in contrast to similar performance of forward and backward recall in enacted recall conditions (p = .686, ηp2 < 0.01). The enacted-recall advantage was larger in forward recall (p< .001, ηp2 = 0.70) than for backward recall conditions (p = .004, ηp2 = 0.30).

Fig. 2
figure 2

Recall accuracy (number of action-object pairs) with error bars showing standard errors as a function of recall modality and recall direction in Experiment 1 and Experiment 2. Exp = Experiment

Discussion

As predicted, participants achieved higher recall accuracy when enacting rather than verbally recalling instruction sequences, replicating previous findings (Allen & Waterman, 2015; T. Yang et al., 2014; T.-x. Yang et al., 2016; T.-x. Yang et al., 2015).

Similar performance levels were found for forward and backward recall of instructions. However, this only held true for enacted recall conditions, while backward recall actually improved memory of spoken instructions when they were repeated orally. The differential effects of backward recall on verbal and enacted recall reduced the enacted-recall advantage in the forward recall conditions. This improved verbal recall performance when reversing direction of recall, and the reduction in the enacted-recall advantage associated with this, may have occurred for two distinct reasons. Firstly, enactment appears to enable superior recall of later sequence items, relative to verbal recall (Allen & Waterman, 2015; T. Yang, 2011). As backward recall also results in enhanced retrieval of the most recently presented items (Anderson et al., 1998; Ritchie et al., 2015; Vandierendonck et al., 2004), this should aid their verbal retrieval, and therefore reduce any benefits of enacted recall. Secondly, backward recall may be more likely to engage visuospatial coding as a way of supporting the process of order reversal (Li & Lewandowsky, 1995; St Clair-Thompson & Allen, 2013). In this experiment, as it may be comparatively easy to invert spatial locations (Cornoldi & Mammarella, 2008; Vandierendonck et al., 2004, Experiment 3; Wilde & Strauss, 2002), people may be encouraged to draw more on the visual display of objects laid out in front of them to support reversal of the sequence. This greater engagement with visuospatial coding would then provide a particular boost for verbal recall of spoken instructions. When individuals prepare for enacted recall, in contrast, they may form an action-based presentation that similarly relies on visuospatial strategies for both forward and backward recall, which could help explain their similar performance levels in forward and backward recall.

Experiment 2

The first experiment produced an enacted-recall benefit and a backward recall advantage in verbal recall. These effects may be at least partly associated with the availability and utility of visuospatial coding. Experiment 2 therefore tested these effects in a more visuospatial-oriented presentation form (i.e., video clips of demonstrated actions) using the same experimental design, to examine the impacts of recall direction (forward vs. backward) and recall modality (verbal vs. enacted recall) on memory performance.

Based on previous findings (Yang et al., 2015), we predicted an enacted-recall advantage in forward recall following demonstrated instructions, though this may be reduced in magnitude relative to that observed in Experiment 1 (using spoken instructions). Compared with spoken instructions, demonstration is more likely to engage visual, spatial, and motor components of working memory, which may more easily allow sequential reversal compared with verbal sequences. Moreover, backward recall of spatial locations has been shown to have equivalent accuracy to forward recall (Berch, Krikorian, & Huha, 1998; Vandierendonck et al., 2004). Thus, we predicted that reversed ordering of serial operations of objects in different locations may not be very difficult, thus resulting in similar levels of performance for forward and backward recall condition. We also collapsed the data of Experiments 1 and 2 to explore whether the effects of backward recall and enacted-recall advantage would vary with the presentation modality (spoken vs. demonstration instructions).

Method

Participants

Twenty-four native Mandarin Chinese speakers (14 females and 10 males) with a mean age of 22.46 years (SD = 3.47) and 15.79 years of education (SD = 2.52) participated in the experiment. None of the participants attended Experiment 1. All were right-handed, and none had hearing problems, or a history of neurological or psychiatric diseases.

Design and procedure

The instruction span task was adapted from the demonstration instruction subtest in a previous study (Yang et al., 2015). The materials and procedures resembled Experiment 1, except that the demonstrations of instructions were silent video clips involving series of actions upon objects. The durations of instructions of each span were similar to those in spoken instructions.

Results

Descriptive results of recall accuracy are displayed in Fig. 2. The 2 × 2 (Recall Direction × Recall Modality) ANOVA showed a significant enacted-recall advantage, F(1, 23) = 15.30, MSE = 219.62, p = .001, ηp2 = 0.40. The main effect of recall direction, F(1, 23) = 0.24, MSE = 265.55, p = .630, ηp2 = 0.01, and its interaction with recall modality, F(1, 23) = 0.07, MSE = 195.67, p = .795, ηp2 < 0.01, were not significant.

A cross-experiment analysis was conducted to examine the effect of presentation modality (spoken vs. demonstration). The 2 × 2 × 2 (Presentation Modality × Recall Direction × Recall Modality) ANOVA indicated superior performance of demonstration than of spoken instructions, F(1, 46) = 9.62, p = .003, ηp2 = 0.17. Presentation modality did not interact with recall direction, F(1, 46) = 0.29, p = .592, ηp2 < 0.01, or with recall modality, F(1, 46) = 2.15, p = .149, ηp2 = 0.05. However, there was a nonsignificant trend for a three-way interaction, F(1, 46) = 3.59, p = .064, ηp2 = 0.07, implying different interactive effects of recall direction and recall modality in spoken and demonstrated instructions.

Discussion

As in Experiment 1, and consistent with the starting hypothesis, the enacted-recall advantage was obtained in demonstrated instructions and the magnitude (ηp2 = 0.40) was smaller compared with spoken instructions (ηp2 = 0.69). Moreover, cross-experiment analysis indicated superior memory following demonstrated than spoken instructions, replicating previous findings (Lui et al., 2017; Yang et al., 2017; Yang et al., 2015).

In contrast to Experiment 1, and consistent with our hypothesis, direction of recall had no effect on overall performance, on verbal recall, or on the enacted-recall advantage. The similar performance levels in forward and backward recall using visual demonstration is somewhat similar to the finding of equivalent performance of forward and backward recall of spatial locations (Berch et al., 1998; Vandierendonck et al., 2004). Our findings thus extend this literature to the recall of serial demonstrated actions in space. Moreover, there was a marginal nonsignificant trend toward a three-way interaction in the cross-experiment analysis, emerging as improved memory performance in backward verbal recall of spoken instructions, in contrast to similar performance of forward and backward recall in other conditions.

While these findings are novel and intriguing, they require replication. Moreover, Experiments 1 and 2 may be contaminated by several methodological problems. First, following previous work in this area (e.g., Gathercole et al., 2008), the instructions included concatenated action (“pick up . . . put it into . . .”), which cannot be independently reversed in the backward recall conditions. Secondly, the span procedure means the number of trials carried out at each sequence length varied between conditions. Thirdly, conjunction words such as “and,” “then,” and “finally” in spoken instructions but not in demonstrated instructions may provide additional information of serial order. In addition, while the cross-experiment analysis suggests a trend for differential effects of recall direction on spoken and demonstrated instructions, this requires replication before any confident claims can be made. These issues were addressed in Experiment 3, while also taking advantage of the use of set sequence length across conditions by incorporating additional analyses of response timings and serial position.

Experiment 3

This experiment had two primary purposes. First, the effects of backward recall on following instructions required replication using a more rigorous and valid task paradigm. In the updated instruction task, the length of instructions was fixed to four action–object pairings. The number of trials for four-pair sequence was also increased to improve the power for analysis. The concatenated action (“pick up . . . put it into . . .”) was removed and replaced with two separate actions (“flip” and “lift”). The conjunction words in spoken instructions (e.g., “and,” “then,” “finally”) were also removed. Second, the cognitive mechanisms of following instructions and the effects of recall direction and modality were examined with a particular focus on the retrieval process. Specifically, reaction-time measures including preparation and recall duration were collected. Previous studies on serial recall of words indicated reaction time to be useful in revealing retrieval process and testing the assumptions of different serial order models (Anderson et al., 1998; Haberlandt et al., 2005; Thomas et al., 2003). In the context of following instructions, preparation duration reveals the retrieval strategies employed in different conditions, and recall duration represents efficiency of output production. Both types of reaction-time measures may also partly reflect strength and accessibility of memory representations. In addition, serial position curves were analyzed as functions of presentation modality, recall modality, and recall direction. These were explored to test the hypothesis that backward verbal recall of spoken instructions increases recency effects and mitigates against decay/output interference during retrieval.

Method

Participants

Twenty-four native Mandarin Chinese speakers (14 females and 10 males) with a mean age of 22.08 years (SD = 2.87) and education of 16.08 years (SD = 2.32) participated in the experiment. None of the participants attended Experiments 1 and 2.

Design and procedure

A 2 (presentation modality: spoken vs. demonstration) × 2 (recall direction: forward vs. backward recall) × 2 (recall modality: verbal vs. enacted recall) within-subjects design was applied. The main dependent variable was recall accuracy (represented by total number of action–object pairs), which ranged from zero to 40 for each condition/instructional list. The reaction-time analysis included preparation duration and recall duration. Preparation duration measures the time between the offset of the last sound/action of an instructional sequence and the first attempt to recall the instructions (the onset of the first output sound by participant in verbal recall, and the start of movement toward the objects in enacted recall). Recall duration was the time between the first attempt to recall and the end of the recalled sequence (i.e., the offset of the last spoken sound by participants in verbal recall, and the hand leaving the last object in enacted recall). Reaction time data was derived from the videos of participants’ behaviors during testing, and was independently coded by two raters (T-X.Y. and Q. Z.) using Observant XT software. The final reaction times included in the analysis was based on the mean of the two raters’ estimates.

The task was similar to the instruction span task employed in Experiments 1 and 2, except for the following changes. First, the instruction sequences were fixed to four action–object pairings. Second, the concatenated action “pick up . . . and put it into . . . ” was removed as it cannot be reversed in the backward recall conditions. The action “touch” was replaced by the more distinctive action “tap,” and two novel actions “flip” and “lift” were added (Allen & Waterman, 2015). There were in total six types of actions, namely, “push,” “pull,” “spin,” “flip,” “tap,” and “lift.” The objects were the same as those used in Experiments 1 and 2. Thus, a typical instruction trial involved a sequence such as “spin the white rubber, tap the blue ruler, push the white basket, pull the green folder.” There was no repetition of actions or objects within a single trial. Eight sets of sequences were generated, and each contained two practice trials and 10 test trials. Set-condition combinations were counterbalanced, and all participants completed the eight conditions in counterbalanced order. Spoken and demonstration instructions were presented in the same way as in Experiments 1 and 2.

Results

Recall accuracy

Descriptive results are displayed in Fig. 3a. The 2 × 2 × 2 (Presentation Modality ×Recall Direction × Recall Modality) ANOVA showed a significant main effect of presentation modality, F(1, 23) = 29.70, MSE = 16.96, p < .001, ηp2 = 0.56, with superior memory for demonstrated relative to spoken instructions. There was also a significant enacted-recall advantage, F(1, 23) = 41.36, MSE = 20.66, p < .001, ηp2 = 0.64. The main effect of recall direction was not significant, F(1, 23) = 0.21, MSE = 15.44, p = .650, ηp2 = 0.01. There was a significant interaction between recall direction and recall modality, F(1, 23) = 10.69, MSE = 8.62, p = .003, ηp2 = 0.32. Simple effect analyses indicated an advantage for recalling in backward than forward direction in verbal recall (p = .018, ηp2 = 0.22, in contrast to similar performance levels for across recall directions for enacted recall (p = .156, ηp2 = 0.09). Presentation modality did not interact with recall direction, F(1, 23) = 2.82, MSE = 24.44, p = .107, ηp2 = 0.11) or recall modality, F(1, 23) = 0.16, MSE = 17.14, p = .692, ηp2 = 0.01. The three-way interaction was marginally nonsignificant, F(1, 23) = 3.95, MSE = 15.10, p = .059, ηp2 = 0.15.

Fig. 3
figure 3

Recall accuracy (number of action-object pairs, a), preparation duration (b), recall duration (c) as a function of presentation modality, recall modality, and recall direction in Experiment 3 (error bars are standard errors)

Although this three-way interaction did not reach the traditional p < .05 cutoff for statistical significance, a medium effect size (ηp2 = 0.15) was observable. Moreover, this trend closely follows the similar trend that was observed in the cross-experiment analysis combining Experiments 1 and 2, with backward recall appearing to only boost verbal recall of spoken instructions, and having no effect on memory performance following demonstrated instructions. In order to compare the present findings with those in Experiments 1 and 2, two 2 × 2 (Recall Direction × Recall Modality) ANOVAs were conducted for spoken and demonstration instructions, respectively. For spoken instructions, there was a significant enacted-recall advantage, F(1, 23) = 33.25, MSE = 14.35, p < .001, ηp2 = 0.59. The main effect of recall direction was not significant, F(1, 23) = 2.09, MSE = 24.39, p = .161, ηp2 = 0.08, but there was a significant interaction between recall direction and recall modality, F(1, 23) = 13.69, MSE = 10.96, p = .001, ηp2 = 0.37. Simple effect analyses indicated a superiority for backward over forward recall in verbal recall conditions (p = .002, ηp2 = 0.34) in contrast to similar performance of forward and backward recall in enacted recall conditions (p = .424, ηp2 = 0.03). The enacted-recall advantage was significant and larger in forward recall conditions (p < .001, ηp2 = 0.67) compared with backward recall conditions (p = .069, ηp2 = 0.14). These results closely replicate the findings of Experiment 1. For demonstrated instructions, there was also a significant enacted-recall advantage, F(1, 23) = 16.21, MSE = 23.45, p = .001, ηp2 = 0.41. The main effect of recall direction and its interaction with recall modality were not significant, F(1, 23) = 1.36, MSE = 15.49, p = .255, ηp2 = 0.06; F(1, 23) = 0.14, MSE = 12.76, p = .714, ηp2 < 0.01, respectively, thus closely replicating the findings from Experiment 2.

Serial position curve

For each position, the proportion of correct responses was calculated, ranging from zero to one. The serial position curves as function of presentation modality, recall direction and recall modality are presented in Fig. 4.

Fig. 4
figure 4

Serial positions of actions of Span 4 as functions of presentation modality, recall direction and recall modality in Experiment 3 (error bars are standard errors)

A 2 × 2 × 2 × 4 (Presentation Modality × Recall Direction × Recall Modality × Position) was conducted. For the sake of brevity, we only focus here on effects relating to serial position. There was a significant main effect of position, F(2.35, 53.97) = 23.66, MSE = 0.02, p < .001, ηp2 = 0.51. Position interacted with presentation modality, F(3, 69) = 9.40, MSE = 0.02, p < .001, ηp2 = 0.29. Simple effect analyses indicated that a demonstration advantage was present at the first three serial positions (all p values < .05; ηp2 were between 0.40 and 0.51), but was absent at the last position (p = .885, ηp2 < 0.01). Position interacted significantly with recall direction, F(1.47, 33.76) = 69.10, MSE = 0.09, p < .001, ηp2 = 0.75, but not with recall modality, F(3, 69) = 1.62, MSE = 0.02, p = .193, ηp2 = 0.07. Serial position effects in forward and backward recall indicated opposite patterns, with a large primacy effect in forward recall in contrast to a large recency effect in backward recall. In terms of output positions, simple effect analyses indicated superior memory in backward recall than forward recall for the first two output items (corresponding to the last two presented items in backward recall). There was a significant three-way interaction between position, recall modality, and recall direction, F(3, 69) = 11.12, MSE = 0.02, p < .001, ηp2 = 0.33, with an increasing enacted-recall advantage emerging from first to last positions in the forward recall conditions, and the opposite pattern in the backward recall conditions. There were no other three-way or four-way interactions (all p values > .05).

Reaction time

Descriptive results are displayed in Figs. 3b–c. The average interrater percentage of agreement for each condition was calculated. For a given behavior, 100-ms difference in duration length between two raters was defined as disagreement. For a disagreed behavior, both raters read the rating standard and again rated the behavior independently, which helped some behaviors to reach agreement while others remained in disagreement. After this correction, the average Cohen’s kappa coefficients across participants on duration per sequence for each condition were all above 0.98.

For preparation time, a 2 × 2 × 2 (Presentation Modality × Recall Direction × Recall Modality) ANOVA indicated a significant main effect of recall modality, with shorter preparation time for enacted recall than verbal recall, F(1, 23) = 34.97, MSE = 3.08, p < .001, ηp2 = 0.60. There was also a significant main effect of recall direction, with shorter preparation time for backward than forward recall, F(1, 23) = 6.64, MSE = 0.46, p = .017, ηp2 = 0.22. The main effect of presentation modality was marginally nonsignificant, with a trend of shorter preparation time for demonstration than spoken instructions, F(1, 23) = 3.58, MSE = 1.24, p = .071, ηp2 = 0.14. There was a significant interaction between presentation modality and recall direction, F(1, 23) = 34.08, MSE = 0.25, p < .001, ηp2 = 0.60. Simple effect analyses indicated longer preparation duration for spoken than demonstrated instructions in forward recall conditions (p < .001, ηp2 = 0.44) in contrast to similar preparation time for the two presentation modalities in backward recall conditions (p = .513, ηp2 = 0.02). The preparation duration was longer for forward recall than backward recall in spoken instructions (p < .001, ηp2 = 0.49), whereas it was similar for forward and backward recall in demonstration instructions (p = .080, ηp2 = 0.13). There were no other two-way or three-way interactions (all p values > .05).

For recall duration, there was a main effect of presentation modality, with shorter recall duration for demonstration than spoken instructions, F(1, 23) = 4.39, MSE = 2.12, p = .047, ηp2 = 0.16. There was a significant reduction in recall duration for enacted recall than verbal recall conditions, F(1, 23) = 29.64, MSE = 11.27, p < .001, ηp2 = 0.56. The main effect of recall direction was not significant, F(1, 23) = 0.40, MSE = 3.89, p = .532, ηp2 = 0.02, but it interacted significantly with recall modality, F(1, 23) = 10.92, MSE = 1.17, p = .003, ηp2 = 0.32. Simple effect analyses indicated that recall duration was shorter for forward than backward recall in enacted recall conditions (p = .047, ηp2 = 0.16) but not in verbal recall conditions (p = .299, ηp2 = 0.05). Recall duration was reduced in enacted relative to verbal recall for both forward (p < .001, ηp2 = 0.61) and backward recall (p < .001, ηp2 = 0.45). There was a significant interaction between presentation modality and recall direction, F(1, 23) = 8.54, MSE = 2.85, p = .008, ηp2 = 0.27, with longer recall duration following spoken than demonstrated presentation in forward recall conditions (p = .006, ηp2 = 0.28) but similar duration for the two presentation modalities in backward recall conditions (p = .276, ηp2 = 0.05), and longer duration in backward than forward recall in demonstration conditions (p = .003, ηp2 = 0.32) in contrast to similar duration in spoken instructions (p = .255, ηp2 = 0.06). Presentation modality also interacted with recall modality, F(1, 23) = 18.11, MSE = 2.43, p < .001, ηp2 = 0.44. Recall duration was reduced following demonstrated compared to spoken instructions in enacted recall conditions (p < .001, ηp2= 0.49) but was similar for the two presentation modalities in verbal recall conditions (p = .118, ηp2 = 0.10); and recall duration was shorter with enacted than with verbal recall in both spoken (p < .001, ηp2 = 0.45) and demonstrated presentation conditions (p < .001, ηp2 = 0.57). There was no three-way interaction, F(1, 23) < 0.01, MSE = 3.42, p = .957, ηp2 < 0.01.

Discussion

This experiment replicated the findings in Experiments 1 and 2 using a within-subjects design and an instruction task with fixed length sequences. Firstly, a backward recall benefit was observed for verbal but not enacted recall of spoken instructions (as in Experiment 1), while this pattern was not apparent for demonstrated presentation (as in Experiment 2). More generally, boosts in performance were observed following demonstrated (vs. spoken) presentation, and enacted (vs. verbal) recall. A second purpose of the present experiment was to reveal new insights concerning the cognitive process underlying FI by examining reaction time and serial position curves. These outcomes are discussed together with the findings from Experiments 1 and 2 in the next section.

General discussion

Three experiments were conducted to examine the effect of backward recall on following instructions and its interactions with presentation modality and recall modality. Experiment 1 examined the effect of backward recall on spoken instructions and found it significantly improved performance in verbal recall but had no effect on memory in enacted recall conditions. In contrast, Experiment 2 found that recall direction did not influence memory performance following demonstrated instructions. Experiment 3 replicated the findings from Experiments 1 and 2, and also generated serial position and response timing data with additional implications for FI performance and mechanisms of serial order.

First, we discuss the benefit of backward recall on oral repetition of spoken instructions. The backward recall benefit (observed in Experiments 1 and 3) was assumed to arise primarily from a larger recency effect, relative to forward recall. This possibility was supported by an examination of serial position curves in Experiment 3; as can be seen from the serial position profiles in verbal recall conditions (see Fig. 4), and direct comparison of forward and backward verbal recall conditions in terms of output positions, backward recall improved recall of the final two presented items/first two output items. This supports the notion that the last two items/first two output items in backward recall of digit sequence have a special status (Ritchie et al., 2015). Ritchie et al. (2015) used two memory theories to explain this. According to the scale independent memory, perception, and learning (SIMPLE) model (Brown et al., 2007), recently presented items are more discriminable in their temporal order, thus increasing their distinctiveness and leading to superior memory compared to early-presented items. In Cowan’s (1999) short-term memory model, focused attention can maximally activate the last two items, making them directly accessible in memory (see also Hu, Hitch, Baddeley, Zhang, & Allen, 2014). In addition, as can be observed in Fig. 4, the recency effect in backward recall is somewhat larger following spoken instructions (compared with demonstration conditions), indicating an additional benefit for backward recall of spoken instructions. This benefit may be associated with echoic memory, an ability to retain acoustic information after the sound has disappeared, which is apparent for auditory rather than for visual information (Watkins & Watkins, 1980). Since echoic memory is short lived, people tend to start recall immediately before it fades away, as supported by the reduced preparation time in backward than forward verbal recall of spoken instructions. Backward recall also reduced the duration of verbal recall compared with forward recall. This contrasts with increased recall time during backward recall of words observed by Thomas et al. (2003), interpreted as reflecting covert repeated cycles of forward recall. These findings suggest that, at least in the present experimental context, participants are not engaging in covert forward recall and then “peeling off” the final item in the backwards conditions and may instead be directly working backwards through the sequence using the single-step strategy. Indeed, this would fit with the notion that they are able to benefit from an echoic representation of the final sequence item.

The present results therefore indicate that participants can benefit from availability of an enhanced recency effect when reversing the recall sequence, possibly reflecting increased temporal discriminability and retrieval from the focus of attention. Furthermore, at least for verbal recall of spoken instructions, they may also be able to utilize a short-lived acoustic/echoic memory trace relating to the end-sequence item. However, this account does not explain why other items in the sequence are not recalled with reduced accuracy when reversal is required, as is often observed in other explorations of verbal serial recall (Baker et al., 2012; Haberlandt et al., 2005; St Clair-Thompson, 2010; St Clair-Thompson & Allen, 2013; Wilde & Strauss, 2002). Within the current task context, the presence of objects laid out in front of the participant throughout the task may have increased the likelihood of scaffolding backward recall via visuospatial support (e.g., drawing a path of objects in terms of action sequences). It should be noted that visuospatial processing might also be engaged in reversing purely verbal tasks, such as representing alphabets/digits visually in the mind’s eye (Li & Lewandowsky, 1995; St Clair-Thompson & Allen, 2013). Compared with representing verbal materials in visuospatial working memory, an action–object path may be easier to memorize and maintain in visuospatial form. Thus, the FI paradigm may afford a greater degree of support for reversed retrieval, enabling backward recall to be no worse than forwards, overall.

However, it should also be noted that our manipulation of recall direction was operationalized at the action chunk level rather than the individual word level. It may be that component elements within sequences are particularly difficult to reverse, as this requires the break up and reversal of local associative links between words. In contrast, reversing the order of larger chunks (as in the present study) does not require this, and enables intact production of within-chunk associative structure. It would be informative for future research to directly contrast reordering of individual items versus chunks, and how this might be mediated by the nature of within-chunk and between-chunk structure. On this note, the ordering of action chunks implemented in this study was rather arbitrary, as the chunks were connected by temporal order rather than consequentiality. Reversal of a more meaningful or well-established action sequence may be more difficult, as this would conflict with preexisting knowledge or long-term memory for sequence structure.

In contrast to spoken instructions, backward recall had no effect on memory for demonstrated instructions and enacted recall of spoken instructions. Overall, reversed order of a demonstrated action sequence had no costs or benefits on memory accuracy and preparation time, and only slightly extended the duration of enactment. This contrasts with a number of previous studies using other types of verbal materials with verbal recall (Baker et al., 2012; Haberlandt et al., 2005; St Clair-Thompson, 2010; St Clair-Thompson & Allen, 2013; Wilde & Strauss, 2002), but is largely in concordance with earlier studies of spatial sequence memory (Cornoldi & Mammarella, 2008; Vandierendonck et al., 2004, Experiment 3; Wilde & Strauss, 2002), suggesting some similarities between the serial memory of spatial locations and actions. In fact, the instruction task involves manipulation of objects dispersed in different locations, and previous studies have indicated that eye tracking of the path around the visual environment was a particularly frequent strategy in this task (T.-x. Yang et al., 2016). Therefore, reversal of action sequences partly involves reversal of path configurations, which plays an important role in the backward recall of spatial locations (Berch et al., 1998). Provided the whole pattern of path configuration can be held in memory, reversal of the sequence may be relatively straightforward, a conjecture supported by the mirrored serial position curves in forward and backward enacted recall of demonstrated actions.

Second, the observation of superior memory performance using demonstrated compared with spoken instructions was again replicated (Lui et al., 2017; T.-x. Yang et al., 2017; T.-x. Yang et al., 2015). Moreover, the present study showed that the demonstration advantage occurred irrespective of recall modality and recall direction, suggesting the advantage may arise mainly during encoding. It is speculated that demonstration involves automatic visuospatial and motor encoding, and strengthens bindings between serial position, spatial location, action, and object, leading to a more integrated and robust representation. The overall serial position curves reveal that the demonstration advantages were only present for the first three actions in the sequence. On the face of it, this might imply that the integrated representation provided by demonstration may be particularly useful for maintenance of early and middle in the sequence. However, the absence of a demonstration advantage at the last serial position is likely associated with the large recency effect in the conditions of backward recall and enacted recall of spoken instructions (see Fig. 4). Indeed, when comparing spoken and demonstrated conditions at each serial position in the forward verbal recall condition, demonstration is superior at all positions (all p values < .01). Thus, demonstrated presentation facilitates recall across the entire sequence, unless additional changes to the task context are introduced. In terms of reaction time, demonstration reduced the preparation time for forward (verbal and enacted) recall, implying easier and faster access to the integrated action-based representation that may be provided by demonstration. This may because the tightened bindings of elements in action sequence in demonstrations provided more retrieval routes to the full action-object representation, compared with verbal instruction.

Among all conditions, forward enacted recall of demonstration resulted in the highest levels of memory performance and shortest recall durations. Compared with the other conditions, imitating demonstrated action sequences is intuitive, straightforward, and rapid, and does not involve some of the complex cognitive processes integral to other conditions (e.g., active action planning, order reversal, and input–output modality transformation). Despite these advantages, direct imitation of action sequences still suffered from memory decay/output interference. As shown in the serial position curves (see Fig. 4), enacted recall of demonstrated instructions displayed a clear primacy gradient, which may reflect later sequence items being lost through decay and output interference during performance of earlier sequence positions. This pattern is consistent with the primacy model (Page & Norris, 1998), and the notion that a novel action sequence is represented by a primacy gradient of action representations/parallel feature activations that may be stored in a short-term motor buffer before execution (Averbeck et al., 2002; Fournier et al., 2014; Jaroslawska et al., 2018).

Third, across three experiments, an advantage in recall accuracy for enacted over verbal recall was found, replicating a growing body of findings (Allen & Waterman, 2015; Gathercole et al., 2008; Koriat et al., 1990; Lui et al., 2017; T. Yang et al., 2014; Yang et al., 2017) and indicating it to be a highly robust finding. Serial position curves indicate that the first item was recalled equally well for both recall types, but memory performance for subsequent items dropped more rapidly when oral repetition was required (as in Allen & Waterman, 2015). This rapid loss of later sequence items may reflect faster rates of decay and/or larger interference effects during verbal output. Moreover, a similar enacted-recall advantage was also observed on the two indices of response time. Examination of preparatory interval and response duration (at Span 4 in Experiment 3) indicated that performance is more temporally efficient when recall is enacted, relative to verbal in nature. This is likely to reflect superior performance in the enacted-recall condition overall, as participants will be quicker to start responding and to complete their response sequence when they have a more well-defined and robust representation to draw upon. In addition, the shorter duration of recall observed for enacted recall may also partly reflect the speed/efficiency with which this response type can be implemented. It will be valuable for future work to explore the extent to which enacted recall effects are attributable to encoding-based, storage-based, and retrieval-based processes.

These results are compatible with views of working memory that emphasize how different processing components might be engaged to supplement performance, especially when to-be-remembered item sequences exceed any one component’s storage capacity (e.g., Logie, 2011). While they might be recruited separately, the combination of these different forms of coding might also be drawn together and held in multimodal form, for example, in the episodic buffer within the multicomponent framework (e.g., Baddeley, 2012). This is also consistent with Keele et al.’s (2003) view on learning complex sequences, which involves a within-dimensional learning process as well as a cross-dimensional integrative process. Taking an alternate perspective, the findings also fit with the notion that working memory “capacity” dynamically varies with changes in task, material, and an individual’s repertoire (e.g., Macken, Taylor, & Jones, 2015). Within this framework, recall direction and recall modality would map onto task context, and presentation modality to the materials dimension. While the present study was not explicitly designed to differentiate between different models of working memory, it adds to a growing body of FI research indicating how varying forms of input, representation, and output can combine to impact on performance in a multidimensional, flexible, and dynamic manner.

While a number of useful insights have emerged, the study has two limitations. First, the findings of reaction time and serial position were based on action sequences with fixed length (i.e., four actions), and whether the findings also apply to variable sequence lengths remains to be investigated. For instance, reversing longer action sequences may be more difficult and show decreased performance. Second, the complex nature of the following instructions task involved operational interactions with objects in different spatial locations and thus required serial memory of action-object-location bindings. This has theoretical implications for understanding how complex integrative processing is achieved in working memory for the purposes of further action, and practical importance given the ubiquity of instruction sequences across a range of settings (e.g., the classroom). Nevertheless, future work on recall direction might aim to break down the contributory cognitive subcomponents by using simple tasks focusing on, for example, object memory, or motor movement.

In conclusion, we investigated the impact of recall direction, along with presentation and recall modality, on serial memory for sequences of instructions. We observed an advantage for backward verbal recall following spoken instructions in terms of memory accuracy, preparation, and recall times. In contrast, similar performance levels were obtained in forward and backward recall of demonstrated instructions and enacted recall of spoken instructions, indicating little cost in reversing orders of action-based representations. In addition, the present study replicated previously observed beneficial effects of demonstration and enacted recall and suggested that these action-based impacts are also associated with efficiency and effectiveness during retrieval. These novel findings provide the first evidence concerning backward recall of action sequences and deepen our understanding of the temporal dynamics of working memory for action and instruction.