As dancers turn and leap across the stage, they move in and out of multiple body configurations. An observer of this body movement will simultaneously process both form-based detail of body shape and motion-based detail of the temporal order in which these shapes appear. Visual perception research has demonstrated that such processing of body form and of body motion differs. That is, although an action consists of a sequence of multiple body postures, action perception differs from posture perception. In the present article, we question whether a similar dissociation between actions and postures occurs in visual memory. To date, this question has not been addressed. In part, this may be because visual memory research has focused primarily on static, nonbiological stimuli, but also because research on human action has typically been directed toward perception and production rather than memory. The following sections first review the literature considering visual memory for static and dynamic stimuli, and then review the literature considering visual perception of human action and form. Together, these research areas inform two experiments comparing memory for dynamic actions and static postures alone (Exp. 1) and in the context of other dynamic and static visual tasks (Exp. 2). We concluded that, as in perception, action and postures differ in visual memory, relying on partially dissociable networks.

Working memory: Static and dynamic processing?

Working memory (WM) is defined as the active portion of the long-term neural network within which stimuli are temporarily maintained and manipulated for retrieval (Oberauer, 2009). Recent models of WM have built on the current understandings of visual perception (Jonides et al., 2008), visual attention (Oberauer & Bialkova, 2009), and long-term memory (Cowan et al., 2005; Zimmer, 2008). These models diverge from the concept of WM as containing modality-specific stores, toward the conceptualization of WM as an extension of the perceptual process (Cowan, 2010; Jones, Hughes, & Macken, 2006; Oberauer, 2009; Zimmer, 2008). Thus, it is suggested that the way in which stimuli are initially processed will heavily influence the way in which the stimuli are remembered and retrieved (Engle, 2010; Slotnick & Thakral, 2011; Wager & Smith, 2003; Wilson, 2001). For example, Jones, Farrand, Stuart, and Morris (1995) showed that memory for both visual and verbal stimuli is impaired by a range of distractor tasks irrespective of modality (e.g., both irrelevant speech and spatial tapping) as long as both the primary and intervening tasks involved change over time; that is, the tasks were dynamic. This example and others (see, e.g., Depoorter & Vandierendonck, 2009; Stevanovski & Jolicœur, 2007) is suggestive of “interference-by-process” (Hughes & Jones, 2005). The form of processing needed to perceive and encode the items has as much influence on retention of items as does the modality of item content.

Notable, then, is the observation that whereas some stimuli elicit primarily static (form-based) encoding, such as remembering a configuration of shapes, other stimuli elicit dynamic (temporal-based) encoding, such as remembering the sequential order in which the shapes appeared (Logie, 2011). Typically, when investigating visual WM (VWM), visual and spatial distinctions are made (Luck, 2008). However, the tasks used to target VWM have tended to require static, form-based processing—including, for example, visual pattern tasks in which the participant retains a static pattern of colored squares (Della Sala, Gray, Baddeley, Allamano, & Wilson, 1999; Luck, 2008). By contrast, the tasks used to target spatial WM have tended to require dynamic, temporal-based processing—including, for example, the Corsi-block task, in which the participant retains a sequence of visually identified spatial locations (Berch, Krikorian, & Huha, 1998; Vandierendonck & Szmalec, 2011). These “static–visual” and “dynamic–spatial” associations are not often controlled (Pickering, Gathercole, Hall, & Lloyd, 2001). Therefore, it is plausible that the visuospatial dissociations that have been reported may reflect static–dynamic processing dissociations (Luck, 2008). Confounded in this way, it is difficult to determine which characteristic—visual or static versus spatial or dynamic—is key to retention in WM.

Pickering et al. (2001) proposed that static and dynamic (rather than visual and spatial) VWM processes are separable, suggesting that the two processes may not mature at the same rate, leading to different developmental trajectories. The accuracy of VWM for static and dynamic versions of a visual task was tested with children ranging from 5 to 10 years old. In the static version of the task, the entire visual stimulus was visible at the start of a trial. In the dynamic version of the task, the stimulus became visible across the duration of the trial. Although both tasks were targeting visual WM, accuracy differed on the basis of whether the initial processing was static and form-based or dynamic and motion-based. Specifically, whereas accurate performance on the static, form-based task increased significantly with age, accurate performance on the dynamic, motion-based task did not. Although the 10-year-olds were more accurate than the 5-year-olds in both the dynamic and static tasks, the slope of the increase from 5 to 10 years old was much steeper for the static than for the dynamic task. This result suggests that memory for visual form is dissociable from, and takes longer to develop than, memory for visual motion. Static–dynamic dissociations have also been reported in clinical populations. For example, a sample of patients diagnosed with schizophrenia showed impaired performance on a static, form-based task, as compared with nonschizophrenic controls, in the absence of impaired performance on a dynamic, motion-based task (Cocchi et al., 2007).

Although only a few studies have disentangled visual form and motion in VWM, those that have done this suggest that the retention of visual stimuli is in part determined by the processing network that was dominant when the stimuli were observed—in other words, whether the stimuli were primarily form- or motion-based. When considering memory for body actions, we would expect that the type of processing undertaken during action observation would influence the means by which the actions would be retained.

Memory for body movement: Form plus motion processing?

Although it is relatively simple to determine whether a non-body-based visual task (such as the Corsi-block task) will require static or dynamic processing, it is relatively difficult to determine what form of processing will be required in retaining human actions. This is because human body movement consists of many successive changes in body form across space and time (Adshead-Lansdale, 1988). The perception and recognition of body movement might, then, require both static–form-based and dynamic–motion-based processing. Indeed, whereas biological motion research with point light (PL) stimuli has demonstrated that the perception of coherent body movement can occur in the absence of form cues (Dittrich, 1993), one’s implicit knowledge of the unchanging structure of the human form assists action perception when motion alone is not a useful cue (Blake & Shiffrar, 2007). Typically, whereas static PL stimuli appear to show a random array of dots, dynamic PL stimuli elicit a perception of biological motion, despite no overt body form being present. This form-from-motion effect tends to suggest that the presence of form is not necessary for action perception. Yet, Beintema and Lappe (2002) demonstrated that when the pure motion of PL markers is rendered useless to participants (because the positions of the markers are randomized along the body and changed unpredictably from frame to frame), participants are still reliably able to discriminate between actions. When motion is not useful, knowledge of the human form appears to help structure the perception of different human actions. Similarly, two sequentially presented static pictures of the whole body are sufficient for motion perception, despite the fact that no motion at all is present in the stimulus (e.g., Shiffrar & Freyd, 1993). Thus, both form and motion play important roles in the perception of action, and both are likely to be important in memory for movement.

Neuroimaging and computational research has also suggested that the perception of motion and the perception of body form may initially occur via separable neural pathways that later converge to facilitate a coherent dynamic action percept (Giese & Poggio, 2003). For example, whereas the observation of body posture appears to be associated with extrastriate and fusiform body areas, the observation of action is primarily associated with motion-related areas of the cortex, including posterior superior temporal sulcus (pSTS; Allison, Puce, & McCarthy, 2000; Downing, Peelen, Wiggett, & Tew, 2006; Grossman & Blake, 2002; Peelen & Downing, 2007; Peuskens, Vanrie, Verfaillie, & Orban, 2005; Urgesi, Berlucchi, & Aglioti, 2004). However, some activity is invariably noted within form-processing regions during action observation, suggesting that both motion and form processes do contribute to action perception (e.g., Grossman & Blake, 2002; Peelen, Wiggett, & Downing, 2006). If the processing format influences retention in VWM, it could be expected that, whereas memory for action and form will largely differ, some retention of form will also occur after action observation. This is to say, after observing a dynamic body movement, the observer will have both snapshot memory of the embodied forms and dynamic memory of the forms in motion. In Experiment 1, a change detection paradigm with manipulations of study format (movement, posture) and test format (movement, posture) was used to investigate whether participants could correctly identify an action on the basis of only a single static image. If so, this suggests that when observing dynamic, dance-like actions, static images of posture are encoded and support recognition.

Further to this, if movement perception results in the activation of both form and motion pathways, then disrupting one processing pathway might not have deleterious effects on memory for the action. That is, the recognition of actions on the basis of only form information may be possible when motion information is unavailable or the processing pathway has been blocked (i.e., by being redirected to another task). In this case, action recognition may rely on snapshots of body posture that are available from the form-processing pathway. In Experiment 2, an interference paradigm strategically disrupted static and dynamic processing of dance-like actions in both posture and movement formats. Although body movement involves both form and motion cues, the perception of body movement most prominently activates the visual motion-processing pathway. By contrast, the perception of body postures typically activates the static, form-based pathway, with little input from the motion pathway. If similar processes are active in WM and in perception, the recognition of body movements should be selectively impaired by a motion-based interference task, whereas the recognition of postures should be selectively impaired by a form-based interference task.

Experiment 1

Aim, design, and hypotheses

The aim of Experiment 1 was to determine whether snapshot images of static body posture can be used for action recognition. Dance-like actions were used as an example of complex body movement, involving many changes in body postures across time and space, without being directed toward external objects. A 2 × 2 within-subjects design was used, with manipulations of study format (movement, posture) and test format (movement, posture). Therefore, recognition was compared in four conditions: (1) movement study format to movement test format (movement–movement), (2) movement study format to posture test format (movement–posture), (3) posture study format to posture test format (posture–posture), and (4) posture study format to movement test format (posture–movement). Accuracy (in terms of hit and false alarm rates) and reaction times (RTs) were the dependent variables.

During the study phase, novice (nondancer) participants were presented with sequences of three dance-like actions as either movements (complete dynamic action) or postures (snapshot images of postures; see Fig. 1). During the test phase, participants decided whether a test item showed an old or a new action, in conditions in which the item format (movement or posture) was the same as, or different from, the study format. Specifically, participants were instructed to ignore item format (i.e., whether the item was a movement or a posture) and decide whether the test item depicted the same action as one of the study items. For example, if a study sequence comprised Movement A, Movement B, and Movement C, and the test item was Posture A, the correct response would be “old,” despite the change from movement at study to posture at test.

Fig. 1
figure 1

Experiments 1 and 2: Dance-like action stimuli. Posture stimuli are shown

First, we hypothesized that recognition would be best when study and test format match. Hence, recognition of movement stimuli will be most accurate in the movement–movement condition in which memory for movements is probed by a complete movement at test. Likewise, recognition of action postures should be best in the posture–posture condition in which memory for posture is probed by a posture at test. Since movement stimuli potentially contain more information overall that can be used to support recognition, accuracy was hypothesized to be greater in the movement-to-movement condition than in posture-to-posture conditions.

Second, in both the movement–movement and posture–posture conditions the format of the test stimulus exactly matches the format of the study stimulus, meaning that the processing pathways active during perception are reactivated at test. Recognition rates in these conditions should be high. Alternatively, in the movement-posture and posture-movement conditions, the study stimulus must be transformed in VWM to enable an accurate response to the test stimulus. When a movement is observed, both form and motion pathways should be active. Therefore, accurate posture recognition after encoding of movements during study (movement–posture condition) should be possible, since body posture will have been encoded. However, accuracy should still be poorer than in the movement–movement condition

By contrast, when a single posture is observed at study, only form pathways should be active. Therefore, since movement information is not encoded, movement recognition after the encoding of postures during study (posture–movement condition) should be poor relative to the posture–posture condition. Finally, to provide an accurate response in the posture–movement condition, participants would need to search the movements at test for the posture stimuli that had been encoded. Therefore, in the posture–movement condition, RTs should be slow relative to the movement–movement condition.

Method

Participants

A group of 35 students enrolled in first-year psychology at the University of Western Sydney (M age = 22.0 years, SD = 6.59 years; 32 female, three male) volunteered in return for course credit. All of the participants were naive to the task. Five participants reported some nonprofessional dance experience, primarily involving classes taken during childhood or casual fitness classes (M = 9.8 years, SD = 6.57 years). Four of these participants had some nonprofessional experience specifically with ballet (M = 8.25 years, SD = 6.44 years). Data screening revealed no significant effect of participant dance expertise and so these participants were retained in the sample. All participants reported normal or corrected to normal vision.

Materials and equipment

Dance-like action stimuli consisted of a set of ten video-recorded ballet items performed by an experienced male ballet dancer, used previously by Calvo-Merino, Glaser, Grèzes, Passingham, and Haggard (2005). The raw dance-like items were 3 s in length and involved one performer executing one ballet item. The same male dancer performed all ten movement items wearing fitted black clothing, against a dark blue backdrop. All movements began from roughly the right hand side of the screen (stage left) and travelled toward the left (stage right). An additional two movement items were available but not used in the stimuli set, as the dancer stayed centered within the frame and did not travel right to left. For the dynamic movement items, the raw action items were played in complete format (25 frames per second) for 3 s. For the static items, a single frame capturing the peak of the item was shown on screen for 3 s. The static frame was chosen as the one that represented the critical moment of the action, for example at the height of the jump or full extension of the leg. This posture represented a maximal state of deviation from the start and end position of the body in the dynamic phrase. The postures occurred at a different frame and time point for each item, occurring on average at around 2 s. Each item was presented in the center of the computer screen, surrounded by a black background.

The experiment was run on Lenovo PCs with DMDX software (Forster & Forster, 2003). Responses were made on the computer’s internal keyboard.

Procedure

Participants signed informed consent forms conforming to ethical standards (HREC H9302). Detailed instructions were then given regarding the task. Participants were told that they would observe sets of three dance-like actions presented either as static “photographs” or dynamic “movies,” and that they were to try to remember what action was being performed, regardless of the format in which it appeared.

Each trial progressed in the following way: A blank screen showing the word STUDY appeared. Subsequently, three dance-like actions appeared sequentially in the center of the screen. On any given trial, all of the items were either static postures or dynamic movements at study (i.e., three movement items or three posture items were shown at study, never a combination of the two formats). The word TEST was then displayed in the center the screen, followed by the test item. Participants responded “old” or “new” by using the computer’s shift keys, which were labeled accordingly. Response was time limited, with a response period of 5 s in total, including the 3-s duration of the test item. Participants were instructed to observe the study items and to respond “as soon as they knew,” so that responses could be made at any point during the item. After the response was recorded and the trial completed, a new trial began (see Fig. 2).

Fig. 2
figure 2

Experiment 1, trial sequence. A posture–posture trial is shown. In the study phase, three postures are shown sequentially. In the test phase, participants indicate “old” or “new” to a posture that was or was not in the study sequence

Eight practice trials (one for each study–test combination in old and new trials) with corrective feedback were completed before the task began. A set of capoeira movements, matched for kinematics and limb displacement to the ballet items (Calvo-Merino et al., 2005), were used for the practice trials. In total, 40 experimental trials were presented, with ten trials for each study–test combination. The trials were counterbalanced across participants. Within each set of ten trials, five trials contained a “new” action at test, whereas five trials contained an “old” item at test. At the end of the experiment, participants were debriefed and asked to note any strategies they had attempted to use to help them remember the dance-like actions. The experiment took approximately 35 min to complete.

Data processing and exclusion criteria

The data were categorized into hits (old item correctly identified as old), correct rejections (new items correctly identified as new), misses (old items incorrectly identified as new), and false alarms (new items incorrectly identified as old) for each study–test category. Accuracy was calculated as the proportion of correct responses: (hits + correct rejections)/(hits + misses + correct rejections + false alarms), where chance was equal to .5.

RTs were calculated both from the onset of the test item (oRT) and from the offset of the test posture (pRT). For the pRT measure, the point in time at which the test posture occurred in the dynamic sequence was subtracted from the oRT. Using this measure, we could determine whether participants were responding before, after, or at the time that the test posture occurred in the dynamic action. Negative pRTs would indicate responses occurring before the test posture, whereas positive pRTs would indicate responses occurring after the test posture.

The criteria for exclusion, based on results from pilot testing, stated that participants would be excluded if they had (a) more than six missed responses (i.e., no response made) across the experiment in total (representing 15 % of total trials), and/or (b) more than two missed responses in two study–test conditions, and/or (c) one or more conditions in which the false alarm rate exceeded the hit rate. On the basis of these criteria, five (out of 35) participants were excluded from the analysis (three for criterion c and two for criterion b), leaving N = 30 (M = 22.5 years, SD = 6.97 years; 27 female, three male).

Results and discussion

Accuracy

Table 1 shows accuracy rates across the study–test conditions. We performed a 2 (study format: movement, posture) × 2 (test format: movement, posture) repeated measures analysis of variance (ANOVA), with planned comparisons to test the specific hypotheses (Tabachnick, Fidell, & Osterlind, 2001). With alpha set at .05, the Study Format × Test Format interaction was significant, F(1, 29) = 37.67, p < .001, η p 2 = .57. As predicted, accuracy was greatest when the study and test formats were congruent, and weakened when they were not. Three planned comparisons clarified this interaction.

Table 1 Experiment 1: Accuracy for postures and movements

First, accuracy in the posture–posture condition (M = .87, SD = .13) was significantly greater than accuracy in the movement–movement condition (M = .80, SD = .14), t(29) = 2.13, p = .04, d = 0.52. This is somewhat surprising, since the movement stimuli arguably contained more detail that might be used for recognition purposes than did the single posture stimuli. However, in a limited-capacity VWM system, memory for movements has often been found to be best at around two items (Cortese & Rossi-Arnaud, 2010; Smyth & Pendleton, 1989; Wood, 2007). The extra detail afforded in the movement, as compared to the posture, stimuli may exceed VWM capacity, serving as a distraction rather than a benefit for memory. Alternatively, since the posture stimuli remained unchanging on the screen for 3 s, participants may have had a better chance to process and store details of the represented action.

Second, since both form and action pathways are active during the perception of body movement, we predicted that a movement item might be accurately recognized on the basis of only a static snapshot of the goal posture at test, but that recognition would be best when the entire movement stimulus was available. This hypothesis was supported, with accuracy in the movement–movement condition being significantly greater than accuracy in the movement–posture condition (M = .71, SD = .13), t(29) = 2.92, p = .002, d = 0.66. With specific reference to the hit and false alarm rates, it is notable that accuracy in correctly rejecting “new” items was comparable across the movement–movement and movement–posture conditions (false alarm rates of .22 and .28, respectively). However, the recognition of “old” items from static posture was poor as compared with recognition in the movement–movement condition (hit rates of .83 and .71, respectively). Participants were able to correctly dismiss new items equally in the two conditions, but they could better recognize previously seen items when the complete movement stimulus was available.

Finally, as hypothesized, accuracy in the posture–posture condition was significantly greater than accuracy in the posture–movement condition (M = .70, SD = .14), t (29) = 5.89, p < .001, d = 1.2 (Fig. 3). When postures were observed, no motion was encoded, and therefore recognition in the posture–movement condition required a search of the test item for the correct static posture.

Fig. 3
figure 3

Experiment 1: Mean accuracy for movements and postures (maximum = 1, chance = .5). Error bars refer to standard errors of the means

Reaction time

Figure 4 shows the average RTs from the onset of the test item (oRT). A 2 × 2 repeated measures ANOVA produced a significant Study Format × Test Format interaction, F(1, 29) = 17.86, p < .001, η p 2 = .38. Specifically, the oRT in the movement–posture condition (M = 1,675.97 ms, SD = 463) was not significantly different from the oRT in the posture–posture condition (M = 1,590.49 ms, SD = 408.15), t(29) = 1.49, p = .15, r = .26. This indicates no difference in RTs to postures at test, despite the differences in the format of the study sequence. However, the oRT in the movement–movement condition (M = 2,838.75 ms, SD = 524.10) was significantly faster than the oRT in the posture–movement condition (M = 3,222.17 ms, SD = 682.75), t(29) = 4.38, p < .001, r = .63. As predicted, the time taken to respond to a test item in the movement format was significantly slower after observing postures, relative to movements, at study.

Fig. 4
figure 4

Experiment 1: Mean onset reaction times (oRTs) for movements and postures. Test items had a duration of 3 s. oRTs were constrained to a maximum of 5 s. Error bars refer to standard errors of the means

RTs were also analyzed in relation to the test posture (pRT). Since the test posture captures the goal of the movement in this experiment, it is possible that this posture was used to make recognition judgments even when dynamic movement stimuli were observed. If so, in the movement–movement condition, responses should have been close in time to the appearance of the posture. Figure 5 shows that when the test item was a movement (movement–movement and posture–movement conditions), the response was made around 1 s after the onset of the test posture within the action (M = 909.98 ms and 1,324.73 ms, respectively). In the movement–movement condition, the RT was significantly different from the point in time at which the test posture occurred within the action, t(29) = 9.51, p < .001, r = .87. The frame capturing the test posture was always approximately 2 s into the item. This result suggests that the participants were not responding to recognition of the posture, but rather observed the entire dynamic item before making a response.

Fig. 5
figure 5

Experiment 1: Mean posture reaction times (pRTs) for the movement–movement and movement–posture conditions. Positive pRTs indicate responses occurring after the test posture, and negative pRTs indicate responses occurring before the test posture. Error bars refer to standard errors of the means

In summary, Experiment 1 shows that when body movements are observed, static posture information is encoded, but form information may not be the primary source of movement recognition. Recognition of a movement was most accurate when a complete movement item, rather than the test frame capturing the goal posture, was seen again at test. Additionally, the pRT measure demonstrates that participants observed the entire movement item before responding. This result suggests that, whereas it is possible to recognize an action on the basis of only static posture, it is unlikely that this is the primary manner in which an action is temporarily retained.

Experiment 2

Pickering et al. (2001) hypothesized that the recognition of form (posture) may rely on a VWM mechanism that is dissociable from VWM for the recognition of motion (action). In Experiment 1, the movement items were primarily encoded as “motion-based” stimuli, although some posture-based information was stored. As such, VWM for movements should be most impaired by a secondary task that disrupts dynamic, motion-based processing, in comparison to a secondary task that disrupts static, form-based processing. Alternatively, motion-based processing plays little role in VWM for postures, and therefore VWM for postures should be impaired by a secondary task that disrupts form-based processing, in comparison to a secondary task that disrupts motion-based processing. In Experiment 2, we investigated this hypothesis by using an interference paradigm to selectively disrupt form- or motion-based VWM.

Aim, design, and hypotheses

The aim of Experiment 2 was to dissociate form and motion processing in VWM for postures and movements. As in Experiment 1, participants observed sequences of three movements or postures and judged whether a single test item was old or new. To determine whether different WM systems are recruited, static and dynamic visual interference tasks occurred between study and test. A digital version of the Corsi-block task was used for the interference conditions. Corsi blocks are a set of blue squares that may be selected as targets by the appearance of a black circle. The participant’s task is to remember the pattern of the target blocks. This task was classed as static if all of the targets were presented simultaneously, and dynamic if the targets appeared sequentially across the trial. To consider the effects of interference on memory for movements and postures, only congruent movement–movement and posture–posture trials were included. Therefore, a 3 (interference: control, static, dynamic) × 2 (item format: movement, posture) mixed design was implemented, with between-subjects manipulation of the Interference factor. Accuracy (proportions correct) was the dependent variable.

We hypothesized that (1) if body movements are retained primarily via a dynamic, motion-based VWM, then recognition should be impaired by the dynamic, but not the static, interference task; and (2) if body postures are retained primarily by a static, form-based VWM process, then recognition of body postures should be impaired by the static, but not the dynamic, interference task. The effects of the static and dynamic interference tasks on recognition were compared against a no-interference, control condition.

Method

Participants

A group of 60 students enrolled in first-year psychology at the University of Western Sydney volunteered in return for course credit (M age = 25.6 years, SD = 8.3 years; 43 female, 17 male; 20 in each interference group). All participants were naive to the task. Ten participants reported some nonprofessional dance experience (M = 8.6 years, SD = 4.1 years), six of whom had some prior experience specifically with ballet (M = 8.16 years, SD = 7.19 years). The data screening revealed no significant effect of participant dance expertise, and these participants were retained in the sample. All of the participants reported normal or corrected-to-normal vision.

Materials and equipment

The dance-like action stimuli were the same ones used in Experiment 1. The stimuli for the intervening tasks consisted of two digital versions of the Corsi-block task. Both Corsi-block task manipulations consisted of a white rectangle featuring ten blue squares at irregular locations. In the static Corsi-block task (static task), five black circles appeared simultaneously on five different squares and remained in place for 5 s. After 5 s, the entire static task disappeared from the screen. Alternatively, for the dynamic Corsi-block task (dynamic task), a single black circle appeared sequentially at five different locations. This task was chosen to replicate the dynamic task involving “flashing” matrix targets that had been used by Pickering et al. (2001). In the dynamic task, each black circle had an 800-ms duration and was separated from the appearance of the next black circle by a 30-ms interstimulus interval (ISI). This is within the range of ISIs typically shown to elicit a percept of apparent motion (Shaw, Flascher, & Mace, 1995). The path length of five targets was chosen to fall within, but not exceed, VWM capacity (Cowan, 2010) and to allow the same patterns to be presented in the dynamic and static conditions (larger path sizes would require repetition across blocks, which could not be represented effectively in the static task). Ten Corsi-block patterns were chosen and displayed randomly throughout the experiment. To ensure that the only difference between the static and dynamic task conditions was the simultaneous or sequential appearance of the circles, the same patterns were used for both. Both tasks lasted 5 s in total.

Procedure

The primary (actions) task progressed as in Experiment 1. On each trial in the static and dynamic interference conditions, participants would observe a set of three dance-like actions (postures or movements), observe the interference task stimulus, and then respond to a test item. On any given trial, the test item could be either a dance-like action or a location from the Corsi-block task. The participants were not aware of which test item (dance-like action or Corsi-block location) would be shown at test until the test item was displayed, ensuring that correct performance required memory of both tasks until the test phase (Cowan & Morey, 2007). In the control (no-interference) condition, the test item was always a dance-like action. As in Experiment 1, half of the trials had a correct response of “old,” and half of “new.” For dance-like actions, “old” and “new” were defined as in Experiment 1. For the Corsi-block location, “old” referred to a location that had been occupied during the interference task, whereas “new” referred to a location that had not been occupied during the interference task (see Fig. 6).

Fig. 6
figure 6

Experiment 2: Trial progression. Static interference trials are shown, with posture study format and posture (panel A) or location (panel B) test items. In the study phase (9 s), three key-frame images are shown sequentially. In the interference phase (5 s), the Corsi-block stimulus appears, with five target marked by black circles. In the test, participants indicate “old” or “new” to a posture or a Corsi-block location that was or was not in the study sequence

Before completing the task, participants undertook practice trials with corrective feedback to ensure that the instructions were clear and that the task was understood for each possible study–test condition. In total, ten trials per stimulus condition were completed, allowing for five “new” and five “old” trials. In the interference conditions, an extra ten trials per study–interference condition (i.e., dynamic task with movement study items and dynamic task with posture study items) format were completed, also allowing for testing of five “new” and five “old” Corsi-block locations per condition. The experiment took approximately 30 min to complete.

Results and discussion

Accuracy values across conditions are shown in Table 2, and the mean proportions correct are depicted in Fig. 7. We hypothesized that if dissociable static and dynamic VWM systems are used to retain postures and actions, respectively, then a double dissociation should be observed for accuracy. Specifically, the recognition of movement items should be impaired by dynamic, but not by static, interference (relative to control), whereas the recognition of postures should be impaired by static, but not by dynamic, interference (relative to control).

Table 2 Experiment 2: Accuracy for postures and movements across interference conditions
Fig. 7
figure 7

Experiment 2: Mean accuracy for movements and postures in control (no interference), dynamic, and static interference conditions (maximum = 1, chance = .5). Error bars refer to standard errors of the means

Four planned comparisons with alpha maintained at .05 tested these prespecified hypotheses. As expected, the mean accuracy for recognition of movement stimuli in the control interference condition (M = .80, SD = .13) was significantly greater than the mean accuracy for recognition of movement stimuli in the dynamic interference condition (M = .72, SD = .15), p = .04, d = 0.58, but did not differ from the mean accuracy for recognition of movement stimuli in the static interference condition (M = .79, SD = .15), p = .89, d = 0.04. That is, in support of the hypothesis, the recognition of dynamic stimuli was significantly impaired by a dynamic visual interference task, but was not impaired by a static visual interference task.

However, contrary to our hypotheses, the mean accuracy for recognition of posture stimuli in the control interference condition (M = .88, SD = .09) was significantly greater than the mean accuracy for recognition of postures in both the static interference condition (M = .79, SD = .15), p = .01, r = .74, and the dynamic interference condition (M = .80, SD = .12), p = .01, r = 83. That is, only partial support was found for the hypothesis that recognition of static stimuli would be impaired by performance of a static interference task (supported), but not by a dynamic visual interference task (not supported).

Scores on the intervening static and dynamic Corsi-block tasks were calculated, to ensure that one task was not more difficult than the other. The mean accuracy on the static Corsi-block task (M = .54, SD = .26) was lower than, but not significantly different from, the mean accuracy on the dynamic Corsi-block task (M = .63, SD = .33), p = .31. Therefore, differences in accuracy in the primary recognition task cannot be attributed to differences in difficulty of the intervening tasks.

In summary, the results of Experiment 2 showed partial support for the hypothesized dissociation between VWM for static and dynamic visual tasks. Accuracy was similar on the dynamic and static Corsi-block tasks, indicating that the two tasks were of comparable difficulty. The dynamic Corsi-block task impaired the accurate recognition of actions, relative to control, whereas the static Corsi-block task did not. That is, consistent with our hypotheses, memory for actions is impaired by a secondary task involving change over visual space, but is not impaired by a secondary task in which only static visual form is to be encoded. The complementary situation, however, was not entirely supported. As predicted, accurate recognition of postures was impaired by the static task; however, the recognition of postures was also impaired by the dynamic task.

The lack of a double dissociation in the results of Experiment 2 makes it difficult to definitively conclude that dissociable dynamic and static VWM networks retain representations of actions and postures. Consideration of the hit rate values shows that, for movements, recognition accuracy was poorest in the dynamic interference condition (hit rate of .66). Conversely, for postures, recognition accuracy was poorest in the static interference condition (hit rate of .79). This is consistent with our predictions. Therefore, at least for the recognition of previously seen items, the data indicate some dissociation between dynamic and static processing. However, this result is mediated by the false alarm rates. Both the dynamic and static interference tasks lead to increased false alarm rates for posture stimuli relative to control (from .14 to .24 in both cases). This unexpected increase in the false alarm rates is what disrupts the predicted response pattern overall.

General discussion

In the present experiments, we investigated VWM for actions, considering (1) whether VWM for body movement incorporates memory for body postures (form) and (2) whether memory for body posture and body movement relies on dissociable form- and motion-processing networks. To the best of our knowledge, this is one of the first studies to explicitly compare static and dynamic versions of the same task on memory for action stimuli. The two behavioral experiments reported here indicate that, as in perception, dance-like body actions are likely to be primarily stored as dynamic movements in VWM, although some posture-based information is encoded and available to aid recognition. Two results support this conclusion. First, recognition of body movement based only on posture was possible, but poor relative to recognition based on the entire movement stimulus. Second, when the rehearsal of movement items was disrupted by a form-based task, recognition accuracy was unimpaired relative to control conditions. That is, form-based interference did not impair memory for movements. These results are discussed in turn.

Dance-like actions are more than just postures in VWM

In Experiment 1, participants observed spans of three movements or postures and made “old” or “new” recognition responses when a single action or posture was shown at test. Neuroimaging and behavioral research has identified distinct neural regions involved in the perception of human body form and human body movement (Downing et al., 2006). Although body movement involves multiple changes in body posture over time and space, observation of body movement appears to primarily activate motion-selective neurons in pSTS rather than the body-form-selective neurons in extrastriate body area. Interestingly, no attempt has been made to determine whether similar processing dissociations continue to operate in VWM. If the perception of body form differs from the perception of body movement, then it is plausible that memory for body form will also differ from memory for body movement. This is particularly so, given recent conceptualizations of WM not as a series of modality-specific stores or workspaces, but as an extension of the perceptual process and a bridge to long-term memory networks.

If a similar posture-versus-motion dissociation is evident in VWM, then recognition of actions on the basis of a posture alone should be poor, relative to recognition of actions in complete, dynamic format. This hypothesis was supported: Whereas participants could recognize the action on the basis of posture at above-chance levels, recognition accuracy was significantly poorer than recognition in the movement–movement condition. In the present experiments, postures did not seem to be encoded preferentially when observing dance-like actions. It is likely that with dance expertise (e.g., professional training in ballet) this effect may differ. Since ballet deliberately makes use of goal postures, ballet experts may show enhanced ability to recognize actions on the basis of images of goal posture alone.

The evidence for separation of postures and movements in VWM in Experiment 1 is consistent with fMRI research indicating less activity in posture-processing regions (e.g., extrastriate and fusiform body areas) when continuous movement is observed. When body movement involves many sequential changes in body posture that are biologically plausible, it is perceived as a coherent action sequence and motion-processing regions are recruited. However, if the same changes in body posture occur in a non-biologically-plausible way (i.e., because they are viewed out of order), a coherent action is not perceived, and posture-processing regions are recruited instead (Downing et al., 2006). In research with primates, Jellema and Perrett (2003) demonstrated that pSTS also responds to discrete static images of body posture if the postures follow one another in an appropriate temporal sequence (i.e., each displays the next posture within a single action). In the present experiments, the movements were continuous and biologically plausible. The results of Experiment 1 indicated that identifying a posture as an instance of a studied movement item was a difficult, but achievable, task. Therefore, consistent with the neuroimaging results, the present experiments suggest that even though body form is encoded during movement observation (allowing the posture to support movement recognition), static snapshots of body form are not retained in VWM in preference to complete dynamic movements.

This result is interesting for studies of motor skill learning—for example, in dance, in which a teacher will often demonstrate movement phrases and expect that they be remembered by the learner. Experiment 1 indicates that, although the key forms in the movement may be retained, the learner will be much better at identifying an instance of the movement if it is shown again in complete format than if individual movement parts, including key postures, are shown. Transitions between postures, rather than postures alone, should be emphasized in visual learning of actions (cf. Opacic, Stevens, & Tillmann, 2009)

Dynamic motion processing and static form processing differ in VWM

In Experiment 2, we tested whether actions and postures are different classes of stimuli, encoded and retained by different VWM mechanisms. If so, then this provides an explanation as to why the extraction of postures from encoded movements in the movement–posture condition of Experiment 1 was a difficult and cognitively demanding task, leading to the reduced rates of accuracy observed.

Experiment 2 compared memory for postures and actions in an interference paradigm to examine whether distinct VWM networks exist for the retention of actions and postures. A double dissociation between static and dynamic interference tasks was expected to emerge in the recognition of static (postures) and dynamic (movements) items at test. Specifically, we predicted that actions would be impaired by dynamic, but not static, interference, and that, conversely, postures would be impaired by static, but not dynamic, interference. Partial support was found for the hypotheses, with dissociation between the dynamic and static interference tasks found for recognition of actions, but not for recognition of postures.

The results of Experiment 2 provide some evidence for distinct dynamic and static VWM processes and, although it was not predicted, the observed pattern of interference is not entirely inconsistent with the prior literature. This is because the dynamic interference task is not purely dynamic, but also involves form-based processing. The results obtained from experiments using a dynamic task similar to that used here have previously been interpreted inconsistently with regard to the task feature that is thought to have been primarily encoded. For example, some have argued that the dynamic tracking of sequential locations across time and space is a defining factor of the task (e.g., Cocchi et al., 2007; Pickering et al., 2001). Others—for example, Zimmer and Liesefeld (2011)—have argued that it is not just the dynamic tracking of locations that is stored in memory, but also the aggregation of final landing locations, into a complete static spatial map (see also Vandierendonck & Szmalec, 2011). Essentially, the dynamic task used in Experiment 2 may have required not only encoding of a dynamic pattern, but also a static representation of its final form. If so, both the dynamic and static versions of the Corsi-block task should impair memory for postures, because both tasks require form-based memory. Conversely, dynamic information was only extracted from the dynamic task, with the static task having no explicit motion or requirement for dynamic tracking. Therefore, if action stimuli that have been studied dynamically are stored dynamically in memory, then they should only be impaired by the dynamic task. This interpretation is speculative, but it provides an explanation for why memory for the posture stimuli was impaired by both the dynamic and static interference tasks, whereas memory for movements was impaired by the dynamic task alone.

It is possible here that this effect—encoding of form and motion from the dynamic task—was heightened by the use of a multistatic “dynamic” interference task, instead of one that was smoothly moving. However, given that the test phase required memory for particular locations, this same effect would likely persist, even with a task that did involve smooth movement. In either case, participants would be likely to continue to track individual locations and to store these locations as a static spatial map. The key here is that the spatial map of visual form could be created immediately in the static interference condition, since all of the targets were identified at the beginning of the trial. Alternatively, the spatial map of visual form had to be updated continually in a dynamic manner during the dynamic task, since the targets were sequentially revealed. Thus, the comparison between the static and dynamic elements of the two interference tasks is crucial in the present experiment. That the static Corsi-block task did not impair memory for movements is a further indication that overall form is not the primary feature extracted from a movement. Consistent with Experiment 1, when actions are observed in the study phase of an experiment, movement is encoded rather than the individual postures that comprise the movement.

Despite the lack of a double dissociation between the static and dynamic tasks in Experiment 2, we have some evidence to suggest that body actions may be different from body postures in VWM. This finding is in accord with neuroimaging findings (e.g., Downing et al., 2006) and computational modeling research (e.g., Giese & Poggio, 2003) that has posited dissociable structures for the perception of human bodies and human body movement. In VWM, as in perception, body form and body movement rely on dissociable processing systems.

Future research might extend the results reported here by investigating potential differences between static and dynamic VWM for objects and for human body movements. An assumption made here was that similar VWM resources are used to retain movement and form when they are exhibited in dance-like actions and in the abstract movement of visual objects. There is evidence to the contrary (e.g., Umla-Runge, Zimmer, Krick, & Reith, 2011; Wood, 2007), but much of this research is complicated by the use of static–visual-based or dynamic–spatial-based tasks; that is, the categories are confounded. In the present study, we disentangled these categories and showed impaired recognition of body movement stimuli after encoding visual motion and impaired recognition of body postures after encoding visual form. An experiment specifically manipulating both tasks with body-based stimuli would further clarify the results.

Conclusions

The experiments reported here demonstrate that the perceptual dissociation between body form and movement is also evident in VWM. Furthermore, they show that whereas images of body posture may assist recognition, body form is not the sole means by which dance-like actions are retained in VWM. The present results suggest that, for an observer, particularly one who is inexperienced with the type of movement being observed, dance-like actions are retained as dynamic items in VWM.