Cognitive systems cannot function without memory, and the visual system is no exception. Visual memory comes in many forms, since each stage of visual perception appears to come with its own memory type: from brief iconic memory (Phillips, 1974; Sperling, 1960), via intermediate fragile memory (Potter, 1993; Sligte, Wokke, Tesselaar, Scholte, & Lamme, 2011), to more stable visual short-term (Baddeley, 1986; Wilken & Ma, 2004) and long-term (Konkle, Brady, Alvarez, & Oliva, 2010; Wiseman & Neisser, 1974) memories.

As was pointed out recently by Brady, Konkle, and Alvarez (2011), classic visual perception research has mainly focused on the differences in the nature of representation within the different modules and layers of the visual system, whereas visual memory research has largely focused on finding laws and mechanisms that generalize across specific representational content (e.g., Anderson, Vogel, & Awh, 2011; Cowan, 2001; Nobre et al., 2004; Vogel, Woodman, & Luck, 2006; Woodman & Vogel, 2005). However, given that visual perception delivers the material for visual memory, and given that visual memory appears to heavily recruit sensory areas when trying to maintain visual information (Chelazzi, Miller, Duncan, & Desimone, 1993; Harrison & Tong, 2009; Serences, Ester, Vogel, & Awh, 2009), it would be surprising if the efficacy of visual memory does not depend, at least to some extent, on the specific representational nature of the content. There is now ubiquitous evidence for such links. For example, the capacity of visual working memory depends on the visual complexity of the remembered material (Alvarez & Cavanagh, 2004; Eng, Chen, & Jiang, 2005; Xu & Chun, 2006). Conversely, having to remember more objects goes at the expense of the resolution of the remembered representation (Bays, Catalao, & Husain, 2009; Wilken & Ma, 2004; Zhang & Luck, 2008). Furthermore, memory capacity for individual visual features appears to mildly benefit when these features belong to the same perceptual object, as opposed to different objects (Delvenne & Bruyer, 2004; Luck & Vogel, 1997; Olson & Jiang, 2002; Xu, 2002a, 2002b). Memory for features and locations also benefits from preservation of the overall spatial structure of the memorized features (Jiang, Chun, & Olson, 2004; Jiang, Olson, & Chun, 2000), as well as from grouping those features into coherent spatial layouts (Phillips, 1974; Sanocki, Sellers, Mittelstadt, & Sulman, 2010; Woodman, Vecera, & Luck, 2003; Xu & Chun, 2007). Finally, not only the spatial, but also the feature context may affect memory for an individual item (Alvarez, 2011; Brady & Alvarez, 2011; Brady et al., 2011).

Spatiotemporal context

In the present study, we investigated the effects of the spatiotemporal context on visual working memory. Unlike in the laboratory, where visual objects usually appear as single abrupt instances on a computer screen, in the everyday world, objects typically have a spatiotemporal history. They often gradually move in and out of view depending on whether the object itself moves, another moving object occludes it, or the observer moves. Nowadays, this behavior even extends to virtual objects as presented on smart phones and tablet computers. To help keep a stable percept of the world, it has been proposed that the visual system maintains indices of objects across space and time. These spatiotemporally stable object representations are referred to as object files (Kahneman, Treisman, & Gibbs, 1992; see Pylyshyn, 2001, for a similar framework). In the classic type of experiment (Gordon & Irwin, 1996, 2000; Henderson & Anes, 1994; Kahneman et al., 1992; Noles, Scholl, & Mitroff, 2005), observers preview two stimuli (typically, letters or pictures), each appearing in a box. After the stimuli disappear, the boxes move to new locations. A target stimulus then appears in one of them, and participants indicate whether it matches one of the previewed stimuli. Even though the boxes are irrelevant to the task, observers’ response times (RTs) are reduced for detecting matches if the target also appears in the same box again. Apparently, observers automatically keep track of the boxes and their associated properties (in this case, the identity of the contained stimulus) across space and time. Consistent with this, Yi et al. (2008) found that the spatiotemporal history of an object (in their case, a picture of a face) affected face-specific brain activity in the right fusiform face area. A trial consisted of two consecutive events, each involving the emergence of a face from behind either one of two pillars positioned at the sides of the display. The task was to press a button whenever a face was upside down, which occurred on about one in seven trials. Importantly, the second face could be either the same as or different from the first face, and it could appear from behind the same or the other pillar. Yi et al. found that brain activity decreased when the face was the same in both events. This was expected, since it is common for neurons to show signs of habituation. Importantly, however, this habituation effect was stronger when the same face had also reemerged from the same pillar as it had just disappeared behind. In other words, the neural coding of the face depended, in part, on its spatiotemporal history, despite the fact that the movement was completely irrelevant to the task and despite the fact that motion is computed in rather different brain regions.

Recently, we have shown that the spatiotemporal history of a display can also affect visual search through that display (Schreij & Olivers, 2009, 2013a, 2013b). We presented observers with a search array in which observers looked for a diamond among circles (or vice versa). The array was presented inside a display panel that emerged, in its entirety, from behind one of two walls that flanked the screen (as inspired by the Yi et al., 2008, study). After search, the panel disappeared again behind the wall it came from. The next search display could then appear from behind the same wall or from behind the wall on the other side of the screen. When it appeared from the same side, there was a clear additional speeding of search for repeated target locations or repeated target features, indicating that a repetition of the panel’s motion triggered an attentional bias toward the just-selected target properties.

All these object file results require an explanation in terms of memory, since some object property is stored in relation to the spatiotemporal context of the object. So far, these memory effects have been mainly interpreted as reflecting some automatic and implicit process in which the spatiotemporal context primes the retrieval of the original representation. This then leads to shorter RTs. For example, in our dynamic visual search displays (Schreij & Olivers, 2013a), the target repetition benefits on RTs as induced by the spatiotemporal history of the display object were just as great regardless of whether the target was quite likely to be repeated (50 %) or not so likely (16 %), consistent with little strategic control over the effect.

However, RTs do not reveal whether it is the underlying memory that is stronger or not or some response decision process. Response decisions may benefit when both the target stimulus and the context happen to point in the same direction and, thus, each independently primes the same response. Put differently, at least part of the RT effect may be caused by a context-induced response bias: Observers may be more inclined to respond “same” to a target when it also appears in the same box, without the box necessarily aiding in retrieving or boosting the memory (for similar arguments, see Biederman & Cooper, 1992; and see Schreij & Olivers, 2009, 2013a, 2013b, for ways to untangle such effects). This leaves open the question of whether the memory itself is affected by spatiotemporal context—that is, whether a memory is more or less likely to be lost given such a context. For this, we need an explicit memory task.

There is one study that investigated explicit visual memory in relation to object files. Hollingworth and Rasmussen (2010) presented observers with four boxes in a pseudorandom spatial layout, each containing one uniquely colored patch. Participants were required to remember the colors. After a while, the patches disappeared, while the boxes remained. The boxes then moved around the display such that they exchanged positions, after which the boxes were filled with colors again. The participants had to decide whether one of the colors had changed relative to the memory display. Crucially, the test colors could be positioned in their original locations (as if the boxes had never moved), in their updated positions (thus following the motion of the boxes), or in noncorresponding positions (as if the boxes had been randomly shuffled). Memory accuracy was superior for test items presented in the original positions, despite the intermediate movement. This is consistent with the finding that changing the spatial structure of an array of items affects the memory for the items’ features, such as color (Jiang et al., 2000; Wood, 2009, 2011). Importantly, in one of Hollingworth and Rasmussen’s experiments, overall accuracy for the updated positions condition was better than for the noncorresponding positions condition, indicating that observers had kept track of the boxes.Footnote 1 Hollingworth and Rasmussen concluded that visual working memory is partly aided by the spatiotemporal correspondence between memory and test display, consistent with object file theory. However, an alternative explanation is that, given that disruptions of the spatial configuration can substantially affect memory, observers deliberately chose to track the boxes as much as they could, because there would be a considerable chance (of 1 in 3) that the colors would end up in the eventual configuration. This way they would be able to anticipate the potential spatial configuration of the test display, without there being much to lose.

In the present set of experiments, we provide additional evidence that the spatiotemporal context affects explicit visual memory for color. In contrast to Hollingworth and Rasmussen (2010), in our task, the spatial locations and configuration of the memory items always remained exactly the same from memory to test display, regardless of condition. What was varied was the spatiotemporal history of the entire display panel containing the items. The procedure is illustrated in Fig. 1. Observers first saw a display panel emerge from behind either one of two walls positioned on the left and right of the screen. The panel contained a number of colored disks, which the participants were instructed to remember. The panel then moved back behind the wall it had originally come from. Then a test panel emerged, for which the participants had to decide whether one of the original colors had changed or not. Importantly, the test display could emerge from the same side as the study display or from the other side. The task was unspeeded, and we measured change detection sensitivity (d′), which was dissociated from any response biases (c) (Tanner & Swets, 1954). If the spatiotemporal context contributes to the memory representation, we expect to see a benefit for repeated motion trajectories, as compared with different motion trajectories.

Fig. 1
figure 1

Example trial of Experiment 1. A display containing four, six, or eight colored disks emerged from behind a wall. It stopped for 1,000 ms in the center of the screen, after which it moved back where it came from. After 1,500 ms in which nothing happened, the test display emerged either from the same side (as here, in this example) or from the other side. The participant then decided whether one of the colors had changed relative to the first display (here, not)

Experiment 1

Method

Participants

Twelve VU university students (5 male, 2 left-handed; age = 20–26 years, average = 23.6 years) participated for money (€8 an hour) or course credits. They had self-reported normal or corrected-to-normal acuity and color vision. One participant was replaced because of overall close-to-chance performance (55 %). Sample size was based on common numbers for attention/working memory experiments. We acknowledge that this sample size may be considered small. However, we believe that the replication in Experiments 2 and 3 (based on larger sample sizes) reduces the likelihood of false positives here.

Apparatus

The experiment was run on a Pentium 4 PC. The stimuli were presented on a 19-in. (approximately 35° of visual angle) Iiyama Vision Master Pro 454 CRT screen with loudspeakers, with a refresh rate of 120 Hz and with a resolution of 1,024 × 768 pixels. The “J” and “N” keys on a US keyboard were used to register the responses of the participants. Stimulus presentation and response recording were done in E-Prime 1.2 (Psychological Software Tools, 2003) running under Microsoft Windows XP. The experiment was executed in a dimly lit and soundproof room, in which participants were seated at a distance of approximately 75 cm from the screen.

Stimuli

Images of a wall (7.36° of visual angle wide) were positioned on the left and the right sides of the display, stretching from the top to the bottom of the screen. Behind these walls, there was an evenly colored gray background (4.9 cd/m2). Two square panels containing the memory arrays were placed behind the walls on either side. The contents of these displays would be occluded by the walls and, thus, not visible to the participant, until one of them would slide to the middle of the screen. A small edge of the panel would, however, remain visible to clearly provide the impression of a panel being behind the wall. The displays were 512 × 384 pixels and had a black background and a white (39.0 cd/m2) border of 0.07° width. To generate an impression of depth in the display and to enhance the perception of a real object, a thin shadow was drawn behind it at the right and bottom sides (3.2 cd/m2). Within this display, the memory elements were randomly positioned on any of eight evenly spaced locations on an imaginary circle with a radius of 14.2° of visual angle from a central white fixation cross. These elements were colored disks with a visual angle of 3.1°. The colors were randomly chosen from red, green, blue, yellow, pink, purple, orange, turquoise, gray, and white, with the restriction that there could be no disks of the same color within a display. The test display was the same as the memory display, except that one randomly chosen item could have changed color. If so, the new color was randomly picked from the remaining colors, such that it did not match any of the other colors in the display.

Design and procedure

The trial started with the presentation of an exclamation mark at the center of the screen for 250 ms, to alert participants to the upcoming memory display. After a 500-ms blank, a memory display moved in from behind either the left or the right wall and took 150 ms to move to the center of the screen, where it stopped for 1,000 ms. Memory set size was varied between four, six, and eight disks. Participants were instructed to memorize as many colors as they could from this display. The display then moved back to where it came from, at the same speed as it had arrived. After 1,500 ms, the test display appeared, again with the same speed. The fast speed of emergence and disappearance ensured that observers could not make an eye movement toward the display before it had stopped in the center or pursue it afterward. The test display had a question mark as the central fixation point, so that observers now knew that they had to respond. The crucial manipulation was motion direction: The test display could appear from the same direction as the study display or a different direction. Either the array of colors in the test display was then entirely the same as in the memory display, or one of the colors had changed. The instruction was to press “J” for “yes, a color has changed” (“J” stands for “ja,” which means yes in Dutch) or “N” for “no, no color has changed” (“nee” in Dutch). A high-pitch or low-pitch feedback sound signaled that the response was correct or incorrect, respectively. Responses were not speeded. After the response, the test display moved back to where it came from, and there was a 2,000-ms pause until the next trial started. The experiment started with a 24-trial practice block, after which there were 10 blocks of 24 trials each. In total, there were 20 trials per combination of motion direction, set size, and response alternative (same/different), all randomly mixed within blocks. At the end of each block, participants received feedback on their overall accuracy. Between blocks, participants could take breaks whenever they felt it was necessary. The experiment lasted approximately 35 min.

Results and discussion

Overall accuracy was 88 % for set size 4, 77 % for set size 6, and 69 % for set size 8. Following Pashler (1988), among others, we computed d′ and c in order to separate sensitivity to color changes from any response biases (in accordance with signal detection theory; Tanner & Swets, 1954). We deemed this important because the motion direction (being same or different) itself might invoke a “same” or “different” response (cf. Biederman & Cooper, 1991; Dill & Fahle, 1998). The top panel of Fig. 2 shows d′ for when the memory and test display came from the same side, as compared with different sides, for each set size. A repeated measures ANOVA with the same factors (set size, motion direction) revealed a significant overall decline in performance with memory set size, F(2, 22) = 42.44, p < .001, η p 2 = .794. Moreover, there was a significant interaction with motion direction, F(2, 22) = 3.90, p < .05, η p 2 = .262. As can be seen from Fig. 2, performance showed a marked improvement for same-side memory tests at set size 6. A t-test confirmed this. The difference between same- and different-side tests was reliable for set size 6, t(11) = 3.35, p < .01 (which is also reliable under Bonferroni correction for multiple comparisons), whereas it was not for set sizes 4 and 8 (ts < 1.21). The bottom panel of Fig. 2 shows the response bias, c. A repeated measures ANOVA with, again, the factors set size and motion direction revealed an bias against change, which rose significantly with set size, F(2, 22) = 10.59, p = .001, η p 2 = .491. There was no effect involving side (Fs < 1).

Fig. 2
figure 2

Memory performance (sensitivity d′) and response bias (c) as a function of set size and display dynamics in Experiment 1. A positive bias means one toward “same” responses. Error bars reflect 95 % repeated measures confidence intervals (Cousineau, 2005; Morey, 2008)

The findings reveal an effect of the spatiotemporal history of a display on the memory for color. A test display that was consistent with the memory display in terms of spatiotemporal trajectory yielded better memory performance than when the spatiotemporal history suggested a different display object. This points to an episodic binding of the color memory to the motion context of the entire display. The effect was quite selective, since it occurred only for the intermediate set size (6). One way to explain this is by assuming respective ceiling and floor effects at the other set sizes, leaving little room for further improvement or, respectively, decline in performance. Note, however, that performance at set size 4 was still around 88 %, and at set size 8, it was around 69 %, which does suggest some distance to the respective absolute ceiling (100 %) and floor (50 %). Nevertheless, the absolute scale boundaries may not be the effective floor or ceiling for the mechanism of interest (i.e., the mechanism affected by the spatiotemporal history). Overall, memory performance likely also reflects a number of other processes, such as initial encoding, maintenance, and retrieval, but also response selection errors. Each of these may cause performance to differ from floor or ceiling and, thus, limit the range of effect for the other mechanisms at play (Rouder, Morey, Morey, & Cowan, 2011). For example, if, for the sake of argument, we assume 10 % response selection errors, this leaves 90 % as the ceiling for memory performance. Another possibility, as suggested by one of the reviewers, is that the observed pattern rather reflects changes in observers’ strategy depending on perceived difficulty, within the context of the experiment. Under this account, observers encode spatiotemporal context only at intermediate levels of difficulty. For example, for easy displays, there may be little incentive to spend the extra effort, whereas the more difficult displays may demand so much effort that little remains to also encode and store the spatiotemporal context. In any case, it seemed prudent to replicate the effect. Both Experiments 2 and 3 served this purpose.

A second issue with effects being present only at set size 6 is that six is not really at or near the limits of visual working memory. Visual working memory capacity is typically estimated at three to four items on average (Cowan, 2001; Vogel, Woodman, & Luck, 2001). Most of the benefit would therefore be expected near that limit. Instead, in the present experiment, performance was, overall, quite good at set size 4, with no effect of spatiotemporal history. One potential reason for this is that performance was not based on visual working memory alone. Even though displays were presented only briefly, observers may have made use of verbal codes for at least some of the to-be-remembered colors. In fact, the use of verbal codes may also have contributed to the absence of any spatiotemporal history effects at set size 4, if we assume that spatiotemporal history of the visual display object has less of an effect on semantic than on visual representations (Gordon & Irwin, 1996). On the other hand, it is possible that the effect of spatiotemporal history is driven by verbal codes instead and is not related to visual memory.

A final issue is that the memory improvement may be due not to the spatiotemporal history as such, but to the information available during the motion itself. Note that the colors did not appear abruptly but moved into sight from behind one of the flanking walls. Thus, each of the colors followed a certain motion path. Given that observers have a memory for motion (Blake, Cepeda, & Hiris, 1997; Magnussen & Greenlee, 1992; Zaksas & Pasternak, 2006), the movement of the colors per se could provide additional information, rather than the fact that the colors were part of the same moving object.

To address these issues, we ran the same experiment again, but with some important alterations. For one, we added a verbal suppression task. This should at least reduce verbal coding and move the memory representation more toward the visual domain. Since the use of additional verbal memory would now be discouraged, we also decided to reduce the set sizes. Second, during the actual motion of the displays, the colors of the disks were not visible, since they remained gray. Only when the disks arrived at their final position did they assume their colors. This precludes an explanation in terms of memory for color motion.

Experiment 2

Method

Participants

Twenty-two VU university students (6 male, 1 left-handed; age = 17–28 years, average = 22.0 years) participated for money (€8 an hour) or course credits. They had self-reported normal or corrected-to-normal acuity and color vision. A sample size of 20 was planned because of expected additional noise due to the additional tasks. Twenty-two participants were run to anticipate potential dropouts. In the end, 1 participant was dropped from further analyses because of overall close-to-chance performance (56 %), leaving 21 in total.

Apparatus, stimuli, design, and procedure

The experiment was the same as Experiment 1, except for the following changes. A verbal suppression task was added. To this end, each trial started with the presentation of two random digits between 0 and 8. Nine and 7 were excluded because these involve two syllables in Dutch (“zeven,” “negen”). The digits were shown in black (on a gray background) in standard Courier font (size 24), at the center of the screen, for 300 ms, followed by a 1,000-ms blank. Participants were instructed to repeat them out loud throughout the trial, in a soft tone of voice and at the pace of the ticking of an old clock (i.e., approximately two digits per second). Speech was recorded on each trial (as participants were told beforehand), and we checked afterward whether participants complied with the verbal suppression task by sampling a range of trials for each participant. Because of the additional task and because verbal memory should be suppressed, we expected the memory task to become more difficult. We therefore chose to lower the set sizes from 4, 6, and 8 to 3, 4, and 6. Finally, during the actual motion of the displays, the disks were gray, instead of colored. Only once the display had arrived at its central position did each of the disks assume its color. Due to the gray being nearly equiluminant with most of the colors and due to the speed of the motion, this procedure was not noticeable if one did not know about it (as was informally tested on a few observers).

Results and discussion

Overall accuracy was 90 % for set size 3, 86 % for set size 4, and 74 % for set size 6. Figure 3 shows d′ for when the memory and test display came from the same side, as compared with different sides, for each set size. A repeated measures ANOVA with the same factors revealed a significant overall decline in performance with increases in memory set size, F(2, 40) = 110.684, p < .001, η p 2 = .847. Also in accordance with Experiment 1, there was a trend toward an interaction with motion direction, which is significant under a one-tailed test, F(2, 40) = 2.52, p < .05 (p = .093 when tested two-tailed), η p 2 = .112. As can be seen from the top panel of Fig. 3, performance again showed an improvement for same-side memory tests at the intermediate set size. A t-test confirmed this. The difference between same- and different-side tests was reliable for set size 4, t(20) = 2.11, p < .05, whereas it was not for set sizes 3 and 6 (ts < 1). The bottom panel of Fig. 3 shows the response bias, c. Overall, there was a bias against change, which rose significantly with increases in set size, F(2, 40) = 7.73, p = .001, η p 2 = .279, with all other effects being unreliable (Fs < 1).

Fig. 3
figure 3

Memory performance (sensitivity d′) and response bias (c) as a function of set size and display dynamics in Experiment 2. A positive bias means one toward “same” responses. Error bars reflect 95 % repeated measures confidence intervals (Cousineau, 2005; Morey, 2008)

Experiment 2 saw the addition of a verbal suppression task, yet the same overall pattern of results was obtained as in Experiment 1. This suggests that verbal recoding was not a major factor in determining whether memory benefited from a coherent spatiotemporal trajectory (but nor can we entirely exclude the possibility that verbal memory contributed to the effect). We point out that, again, the benefit was there only for the intermediate set size (here, 4). As was suggested in the discussion of Experiment 1, it is at the intermediate set size where performance has the most room to vary. Moreover, four is the presumed limit of visual working memory and may, thus, be the most sensitive to improvements.

The results suggest that the benefits of spatiotemporal coherence for visual memory are not due to specific memory for moving colors. Here, all disk were gray during the movement toward the center of the display, and thus no memory for the movement of colors could be created.

What remains is the question of what the color memory is exactly bound to. There appear to be at least two possibilities. One possibility is that that the color memory is tied to the display object and, thus, “moves” with the object wherever it is going to. A second possibility is that the color memory is bound to the original trajectory of the display object; that is, it is tied to where it has been coming from rather than going to. Experiment 3 tested these possibilities.

Experiment 3

To test whether the effect of spatiotemporal context on color memory was object based (i.e., tied to the display object) or direction based (i.e., tied to the original trajectory), Experiment 3 largely followed the procedure of Experiment 1 but now included three main conditions. In the same-object–same-motion-direction condition, the display object would appear from, and disappear again behind, the same wall in the memory as well as the test phase of the trial. In the same-object–different-motion-direction condition, the display would appear from one side during the memory phase, then move on to a different side, and reappear from that new side in the test phase. This way, it was the same object (in terms of its spatiotemporal trajectory), but it reemerged from a different direction than on the first encounter. To allow for a sufficient number of directions to move, displays could move not only left and right, but also up and down. For this purpose, pictures of a brick wall were also placed at the top and bottom of the screen (see Fig. 4 for an example). These conditions were then compared with a different-object condition, in which the test display always appeared from a different side than the memory display. If the color representations are tied to a proper object file, we should see a benefit regardless of where the object moves to, as long as it moves in a spatiotemporally coherent fashion. However, if the memory is tied to the specific motion trajectory, retrieval should benefit only if the display object is also coming from the same direction again.

Fig. 4
figure 4

Illustration of the dynamics in a trial of Experiment 3. Here, a memory display moves in from behind the wall at the top. Displays could move behind and emerge from all four sides

Method

All methods were the same as in Experiment 1, except for the following. Sixteen VU University students participated. Originally, 12 were planned, in accordance with Experiment 1. However, the results for the same-object–different-motion-direction condition turned out to be ambiguous, and 4 more participants were planned (after which any effects in this condition had disappeared). Nine were male, and age ranged from 19 to 28 (average = 24) years. We failed to register handedness for this experiment. Set size was limited to 6, since this was the set size for which Experiment 1 revealed an effect. There were now three main conditions, as explained above in the main text. On each trial, the memory display would move out from behind one of what were now four walls (left, right, top, bottom of the screen). After study (1,000 ms), the display then had an equal probability of disappearing behind any of the free walls (excluding the one behind which the other display object was waiting). This meant that it had a 1 in 3 chance of returning to the wall it came from and a 2 in 3 chance of moving on to a different wall. This meant that when the test would appear on the same object (as suggested by the motion trajectory), as was the case on 50 % of the trials, it would arrive from a different direction on two thirds of those 50 % of the trials and from the same direction on one third of those 50 % of the trials. On the other 50 %, the test object was on the other object. We chose this division of trials so that there was no inherent bias by design against different objects or toward same objects coming from the same side. The experiment started with an instruction on the main task and the different motion paths that a display could follow. Subsequently, there were 16 practice trials, followed by eight blocks of 32 trials each, resulting in 256 trials in total. Of these, there were 128 trials in the different-object condition and 128 in the same-object condition. In the latter condition, after study, the display object could move toward any of the unoccupied sides (randomly determined), including going back to where it came from (and thus reemerge from the same side). This resulted in, on average, 84 trials in the same-object–different-direction condition and, on average, 44 trials in the same-object–same-direction condition. As before, one of the colors could change between memory and test displays (“yes” response, 50 %), or they all remained the same (“no” response, 50 %).

Results and discussion

Overall, accuracy was 71 % for the different-object condition, 71 % for the same-object–different-motion-direction condition, and 75 % for the same-object–same-motion-direction condition. Figure 5 shows sensitivity d′ for each of these conditions. A one-way ANOVA on d′ revealed a significant effect of condition, F(2, 30) = 41.8, p < .02, η p 2 = .243. Separate t-tests traced the source of this effect to differences between the same-object–same-direction condition and the different-object condition, t(15) = 2.54, p < .05, as well as between the same-object–same-direction condition and the same-object–different-direction condition, t(15) = 2.17, p < .05. There was no difference between the different-direction conditions, t < 1. Memory performance was clearly better when the test array appeared on the same object and came from the same side. There appeared to be a slight overall response bias against responding change, and this appeared to increase with object sameness (i.e., as for the previous experiments, it was strongest for the same-object–same-side condition, c = 0.209, with c = 0.160 and c = 0.076 for the same-object–different-direction and different-object conditions), but this was far from reliable, F = 1.1. In sum, the findings replicate the main finding of Experiments 1 and 2—namely, a relative memory benefit when spatiotemporal context is repeated. Furthermore, the results indicate that this benefit is quite specific to the display object coming from the same direction again. A change in motion direction that is, nevertheless, consistent with the display being the same object resulted in no better memory than when the motion suggested a different object altogether. This indicates that the memory was not so much object bound as tied to an earlier motion trajectory.

Fig. 5
figure 5

Sensitivity (d′) for each of the three different motion trajectories in Experiment 2. Error bars reflect the 95 % repeated measures confidence intervals (Cousineau, 2005; Morey, 2008) for the difference relative to the different-object condition (the confidence interval for the difference between the same-object–same-side and same-object–different-side conditions was virtually identical)

General discussion

The present work adds to the evidence that the mnemonic representations of visual features do not stand on their own but are embedded in a larger episodic experience that includes locations, other features, or even actions (Brady & Alvarez, 2010; Chun & Jiang, 1998; Hommel, 1998; as also goes for nonvisual information, Godden & Baddeley, 1975). More specifically, it adds to the evidence that visual memory for an object’s features is modulated by, or bound to, the spatiotemporal history of that object (e.g., Gordon & Irwin, 1996, 2000; Henderson & Anes, 1994; Kahneman et al., 1992; Noles et al., 2005). Together with the findings of Hollingworth and Rasmussen (2010), our findings support the idea that spatiotemporal history affects memory for color. Whereas their study showed that people can track the color of individual items that exchange position during the delay period, we show here that the spatiotemporal context affects memory for entire color arrays that themselves retain a constant spatial configuration from memory to test display. Three experiments showed that change detection for a multiple color array was better when the test array emerged from the same direction as the memory array. Experiment 2 showed that this effect also occurred under conditions of verbal suppression and when all color information was removed during the motion. Experiment 3 furthermore suggested that color memory was tied to the original trajectory and not to the display object per se, since benefits occurred only when the dynamics of the display were exactly repeated from memory to test.

Previous work has shown that visual memory benefits from a consistent spatial context, in that memory is best for objects that can be found at the same location as where they were studied (Foster & Kahn, 1985; Hollingworth, 2007; Jiang et al., 2000; Olson & Marshuetz, 2005). Furthermore, such effects of spatial changes may depend on the reference frame. For example, Simons and Wang (1998; Wang & Simons, 1999) conducted experiments in which participants were required to remember an array of objects presented on a circular table. Between the memory display and the memory test, the table could turn to a new position. At the same time, the observer could follow the table, walk to the new position without the table moving, or remain at his or her old position. Memory performance suffered most when the table turned independently of the observer, whereas there was relatively little deterioration when the observers themselves changed viewpoint. This suggests that the spatial context is updated as long as it is the observer who actively moves, and not the environment. In our experiments, the spatial layout of the color array always remained intact from one view to the next, and all dynamics occurred in the display, rather than through the observer. Instead, memory performance depended on the specific history of the display, rather than the current viewpoint and layout. This indicates that at least some history of the environment is taken into account (even though the observer’s own movement may still provide a better memory for spatiotemporal changes).

Object-based memory

Several studies have suggested that there is a benefit for encoding multiple features from the same object, as compared with the same number of features across different objects, or, in a similar vein, that there are no additional costs for remembering more than one feature as long as it belongs to the same object (Delvenne & Bruyer, 2004, 2006; Fougnie, Asplund, & Marois, 2010; Luck & Vogel, 1997; Olson & Jiang, 2002; Quinlan & Cohen, 2011; Xu, 2002a; Xu & Chun, 2007; although see Fougnie & Alvarez, 2011; Wheeler & Treisman, 2002). In these types of studies, “same object” has been operationalized as features presented simultaneously at the same location. In contrast, in our displays, the different colors all occupied different locations. Nevertheless, a similar mechanism may still provide an explanation for the present findings, except that, in this instance, objecthood was defined by the spatiotemporal history of the entire panel (i.e., the presumed temporally stable object file). The different colors can then be seen as different parts of that panel object.

However, the finding in Experiment 3 that the memory improvement did not follow the display object when it moved on to (and then emerged from) a different position than it originally came from is difficult to reconcile with a pure object file account. The core of object file theory is that object identity or feature information remains intact as long as the spatiotemporal continuity of the object is not compromised. In Experiment 3, the motion trajectory of the same-object–different-side condition was entirely compatible with a single object performing two types of motion in sequence, yet there was no memory benefit relative to the different-object condition. There is little comparison in the literature using the more classic object file paradigm in which context-specific letter priming effects on RTs are measured. Those tasks typically use just a single motion trajectory, where the box is visibly travelling from A to B between memory and test displays. A handful of studies has suggested that the object file effect survives trajectory changes (Mitroff, Scholl, & Wynn, 2004, 2005), but such changes were relatively mild and occurred on screen, whereas in our study, the direction change involved a complete reversal when the panel was largely behind an occluder. Object file effects have been shown to survive occlusion, but so far this has been investigated without direction changes (Flombaum & Scholl, 2006; Scholl & Pylyshyn, 1999).

A more straightforward comparison can be made with our previous work on the effects of spatiotemporal history on visual search (Schreij & Olivers, 2009, 2013a, 2013b). In those studies, we used the exact same moving panels, but instead of an explicit visual memory task, participants performed a visual search task. We repeatedly found that the dynamics of the display modulated intertrial location and feature priming of the target, such that when the motion trajectory of the panel was suggestive of object constancy, search was facilitated when the target location or feature repeated. Interestingly, this spatiotemporally driven priming effect now also occurred in the same-object–different-direction condition, which was the exact same condition in terms of kinematics as the one used in Experiment 3 here. It thus seems that the representations used by explicit memory (as measured through memory accuracy scores) and those used in more implicit priming-like memory (as measured through RTs on an in principle unrelated task) may be dissociated with respect to the extent to which they are modulated by spatiotemporal context. At a general level, this fits with the conclusion reached earlier by Mitroff and Scholl (2005) that object file effects do not always follow what people consciously perceive. In that study, observers watched ambiguous motion displays in which two boxes could be perceived either as bouncing or as crossing over. The results showed that the priming exerted by the pretrajectory stimuli on posttrajectory targets followed the bounce trajectory, even when observers consciously perceived the boxes as crossing over. Our results also suggest a dissociation, since we found full object-specific effects (even across trajectory changes) for implicit memory for visual search targets, but not for explicit memory, in the present experiments.

One possible explanation for this dissociation may be the capacity limit of visual working memory in combination with the necessity to track the display objects. In the present memory task, observers had to explicitly remember up to eight colors. This may leave relatively few resources for actively tracking the display object when it follows the inherently more complex motion trajectory in the same-object–different-direction condition. Conversely, tracking the object along a more complex trajectory may impair explicit memory for the colors. Indeed, research has indicated that visual working memory suffers from the requirement to track moving objects (Fougnie & Marois, 2006, 2009; Saiki, 2003). This would explain why the same condition still allows for implicit priming effects, which are thought to be less reliant on limited resources.

Another possibility is that explicit color memory is tied not to an object representation (in this case, the panel and its spatiotemporally coherent path), but to the motion direction per se. Hommel and colleagues (Keizer, Colzato, & Hommel, 2008; Keizer, Nieuwenhuis, et al., 2008) have found evidence of the episodic binding of faces and houses to motion direction. For example, seeing a face move in the same direction as a house moved on the previous trial activated not only the fusiform face area, but also the parahippocampal place area sensitive to houses. The findings of Yi et al. (2008), as treated in the introduction, may be interpreted in the same way. They found that neural adaptation to a specific face (as measured by a decrease in activity in the fusiform face area) is stronger when the face reappears from behind the same pillar again as it just disappeared behind. In this case, too, it followed the same motion trajectory.

A final possibility is that the color memory is assigned not to the object or the motion trajectory, but to the actual source location of the display object as is indicated by the trajectory. For example, if the memory panel moved in from the right wall, this trajectory may cause the position behind the wall to be marked as the location of origin for the panel of colors. Given that tying a memory to a specific location or viewpoint aids recall (Bower, 1970; Jiang et al., 2000; Wang & Simons, 1999), the colors may be retrospectively attributed to their location of origin. When the panel emerges from the same direction again, the colors may then be retrieved from that same location. This would explain the absence of any same-object benefits when the panel emerges from a different location than the original one. Future studies will be needed to dissociate these possibilities. For example, lateralized ERP components such as the contralateral delay activity (Vogel & Machizawa, 2004) might be used to reveal to what extent observers maintain left- and right-lateralized representations during the encoding, maintenance, and test phases of these dynamic displays. Furthermore, we may expect distinctive hippocampal activity to arise when memory panels are emerging from a specific location, as opposed to when they are presented in the standard way, by abrupt onset at the center of the screen. The hippocampus is thought to provide the spatiotemporal (i.e., episodic) context of events (Burgess, Maguire, & O'Keefe, 2002; O'Keefe & Dostrovsky, 1971) and support single-trial learning of such spatiotemporal relationships (Rolls & Kesner, 2006).

Regardless of the exact mechanism, the present study shows that spatiotemporal history is an integral part of visual memory and should thus be taken into account when making claims about memory capacity.