In the literature on visual memory, researchers have reached consensus that there are two systems that maintain visual information in an active state for relatively brief durations: iconic memory and visual short-term memory (VSTM).Footnote 1 Iconic memory is the persistence of sensory processing following a visual event and is characterized by high-capacity storage and rapid decay (e.g., Averbach & Coriell, 1961; Sperling, 1960). Iconic memory has been divided into two subsystems (Coltheart, 1980). Visible persistence lasts for only 80–100 ms after the onset of a stimulus event (Di Lollo, 1980) and produces the phenomenology of a persisting visible image. Informational persistence is not visible but supports the retention of sensory information for 300–500 ms after stimulus offset (Irwin & Yeomans, 1986). Only a small subset of the information available in iconic memory can be consolidated into the more durable VSTM. VSTM has a highly limited capacity of 3–4 items (e.g., Luck & Vogel, 1997), is abstracted away from precise sensory features of the original stimulus (Irwin, 1991; Phillips, 1974), can retain items for multiple seconds (Vogel, Woodman, & Luck, 2001), and is significantly more resistant to visual interference than is iconic memory (Pashler, 1988; Phillips, 1974).

These long-standing assumptions were recently challenged by Sligte, Scholte, and Lamme (2008), who suggested that VSTM itself can be divided into two stages: a relatively early, fragile, high-capacity stage (that is nevertheless distinct from iconic memory) and a relatively late, durable, limited-capacity stage (equivalent to traditional models of VSTM). In a change-detection task similar to that depicted in Fig. 1a, participants in Sligte et al. viewed a memory array of oriented bars, followed by a retention interval and test item (same or different orientation). During the retention interval, a spatial cue indicated the location of the to-be-tested item. The cue appeared 1,000 ms after the offset of the memory array, well beyond the range of iconic memory. This type of retention-interval cuing (also known as retro-cuing) has been shown to generate modest improvements in change-detection accuracy (e.g., Griffin & Nobre, 2003). Because the cuing effects are typically small, and an iconic image of the memory array is no longer available at the time of the cue presentation (1,000 ms after the offset of the memory array), prominent accounts of these cuing effects suggest that attention operates over representations that have already been consolidated into limited-capacity VSTM. Specifically, attention protects the cued representation from passive decay and/or interference by other uncued items stored in VSTM (Matsukura, Luck, & Vecera, 2007; for similar interpretations and/or results, see Griffin & Nobre, 2003; Makovski & Jiang, 2007; Makovski, Sussman, & Jiang, 2008).

Fig. 1
figure 1

a The trial event sequence of Experiment 1A. The fixation, bar stimuli, and cues were presented in black on a gray background. b Dot cues used in Experiment 1B. c Illustration of the color square arrangement used in Experiment 2. Different fill patterns represent different colors. Note that, for illustrative purpose, the stimuli are drawn much larger than they appeared in the actual computer display

In contrast, Sligte et al. (2008) proposed that a retention-interval cue allowed access to a newly discovered stage of VSTM, one that is early, fragile, and of very high capacity. This proposal was based on evidence that the cue generated change-detection performance consistent with the retention of 16 items from an array of 32 items, as compared with traditional VSTM capacity estimates of 3–4 items (e.g., Luck & Vogel, 1997). In addition, access was disrupted by a pattern mask, suggesting that the memory representation was susceptible to interference unless transformed into a more durable form by attention (see also Makovski & Jiang, 2007).

Sligte et al.’s (2008) division of VSTM depends centrally on the claim of a high-capacity memory available when the retention-interval cue appeared (1,000 ms after the removal of the memory array). Capacity is the primary dimension on which visual memory systems have been distinguished, and a limited capacity of no more than a handful of objects is the most prominent feature of traditional models of VSTM. However, there are several reasons to be cautious about interpreting the results of Sligte et al. as indicating a high-capacity VSTM stage.

First, the initial estimate of 16-item capacity was likely to have been inflated by the fact that adjacent items were often arranged in a co-linear fashion. Figure 2 illustrates a typical 32-item display used in Sligte et al.’s (2008) Experiment 1. Alignment between elements generates multiple larger figures that can be encoded as single units in memory (e.g., Hollingworth, Hyun, & Zhang, 2005). When Sligte et al. eliminated this type of inter-item alignment and introduced four orientations instead of two (Experiment 3), the estimated capacity declined dramatically from 16 items to 5.5 items. Estimated capacity in subsequent experiments controlling bar alignment never exceeded 7 items. Thus, a more plausible estimate of the memory capacity observed by Sligte et al. is 5–7 items. As compared with the standard VSTM capacity estimates of 3–4 items, a 5-to-7 item capacity provides only limited support for a qualitatively different, high-capacity form of VSTM. Second, Sligte et al.’s experiments using two possible orientations may have allowed participants to segregate the stimulus array into two groups (a vertical bar group and a horizontal bar group). A useful strategy would have been to remember, for example, the locations of the vertical items. Because the orientations of all array items could have been encoded by remembering the locations of half of them, such a strategy could have inflated capacity estimates by up to a factor of two. Third, participants in Sligte et al.’s study were given an unusually long practice session, completing 3 h of practice on the day before the experiment session. Finally, participants were given the opportunity to repeat individual blocks of trials in the experiment session when their performance fell short of their expectations (I. G. Sligte, personal communication, December 10, 2010).

Fig. 2
figure 2

Sample memory stimuli from Experiment 1 of Sligte et al. (2008). The fixation dot was red in Sligte et al.

In the present study, we sought to determine the strength of the evidence supporting a high-capacity form of VSTM with four experiments. In Experiments 1–3, we replicated and extended Sligte et al.’s (2008) method but were unable to replicate their results suggesting a high-capacity stage in VSTM. In Experiment 4, we examined the role of extended practice in generating the original results of Sligte et al.

Experiments 1–3

In Experiments 1A and 1B, we replicated and modified Sligte et al.’s (2008) method. Our experiments were based on Experiment 3 of Sligte et al., in which they used a set size of 8 items and reported a memory capacity of 5.5 items. The events on each trial are displayed in Fig. 1a. In Experiment 2, we reduced the number of possible orientations to two, to test whether the elevated levels of change-detection performance observed by Sligte et al. (2008) were produced, at least in part, by segregation of the memory array into two perceptual groups. In Experiment 3, we extended the test of VSTM capacity to color memory, which is the feature most commonly studied in the literature on VSTM.

In all four experiments, there was one significant modification of the original Sligte et al.’s (2008) method (see also Landman, Sperkreijse, & Lamme, 2003). In Sligte et al.’s experiments, cues were valid on all trials. To provide a baseline measure of change-detection performance, half of the trials in the present experiments contained neutral cues, and the other half contained valid cues. This design allowed us to confirm that participants were indeed using the cuing information to select the cued item. It also enabled us to measure the magnitude of the cuing effect. If a cue at a delay of 1,000 ms allows participants to access an early, high-capacity stage of VSTM, we should observe memory performance consistent with high-capacity estimates in the valid cue condition, as in Sligte et al. The valid cue condition, therefore, is similar to the partial-report procedure used by Sperling (1960). In contrast, a neutral cue does not allow selective access to the proposed high-capacity stage, and participants must rely on the limited-capacity VSTM. The neutral cue condition is, therefore, similar to the whole-report procedure of Sperling. If a high-capacity VSTM representation is available at a delay of 1,000 ms, we should observe a large cuing advantage for the valid cue condition over the neutral cue condition.



In each of the experiments, 16 University of Iowa undergraduates (18–30 years of age) participated for course credit or payment. All reported having normal or corrected-to-normal vision. Each participant completed only one experiment. To ensure that capacity was not underestimated by the inclusion of participants who did not understand the task or did not follow instructions, participants who failed to perform the task significantly above chance were replaced (1 in Experiment 1B and 2 in Experiment 3). Note that this procedure naturally increases mean performance accuracy and is conservative, given that we failed to replicate the high-capacity results of Sligte et al. (2008).


Stimuli appeared on a gray background (11.04 cd/m2) with a continuously visible black fixation dot (0 cd/m2) with a radius of 0.11º. The memory stimuli were presented at eight locations evenly spaced around an imaginary circle, with a radius of 5.2º, centered at fixation (Fig. 1a).

Experiments 1A and 1B

The bar stimuli (1.93º × 0.21º) were presented in one of four orientations (vertical, horizontal, ˗45º, and 45º). The orientation of each bar stimulus was chosen randomly. In Experiment 1A, the cue was a 1.0º × 0.07º thin black line pointing from the central fixation point to one of the eight memory locations. On neutral cue trials, all eight lines were displayed. Because the cue was an oriented line that might have interfered with the change-detection task (orientation discrimination), in Experiment 1B (Fig. 1b), a dot cue (0.07º radius) that did not overlap with the features of the memory array stimuli was used. The dot cue appeared at the same location as the outer end of the line cue. Neutral cue trials presented all eight dots. Because no difference in the cuing effect on change-detection performance was observed between Experiments 1A and 1B, subsequent experiments used the line cue. The test display consisted of a single oriented bar in the location of one of the memory array stimuli.

Experiment 2

The stimuli were the same as those in Experiment 1A, except that only two orientations were used (vertical, horizontal). Two orientations were used to facilitate the perceptual grouping of multiple items by shared orientation.

Experiment 3

Eight color squares subtended 1.93º × 1.93º each and were presented in the same locations as the oriented bars (Fig. 1c). Each color was selected randomly without replacement from a set of 10 easily discriminable colors: violet (x = .306, y = .147, 2.39 cd/m2), red (x = .665, y = .314, 8.27 cd/m2), blue (x = .150, y = .080, 4.38 cd/m2), green (x = .315, y = .600, 12.91 cd/m2), yellow (x = .485, y = .462, 27.37 cd/m2), black (0 cd/m2), brown (x = .498, y = .440, 8.893 cd/m2), pink (x = .310, y = .0219, 23.36 cd/m2), orange (x = .604, y = .335, 11.95 cd/m2), and light blue (x = .228, y = .314, 35.16 cd/m2).


Stimuli were displayed on a 17-in. CRT monitor with a resolution of 800 × 600 pixels at a viewing distance of 80 cm. Manual responses were collected by a button box. The experiment was controlled by E-Prime software. Eye position was observed via a close-up video image of the participant’s right eye. The experimenter monitored eye movements, and trials with eye movements were excluded from the analysis.


Each trial began with visual presentation of four digits. Participants repeated the digits aloud (at least 2 digits/sec) throughout the trial to suppress verbal encoding of the memory stimuli. There was a 500-ms delay before the main trial events.

The sequence of events closely matched those of Sligte et al. (2008, Experiment 3, context-absent condition). The memory array appeared for 250 ms, followed by a blank delay of 1,000 ms. The cue then appeared for 500 ms, followed by another delay of 500 ms. Finally, the test item remained on the computer screen until the participant responded. On validly-cued trials, the cue indicated the location of the item that would be tested. On neutrally-cued trials, the test probe was equally likely to appear at any of the locations. On same trials, the test item had the same orientation (same color in Experiment 3) as the memory-array item at that location. On different trials in the orientation experiments, the orientation of the test item was selected randomly from the other three possible orientations (Experiment 1) or was changed to the other possible orientation (Experiment 2). On different trials of the color experiment, the color of the test item was selected randomly from the two colors that were not used in the memory array. The participants made an unspeeded button response to indicate same or different.

At the beginning of the experiment, the participants were given both written and verbal instructions. After 8 practice trials, they completed a main session of 340 trials, 85 trials in each of the four conditions created by the 2 (cue: valid, neutral) × 2 (change: same, different) design, randomly intermixed.

Results and discussion

Figure 3 illustrates mean accuracy (percentage correct, collapsed across same and different trials) and K (estimated number of items held in VSTM; Cowan, 2001) for validly- and neutrally-cued trials. Across the experiments, fewer than 1% of the trials were removed due to eye movements.

Fig. 3
figure 3

a Change-detection accuracy (% correct) as a function of cue type (valid vs. neutral) and experiment (Experiments 1–3). Underlined percentages represent the size of the cuing effect (accuracy of validly-cued trials minus accuracy of neutrally-cued trials). b Estimated capacity (Cowan, 2001) as a function of cue type (valid vs. neutral) and experiment (Experiments 1–3). For this and all subsequent figures, error bars represent 95% within-subjects confidence intervals (Loftus & Masson, 1994)

Across the four experiments, mean change-detection accuracy was higher for validly-cued trials than for neutrally-cued trials. An analysis of variance (ANOVA) with a within-subjects factor of cue type (valid vs. neutral) and a between-subjects factor of experiment led to a significant main effect of cue type, F(1, 60) = 149.37, p < .001. The cuing effect was significant in each experiment (all ps < .001). In addition, overall accuracy was higher for color discrimination (Experiment 3) than for orientation discrimination (Experiment 1A), F(1, 30) = 8.14, p = .008. Cowan’s K yielded the same pattern of results.

Could a relatively large cuing effect, consistent with a high-capacity representation, be available only after a considerable amount of practice with using the cue? To test this possibility, the size of the cuing effect was compared between the first and second halves of trials for each experiment. No differences in the magnitude of the cuing benefit were observed (all ps > .3), suggesting that the size of the cuing effect was relatively stable across the experimental session.

The data revealed a modest increase in performance with a valid cue. There was no indication that a valid cue allowed the participants to access a memory system with a qualitatively different capacity from that of traditional, limited-capacity VSTM. Across the valid cue conditions of the three orientation experiments, capacity estimates varied from 3.2 to 4.1 items, exactly within the typical range of VSTM capacity.Footnote 2 These modest cuing benefits can be explained by selective processes occurring within the limited-capacity VSTM architecture itself (Matsukura, et al., 2007).

We found no evidence that the ability to segregate orientations into two perceptual groups improved change-detection accuracy. Accuracy was numerically lower in the two-orientation experiment (Experiment 2) than in the four-orientation experiments (Experiment 1). Although grouping based on direct alignment of stimuli into larger figures clearly contributed to the very high estimates of capacity in Experiment 1 of Sligte et al. (2008; see Fig. 2), it does not appear that grouping by orientation contributed to their modestly elevated capacity estimates of 5–7 items in subsequent experiments.

Experiment 4

What, then, caused the participants in Sligte et al. (2008) to achieve relatively high levels of change-detection performance? As was discussed in the introduction, these participants received 3 h of practice before the experimental session. In addition, they were able to repeat blocks of trials on which they were not satisfied with their performance. The latter feature could have elevated change-detection performance, and this aspect of Sligte et al.’s method reduces confidence in their estimates of capacity. However, this effect of block repetition is difficult to assess, since we do not know how often their participants chose to repeat a block.

Probing the effect of practice is more tractable. If the levels of performance observed by Sligte et al. (2008) were present only after extensive practice, comparison between their results and those of traditional VSTM studies would be problematic, since the latter typically have assessed VSTM after minimal practice. Note that improvement with practice in a change-detection task need not reflect a change in basic memory capacity per se. Practice could influence the efficiency by which perceptual features are extracted from the display, the efficiency of item encoding in VSTM (e.g., by limiting coding to task-relevant features, by encoding composite features consisting of multiple items, or by encoding statistical summary information), the efficiency of retrieval and comparison processes at the time of test, and the efficient use of long-term memory (e.g., Hollingworth, 2004).

To examine the effect of practice on change-detection performance, in Experiment 4, 2 participants performed a longer session of orientation change-detection trials. To more closely replicate the training conditions of Sligte et al. (2008), each trial contained a valid cue.


The stimuli and procedure in Experiment 4 were identical to those in Experiment 2, except that each trial contained a valid cue. The 2 naïve participants completed 10 blocks of 64 trials, which lasted approximately 80 min.

Results and discussion

Fewer than 3% of the trials were removed from the analysis due to eye movements. There was a significant increase in change-detection accuracy with practice, F(1, 9) = 3.18, p = .04 (Fig. 4). Accuracy steadily improved throughout the session, with a drop in performance late in the session that might have been attributable to fatigue. Change-detection accuracy in the last few blocks approached a K of 5.5 items retained, which matches very closely with the estimate obtained by Sligte et al. (2008, Experiment 3).

Fig. 4
figure 4

a Change-detection accuracy (% correct) and b estimated capacity (Cowan, 2001) as a function of Experiment 4 block

General discussion

The results of four experiments failed to corroborate Sligte et al.’s (2008) finding of an early, high-capacity stage of VSTM. Consistent with previous studies (e.g., Griffin & Nobre, 2003; Matsukura et al., 2007), in Experiments 1–3, a valid cue presented beyond the range of iconic memory led to higher change-detection performance, as compared with a neutral cue. However, the size of the cuing effect was modest and can be explained by selective attention mechanisms operating within the limited-capacity VSTM architecture itself (Matsukura et al., 2007). These findings provide no compelling evidence that participants were able to access a distinct, high-capacity form of VSTM. In particular, absolute estimates of the number of items retained in the valid cue condition fell squarely within traditional VSTM capacity estimates of 3–4 items (e.g., Luck & Vogel, 1997).

Sligte et al.’s (2008) most conspicuous evidence for a high-capacity form of VSTM came from the 16-item capacity observed in their Experiment 1. However, this result was almost certainly caused by the fact that individual items were aligned to form larger figural groups (see Fig. 2). When such alignment was eliminated, Sligte et al.’s estimates fell to no more than 5–7 items retained. These moderately elevated levels of change-detection performance were likely to have been achieved by extensive practice. In Experiment 4, we demonstrated a consistent improvement in change-detection accuracy over the course of 640 trials, from an initial estimate of approximately 3 items retained to a maximum estimate of 5–6 items retained. This amount of training was approximately half of what the participants in Sligte et al. received as practice, and our participants were not allowed to repeat blocks on which they were dissatisfied with their performance. Thus, the K estimates of 5–7 items in Sligte et al. were likely to have been due to an extensive, 3-h practice session. Note again that an increase in the estimate of the number of items retained with practice does not necessarily suggest an increase in capacity per se. Changes in the efficiency of perceptual processing, memory encoding, maintenance, comparison processes, and involvement of long-term memory could produce precisely the same effect without any direct influence on the capacity of the system.Footnote 3

A related issue is the source of the cuing effect observed in the present and other studies. An increase in change-detection accuracy on validly-cued trials might be interpreted as an increase in the capacity of the system or the involvement of an additional system (such as a qualitatively different representation for the attended object, e.g., Landman et al., 2003; Sligte et al., 2008). Although possible, neither is necessary to account for the cuing effect during VSTM maintenance. According to the protection account proposed by Matsukura et al. (2007), attention is selectively oriented to a particular item within the limited-capacity VSTM architecture itself. Specifically, attention protects the cued item from passive decay or interference by other uncued items stored in VSTM. Similarly, attention may shield the cued item from perceptual-level interference such as processing of the test display (e.g., Makovski & Jiang, 2007; Makovski et al., 2008). In this view, the cuing benefit can be explained by preferential retention of the cued item relative to other uncued items, with the overall capacity of the system remaining constant (see the invalid cuing cost observed in Griffin & Nobre, 2003).

Accepting the existence of a new form of visual memory requires extensive and unambiguous evidence. On the basis of the present results, we see no compelling reason to modify the traditional model of VSTM as constituting a single, limited-capacity system. Although cuing benefits were observed in the present experiments, the magnitude of those benefits was modest at best, providing no evidence for a qualitatively different form of VSTM. Cuing benefits of this magnitude are consistent with the operation of selective attention within the limited-capacity VSTM system itself (Matsukura et al., 2007).