Does visual short-term memory have a high-capacity stage?
Visual short-term memory (VSTM) has long been considered a durable, limited-capacity system for the brief retention of visual information. However, a recent work by Sligte et al. (Plos One 3:e1699, 2008) reported that, relatively early after the removal of a memory array, a cue allowed participants to access a fragile, high-capacity stage of VSTM that is distinct from iconic memory. In the present study, we examined whether this stage division is warranted by attempting to corroborate the existence of an early, high-capacity form of VSTM. The results of four experiments did not support Sligte et al.’s claim, since we did not obtain evidence for VSTM retention that exceeded traditional estimates of capacity. However, performance approaching that observed in Sligte et al. can be achieved through extensive practice, providing a clear explanation for their findings. Our evidence favors the standard view of VSTM as a limited-capacity system that maintains a few object representations in a relatively durable form.
KeywordsAttention and memory Visual selective attention Visual working memory
In the literature on visual memory, researchers have reached consensus that there are two systems that maintain visual information in an active state for relatively brief durations: iconic memory and visual short-term memory (VSTM).1 Iconic memory is the persistence of sensory processing following a visual event and is characterized by high-capacity storage and rapid decay (e.g., Averbach & Coriell, 1961; Sperling, 1960). Iconic memory has been divided into two subsystems (Coltheart, 1980). Visible persistence lasts for only 80–100 ms after the onset of a stimulus event (Di Lollo, 1980) and produces the phenomenology of a persisting visible image. Informational persistence is not visible but supports the retention of sensory information for 300–500 ms after stimulus offset (Irwin & Yeomans, 1986). Only a small subset of the information available in iconic memory can be consolidated into the more durable VSTM. VSTM has a highly limited capacity of 3–4 items (e.g., Luck & Vogel, 1997), is abstracted away from precise sensory features of the original stimulus (Irwin, 1991; Phillips, 1974), can retain items for multiple seconds (Vogel, Woodman, & Luck, 2001), and is significantly more resistant to visual interference than is iconic memory (Pashler, 1988; Phillips, 1974).
In contrast, Sligte et al. (2008) proposed that a retention-interval cue allowed access to a newly discovered stage of VSTM, one that is early, fragile, and of very high capacity. This proposal was based on evidence that the cue generated change-detection performance consistent with the retention of 16 items from an array of 32 items, as compared with traditional VSTM capacity estimates of 3–4 items (e.g., Luck & Vogel, 1997). In addition, access was disrupted by a pattern mask, suggesting that the memory representation was susceptible to interference unless transformed into a more durable form by attention (see also Makovski & Jiang, 2007).
Sligte et al.’s (2008) division of VSTM depends centrally on the claim of a high-capacity memory available when the retention-interval cue appeared (1,000 ms after the removal of the memory array). Capacity is the primary dimension on which visual memory systems have been distinguished, and a limited capacity of no more than a handful of objects is the most prominent feature of traditional models of VSTM. However, there are several reasons to be cautious about interpreting the results of Sligte et al. as indicating a high-capacity VSTM stage.
In the present study, we sought to determine the strength of the evidence supporting a high-capacity form of VSTM with four experiments. In Experiments 1–3, we replicated and extended Sligte et al.’s (2008) method but were unable to replicate their results suggesting a high-capacity stage in VSTM. In Experiment 4, we examined the role of extended practice in generating the original results of Sligte et al.
In Experiments 1A and 1B, we replicated and modified Sligte et al.’s (2008) method. Our experiments were based on Experiment 3 of Sligte et al., in which they used a set size of 8 items and reported a memory capacity of 5.5 items. The events on each trial are displayed in Fig. 1a. In Experiment 2, we reduced the number of possible orientations to two, to test whether the elevated levels of change-detection performance observed by Sligte et al. (2008) were produced, at least in part, by segregation of the memory array into two perceptual groups. In Experiment 3, we extended the test of VSTM capacity to color memory, which is the feature most commonly studied in the literature on VSTM.
In all four experiments, there was one significant modification of the original Sligte et al.’s (2008) method (see also Landman, Sperkreijse, & Lamme, 2003). In Sligte et al.’s experiments, cues were valid on all trials. To provide a baseline measure of change-detection performance, half of the trials in the present experiments contained neutral cues, and the other half contained valid cues. This design allowed us to confirm that participants were indeed using the cuing information to select the cued item. It also enabled us to measure the magnitude of the cuing effect. If a cue at a delay of 1,000 ms allows participants to access an early, high-capacity stage of VSTM, we should observe memory performance consistent with high-capacity estimates in the valid cue condition, as in Sligte et al. The valid cue condition, therefore, is similar to the partial-report procedure used by Sperling (1960). In contrast, a neutral cue does not allow selective access to the proposed high-capacity stage, and participants must rely on the limited-capacity VSTM. The neutral cue condition is, therefore, similar to the whole-report procedure of Sperling. If a high-capacity VSTM representation is available at a delay of 1,000 ms, we should observe a large cuing advantage for the valid cue condition over the neutral cue condition.
In each of the experiments, 16 University of Iowa undergraduates (18–30 years of age) participated for course credit or payment. All reported having normal or corrected-to-normal vision. Each participant completed only one experiment. To ensure that capacity was not underestimated by the inclusion of participants who did not understand the task or did not follow instructions, participants who failed to perform the task significantly above chance were replaced (1 in Experiment 1B and 2 in Experiment 3). Note that this procedure naturally increases mean performance accuracy and is conservative, given that we failed to replicate the high-capacity results of Sligte et al. (2008).
Stimuli appeared on a gray background (11.04 cd/m2) with a continuously visible black fixation dot (0 cd/m2) with a radius of 0.11º. The memory stimuli were presented at eight locations evenly spaced around an imaginary circle, with a radius of 5.2º, centered at fixation (Fig. 1a).
Experiments 1A and 1B
The bar stimuli (1.93º × 0.21º) were presented in one of four orientations (vertical, horizontal, ˗45º, and 45º). The orientation of each bar stimulus was chosen randomly. In Experiment 1A, the cue was a 1.0º × 0.07º thin black line pointing from the central fixation point to one of the eight memory locations. On neutral cue trials, all eight lines were displayed. Because the cue was an oriented line that might have interfered with the change-detection task (orientation discrimination), in Experiment 1B (Fig. 1b), a dot cue (0.07º radius) that did not overlap with the features of the memory array stimuli was used. The dot cue appeared at the same location as the outer end of the line cue. Neutral cue trials presented all eight dots. Because no difference in the cuing effect on change-detection performance was observed between Experiments 1A and 1B, subsequent experiments used the line cue. The test display consisted of a single oriented bar in the location of one of the memory array stimuli.
The stimuli were the same as those in Experiment 1A, except that only two orientations were used (vertical, horizontal). Two orientations were used to facilitate the perceptual grouping of multiple items by shared orientation.
Eight color squares subtended 1.93º × 1.93º each and were presented in the same locations as the oriented bars (Fig. 1c). Each color was selected randomly without replacement from a set of 10 easily discriminable colors: violet (x = .306, y = .147, 2.39 cd/m2), red (x = .665, y = .314, 8.27 cd/m2), blue (x = .150, y = .080, 4.38 cd/m2), green (x = .315, y = .600, 12.91 cd/m2), yellow (x = .485, y = .462, 27.37 cd/m2), black (0 cd/m2), brown (x = .498, y = .440, 8.893 cd/m2), pink (x = .310, y = .0219, 23.36 cd/m2), orange (x = .604, y = .335, 11.95 cd/m2), and light blue (x = .228, y = .314, 35.16 cd/m2).
Stimuli were displayed on a 17-in. CRT monitor with a resolution of 800 × 600 pixels at a viewing distance of 80 cm. Manual responses were collected by a button box. The experiment was controlled by E-Prime software. Eye position was observed via a close-up video image of the participant’s right eye. The experimenter monitored eye movements, and trials with eye movements were excluded from the analysis.
Each trial began with visual presentation of four digits. Participants repeated the digits aloud (at least 2 digits/sec) throughout the trial to suppress verbal encoding of the memory stimuli. There was a 500-ms delay before the main trial events.
The sequence of events closely matched those of Sligte et al. (2008, Experiment 3, context-absent condition). The memory array appeared for 250 ms, followed by a blank delay of 1,000 ms. The cue then appeared for 500 ms, followed by another delay of 500 ms. Finally, the test item remained on the computer screen until the participant responded. On validly-cued trials, the cue indicated the location of the item that would be tested. On neutrally-cued trials, the test probe was equally likely to appear at any of the locations. On same trials, the test item had the same orientation (same color in Experiment 3) as the memory-array item at that location. On different trials in the orientation experiments, the orientation of the test item was selected randomly from the other three possible orientations (Experiment 1) or was changed to the other possible orientation (Experiment 2). On different trials of the color experiment, the color of the test item was selected randomly from the two colors that were not used in the memory array. The participants made an unspeeded button response to indicate same or different.
At the beginning of the experiment, the participants were given both written and verbal instructions. After 8 practice trials, they completed a main session of 340 trials, 85 trials in each of the four conditions created by the 2 (cue: valid, neutral) × 2 (change: same, different) design, randomly intermixed.
Results and discussion
Across the four experiments, mean change-detection accuracy was higher for validly-cued trials than for neutrally-cued trials. An analysis of variance (ANOVA) with a within-subjects factor of cue type (valid vs. neutral) and a between-subjects factor of experiment led to a significant main effect of cue type, F(1, 60) = 149.37, p < .001. The cuing effect was significant in each experiment (all ps < .001). In addition, overall accuracy was higher for color discrimination (Experiment 3) than for orientation discrimination (Experiment 1A), F(1, 30) = 8.14, p = .008. Cowan’s K yielded the same pattern of results.
Could a relatively large cuing effect, consistent with a high-capacity representation, be available only after a considerable amount of practice with using the cue? To test this possibility, the size of the cuing effect was compared between the first and second halves of trials for each experiment. No differences in the magnitude of the cuing benefit were observed (all ps > .3), suggesting that the size of the cuing effect was relatively stable across the experimental session.
The data revealed a modest increase in performance with a valid cue. There was no indication that a valid cue allowed the participants to access a memory system with a qualitatively different capacity from that of traditional, limited-capacity VSTM. Across the valid cue conditions of the three orientation experiments, capacity estimates varied from 3.2 to 4.1 items, exactly within the typical range of VSTM capacity.2 These modest cuing benefits can be explained by selective processes occurring within the limited-capacity VSTM architecture itself (Matsukura, et al., 2007).
We found no evidence that the ability to segregate orientations into two perceptual groups improved change-detection accuracy. Accuracy was numerically lower in the two-orientation experiment (Experiment 2) than in the four-orientation experiments (Experiment 1). Although grouping based on direct alignment of stimuli into larger figures clearly contributed to the very high estimates of capacity in Experiment 1 of Sligte et al. (2008; see Fig. 2), it does not appear that grouping by orientation contributed to their modestly elevated capacity estimates of 5–7 items in subsequent experiments.
What, then, caused the participants in Sligte et al. (2008) to achieve relatively high levels of change-detection performance? As was discussed in the introduction, these participants received 3 h of practice before the experimental session. In addition, they were able to repeat blocks of trials on which they were not satisfied with their performance. The latter feature could have elevated change-detection performance, and this aspect of Sligte et al.’s method reduces confidence in their estimates of capacity. However, this effect of block repetition is difficult to assess, since we do not know how often their participants chose to repeat a block.
Probing the effect of practice is more tractable. If the levels of performance observed by Sligte et al. (2008) were present only after extensive practice, comparison between their results and those of traditional VSTM studies would be problematic, since the latter typically have assessed VSTM after minimal practice. Note that improvement with practice in a change-detection task need not reflect a change in basic memory capacity per se. Practice could influence the efficiency by which perceptual features are extracted from the display, the efficiency of item encoding in VSTM (e.g., by limiting coding to task-relevant features, by encoding composite features consisting of multiple items, or by encoding statistical summary information), the efficiency of retrieval and comparison processes at the time of test, and the efficient use of long-term memory (e.g., Hollingworth, 2004).
To examine the effect of practice on change-detection performance, in Experiment 4, 2 participants performed a longer session of orientation change-detection trials. To more closely replicate the training conditions of Sligte et al. (2008), each trial contained a valid cue.
Results and discussion
The results of four experiments failed to corroborate Sligte et al.’s (2008) finding of an early, high-capacity stage of VSTM. Consistent with previous studies (e.g., Griffin & Nobre, 2003; Matsukura et al., 2007), in Experiments 1–3, a valid cue presented beyond the range of iconic memory led to higher change-detection performance, as compared with a neutral cue. However, the size of the cuing effect was modest and can be explained by selective attention mechanisms operating within the limited-capacity VSTM architecture itself (Matsukura et al., 2007). These findings provide no compelling evidence that participants were able to access a distinct, high-capacity form of VSTM. In particular, absolute estimates of the number of items retained in the valid cue condition fell squarely within traditional VSTM capacity estimates of 3–4 items (e.g., Luck & Vogel, 1997).
Sligte et al.’s (2008) most conspicuous evidence for a high-capacity form of VSTM came from the 16-item capacity observed in their Experiment 1. However, this result was almost certainly caused by the fact that individual items were aligned to form larger figural groups (see Fig. 2). When such alignment was eliminated, Sligte et al.’s estimates fell to no more than 5–7 items retained. These moderately elevated levels of change-detection performance were likely to have been achieved by extensive practice. In Experiment 4, we demonstrated a consistent improvement in change-detection accuracy over the course of 640 trials, from an initial estimate of approximately 3 items retained to a maximum estimate of 5–6 items retained. This amount of training was approximately half of what the participants in Sligte et al. received as practice, and our participants were not allowed to repeat blocks on which they were dissatisfied with their performance. Thus, the K estimates of 5–7 items in Sligte et al. were likely to have been due to an extensive, 3-h practice session. Note again that an increase in the estimate of the number of items retained with practice does not necessarily suggest an increase in capacity per se. Changes in the efficiency of perceptual processing, memory encoding, maintenance, comparison processes, and involvement of long-term memory could produce precisely the same effect without any direct influence on the capacity of the system.3
A related issue is the source of the cuing effect observed in the present and other studies. An increase in change-detection accuracy on validly-cued trials might be interpreted as an increase in the capacity of the system or the involvement of an additional system (such as a qualitatively different representation for the attended object, e.g., Landman et al., 2003; Sligte et al., 2008). Although possible, neither is necessary to account for the cuing effect during VSTM maintenance. According to the protection account proposed by Matsukura et al. (2007), attention is selectively oriented to a particular item within the limited-capacity VSTM architecture itself. Specifically, attention protects the cued item from passive decay or interference by other uncued items stored in VSTM. Similarly, attention may shield the cued item from perceptual-level interference such as processing of the test display (e.g., Makovski & Jiang, 2007; Makovski et al., 2008). In this view, the cuing benefit can be explained by preferential retention of the cued item relative to other uncued items, with the overall capacity of the system remaining constant (see the invalid cuing cost observed in Griffin & Nobre, 2003).
Accepting the existence of a new form of visual memory requires extensive and unambiguous evidence. On the basis of the present results, we see no compelling reason to modify the traditional model of VSTM as constituting a single, limited-capacity system. Although cuing benefits were observed in the present experiments, the magnitude of those benefits was modest at best, providing no evidence for a qualitatively different form of VSTM. Cuing benefits of this magnitude are consistent with the operation of selective attention within the limited-capacity VSTM system itself (Matsukura et al., 2007).
We consider VSTM and visual working memory as the same set of processes.
Cowan’s K may overestimate the number of items retained in validly-cued trails. In the case of a standard change-detection task without any cuing manipulation (e.g., Luck & Vogel, 1997), the assumption that participants encoded and maintained as many items as possible from the memory array holds; thus, K can be reliably estimated. However, with a retention-interval cue, the assumption that participants attempted to maintain as many items as possible is violated, because participants had a strong incentive to preferentially retain the cued item and forget the uncued items. Thus, computing K on the basis of change-detection performance of the cued item has the potential to overestimate the extent to which uncued items were retained and, thereby, overestimate total number of items retained.
It is not uncommon for changes in K to be interpreted as changes in the capacity of the system, despite the fact that K is just a derivative of accuracy, and accuracy can be influenced by many factors unrelated to capacity, such as those listed in the preceding paragraph. Consider a simple example of chunking. For a display size of eight digits, K might be estimated as 5 for a display such as “97141627” but might be estimated as 8 for the same digits organized as two groups of year, “17761492.” Obviously, the difference in K is driven by the efficiency of encoding and/or retention, but not necessarily by a change in the capacity of the system. Thus, the common practice of reporting K as a measure of capacity can be problematic. K reflects the number of items retained, but the number items retained does not always provide a pure measure of capacity.
This research was funded by the National Institutes of Health (R01EY017356). We thank Ilja Sligte for sharing his data and sample stimuli. We also thank Shaun Vecera, Joshua Cosman, Weiwei Zhang, Ilja Sligte, and two anonymous reviewers for their helpful comments. Correspondence concerning this article should be addressed to Michi Matsukura, Department of Psychology, University of Iowa, Iowa City, IA 52242. E-mail should be sent to firstname.lastname@example.org.