One way to measure the discriminatory ability of VLTM is to utilize a paradigm often referred to as the old/similar/new judgment. Participants first view and encode images of objects, typically while doing an incidental encoding cover task (e.g., would this object fit in a shoe box?). At test, participants serially view objects that were present in the encoding task (old images), objects similar but not identical to ones in the encoding task (similar images), and completely new objects that were not present during encoding (new images). Participants then judge whether the images they see are old, similar, or new (see Fig. 3).
The primary purpose of this task is to evaluate the discriminatory ability of VLTM, and how memories may acquire different levels of discriminability. For example, correctly classifying a similar item likely requires an observer to have more specific details in memory (i.e., recollection-based) than correctly identifying an old item, which could be accomplished using a gist-based or familiarity-based process (Kensinger, Garoff-Eaton, & Schacter, 2006; Kim & Yassa, 2013; Schurgin & Flombaum, 2017; Schurgin, Reagh, Yassa, & Flombaum, 2013; Stark, Yassa, Lacy, & Stark, 2013).
A potential limitation of old/similar/new judgments is that it is not clear what constitutes a false alarm for similar items. In contrast, when using a simpler old/new procedure researchers can obtain an unbiased measure of memory discriminability (d') by taking the difference between the normalized proportion of hits (correctly classifying an old item is old) and the normalized proportion of false alarms (incorrectly classifying a new item as old; Green & Swets, 1966). However, what constitutes a false alarm for a similar judgment? Is it classifying an old item as similar? Or a new item as similar? Or a similar item as old? Without a clear understanding of what constitutes a false alarm, it is difficult to normalize responses given to similar items for potential biases. While different analyses have attempted to address this issue, research suggests analyzing old/similar/new responses using a signal-detection-based framework (da) may provide an accurate, unbiased measure of memory performance for similar items (see Loiotile & Courtney, 2015).
How does emotion affect visual memory?
One study that utilized this method sought to investigate how negative emotional context may affect the likelihood of remembering an item’s specific visual details. Participants first completed an incidental encoding task, where they were exposed to hundreds of images of real-world objects (for 250 ms or 500 ms) and had to judge whether each object would fit in a shoebox. Half of the images were rated as being negative and arousing, and the other objects were rated as neutral. Two days after incidental encoding, participants were then given a surprise test. Participants viewed old images (exactly the same as encoding), similar images (similar but not identical), and completely new images relative to encoding. They were told to classify them accordingly (old/similar/new; Kensinger et al., 2006).
At test, it was observed that for old items, negative emotional context led to an increase in correct classification. This was true for items presented both for 250 ms and 500 ms, but was stronger as encoding time increased. However, for similar items there was no main effect of emotional context or encoding time. As a result of the main effect of emotional context for old images, the researchers concluded that negatively arousing content increased the likelihood that visual details of an object would be remembered (Kensinger et al., 2006). However, given the lack of an effect for similar images, it would be more accurate to conclude that emotion may have enhanced certain aspects of visual memory (i.e., for old images).
How do familiarity and recollection contribute to responses?
This method has also been used to investigate the relative contributions that familiarity and recollection processes may have in the behavioral responses of old, similar, and new judgments. In the study, participants completed a two-stage recognition test. In the first phase, participants saw 128 images of real-world objects on a computer screen for 2 seconds each, and were asked to report whether an object was an “indoor” or “outdoor” object. In the second phase, participants were given a surprise test where they viewed old images (exactly the same as encoding), similar images (similar but not identical), and completely new images. They were told to classify the images as either old, similar, or new. Additionally, after indicating what category an image belonged to, participants were then instructed to indicate whether they “remember” seeing the same image in the study session or if they just “know” that they have seen the same image without any conscious recollection of its original presentation (Kim & Yassa, 2013). It is theorized that “remember” judgments reflect recollection-based processes, whereas “know” judgments reflect familiarity-based processes, although there exist several criticisms that these are not true indices of these processes but rather reflect subjective states of awareness or differences in confidence (Yonelinas, 2002).
As expected, at test they found very different accuracy for classifying old (70% correct), similar (53% correct), and new (74% correct) images. When analyzing these responses according to whether a participant reported they “remembered” (recollection based) or “knew” (familiarity based), a few interesting patterns emerged. When judging old items correctly, observers made primarily “remember” responses, suggesting that correct classifications of old items was primarily driven by recollection. For similar items, there was a slight trend to report “remember” rather than “know” both when the item was judged as old or similar. This suggests that observers can classify similar items with or without recollection, and that incorrectly identifying similar items (i.e., misclassifying them as old) is not simply driven by familiarity signals (Kim & Yassa, 2013).
Source localization judgments
One approach to evaluate the strength of memory and further distinguish potential errors is to use a paradigm combining old/new judgments with source localization. At encoding, participants view images of objects typically presented in one of four quadrants in the display. Then, at test, participants are shown old, similar, and new images in the center of the screen they must classify as old (previously seen images) or new (similar or new images), and indicate which of the four quadrants the object originally appeared in (Cansino, Maquet, Dolan, & Rugg, 2002; Reagh & Yassa, 2014; see Fig. 4).
The underlying assumption of source localization manipulations is that more episodic information is retrieved on trials when the source judgment was successful than on trials when it was not. This is similar to the logic behind the remember/know procedure discussed previously. When an observer makes a correct classification and source judgment, the assumption is that this indicates a recollection-like memory, whereas if an observer makes a correct classification of an image but an incorrect source judgment, this may indicate a familiarity-based memory. However, unlike the remember/know procedure, source localization judgments do not rely on the observer’s own introspection in order to classify memory quality. Thus, the aim of the paradigm is to evaluate what percentage of responses using old/new judgments may rely on memories that contain more or less information, and what brain areas may be involved in these processes.
What brain areas and behaviors are involved in source judgments?
Cansino et al. (2002) were interested in using this method to investigate what brain regions may be involved in different memory processes beyond simple item recognition tasks. To accomplish this, participants first viewed images of real-world objects and were asked to judge whether the objects were natural or artificial. Critically, each image was presented in one of four quadrants in the display. After completing the task, participants were then administered a surprise test where previously shown images (old images) were mixed with completely new images. These images were shown in the center of the screen. Participants had to judge whether each image was old or new. They were instructed to press a single key if an image was new, and if an image was old participants indicated which position the image was presented during encoding using one of four keys. If a participant did not know which quadrant an old image originated from, they were instructed to guess.
At test it was found that when classifying previously seen (old) items, observers correctly identified 87% of items presented. However, 60.7% of these responses contained correct source responses, and 26.3% of responses contained incorrect source responses. This suggests that even when classifying old and new objects in a typical retrieval task, memories for these items likely contain additional information beyond simply categorical or familiarity-based knowledge. Additionally, they observed via collected fMRI data that when recognizing an old object with a correct versus incorrect source judgment, there was greater activity observed in the right hippocampus and left prefrontal cortex (Cansino et al., 2002). This suggests that memories containing more information may elicit greater memory signals and decision-making coordination.
Potential neural correlates of “what” and “where” memory
Source localization judgments have also been used to explore possible dissociations between object (what) and spatial (where) memories and their potential neural correlates. In the experiment, participants first completed an encoding task where images of real-world objects were presented in one of 31 possible locations on the screen for 3 seconds each. They were instructed to first judge whether the object was an indoor or outdoor object, and then whether the object appeared on the left or right relative to the center of the screen. Afterward, participants were given a surprise test with four possible trial types: repeated images (old images in the same location), lure images (similar images in the original object’s location), spatial lure images (old images in a slightly different location), or new images (not shown during encoding). Participants were instructed to indicate whether an image showed no change, object change, location change, or new (Reagh & Yassa, 2014).
Behaviorally, there was no difference in lure discrimination whether it was an object trial (i.e., similar image) or spatial trial (i.e., old image in slightly different location). This effect was consistent across both high-similarity and low-similarity stimuli. Neuroimaging data were also collected via fMRI and demonstrated unique differences based on lure type. It was observed that the lateral entorhinal cortex (LEC) was more engaged during object lure discrimination than during spatial lure discrimination, whereas the opposite pattern was observed in the medial entorhinal cortex (MEC). Additionally, the perirhinal cortex (PRC) was more active during correct rejections of object than spatial lures, whereas the parahippocampal cortex (PHC) was more active during correct rejections of spatial than object lures. Regardless of lure type, the dentate gyrus (DG) and subregion CA3 demonstrated greater activity during lure discrimination. Overall, this suggests two parallel but interacting networks in the hippocampus and related regions for managing object identity and spatial interference (Reagh & Yassa, 2014).
Two-alternative forced-choice test (2AFC)
A paradigm referred to as the two-alternative forced-choice (2AFC) test has been primarily used to study the capacity of visual episodic long-term memory. In a typical test, observers see two objects on the screen during test, one that they have seen before and another object they have not encountered previously. The other may be completely novel (old–new comparison) or a similarly related lure (old–similar comparison; Brady, Konkle, Alvarez, & Oliva, 2008; Brady, Konkle, Oliva, & Alvarez, 2009; Konkle, Brady, Alvarez, & Oliva, 2010b). The logic of this test is that it can tap into even “weak” memories that other methods may fail to reveal. It makes sense that a 2AFC judgment is easier than other kinds of responses because it is a binary response.
Typically, 2AFC is conceptualized using a signal-detection-theory framework (Green & Swets, 1966; Loiotile & Courtney, 2015). The logic is that observers have a memory representation that creates a normally distributed signal in a “memory strength” space. At test, when observers are shown an old item, that item elicits a normally distributed memory-match signal, which due to noise and other factors will vary in strength. If an observer was simply shown the old object, depending on the decision criteria this may result in incorrectly identifying an old object as new. However, by giving observers a foil in the 2AFC task, whether that foil is a completely new or similar-looking object, this gives observers a second normally distributed signal to assist in the comparison process. This second signal should be centered around a lower memory-match signal (i.e., zero) than the old image. As a result, observers can simply pick the max between the two items to correctly identify the old image (Macmillan & Creelman, 2004; see Fig. 5). This framework demonstrates from a modeling perspective why 2AFC tasks should be easier and are able to provide better performance for items that may otherwise fail to be remembered or classified correctly in other types of memory tasks—it is always easier to pick the max of two things.Footnote 2 Moreover, this performance is higher by a fixed amount, suggesting 2AFC taps into the same underlying memory signal as old/new testing procedures (Macmillan & Creelman, 2004).
In addition to traditional 2AFC tasks, which involve an old item paired with a new or similar-looking item, researchers have also expanded on the number of potential options, creating 3AFC and 4AFC-type tasks. Generally, these tasks involve the addition of multiple similar-looking lures at test, in order to further evaluate the ability to discriminate between different kinds of lures. A potential limitation of 3AFC or 4AFC tasks is that adding lures may create interference and increase task difficulty, such as through adding increased decision noise (Holdstock et al., 2002). Additionally, when varying the kinds of lures available at test (such as providing two similar lures, or a similar lure and a completely new image), this creates conditions where the information available to an observer is not equivalent (Guerin, Robbins, Gilmore, & Schacter, 2012). This means performance across different testing conditions cannot be directly compared with one another.
The capacity of VLTM
Brady et al. (2008) sought to investigate the capacity of VLTM using a 2AFC method. Participants were presented 2,500 images of real-world objects for 3 seconds each and told to remember all the details of each image. After completing this study portion, participants were then given a 2AFC task where they saw two images on the screen. One was a previously encountered image from the previous session, whereas the other was either a novel image, an exemplar of an object they had previously encountered, or an image of an object they had previously encountered in a new state (i.e., changed orientation). Participants were instructed to indicate which of the two images they had previously encountered. Overall, performance was quite high, with significantly better accuracy for novel comparisons (92% correct), a replication of previous work by Standing (1973) showing incredibly high performance for novel test comparisons even when encoding 10,000 items into VLTM (see also Shepard 1967). However, quite surprisingly they observed extremely accurate performance for state and exemplar comparisons (87%–88% correct; Brady et al., 2008). These results demonstrate that even when given very brief exposure of images, humans are able to remember thousands of objects (seemingly with no limit) with extremely high accuracy. Furthermore, they suggest that human’s VLTM representations contain visual information necessary to assist in making difficult state and exemplar comparisons, beyond simply categorical or semantic knowledge of previous encounters.
Similar results have been found not just for objects but for the visual memory of scenes as well. Konkle et al. (2010a) demonstrated that after studying thousands of images of scenes, participants were able to recognize 96% of the previously seen images in a novel comparison test. Again, they also observed that performance for test comparisons with a similar foil was quite high, with participants 84% correct even when they studied four of the same exemplar (a potential source of interference) during encoding. This suggests that the incredibly large capacity and high-fidelity representations observed in visual long-term memory are not isolated to a specific stimulus class (i.e., objects or scenes), but rather appear to be general properties of the system.
Given the logic of 2AFC discussed previously, one could assume if participants had been given a single image of an object at test and told to discriminate whether it was old or new, performance would be worse (despite the same underlying memory strength). Thus, the results discussed above are best described as a potential upper bound of VLTM performance. Under different testing procedures, performance will likely differ. However, these results still demonstrate that under potentially ideal testing conditions, visual long-term memory not only has a massive capacity but also contains representations with visual-rich and detailed information.
Delayed estimation (continuous report)
In order to estimate the fidelity of VLTM (i.e., the amount of information in memory) experiments have utilized a delayed estimation paradigm. At encoding participants observe objects embedded in unique colors. Then, at subsequent test, observers see grayscale versions of objects they observed previously and use a color wheel to indicate its original color. By taking the error in degrees between the response and the true value, researchers can create a distribution of long-term color memory responses and measure the standard deviation of the distribution to understand the fidelity of that representation (Brady, Konkle, Gill, Oliva, & Alvarez, 2013).
The precision of information in VLTM representations
Brady et al. (2013) used this method to understand the precision of color memory representations across VWM and VLTM. In the study, researchers gave participants two separate tasks. In the VWM condition, participants saw three real-world objects simultaneously for 3 seconds, arranged in a circle around fixation. Participants were instructed to remember the color of all the objects. After a 1-second delay, one of the objects reappeared in grayscale, and participants could alter the color of the image using their mouse and were told to click the mouse when it matched the original color. In the VLTM condition, participants first underwent a study block viewing images sequentially for 1 second each, with a 1-second blank interval between images. Similar to the VWM condition, participants were instructed to remember the color of the object.
After the study block, the color of the items was tested one at a time in a randomly chosen sequence, and participants reported the image color using the same response mechanism used in the short-term memory condition. The precision of participants’ memory representations was determined by calculating the distribution of the degree error of each response in color space (with larger mistakes represented by greater degree error). In the VWM condition, at Set Size 3 (and above), participants’ precision was 17.8 degrees, which did not significantly differ from the precision observed in the VLTM condition, 19.3 degrees (Brady et al., 2013). Therefore, it appears the precision of color representations across VWM and VLTM have equivalent limits, suggesting they may share or be constrained by similar processes.
Interactions between VLTM and perception
Delayed estimation has also been used to investigate the potential role long-term memories may have when biasing new perceptual information. In a series of experiments by Fan, Hutchinson, and Turk-Browne (2016), participants completed a set of initial exposure trials, where they encountered unique shapes embedded in specific colors. Each shape was shown for half a second, and after a brief delay (1.5 seconds) an achromatic version of the shape reappeared, and participants reported the color of the image using their mouse. During the initial exposure session, participants encountered the same shape three times on separate trials (randomly assorted), always in the same color. As a result, each unique shape was associated with a specific color in long-term memory.
After completing the initial exposure trials, participants were then given final test trials. These final test trials were similar, except now each shape was shown in an unrelated color, and participants had to judge the appearance of the new colors. Researchers found that participants’ responses for the final test trials were best characterized as a mixture of the original and current-color representations, suggesting participants had anchored their responses to their representations in long-term memory. Moreover, this anchoring effect increased when perceptual input became more degraded (for example, by shortening the stimulus presentation during the final test trials). These results demonstrate that while perceptual judgments do reflect the current state of the environment, they can be affected by previous experience and long-term memory (Fan et al., 2016).
Core concepts of VWM
As previously discussed, VWM is typically thought of as the interface of multiple processes including perception, short-term memory, and attention (Baddeley & Hitch, 1974; Cowan, 2008). Due to this conception, researchers have generally described the function of VWM as supporting complex cognitive behaviors that require temporarily storing and manipulating information in order to produce actions (Baddeley, 2003; Ma et al., 2014). In particular, a large body of research over the past decade has focused on the capacity limitations of VWM. As a result, many of the models proposed to explain VWM have focused on this limitation.
Fixed slot model
When trying to understand the inherent capacity limitations of VWM, a particularly influential model has been the fixed slot model. It suggests that VWM can only store a discrete number of integrated object representations (see Fig. 6). This model was proposed by a highly influential study conducted by Luck and Vogel (1997), who used a change-detection task to quantify VWM capacity. In the task, participants were instructed to remember an array consisting of items of a single or a conjunction of features (color, orientation, etc.). After a brief delay (900 ms), a test array was presented that was either identical to the previous array or differed in terms of a single feature. Participants were instructed to indicate whether a change had occurred. Accuracy was assessed as a function of the number of items in the stimulus array in order to determine how many items could be accurately maintained in VWM.
In a series of experiments, Luck and Vogel (1997) gave participants change-detection tasks that varied the number of colored squares presented in an array (one to 12). They observed that performance was at ceiling for arrays of one to three items, and then declined systematically as set size increased from four to12 items. Overall, the average K value (estimations of VWM capacity) among participants was around three or four items. This finding led to the foundation of the slot model, which posits that individuals can only store between three and four objects in VWM.
In addition to experiments consisting of arrays of single features, Luck and Vogel (1997) also presented participants with arrays consisting of a conjunction of features (i.e., lines differing in orientation and color). Participants completed a change-detection task, but the researchers varied whether participants had to remember a single feature or a conjunction of features. For example, participants would see an array consisting of lines of different orientations and colors. In the color condition, only a color change could occur, and participants were instructed to look for a color change. In the orientation condition, only an orientation change could occur, and participants were instructed to look for an orientation change. And in the conjunction condition, either a color or orientation change could occur, and participants were instructed to remember both features of each item. Thus, in the conjunction condition, participants had to remember eight features but only four integrated objects. If VWM storage capacity is limited by individual features (e.g., color, orientation), then performance should decline at lower set sizes in the conjunction compared to the single feature conditions. However, if VWM storage capacity is limited by integrated objects (e.g., one red, horizontal line), then the same pattern of results should be observed throughout all three conditions.
Consistent with the latter case, they observed that VWM capacities were the same for single feature and conjunction items. Altogether, this provided the basis for the fixed slot model that VWM capacity was constrained by slots of ~3–4 integrated objects. While further research has expanded upon these initial findings, it is also important to note that several experiments have failed to replicate the critical conjunction experiments. These follow-up studies have observed that VWM capacity is, in fact, reduced as feature load increases, irrespective of the number of objects (Fougnie, Asplund, & Marois, 2010; Olson & Jiang, 2002; Wheeler & Treisman, 2002). To explain performance for conjunction conditions, researchers must consider both feature load (i.e., number of features to be remembered) and object load (i.e., number of objects in the display; Hardman & Cowan, 2015). Taken together, the current consensus in the field is that there is a cost to maintaining integrated features in VWM.
A key component of this model is that these VWM slots are considered all or nothing—an observer either remembers every object with the same fidelity (i.e., amount of information) within the capacity limit or fails to remember the object completely. This all-or-nothing component is potentially problematic, as it suggests that an observer has the same amount of information per item whether they viewed single or multiple items. What if an observer sees two encoding displays in a typical change-detection task: one with a single image of an apple and another with four images of very similar looking apples? In each test array, there is either no change, or one of the objects is replaced with another very similar looking image of an apple. According to the fixed slot model, the precision of information available to the observer is the same in both conditions. It does not account for the potential interference four very similar objects may have in terms of their representations in VWM, or how they may affect the decision-making process when an observer decides whether a change has occurred.
Continuous resource model
Another model used to explain apparent limitations in VWM is referred to as the continuous resource model. Unlike the fixed slot model, which defines capacity as being constrained by all-or-nothing slots of integrated objects, the continuous resource model conceptualizes VWM capacity as information based and limited by a finite resource (see Fig. 6). Furthermore, this finite resource can vary unevenly across different items in a display. This unequal division of resources across representations can differ due to a variety of factors, such as top-down goals (e.g., attention) or the total information load of the display (i.e., set size; Bays & Husain, 2008; Wilken & Ma, 2004).
Support for the continuous resource model came from Wilken and Ma (2004), who developed the continuous report method as a way of measuring the fidelity, or amount of information, contained in VWM representations. In the continuous report paradigm, participants are instructed to remember an array consisting of items of a single feature (color, orientation, etc.). After a brief delay (1.5 seconds), a square cue appeared centered at the location of one of the previously presented items. At the same time, a test probe is displayed in the center of the screen, which allows for continuous report of the probed feature. For example, in the color condition, a color wheel containing all possible color values appears in the center of the screen, and participants indicate the color of the probed item by clicking a color on the wheel, using a mouse. Responses are then reported as the degree error from the true color value, and a distribution can be made based on a participant’s responses. The standard deviation (SD) of this distribution can then be used to estimate a participant’s precision of color information in VWM.
In a series of experiments, Wilken and Ma (2004) varied the set size of displays as well as the type of feature being probed in memory. Regardless of the type of feature probed, they observed that as set size increased, the precision of VWM representations decreased. However, even at large set sizes, the distribution of responses was still centered around the true value of an item, and the precision (i.e., SD) was large but still well above chance. This led the researchers to conclude that individuals could store a continuous amount of information in VWM, but the precision with which an individual item was represented varied as a function of the total information load of the display (i.e., set size). When set size is greater than four items, observers are able to store more than four items in memory. However, fewer resources are available to allocate to each item. Given the constraints of a typical change-detection task, an item may still be represented in VWM, but without the necessary amount of information to make a successful comparison. This suggests that the ~3–4 object limit proposed by the fixed slot model may simply be a behavioral artifact of the tasks being used to assess such limits.
A potential limitation of the research used to support the continuous resource model is that variable precision has not been shown for holistic representations of objects. Studies in support of this model typically probe memory along a single feature dimension (i.e., color), even if observers are viewing images of real-world objects. It remains possible that representations for objects in VWM rely on different kinds of information, some of which may be variable and others that may not. For example, when an observer sees a single image of a real-world object, their representation of that object may contain categorical knowledge (e.g., teddy bear) in addition to other knowledge such as color (e.g., brown). The information of color available to the observer may be variable as a function of set size, but it is likely that categorical knowledge is not—observers either know the categorical identity of an object or they do not.
Work by Schurgin and Flombaum (2015, 2018) provides evidence this may be the case. In their task, participants saw two images of real-world objects in a display, that were then briefly masked, and participants then had to make a 2AFC judgment containing a previously seen object and a completely new object (indicating which of the two was the old object). Critically, they added image noise to stimuli at test by randomly scrambling up to 75% of the pixels in each image. They observed that VWM performance was unaffected by noise at test, with 0% noise and 75% noise demonstrating the same level of performance. The continuous resource model predicts that noise at test should make comparisons in memory harder, as observers would be comparing a noisy internal representation to a noisy external representation (i.e., test stimuli). Thus, when noise is added at test, performance should decrease. In contrast, the fixed slot model would predict no performance decrease. Observers would have no noise in their internal representation, which would allow them to manage noise at test when making a comparison. Altogether, this work suggests that while resolution for memory could vary along a continuous resource for a single feature, such as color, the same may not be true along all dimensions, such as holistic object representations.
Flexible slot model
Contrasting evidence exists that may support either the fixed slot or continuous resource models. The flexible slot model provides a middle ground between the two, proposing that VWM is constrained to a maximum of ~3–4 representations, but that this capacity may also be limited by the amount of information load in the display. In short, VWM is constrained by slots, but there is flexibility within the system for distributing limited resources across these slots.
One study interpreted by some to support the flexible slot model was conducted by Alvarez and Cavanagh (2004). They utilized a typical change-detection task but varied the information load of displays by changing the type of stimuli presented. On each trial, one to 15 objects were presented for 500 ms, followed by a brief delay (900 ms), and then a test array. On half the trials, one of the objects changed identity, and on the other half, the displays were identical. Participants were instructed to indicate whether one of the objects had changed. Critically, trials could contain stimuli pertaining to a single stimulus class that each differed in their visual complexity: line drawings, shaded cubes, random polygons, Chinese characters, and colored squares.
If VWM is limited by a fixed number of representations (i.e., slots), then performance should be equivalent across all stimulus categories, but if VWM capacity is limited by information load, then these estimates should vary across stimulus categories. After converting responses to K estimates of VWM capacity, Alvarez and Cavanagh (2004) observed there was varying capacity estimates across different stimulus classes, ranging from 1.6 for shaded cubes to 4.4 for colored squares. This provided support that VWM is limited in its number of representations (~4 objects), but also limited by the amount of information (i.e., stimulus complexity) of what is being remembered. However, while some may interpret these results in support of the flexible slot model, a limited number of representations in VWM is not necessarily the same as objects being stored in a slot-like way.
There is disagreement as to whether these differences reflect storage limitations, therefore supporting a flexible slot model, or rather reflect comparison errors made during the decision-making process. It could be that items with higher visual complexity also have higher similarity to one another, and this will lead to greater errors at test even though overall memory capacity for items is the same. Awh et al. (2007) investigated this possibility using Alvarez and Cavangh’s (2004) method and stimuli, but with one critical modification: categorical change. When a change occurred at test, they could either be within-category changes (as in Alvarez & Cavanagh; i.e., a Chinese character replaced with a different Chinese character), or they could be between-category changes (i.e., a Chinese character replaced with a line drawing). They observed that for within-category changes, performance varied across stimulus categories, consistent with Alvarez and Cavanagh. Conversely, for between-category changes, they observed no difference in performance relative to stimulus category. This was taken to suggest that variable performance across different kinds of stimuli may be due to differences in similarity, and not information load (consistent with the fixed slot model). However, recent research has found these between-category changes are primarily driven by the use of global ensemble or texture representations. When objects are clustered in a display by type, individuals can discriminate based on a change in clustering (i.e., using an ensemble or texture representation) rather than one based on individual item memory (Brady & Alvarez, 2015). These results suggest the need for more flexible models of VWM that integrate a role for spatial ensemble representations.
Considerations affecting VWM beyond capacity
An important consideration to our theoretical understanding of working memory is that there are large individual differences in performance, specifically in relation to capacity. Previous research has found capacity estimates of VWM vary substantially across individuals, ranging from 1.5 to 5.0 objects (Vogel et al., 2001; Brady et al., 2011), and that these individual differences in capacity correlate strongly with broad measures of cognitive function, such as academic performance (Alloway & Alloway, 2010) and fluid intelligence (Fukuda, Vogel, Mayr, & Awh, 2010). These individual differences in capacity and relationships with other cognitive functions are likely because VWM is the combination of multiple processes, including short-term memory and executive control mechanisms, many of which vary in performance across individuals (Baddeley, 2003; Conway, Kane, & Engle, 2003).
In the context of the models discussed above, each could be easily amended to account for individual differences. Whether VWM capacity is limited by a fixed or continuous resource, either could conceivably vary across individuals. However, understanding and explaining these individual differences remains key in theoretical discussions of working memory, as they could be the result of potentially different sources. For example, it could be that individual differences in capacity are the result of individual differences across a resource (whether fixed or continuous). Alternatively, these capacity limitations could arise from other components contributing to working memory performance, such as executive function or attention (Baddeley, 2003; Conway et al., 2003). Indeed, research has found potential limitations or biases in VWM can arise from the combination of different sources of information, such as the memory of an item in combination with categorical information (Huttenlocher, Hedges, & Duncan, 1991), nonuniformities in attention (Schurgin & Flombaum, 2014), or ensemble information (Brady & Alvarez, 2011). As a result, understanding the source of individual differences in working memory capacity remains critical to our overall understanding of how capacity limitations arise.
In an effort to explain the capacity limitations of VWM, all the models discussed above also make an implicit assumption that the fidelity of VWM representations is directly affected by the number of objects to be remembered. However, recent research has demonstrated that even with a fixed number of items in a display there is variability across trials in the precision of VWM (Bae, Olkkonen, Allred, Wilson, & Flombaum, 2014; Fougnie, Suchow, & Alvarez, 2012; Van den Berg, Shin, Chou, George, & Ma, 2012). This suggests there are not only differences in visual working memory limits across individuals but that the quality of working memory representations varies within an individual as well. Moreover, this variability cannot be explained by fluctuations in attention or capacity resources (i.e., slots, continuous resource, etc.; Fougnie et al., 2012).
To better account for these results, researchers have proposed different kinds of variable precision models. These models account for random fluctuations in encoding precision that tend to occur from trial to trial and have been shown to fit data better than traditional fixed slot or continuous resource models (Fougnie et al., 2012; Van den Berg et al., 2012). However, it is important to note that variable precision does not necessarily favor either a slot or continuous resource account of working memory capacity (Van den Berg & Ma, 2017). As a result, the role of variability affecting VWM fidelity should be treated as separate from the source of capacity limitations described by the models above.