Introduction

Visual working memory (VWM) is a fundamental cognitive construct that is associated with a number of factors, including educational achievement, fluid intelligence, and top-down attentional control (Bengson & Mangun, 2011; Cowan, 2005; Vogel, McCollough, & Machizawa, 2005; Kane, Bleckley, Conway, & Engle, 2001; Conway, Cowan, & Bunting, 2001). Despite extensive study concerning the relationship between VWM and other cognitive processes, the exact mechanism by which capacity limits manifest is an issue of continuing debate (Franconeri, Alvarez, & Cavanagh, 2013; Cusack, Lehmann, Veldsman, & Mitchell, 2009; Luck & Vogel, 2013; Zhang & Luck, 2008; Bleckley et al., 2003). Estimates of VWM capacity (denoted K max) are often assumed to reflect the whole capacity of the memory system (i.e., the number of “slots” or the amount of some representational resource). However, performance on working memory tasks is limited by other factors as well, such as the ability to avoid mind wandering (Mcvay & Kane, 2009; Mrazek, Smallwood, Franklin, Chin, Baird, & Schooler, 2012), variations in encoding strategy (Cusack, Lehmann, Veldsman, & Mitchell, 2009), and the effectiveness of attentional filtering (Cowan & Morey, 2006). Consequently, individual differences in estimated capacity may not reflect differences in the amount of representational medium but may instead, or in addition, reflect variations in these other factors. To distinguish between actual and estimated storage capacity, we will use K max to denote an individual’s actual storage capacity (which is a purely theoretical construct), and we will use max to denote an estimate of this storage capacity in a given task.

Attentional filtering is the best-studied factor that influences max (Cowan, & Morey, 2006; Vogel & Machizawa, 2004; Kane, et al., 2001). There are two ways that filtering may be important. First, irrelevant information must be filtered out to maximize the amount of relevant information stored in VWM; indeed, physiological measures have shown that low- max individuals store more irrelevant information than high- max individuals (Vogel, McCollough, & Machizawa, 2005; McNab & Klingberg, 2008). Second, when the set size of the to-be-encoded array exceeds K max, attempting to store all of the items may cause interference that leads to inefficient storage. Consistent with this possibility, several studies have observed a drop in max at higher set sizes (e.g., Cusack et al., 2009), especially among low- max individuals and people with schizophrenia (Gold, Fuller, Robinson, McMahon, Braun, & Luck, 2006; Fukuda, Woodman, & Vogel, 2015; but see Morey & Cowan, 2005 and Saults & Cowan, 2007 for studies that failed to observe this decline in max). An explicit assumption of prior work (Cusack et al., 2009; Linke, Vicente-Grabovetsky, Mitchell & Cusack, 2011) is that this latter kind of filtering is under strategic control and that some individuals achieve high max scores because they realize that, when faced with a supracapacity array, the optimal strategy is to select only a subset of the array for VWM encoding. In contrast, other individuals exhibit low max scores because they take a suboptimal strategy of attempting to encode everything into VWM (Cusack et al., 2009; Linke et al., 2011).

In the present study, we tested the effect of these strategic factors on estimates of K max directly at varying set-sizes by instructing participants explicitly to: (1) try to remember the entire display regardless of set-size; (2) focus on a subset of the display when capacity is exceeded; or (3) simply “do your best” (which served as a control condition). We predicted that trying to remember the entire display should yield decreased performance at higher set sizes whereas focusing on a subset of the display should yield increased performance at higher set-sizes.

To preview the results, we found exactly the opposite: relative to the do-your-best condition, performance at the higher set sizes was increased in participants who were instructed to remember the entire display and decreased in participants who were instructed to focus on a subset of the items. Some possible explanations will be described in the Discussion.

Methods

Participants

A group of 168 undergraduate students at the University of California, Davis participated in the experiment in exchange for course credit.Footnote 1 Informed consent was obtained, and all participants had normal or corrected-to-normal visual acuity and normal color vision. Four participants were removed from the sample due to confusion about the stimulus-response mapping. Each participant was assigned to one of three instruction groups. The assigned instruction rotated systematically as participants were recruited so that each condition was equally likely to be tested at a given point in the academic term.

Stimuli and procedure

All stimuli were presented on a 19-inch CRT monitor with a gray background. Figure 1 depicts the stimulus sequence for the change-detection task. For a given trial, arrays of four, six, or eight colored squares (0.95° × 0.95°) were presented with the color of each square selected randomly from a set of seven colors: blue, red, violet, yellow, black, white and green (these colors were selected randomly with replacement for all set-sizes, with the constraint that no color appeared more than twice in a display). The squares were presented within a 13° × 9° region, and each square was at least 4° from the neighboring squares.

Fig. 1
figure 1

Example of the stimulus sequence used in the change detection task. (Not to scale)

Participants completed five blocks of 45 trials each. On each trial, a sample array of either four, six or eight squares was presented for 100 ms (Fig. 1). After a 900-ms delay, a test array was presented for 2000 ms, and participants were instructed to indicate via an unspeeded button press whether the two arrays were the same or different. The sample and test arrays were identical except that a single square was replaced with a square of a different color on 50 % of the trials.

Each participant was given standard task instructions (based on Luck & Vogel, 1997), except that the final sentence was varied across groups. A control group was told: “Do your best and try to get as many trials correct as possible” (N = 51). A remember-all group was told: “Try to remember the entire display, no matter how many items are present” (N = 56). A remember-subset group was told: “If you can’t remember the entire array, focus on a subset and try to remember them well”.

Data analysis

VWM performance was quantified for each combination of instruction type and set size using Pashler’s K formula (Pashler, 1988), which is the appropriate formula for this variant of the change detection paradigm (Rouder, Morey, Morey, & Cowan, 2011). We use to refer to the estimated number of items’ worth of information stored by a given participant at a given set size, whereas max can be measured only for set sizes at or above an individual’s capacity. values were analyzed using a 3 × 3 analysis of variance (ANOVA) with instruction type as a between-subjects factor and set size as a within-subjects factor.

There is considerable controversy about whether VWM is best conceived as a set of fixed-resolution, slot-like representations or a flexible pool of resources (see review by Luck & Vogel, 2013). Most previous research on individual differences in VWM capacity have quantified performance with the measure of capacity (or some variant on it), which assumes slot-like representations, and we have therefore used this same measure. However, the fundamental conclusions of the present study do not depend on this conceptualization of VWM. Moreover, we also provide the hit rates and false alarm rates so that interested readers can compute alternative measures of capacity.

Results

Figure 2 illustrates the effect of instruction type on as a function of set size; the raw hit rates and false alarm rates are provided in Table 1. The remember-all group had substantially higher mean values than the control group at set sizes six and eight, whereas the performance of the remember-subset group was nearly identical to that of the control group. These observations were supported by the ANOVA, which yielded a main effect of instruction type, F (2, 160) = 6.590, P = .002, η2 = .076, and a main effect of set-size, F (2, 160) = 49.78, P < .001, η2 = .236. Although the differences among instruction groups were numerically much larger at set sizes six and eight than at set size four, the set-size × instruction-level interaction did not reach significance, F (4, 322) = 1.855, P = .118, η2 = .023.

Fig. 2
figure 2

Estimates of visual working memory capacity (K) as a function of set size and instruction type. Error bars Standard error of the mean

Table 1 Mean hit rates and false alarm rates

To decompose the main effect of instruction type, we collapsed each participant’s scores across all set sizes and compared the remember-all and remember-subset groups with the control group with independent-samples t tests. was significantly greater in the remember-all group than in the control group, t (105) = 3.289, P = .001. d = .63, but there was no significant difference between the remember-subset group and the control group, t (111) = .052, P = .959, d = .02.

Traditional null hypothesis statistical testing does not make it possible to conclude that the remember-subset and control instructions lead to equivalent performance. However, it is possible to convert a t value into a Bayes factor, which indicates the relative likelihood of the null hypothesis versus the alternative hypothesis (Rouder, et al, 2009). When we converted the t values to Bayes factors (using the calculator at http://pcl.missouri.edu/bayesfactor), we found that the null hypothesis of no difference between the remember-subset group and the control group was 6.9 times more likely to account for the data than the alternative hypothesis of a difference. For the comparison of the remember-all group with the control group, in contrast, the Bayes factor indicated that the alternative hypothesis was 13.2 times more likely to account for the data than the null hypothesis. Thus, we can conclude with substantial confidence that instructing participants to “focus on a subset and try to remember them well” does not lead to better performance than an instruction to “do your best.” In contrast, instructions to “remember the entire display, no matter how many items are present” led to enhanced performance.

Discussion

These findings provide clear evidence that instructional manipulations can influence estimated VWM capacity. Along with the substantial evidence that individual differences in filtering ability explain a significant proportion of the across-subject variation in max (Cowan & Morey, 2006; McNab & Klingberg, 2008), the present study makes it clear that performance in simple VWM tasks can be influenced by factors other than the amount of representational medium (whether conceived as a set of fixed-resolution slots or a flexible pool of resources). Thus, studies of individual differences in estimated VWM capacity must be careful about assuming that they have measured the amount of representational medium (i.e., that max is actually a good estimate of K max). For example, studies concerning the genetic basis of working memory have concluded that estimates of WM capacity across the lifespan are determined by genetic factors that are also predictive of activity within the parietal cortex (Heck, et al., 2014) and that performance on working memory tasks are almost entirely genetic in origin (Friedman, et al., 2008). The present results, which show that varying one sentence of instruction can significantly impact max, are not easily reconciled with a view of working memory performance as reflecting an innate, inflexible cognitive capacity. The present results are, however, consistent with the finding that estimates of working memory capacity can be increased by training (Jaeggi, Buschkuehl, Jonides, & Perrig, 2008) and decreased by stress (Arnsten, 1998) and sleep deprivation (Ilkowska & Engle, 2010).

Prior work using similar tasks has shown that max declines when capacity is exceeded (Cusack et al., 2009; Matsouyoshi, Osaka, & Osaka, 2014), especially in low- max participants (Fukuda, Woodman, & Vogel, 2015). An explicit assumption in prior work is that a decline in max at higher set-sizes is due to a maladaptive encoding strategy in which participants try to remember the entire display regardless of set size (Cusack et al., 2009; Linke et al., 2011). We therefore expected that performance would be impaired at higher set sizes if we explicitly instructed participants to follow this strategy and that performance would be enhanced if we encouraged the assumed-to-be optimal strategy of focusing on a subset of the items in supra-capacity arrays. However, we found exactly the opposite, with no benefit in the remember-subset group and enhanced performance in the remember-all group (relative to the “do your best” control group).

Although it is conceivable that some different variant of the remember-subset instructions would lead to enhanced performance, the present results provide no support for the hypothesis that encoding a subset will lead to better performance than attempting to encode the entire array. Moreover, the fact that the remember-all instructions led to superior performance compared to do-your-best instructions provides strong evidence against the hypothesis that attempting to encode the entire array is a maladaptive “default encoding” strategy that typically leads to a decline in max at higher set-sizes. In fact, the finding of equivalent performance in the remember-subset and control groups suggests that the default strategy is to focus on a subset of the items once capacity limits are reached at higher set sizes.

The present results may appear to conflict with a study by Zhang and Luck (2011), which reported no effect of strategic manipulations on the quality and quantity of representations in VWM were examined. However, this previous study examined whether participants could trade quality for quantity, increasing max by storing less precise representations. That is very different from asking whether focusing on a larger or smaller subset of the array would impact max, which was the goal of the present study. Indeed, the Zhang and Luck (2011) results suggest that the benefit observed in the remember-all condition of the present study is unlikely to reflect the storage of a larger number of lower-precision representations. However, it would be useful for future research to directly test this explanation of the present results.

A more likely explanation of the improved performance in the remember-all group is that this instruction may have encouraged participants to form a representation of the statistics of the overall array (Brady, Konkle, & Alvarez, 2011) in parallel with representations of the individual objects. This could have allowed participants to detect changes either by noticing that an individual object had changed color or by noticing that the overall scene statistics had changed. Future research could test this by using a task that cannot be influenced by ensemble representations and assessing whether this eliminates the advantage of the remember-all instructions.

Another possible explanation is that the remember-all condition leads to increased arousal or vigilance and therefore a reduction in mind wandering (see Mrazek et al., 2012). If this were true, we would expect improved performance across all set sizes. However, we saw little or no effect of the remember-all instruction at set size four (see Fig. 2). Moreover, there is no obvious reason why the remember-all instructions would lead to greater arousal or vigilance than do-your-best instructions.

Yet another possibility is that the remember-all instruction leads to a chunking strategy, in which similar colors are stored together. For example, when participants are instructed to remember every item, this may cause them to notice that two items in an array have the same color, and this might help them store the information more efficiently. Note, however, that the sample array was presented for only 100 ms, which minimizes the opportunity for elaborate encoding strategies.

No matter what the explanation turns out the be, the present results demonstrate that attempting to encode the entire array is not a maladaptive strategy, as might be expected by the idea that working memory performance is limited by failures of filtering. Indeed, attempting to encode the entire array may actually be the best strategy, at least under the very standard conditions of the present study.

In summary, the present findings add to the existing literature in four critical ways. First, these results highlight the sensitivity of VWM to subtle variations in instructions, revealing the practical importance of choice of instructional strategies for future working memory research. Second, the strategy of trying to remember the entire display beyond capacity limits at higher set-sizes does not appear to produce decreases in max. If anything, instructing participants to adopt this strategy increases k̂ max, at least under the conditions used here. Third, the default strategy employed by individuals in a change-detection task may be to focus on a subset of the items when capacity limits are reached, as demonstrated by the equivalent performance in the do-your-best and remember-subset conditions. Finally, these data suggest that estimates of working memory capacity are at least in part determined by task-dependent and flexible strategic factors rather than inflexible and innate limitations.