Only a fraction of the massive visual input that humans face in virtually every real-life situation can be selected and used for further cognitive operations (Broadbent, 1958). Because the visual input itself is constantly changing (e.g., due to eye movements), keeping past percepts in visual working memory (VWM; Baddeley, 1986) is essential for the use of this information in subsequent cognitive operations. VWM capacity is known to be extremely limited: For instance, memory performance for three to four simple colored shapes is relatively high, but performance drops when more of these items have to be memorized (Alvarez & Cavanagh, 2004; Cowan, 2001; Luck & Vogel, 1997).

However, these capacity limitations can be partly overcome by exploiting certain regularities in the sensory input (often referred to as chunking in the literature on verbal working memory; Chase & Simon, 1973; Cowan, 2001; Miller, 1956; Simon, 1974): When visual stimuli form regular and predictable ensembles, they can be grouped into larger unitary representations, leading to increased memory performance (Brady & Tenenbaum, 2013). For example, VWM is enhanced when individual stimuli can be grouped by Gestalt principles or by forming illusory contours (Anderson, Vogel, & Awh, 2013; Peterson & Berryhill, 2013; Woodman, Vecera, & Luck, 2003; Xu, 2006; Xu & Chun, 2007). Similarly, VWM is enhanced when participants learn to associate stimuli through arbitrary spatial contingencies (Brady, Konkle, & Alvarez, 2009; Olson, Jiang, & Moore, 2005). For example, Brady et al. found enhanced VWM performance for displays of two-colored disks when the displays contained disks with learned, predictable color combinations. Thus, as the participants became more familiar with the color conjunctions, VWM performance increased, as if the two colors had been grouped into a single VWM representation. Together, these findings show that stimulus regularities can be exploited to form more efficient VWM representations.

Most studies investigating the influence of stimulus regularities on VWM capacity have used simple stimuli such as colored disks. Importantly, our everyday environments contain many statistical regularities that might similarly be exploited for efficient cognitive processing. For example, the multitude of complex objects in real-world scenes commonly appear in regular and predictable locations relative to other objects. Taking such regularities into account may allow for more effective information processing (Bar, 2004; Chun, 2000). However, only few studies have investigated visual memory for naturalistic objects in real-world context. Early work suggested that memory for spatial relations among objects is better when these objects are embedded in meaningful scenes (Mandler & Johnson, 1976). More recent evidence has indicated that object representations are bound to spatial locations within scenes (or object arrays; Hollingworth, 2006, 2007), thereby helping to generate elaborate episodic representations of visual scenes (Hollingworth & Henderson, 2002). These findings suggest that rather than being stored independently of each other, objects are stored in memory in relation to their environment.

Whereas little is known about the effect of real-world regularities among naturalistic objects on VWM, a larger body of research has studied how object regularities influence visual perception. Some studies have explicitly focused on the relations among pairs of objects, comparing regular spatial arrangements (e.g., a hammer ready to hit a nail) to irregular configurations (e.g., a nail being positioned at the wrong end of a hammer). These studies have demonstrated more efficient visual perception for objects that are regularly positioned (Green & Hummel, 2006; Gronau & Shachar, 2014; Riddoch, Humphreys, Edwards, Baker, & Willson, 2003). These perceptual consequences of real-world regularities among objects may be related to differences in the encoding of regular and irregular object configurations in visual cortex (Gronau, Neta, & Bar, 2008; Kim & Biederman, 2011; Roberts & Humphreys, 2010). Indeed, we have recently found that when pairs of objects are positioned according to typically experienced, regular spatial configurations, attentional competition among individual objects is reduced at both a behavioral and a neural level (Kaiser, Stein, & Peelen, 2014). This indicates that objects appearing in such regular spatial configurations can be grouped to reduce the number of competing stimuli and to allow for more efficient processing.

Object grouping based on real-world regularities may not only facilitate visual perception, but could also represent a powerful mechanism to overcome the capacity limitations of VWM. To test this hypothesis, we measured VWM performance in a delayed change-detection task, in which participants had to memorize multiple objects that were presented in pairs. The objects of each pair were placed either in their typical, regular configurations (e.g., a lamp over a dining table or a mirror above a bathroom sink) or in irregular configurations, with their positions interchanged. We hypothesized that the visual system groups objects on the basis of their typical real-world configurations, leading to enhanced VWM performance for regularly positioned objects.

Experiment 1

In Experiment 1, we investigated whether real-world regularities enhance VWM performance. To do so, we compared performance in a delayed change-detection task between object pairs presented in their typical, regular configuration and in a condition in which that regularity was disrupted by interchanging object positions (Fig. 1).

Fig. 1
figure 1

Enhanced visual working memory (VWM) performance for objects that are positioned according to real-world regularities, relative to irregularly positioned objects. (a) A single trial sequence. Participants had to rehearse five digits aloud while performing the change-detection task. (b) Example displays of change trials from the two-pair condition in Experiment 1. Pair positions were exchanged on every trial, and the changes were always exemplar-level changes (e.g., one mirror changed to another mirror) that included both objects of the changing pair. (c) Regular object pairs led to higher change-detection sensitivity than irregular pairs. Standard errors reflect within-subjects SEMs (Cousineau, 2005)

Methods

Participants

A group of 38 healthy adults (31 female, seven male; mean age 23.6 years, SD = 4.6) participated. All of them had normal or corrected-to-normal vision and received money or course credits for their participation.

Stimuli and apparatus

We used a set of 12 object pairs of everyday objects that have a typical spatial configuration in the vertical direction.Footnote 1 To manipulate regularity, pairs could be placed in their typical configuration (“regular” condition) or vertically reversed (“irregular” condition). For each single-object category, we collected two different exemplars, leading to four possible pairs per category. All images were matched for luminance and contrast using the SHINE toolbox (Willenbockel et al., 2010). Single objects subtended a visual angle of approximately 3°. Images were displayed on a 17-in. CRT monitor (1,024 × 768 resolution, 75-Hz refresh rate). Stimulus presentation was controlled using the Psychophysics Toolbox (Brainard, 1997).

Procedure

Participants performed a VWM task with concurrent verbal suppression (see, e.g., Jackson & Raymond, 2008; Luck & Vogel, 1997; Fig. 1a). At the beginning of each trial, a string of five digits was presented for 1,400 ms. Participants had to rehearse these digits aloud throughout the whole trial. After the digit presentation, followed by a 1,000-ms blank interval, a display of two (left and right of fixation) or three (triangular around fixation) object pairs was shown for 2,000 ms (Fig. 1b). All of the groups in a display were configured in either a regular or an irregular way. After a retention interval of 1,000 ms, the display appeared again for 2,000 ms. In the second display, all pairs appeared at different locations than on the first presentation. This was done to prevent same–different decisions being made on the basis of shape-outline differences only. On 50 % of the trials, the second display contained the same objects as the first one. On the other 50 % of trials, one of the object pairs was changed: These were exemplar-level changes (e.g., a lamp changing into another lamp and a table changing into another table), and both objects of the pair changed (see Fig. 1b). After the second display, participants were first required to report whether there was a change in the objects, and then they had to type in the digits of the verbal suppression task. Participants were informed that both responses should be made as accurately as possible, without speed pressure. The experiment consisted of a total of 192 trials, with the two set-size conditions (two vs. three object pairs) and the two configuration conditions (regular vs. irregular object pairs) randomly intermixed, leading to 48 trials (24 change trials, 24 no-change trials) per condition.

Results and discussion

To test the influence of pair configurations on VWM performance, we computed d-prime scores [d′ = Z(hit rate) – Z(false alarm rate)] as a measure of change-detection sensitivity. Trials with incorrect responses in the verbal suppression task (M = 6.1 %, SD = 7.2) were excluded. A two-way repeated measures analysis of variance (ANOVA) on the mean d-prime scores, with the factors Pair Configuration (regular vs. irregular object pairs) and Set Size (two vs. three object pairs) revealed a main effect of pair configuration, with higher change-detection sensitivity for regularly than for irregularly configured pairs, F(1, 37) = 4.70, p = .037 (Fig. 1c), and a main effect of set size, reflecting better performance in the smaller than in the larger set-size condition, F(1, 37) = 82.12, p < .001. The interaction was not significant, F(1, 37) = 1.90, p = .185. Importantly, these differences in change-detection sensitivity were not related to performance differences in the verbal suppression task: A two-way ANOVA on mean accuracies in the verbal suppression task did not reveal any significant main effects or interaction, all Fs(1, 37) < 1.04, all ps > .316. Thus, verbal memory strategies cannot account for the effect of pair configuration on VWM. These results demonstrate that VWM is enhanced when pairs of objects are positioned according to commonly experienced, regular configurations, as compared to pairs in which this configuration is disrupted.

Experiment 2

The results from Experiment 1 indicated that VWM is influenced by real-world regularities. However, this effect could partly reflect low-level Gestalt grouping, which is known to enhance VWM (Anderson et al., 2013; Peterson & Berryhill, 2013; Woodman et al., 2003; Xu, 2006; Xu & Chun, 2007). Although we carefully selected our stimuli to avoid the regular and irregular object pairs differing along low-level dimensions, in Experiment 2 we included a control condition to directly rule out any influence of such putative low-level differences. For this control condition, all object pairs were inverted—that is, rotated by 180 deg. Inversion disrupts the typical object configuration, while preserving all low-level stimulus properties. Furthermore, although the abstract spatial relationship among the objects is unaffected by inversion, the pairs no longer follow typical real-world viewing conditions. Thus, if the VWM effect in Experiment 1 reflected the impact of real-world regularities rather than low-level grouping, inversion should abolish the effect.

Methods

Participants

In all, 38 healthy adults (28 female, ten male; mean age 24.4 years, SD = 4.5) participated, of which 11 had also participated in Experiment 1.

Stimuli and apparatus

The apparatus, stimuli, and setup were identical to those aspects of Experiment 1. In addition to varying the pair configuration, we also manipulated orientation by presenting the original pairs or inverted versions, in which the same pairs were presented upside-down (Fig. 2a).

Fig. 2
figure 2

The VWM regularity effect is abolished for inverted object pairs. (a) An example display from the regular inverted condition. (b) Whereas in Experiment 2 the results from Experiment 1 were replicated (in the “original” condition), no effect of pair configuration emerged in the inverted condition. Standard errors reflect within-subjects SEMs (Cousineau, 2005)

Procedure

We used the same design as in Experiment 1, except that we now used only displays with three object pairs and added the inverted condition. This again led to a total of 192 trials (48 per condition), with the two configuration (regular vs. irregular) and the two orientation (original vs. inverted) conditions being randomly intermixed.

Results and discussion

Trials with incorrect responses in the verbal suppression task (6.1 %, SD = 5.2) were excluded from the analysis. A two-way repeated measures ANOVA on mean d-prime scores, with the factors Pair Configuration (regular vs. irregular) and Orientation (original vs. inverted), yielded a significant interaction, F(1, 37) = 4.16, p = .049, but no significant main effects, both Fs(1, 37) < 1.59, both ps > .216 (Fig. 2b). For the original pairs, sensitivity was significantly higher for the regular than for the irregular configuration, t(37) = 2.40, p = .021. This VWM benefit for regular pairs was abolished by inversion: For the inverted pairs, we observed no significant difference in sensitivity between the two pair configurations, t < 1. Performance in the verbal suppression task did not differ between conditions, all Fs(1, 37) < 1.39, all ps > .246. These results show that the VWM effect obtained in Experiment 1 cannot be explained by low-level differences between the regular and irregular pairs.

Experiment 3

The first two experiments demonstrated a VWM benefit for regularly positioned objects in comparison to irregularly positioned objects. It is possible, however, that this benefit reflected more efficient perceptual processing of regular configurations: Perhaps an encoding time of 2 s (Fig. 1a) was sufficient to perceptually encode the regular but not the irregular displays. We conducted Experiment 3 to experimentally rule out the possibility that the VWM benefit observed here was fully due to perceptual limitations. Experiment 3 therefore included a condition in which participants were given 4 s to encode the displays. If the effect of object configuration was primarily due to better perceptual encoding of the regular displays, we would expect to find a decreased effect for the 4-s encoding condition, because limitations in perceptual encoding would be reduced. Alternatively, if the benefit for regular displays reflected more efficient VWM storage or retrieval, the effect should be independent of encoding time.

Methods

Participants

A total of 38 healthy adults (25 female, 13 male; mean age 23.3 years, SD = 4.9) participated, of which four had also participated in Experiment 1, and five had participated in Experiment 2.

Stimuli and apparatus

The apparatus, stimuli, and setup were identical to those aspects of Experiment 1, but here we additionally manipulated the encoding time (i.e., the time for which the first display was presented).

Procedure

We used the same design as in Experiment 1, except that we now used only displays with three object pairs and added a condition in which the first display was presented for 4 s. This again led to a total of 192 trials (48 per condition), with the two configuration (regular vs. irregular) and the two encoding time (2 vs. 4 s) conditions being randomly intermixed.

Results and discussion

Trials with incorrect responses in the verbal suppression task (6.6 %, SD = 6.1) were excluded from the analysis. A two-way repeated measures ANOVA on mean d-prime scores, with the factors Pair Configuration (regular vs. irregular) and Encoding Time (2 vs. 4 s), yielded a significant main effect of configuration, with higher sensitivity for the regular than for the irregular configuration, F(1, 37) = 4.73, p = .036. Although the longer encoding time led to better overall performance, F(1, 37) = 10.95, p = .002, the effect of configuration was independent of encoding time, as was indicated by a nonsignificant interaction,Footnote 2 F(1, 37) < 1, p = .832 (Fig. 3). Performance in the verbal suppression task did not differ significantly between conditions, all Fs(1, 37) < 1.36, all ps > .250. Because the VWM benefit for regularly positioned objects was independent of the available encoding time, it seems unlikely that differential perceptual processing of regular and irregular configurations could fully account for the effect.

Fig. 3
figure 3

Longer encoding times do not reduce the VWM regularity effect. With the 2- and 4-s encoding durations, we found equally pronounced effects of object configuration, indicating that the effect does not depend on perceptual limitations. Standard errors reflect within-subjects SEMs (Cousineau, 2005)

General discussion

In our study, we investigated the influence of real-world object regularities on VWM performance by using a delayed change-detection paradigm with concurrent verbal suppression. We found that VWM performance was enhanced when pairs of objects were positioned according to such regularities, in comparison to an irregular positioning of the same objects. Crucially, this effect of regularity was significantly reduced when the object pairs were inverted. Another control experiment (see the supplementary materials) ruled out that these findings were due only to the typical position of individual objects relative to any other object, independent of this other object’s identity: Rather, both congruent object identities and regular relative positioning within a pair are required in order to give rise to the VWM effect. Thus, neither verbal memory strategies nor low-level grouping processes or the position of individual objects alone could account for the results. These results complement previous findings of better performance in perceptual tasks for regularly positioned objects (e.g., Gronau & Shachar, 2014; Kaiser et al., 2014). Because the VWM effect was statistically independent of encoding time, our results are unlikely to solely reflect improved perception of regularly positioned objects. Rather, they indicate that real-world regularities are additionally associated with more efficient storage of objects in VWM. Although previous work had revealed the influence of associations among simple artificial stimuli on VWM (Brady et al., 2009; Olson et al., 2005), our findings show that lifelong experience with specific spatial configurations of real-world objects similarly facilitates VWM performance.

Although our change-detection paradigm directly tested VWM, it is important to stress that the VWM benefit for regularly positioned objects can only emerge by the additional recruitment of long-term memory processes. Stored knowledge can provide a conceptual link that allows for higher-quality VWM representations by offering elaborated and structured coding frames. Such effects can be seen in enhanced VWM capacity for objects of expertise (Curby, Glazek, & Gauthier, 2009; Scolari, Vogel, & Awh, 2008), in contrast to dramatically reduced capacity for artificial stimuli that do not belong to separate categories (Olsson & Poom, 2005). Because VWM can be enhanced when the stimuli match predefined templates, but can also be harmed when the stimuli violate these templates, it is worth noting that our results could in principle reflect either a VWM benefit for regularly positioned objects or a VWM cost for irregularly positioned objects (see the supplementary materials for further elaboration of this point).

Assuming that relational knowledge stored in long-term memory provides efficient schemata for organizing information in VWM, this process could operate at all different stages of the memory process (i.e., encoding, maintenance, and retrieval). Although we have provided evidence that the effect observed here is not merely perceptual in nature, our findings do not directly address the question of at which stage of the memory process the benefit for object regularities arises. We would expect that each of these stages can benefit from more effective information representation to some extent, but future work will be necessary to pinpoint their exact contributions to the overall effect.

Interestingly, our study provides evidence that grouping influences VWM even if the tested stimulus dimension is different from the stimulus dimension underlying the grouping of items. Previous VWM studies investigating the effect of grouping have typically used the same dimension for inducing the grouping of items and for testing memory performance. For example, Anderson et al. (2013) induced perceptual grouping by illusory contours that depended on the rotation of Pac-Man-like inducers and found that VWM for the rotation of single inducers improved when an illusory contour was formed. Similarly, in studies that have investigated the grouping of items on the basis of color, VWM performance was measured by color judgments (Brady et al., 2009; Peterson & Berryhill, 2013). By contrast, in the present study, memory performance was assessed through exemplar-level object discrimination, whereas the grouping was based on spatial relations and, importantly, on the object-category level. Thus, such high-level grouping of objects based on spatial–relational knowledge impacts VWM even when it is unrelated to the specific task.

The more effective VWM representation of objects that follow real-world regularities can be highly useful in natural perception. Indeed, memory for objects within natural scenes has been shown to be more effective than predicted by classical VWM models (Hollingworth, 2006; Hollingworth & Henderson, 2002). Because real-world spatial regularities appear at multiple levels and include a multitude of objects, the amount of grouping in natural scenes can be very high. This grouping of complex objects according to spatial–relational knowledge might thus represent a powerful mechanism of enhancing VWM in natural visual environments.