Object-based visual working memory: an object benefit for equidistant memory items presented within simple contours

Balta, Gülşen; Kandemir, Güven; Akyürek, Elkan G.

doi:10.1007/s00426-022-01757-w

Object-based visual working memory: an object benefit for equidistant memory items presented within simple contours

Original Article
Open access
Published: 29 October 2022

Volume 87, pages 1569–1589, (2023)
Cite this article

Download PDF

You have full access to this open access article

Psychological Research Aims and scope Submit manuscript

Object-based visual working memory: an object benefit for equidistant memory items presented within simple contours

Download PDF

2434 Accesses
Explore all metrics

Abstract

Previous research has shown that more information can be stored in visual working memory (VWM) when multiple items belong to the same object. Here, in four experiments, we investigated the object effect on memory for spatially equidistant features by manipulating simple, task-irrelevant contours that combined these features. In Experiments 1, 3, and, 4, three grating orientations, and in Experiment 2, one color and two orientations, were presented simultaneously to be memorized. Mixture modeling was applied to estimate both the precision and the guess rates of recall errors. Overall results showed that two target features were remembered more accurately when both were part of the same object. Further analysis showed that the probability of recall increased in particular when both features were extracted from the same object. In Experiment 2, we found that the object effect was greater for features from orthogonal dimensions, but this came at the cost of lower memory precision. In Experiment 3, when we kept the locations of the features perfectly consistent over trials so that the participants could attend to these locations rather than the contour, we still found object benefits. Finally, in Experiment 4 when we manipulated the temporal order of the object and the memory features presentations, it was confirmed that the object benefit is unlikely to stem from the strategical usage of object information. These results suggested that the object benefit arises automatically, likely at an early perceptual level.

Object-based selection in visual working memory

Article Open access 13 July 2021

Effects of item distinctiveness on the retrieval of objects and object-location bindings from visual working memory

Article 23 February 2022

Object-based grouping benefits without integrated feature representations in visual working memory

Article Open access 18 October 2020

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Working memory (WM) has been defined as a system that maintains information for a brief period to be used for various cognitive functions (Baddeley, 2003). The strongly limited capacity of WM in particular has attracted the attention of researchers (Brady et al., 2011). WM capacity is an important topic, considering that it is highly correlated with several major cognitive abilities, such as reading comprehension, fluid intelligence and executive function (Daneman & Carpenter, 1980; Fukuda et al., 2010; Miyake et al., 2001). Estimates of WM capacity range from just four to seven items (Cowan, 2001, 2005). Fortunately, this seems less than it is, as items can consist of combinations of multiple individual elements, referred to as chunks (Miller, 1956). Chunking can dramatically increase the amount of information maintained in memory (Chen & Cowan, 2005; Gobet et al., 2001; Miller, 1956).

A compelling demonstration of chunking information into visual objects was given by Luck and Vogel (1997). These authors measured visual WM capacity for single and conjugated features using the change detection paradigm. They asked participants to memorize the colors in an array of stimuli, and manipulated the set size (i.e., the number of colored squares). After a brief interval, a second array of stimuli was presented and participants were asked to detect if the two sets of stimuli were identical. A series of experiments were conducted by repeating this sort of manipulation for objects with multiple feature conjunctions (e.g., a line segment with a particular color and orientation). It was observed that the participants were able to detect a change in sets of up to 4 objects, no matter how many features were presented in each object. They concluded that the appropriate unit of working memory capacity is the integrated object rather than the single feature. However, this object-based account of visual WM capacity has been criticized in several subsequent studies, and several factors have been found to constrain the object benefit.

The object benefit for spatially separated features

One of the criticisms raised against Luck and Vogel’s (1997) account was that their study did not fully explain whether the benefit arises from combining multiple features into an object, or from the overlapping location of the features. Features occupied the same location in all of their experiments. For example, when colored oriented bars were used, color and orientation naturally overlapped. It is important to address the role of shared location when considering object benefit, because working memory has a limited capacity for the number of spatial locations in particular (Jonides et al., 1993; McCarthy et al., 1994), and object location can be encoded automatically regardless of task-relevancy, suggesting it has a certain primacy (Dell Acqua et al., 2010; Eimer & Kiss, 2010; Elsley & Parmentier, 2015; Kuo et al., 2009; Olson & Marshuetz, 2005). Attentional feature integration is also thought to be mediated by spatial location (e.g., Treisman & Zhang, 2006), and several studies have similarly confirmed the significant role of location in the integration of visual features in memory (Hollingworth, 2007; Saiki, 2016; Schneegans & Bays, 2017; Udale et al., 2017; Wang et al., 2016).

The question is, thus, whether the memory benefits observed by Luck and Vogel (1997) might (partially) reflect visuospatial integration. To address this, Xu (2002a, 2002b) investigated the encoding of color and orientation features of multipart objects. Xu had participants perform a change detection task in three display conditions, which relied differentially on the spatial relation between two features in an object. In one condition, two task-relevant features were located in the same part of the object. In another one, the two features were located in different parts of the same object, and in the last condition, the features were each located in different objects. The results indicated that color and orientation features that were located in the same part of the object were better encoded in visual WM, compared to features that were located in different parts of an object. It was also reported that two spatially separated features of the same object were still encoded better than two features from different objects. Despite a decline in performance when features appeared in different locations, the study demonstrated that object-based encoding benefits can be obtained, even in spatially separated parts of the same object (Xu, 2002a, 2002b).

Although Xu (2002a, 2002b) observed a memory advantage for features that were part of the same object, but which appeared in distinct spatial locations, compared to features that were part of different objects, the spatial proximity between- and within-object features was not equalized. For instance, the spatial positions of the features located in different parts of an object were much closer than the spatial positions of the features that were located in different objects. For that reason, the question is whether the benefit in the former case can truly be attributed to the object-based presentation of the features alone, or whether the recall advantage arose (also) from spatial proximity between those features. A further study of Xu (2006) addressed this issue by independently manipulating both distance and connectedness between the object parts. Memory was better for features that were closer and directly connected compared to those that were further apart and unconnected, indicating that both factors of spatial proximity and feature connectedness had a role in the object benefit in visual WM.

Feature dimensionality and independent feature stores

Apart from the role of spatial overlap in the object benefit reported by Luck and Vogel (1997), the possible role of featural overlap has also been scrutinized. In their original study, Luck and Vogel (1997) conducted one experiment in which the object features were of the same feature dimension (color–color), and somewhat surprisingly, again found that no additional costs were involved in retaining the color conjunctions compared to maintaining the same number of objects with single colors. However, several authors failed to observe such benefits from same-dimension conjunctions and argued that features from the same dimension cannot be stored as integrated objects in visual WM (Delvenne & Bruyer, 2004; Parra et al., 2011; Wheeler & Treisman, 2002; Xu, 2002a, 2002b). Conversely, an object benefit for retaining features from different dimensions has been replicated in a considerable number of studies (Delvenne & Bruyer, 2004; Olson & Jiang, 2002; Parra et al., 2011; Riggs et al., 2011; Vogel et al., 2001; Wang et al., 2017; Xu, 2002a, 2002b).

This possible difference in memory performance with feature conjunctions within and between feature dimensions is predicted by the influential feature integration theory (Treisman & Gelade, 1980). Feature integration theory (FIT) assumes that features are first registered in memory separately from their corresponding spatial locations. In a subsequent step, the features that shared the same location then form a unitary object. Importantly, in the theory, each feature type (such as color or orientation) has its own pool of memory resources, which allows parallel processing of features, if they lie on a different dimension. Owing to this mechanism, multidimensional features can also be encoded in memory without interfering (see also Allport, 1971). By contrast, combining features of the same dimension into a single object would lead to competition for their shared memory resource, and result in a decline in the total number of features that can be recalled, contrary to the evidence originally presented by Luck and Vogel (1997). FIT furthermore predicts that attention mediates in these processes, because FIT assumes that focal attention is required to maintain integrated object representations in visual WM once the features are registered to a location. Consequently, misbinding of features across multiple objects can occur as a result of attentional distraction, or due to the absence of sustained attention (Rensink, 2000; Wheeler & Treisman, 2002; but see also Gao et al., 2010; Yin et al., 2012).

The object benefit and the role of attention

In view of the possible role of attention, it is important to assess whether the object benefit for visual memory arises during stimulus encoding under the influence of attention, or at a later stage of information processing or maintenance in visual WM. A considerable number of studies that investigated selective attention mechanisms provide compelling evidence for the presence of object-based components of visual attention. For instance, the study of Duncan (1984) aimed to test the role of object-based attention in the perceptual processing of visual information, and in his study, two overlapping objects, a box and a line, were presented to participants. Each object consisted of two features, which varied between trials, and participants were asked to report either one or two features of the objects. The study found that reporting two features of the same object was as difficult as reporting just a single feature. However, reporting two features was harder if they belonged to separate objects, rather than to the same object. Because the spatial separation of features between and within the objects was equal, this difficulty can be interpreted as a cost of switching attention between objects. Duncan concluded that directing attention to a part of an object activated the rest of the object as well, and that all parts of the object were processed as whole.

Similarly, Egly et al. (1994) provided another demonstration of object-based attention using a cueing paradigm. They used a luminance detection task and showed two rectangular outlines to participants. In each trial, initial attention was manipulated by presenting a cue at one end of the two rectangular stimuli. In the majority of the trials the cues were valid, indicating that upcoming target squares would appear at the same end of the cued rectangle. In the rest of the trials, the cue was invalid and the target either appeared at the opposite end of the cued rectangle, or at the equivalent distance end of the uncued rectangle. Invalid cues required the participants to relocate their attention from the cue location to the target location, and the study focused on the response latency in invalid cue trials, reflecting the time cost of attention shifts within and between objects. They observed that the cost of switching attention between objects was larger than the cost of shifting attention within the object, demonstrating an object-specific benefit of the attention. Overall, these studies suggest that attention can select visual information based on objects. It is natural to assume that object-based information can consequently be better encoded and recalled.

The functional relationship between object-based attention and memory has also been exemplified in interference tasks. For instance, Matsukura and Vecera (2009) gave participants an attention task to perform, while they were concurrently retaining either an object or location memory. The results indicated that concurrent object-based attention tasks interfered more with object memory than with spatial memory, and this finding was interpreted to mean that some forms of object-based selection and object-based memory might be processed by the same mechanism. Similarly, another study by Barnes et al. (2001), using a dual-task paradigm found that object-based advantages for selective attention decreased while maintaining an object memory, but also that no interference occurred while maintaining a verbal or spatial memory.

Another significant role of attention for WM is that attention can prioritize items during encoding, while items in memory may also bias attention. It has been assumed that attention assists information entry into visual working memory (Bays & Husain, 2008; Schmidt et al., 2002), and can, thus, increase the probability of information being maintained in visual working memory for further processing. This attentional prioritization of WM items was observed not only when directing attention to the target item before it appeared (pre-cueing), but also afterwards (retro-cueing), and during the appearance of stimuli (Griffin & Nobre, 2003; Landman et al., 2003; Matsukura et al., 2007; Pertzov et al., 2013; Schmidt et al., 2002).

Furthermore, memory maintenance may be enhanced by retro-cueing, which may protect a memory representation from decay or interference (Makovski et al., 2008; Souza et al., 2016). Pre-cueing similarly not only prioritizes encoding the target item into WM, but also facilitates the maintenance of its memory representation (Ravizza et al., 2016; Schmidt et al., 2002). Therefore, considering the relationship between attention and visual WM, as well as the object-based theory of attention, in which attention directed to a part of an object automatically extends to the whole object, it is natural to conclude that object-based information can be both better encoded and recalled when a part of the object is prioritized.

The present study

In the present study, we aimed to further elucidate the nature of the object benefit in visual WM. We examined whether multiple pieces of visual information (i.e., visual features) that appear within the same object are more efficiently maintained in WM than those that are part of separate objects. In the first experiment, we assessed the strength of the object effect for pairs of orientation features presented at equidistant locations within and across simple objects made up out of first order contours. In the second experiment, we subsequently examined the object benefit for pairs of features from different dimensions (color and orientation). In the third experiment, we tested potential object benefits that may occur due to changes in the location of the object and its features in each trial, when the object was no longer the only spatial reference point in the display. Finally, in the fourth experiment, we examined whether the object benefit is affected by strategic use of information that is related to the object. Our expectations for the collective experiments were that an object benefit should exist even for equally spaced features, that the object benefit should be larger for non-interfering features, that is, those from different dimensions, that the size of the object effect may diminish by reducing attention to the objects themselves (i.e., the contours), and that strategic usage of information during visual processing may contribute to the effect, but not fully account for it.

Experiment 1

This experiment aimed to determine whether the presentation of orientation features within simple contour shapes benefits the encoding, maintenance, or recall of representations in WM. To achieve this goal, we used three oriented grating stimuli, presented at equidistant locations. Two of these were displayed as part of an object by embedding them together in a gray ellipse shape, while the third one was embedded in a gray circular shape on its own. One of the stimuli in the large object was colored red, indicating this stimulus would always be the first target. The second target was one of the two remaining stimuli. Although participants knew the first target stimulus during visual processing, they did not know which of the remaining stimuli would be the second target and thus, they had to memorize all three stimuli to be successful in both responses. This created two experimental conditions, depending on whether the second target stimulus was inside the same object with the first target stimulus, or outside, in the separate object. We expected better memory performance on the second target when it was in the same object as the prioritized (first) item. This hypothesis was motivated by the fact that attention prioritizes objects and this benefit would improve the quality of the encoding for features belonging to the same object. Likewise, attending to the first target, which was always in the large object, could induce memory benefits to features within the same object via the spread of attention.

Method

Participants

Twenty-five first year psychology students (21 females, mean age = 19.2, range = 18–20) with normal or corrected-to-normal vision were recruited from the University of Groningen. Prior to the experiment, participants signed an informed consent form, and they were naive to the purpose of the study. All students were rewarded course credits for their participation. The study was approved by the Ethical Committee Psychology (approval number 1920-S-0071) and it was conducted in accordance with the Declaration of Helsinki (2008).

Apparatus and stimuli

The experiment was programmed in Matlab (version 2017, 64bit), using the Psychtoolbox extension (Brainard, 1997; Kleiner et al., 2007; Pelli, 1997). Stimuli were displayed on a 27" LCD monitor using a standard desktop computer. The screen was set at a refresh rate of 100 Hz and a resolution of 1920 by 1080 pixels in 16-bit color. The participants were tested in an experimental room with uncontrolled but normal interior lighting, and they were seated at about 60 cm viewing distance from the monitor. All behavioral responses were collected via a standard USB keyboard.

The stimuli were three sine wave gratings (radius 2.1° of visual angle, 1 cycle/° and 50% contrast). The center points of each grating were placed equidistantly from each other on the circumference of an invisible circle (3.74° radius) to form the corner points of an invisible equilateral triangle in the center of the screen. In each trial, the invisible triangle was rotated over the invisible circle between 1 and 360 degrees in steps of 1 degree, and presented in random order without repetition. One of the gratings was presented in red (RGB = [200 128 128], luminance 185 cd/m²) to point out the first target grating. The other two gratings were monochrome (RGB = [128 128 128], luminance 160 cd/m²). The red grating and one randomly selected monochrome grating were enclosed by an oval shape whose width and height were 14.49° and 4.83°, respectively. The third, remaining grating was enclosed by a circular shape with a diameter of 4.15°. Both shapes were gray (RGB = [100 100 100], luminance 127 cd/m²), and outlined by a 0.09° wide black contour. This ensured that any differences in memory performance cannot be attributed to contrast or luminance. In addition to the stimuli presented at the memory display, two gray circles with a black outline were later presented at corresponding target locations as feedback circles. They were the same size as the targets and rendered in the same gray as the shapes.

The orientation of each grating was independently chosen at random from the range of angles 0°–180° without repetition between trials, in steps of 0.56° (180°/total trials). Throughout the entire trial, a black dot with a white edge (radius 0.83°) was presented in the center of the screen as a fixation point, and participants were instructed to maintain their gaze on the fixation dot. All the stimuli were displayed against a uniform light gray background (RGB = [128 128 128], luminance 160 cd/m²), which was maintained during the entire experiment. Note that all luminance and color values are approximations of the Psychtoolbox texture function and Memtoolbox color wheel RGB values.

Procedure

The sequence of a trial is shown in Fig. 1. Each trial started with the presentation of a fixation dot in the center of the screen for a duration of 700 ms. Then, three grating orientations were presented in the memory display for a duration of 500 ms. The red grating was always presented inside the flat oval object shape, together with one of the two other gratings. The task required the participant to memorize all orientations for the following memory recall test. The participants were instructed that the orientation of the red grating would always be tested first, and that one of the monochrome grating’s orientations would be tested second. After presentation of the gratings, there was a 750-ms delay period and then the first response probe with a random orientation was presented at the location of the corresponding target stimulus. To match the orientation of the probe to the memorized orientation, participants could use the keyboard; pressing the ‘C’ key for clockwise rotation, the ‘M’ key for counterclockwise rotation, and pressing the space key to submit the response. The response probe stayed on the screen until a response was given. Once the first response was submitted, there was another 750 ms delay period. Following the delay, the second response probe for one of the other gratings was presented at the location of the corresponding target stimulus. Participants were, thus, asked to reproduce the orientation of the grating that had been presented on the location of this second probe in the memory display.

Trials were equally divided between the two object conditions. In half of the trials the second target was inside of the object containing the first target (T2 in), in the other half second target was outside of the object containing the first target (T2 out). The trial types were presented in a randomized order. Following a brief 200 ms interval, a feedback screen was shown for 300 ms. The feedback screen consisted of two lines for each grating stimulus, represented inside circles at their corresponding locations. One of these lines was white, representing the actual orientation, and the other line was red, showing the participant's response. The distance between these lines, thus, indicated exactly how much the response deviated from the correct orientation of each grating.

The experiment was made up of 320 trials, divided into 20 blocks of 16 trials. At the end of every block, participants were able to see their average accuracy (based on the number of responses within or outside of 20° away from the real target value) for the first and second targets separately (the latter independent of its location), for the blocks that had been completed so far. They were shown performance for each block as well as the overall average of all completed blocks. Participants were given the opportunity to take a short break between the blocks. Prior to the experimental trials, all participants performed 4 blocks of practice trials, with 8 trials per block for a total of 32 trials, to become familiar with the task.

Data analysis

Two sets of analyses were performed. First, we calculated the average response accuracy for each participant. Responses that had an absolute deviation of 20° or less compared to the target were defined as correct responses. This criterion served as a basis for the preliminary analyses to see whether there were overall differences between object conditions, and was also used to give participants feedback on their performance at the end of each block. Then, we used Bayesian paired sample t-tests to compare the average accuracy for the two object conditions, which corresponded to having the second target inside versus outside the object containing the first target. These tests were conducted for the first and second response separately. These analyses were performed in JASP using version 0.16.3 (JASP Team, 2022). The approach of Wetzels et al. (2011) was used for interpretation of Bayes Factors (BF). According to this interpretation, BF₁₀ values between 1 and 3 were classified as anecdotal evidence, 3 and 10 as substantial evidence, 10 and 30 as strong evidence, 30 and 100 as very strong evidence, and BF₁₀ values above 100 as decisive evidence in favor of the alternative hypothesis. BF₁₀ values between 0.33 and 1 were classified as anecdotal evidence, 0.1 and 0.33 as substantial evidence, 0.03 and 0.33 as strong evidence, 0.01 and 0.03 as very strong evidence, and BF₁₀ values below 0.01 as decisive evidence in favor of the null hypothesis.

The second analysis was based on the Standard Mixture model, which is described by Zhang and Luck (2008). The Standard Mixture model assumes that there are two possible types of responses that can be given by participants. These are either responses given based on the memory of the target (P^M) or pure guessing (1 − P^M). When participants do have a memory of the target, their responses should have errors of variability around the target value, which is called memory precision (σ). Before performing the analysis, we computed angular deviations between participants’ response and the true orientation of targets, which fall between ± 90°, and where 0° represents the true orientation of the target. To implement the Standard Mixture model, we used the CatContModel package (Version 0.7.0; Hardman et al., 2017). This model has a hierarchical nature and individual participants are seen as samples of the population. Two steps were taken to implement the model-based analysis. First, multiple models were compared to determine the best fitting model. We used both the full and the reduced models; these models differ on which model parameters are constant across conditions. In the full model, both P^M and σ were allowed to vary between object conditions. Only one of the parameters was kept constant across object conditions in the second and third models (P^M and σ, respectively). Both parameters were kept constant in the last model. The model fits of CatContModel variants were compared using the Watanabe–Akaike Information Criterion (WAIC), which is considered to be the most appropriate fit statistic for hierarchical Bayesian models (Hardman et al., 2017). WAIC is based on the overall likelihood of the model and has a penalty term for the effective number of free parameters (Gelman et al., 2014). Smaller WAIC scores indicate better model fit. All parameter values were estimated by Bayesian Markov Chain Monte Carlo (MCMC) approaches. The data of all conditions and participants were fitted simultaneously and 11,000 iterations (with 1000 burn-in) were run for each model. After the first step of selecting the best model, we performed the hypothesis tests for comparing parameters that were estimated by the best model. The model statistic is based on Bayesian tests that are conceptually equivalent to ANOVAs (see Ricker & Hardman, 2017). We obtained subject-level parameter estimates from posterior chains of the best model parameters. For interpretation of the Bayes Factors (BF), we again followed the approach of Wetzels et al. (2011). The “ggplot2” package (Wickham, 2016) was used for the visualization of analysis results.

Results

First, we analyzed the average accuracy of both targets. Overall accuracy for the first target was 84.2%, while that of the second target was much lower at 32.1%. A paired sample t-test was conducted to compare the average accuracy for the two object conditions. For the first target, performance accuracy averaged 84.5% in the T2-in condition, and 84% in the T2-out condition. The statistical test revealed substantial evidence in favor of the null hypothesis (BF₁₀ = 0.275), indicating that the first response was equally accurate in both conditions. For the second response, the average accuracy was 34.9% in the T2-in condition and dropped to 29.2% in the T2-out condition (as shown in Fig. 2a). Test results revealed decisive evidence in favor of the alternative hypothesis for the effect of Object (BF₁₀ > 100).

Second, we performed mixture model analysis, and the Constant σ and P^M across Object model was selected as the best model, since it had the lowest WAIC value in the model comparisons for the first response. Table 1 shows all model fits, in which a smaller WAIC indicates a better fit. This indicated that varying either the probability of recall or the memory precision parameter did not improve the model. For the second response, the Full model was the best model and it indicated that both probability of recall and memory precision varied across object conditions. Figure 2c and d show both parameter plots for the second response. Further, Bayesian analysis provided that there was decisive evidence that recall probability was higher when the second target was inside the same object with the first target (BF₁₀ > 150). This result indicates that presenting two targets into the same object led to an increase in probability that a second target was present in working memory; however, it did not reliably affect the precision of the memory representation (BF₁₀ = 0.737).

Table 1 WAIC for all tested models in experiment 1

Full size table

Experiment 2

In Experiment1, VWM was examined for objects defined by a single feature dimension (orientation), to test whether these features can be more efficiently maintained as a result of the information being bound in a single object. In Experiment 2 we implemented the same experimental procedure for objects containing features from two different dimensions (color and orientation), to investigate whether feature conjunctions from different feature dimensions can further enhance memory performance, due to the use of independent memory stores for each feature dimension, as is proposed in FIT (Treisman & Gelade, 1980).