In our visually complex and temporally dynamic world, we are constantly presented with an overabundance of visual information. There are limitations to the amount of this information that can be processed, resulting in the need for selection of salient and/or goal-relevant information among uninformative background noise. To do this, visual selective attention is employed to enhance processing of relevant objects in our environment. It is not enough to just select important information, however, because relevant objects often change across time. They may change position, become occluded, or be replaced by new objects. For example, we can infer that a bird flying behind a cloud and reappearing at a higher altitude is most likely not a new bird but a continuation of the old one.

Depending on the situation, it may be most accurate to track an object across time and space as one continuing unit, or to split the representation into two separate objects. Both of these outcomes are achieved through the process of object updating: the gradual accommodation of new information into an existing object representation. When the new information is too inconsistent with the old representation, this may result in object individuation, such that the brain concludes that there were two objects present instead of one (i.e., a bird flies behind a cloud but reappears as two, therefore a new object representation for the second bird must be created). Together, visual attention, object updating, and object individuation can help maintain accurate and stable representations of objects across time, despite many changes within a scene.

How these processes contribute to the formation of stable object representations has been studied extensively using object-substitution masking (OSM). OSM is a form of visual masking in which the mask consists of four dots at each corner of the target item, forming an empty square around the target. When the mask offsets at a slightly later time than the briefly presented target item, it results in the perception of an empty mask alone, erasing the target representation (Enns & Di Lollo, 1997; Di Lollo, Enns, & Rensink, 2000; for a review, see Goodhew, 2017). It was originally proposed that OSM occurred as a result of disruptions to reentrant processes (Di Lollo et al. 2000; Dux, Visser, Goodhew, & Lipp, 2010; Weidner, Shah, & Fink, 2006). In this account, the low-level sensory representation of the target and mask together is sent to higher level areas of the brain to be further processed. Next, reentrant connections between higher and lower levels of the brain compare the existing representation with new incoming sensory information. Due to a mismatch between the lower level (mask alone) and higher level (target plus mask) representations, the old sensory information is substituted for the new information, resulting in the perception of an empty mask.

Recently, the substitution theory of OSM has been contested by an alternate mechanism: object updating. In this account, the target-plus-mask object is gradually updated with the representation of the trailing mask alone, such that the visual system infers that the mask is a continuation of the previous object. Although the substitution and updating accounts are quite similar, they predict different outcomes under conditions of temporal object discontinuity. For example, in support of the object updating theory, Lleras and Moore (2003) used apparent motion to induce the appearance of object continuity across time and location. In the apparent motion condition, masking was observed, whereas when apparent motion was not present, the masking effect was abolished. Similarly, if the mask can be individuated from the target, (i.e., by difference in color, luminance, or independent movement), masking strength is reduced and vice versa (Gellatly, Pilling, Carter, & Guest, 2010; Goodhew, Edwards, Boal, & Bell, 2015; Luiga & Bachmann, 2008; Moore & Lleras, 2005).

By contrast, the object substitution account does not make any predictions about the effect of object continuity or individuation on masking strength. Instead, substitution theories rely on the assumption that OSM only occurs when the target does not receive focused attention. This is supported by the finding that the duration of the trailing mask offset and number of distractor items interact to affect behavioral performance (Di Lollo et al., 2000): As the number of distractors increase, there is a decreased likelihood of focused attention on the target item, resulting in increased masking magnitudes. However, more recent studies have failed to find an interaction between attention and masking once accounting for ceiling performance at small set sizes (Argyropoulos, Gellatly, Pilling, & Carter, 2013; Filmer, Mattingley, & Dux, 2014; Goodhew & Edwards, 2016). Successful masking has even been found for a centrally presented target without any distractors present (Filmer, Mattingley & Dux, 2015).

Questions surrounding the underlying mechanisms of OSM, such as object updating versus substitution and the influence of selective attention on masking strength, have been mostly studied using behavioral paradigms. In fact, all of the findings mentioned thus far have been garnered from behavioral measures of masking magnitude. However, a handful of electrophysiological studies examining event-related potentials (ERPs) during OSM have been conducted (e.g., Kotsoni, Csibra, Mareschal, & Johnson, 2007; Reiss & Hoffman, 2007). The majority of these studies examined ERPs related to selective attention and visual working memory (VWM) maintenance during OSM tasks.

To investigate the effect of attention on OSM, Woodman and Luck (2003) examined the N2pc, which is a lateralized parietal component that reflects the allocation of attention toward a target item (Eimer, 1996; Luck & Hillyard, 1994). They found that OSM did not affect this component, suggesting that attention is initially successfully allocated toward the target item but that the target representation is subsequently not consolidated due to the presence of the lingering mask. Similar results were found by examining the N2pc in following OSM studies (i.e., Harris, Ku, & Woldorff, 2013; Prime, Pluchino, Eimer, Dell’acqua, & Jolicœur, 2011), providing strong evidence that focused attention is not the bottleneck limiting performance in OSM tasks. However, none of the aforementioned studies have examined how set size (i.e., the number of distractors) influences the allocation of attention. This is because these studies were modeled on the assumption that large set sizes are required to obtain masking effects (i.e., 10–15 distractors). Therefore, it is currently unknown if and how set size interacts neurally with masking. This question is especially important given the previous assumption that large set sizes are required to observe masking and that increasing set size results in greater masking effects (Di Lollo et al., 2000).

Another ERP component that is often examined in OSM studies is the sustained posterior contralateral negativity (SPCN, also referred to as the contralateral delay activity, or CDA). This component is typically found during the delay period of VWM tasks and is classically thought to reflect the number of items, or overall amount of information, maintained in VWM (Luck & Vogel, 2013; Vogel & Machizawa, 2004; Vogel, McCollough, & Machizawa, 2005; but see Emrich, 2015). In previous OSM studies, the SPCN was used as an indicator of whether the target item reached awareness by being stored in memory (Harris et al., 2013; Prime et al., 2011). In these experiments, correct and incorrect trials were compared within the masked condition, and it was found that there was greater SPCN amplitude on correct as compared to incorrect trials (i.e., there is less information stored in VWM when masking is successful).

Because the SPCN is affected by VWM load, it should also indicate whether more distractor items are being stored in memory as set size increases. Behaviorally, there is evidence that when set size increases, there is an increase in the likelihood of misreporting a distractor item instead of the target in OSM (Harrison, Rajsic, & Wilson, 2016). This was determined using a mixture model analysis (see Bays, Catalao, & Husain, 2009, or the Method section for more information), which determines the proportion of nontarget errors that are made in any given condition. There are two possibilities for the increase in observed nontarget errors: (1) impaired selective attention for the target item when more items must be searched, which would be reflected by changes in N2pc amplitude across set sizes, or (2) more distractor information is incidentally stored in VWM when items are presented close together (i.e., Emrich & Ferber, 2012). In the second case, the amplitude of the SPCN would then reflect this increase in the number of stored items. Of course, it is likely that both of these processes co-occur, such that there is greater difficulty in attending to the target and an increased likelihood of storing distractors in memory as set size increases.

The SPCN may also reveal information about the underlying mechanisms of OSM (i.e., updating vs. substitution). For example, according to the object updating account, when items are successfully masked the target-plus-mask representation is gradually updated with a representation of the trailing mask alone. This updating should result in stable SPCN amplitude, as there are no rapid changes in the number of encoded representations. When masking is unsuccessful, the updating account suggests that object individuation occurred, resulting in two separate representations of the target plus mask and mask alone. Individuation should then result in a gradual increase in SPCN amplitude as the amount of object representations stored in VWM changes from one to two. By contrast, if the consolidation of the target-plus-mask representation is suddenly interrupted and replaced by the mask alone (as suggested by the object substitution account) this should be reflected by a rapid decrease in the SPCN amplitude as the initial target representation is dropped from awareness and replaced with the mask alone. Although previous studies have examined the SPCN during OSM tasks (Harris et al., 2013; Prime et al., 2011), these studies have not investigated the SPCN as a potential marker differentiating updating, individuation, and substitution accounts of OSM.

The primary aim of the current study was to examine whether the effects of set size are independent and dissociable from those of masking as reflected by electrophysiological measures of attention (N2pc) and VWM (SPCN). Importantly, using ERPs allows for a temporally precise examination of cognitive processes related to changes in set size and masking. Therefore, we can examine whether changes to set size affect early selective attention processes, later VWM maintenance, or both. Moreover, if the behavioral interaction between set size and masking is due to changes in selective attention, VWM consolidation, or VWM maintenance, then this interaction should be present within one or both of these ERP components. Additionally, we conducted a post hoc examination of the early SPCN to determine how it was affected by successful and unsuccessful masking, and whether this would elucidate the potential underlying mechanism (i.e., updating or substitution) of OSM.

To preview the results, it was found that set size and masking had separate effects on selective attention (as indexed by the N2pc) and consolidation processes (early SPCN), respectively. Also, a significant effect of set size on the amplitude of the late SPCN suggested that more distractor information is stored in memory at larger set sizes. Finally, there was evidence that the contents of VWM were dropped, causing an imprecise response irrespective of whether the target was masked or unmasked. These results suggest that object individuation-through-updating likely occurred when a correct response was made during masked trials, demonstrating the importance of the object consolidation process in determining whether the target reaches visual awareness in OSM.

Method

Participants

Twenty-six undergraduate participants were recruited from Brock University using the university online Psychology subject pool. A total of 20 participants (M age = 20.45, SD age = 2.35; nine males) were included in the final analyses. All participants had no history of psychiatric illness, no head injury within the past 5 years, and were right-handed (see Procedures section). Two participants were excluded based on poor behavioral performance (for behavioral exclusion criteria, see Behavioral Participant Exclusion section). Three other participants were rejected based on EEG artifact detection with greater than 35% of their total trials rejected due to blinks and/or lateral eye movements during target stimulus presentation. One final participant was excluded due to technical problems during the EEG recording session. Participants were remunerated $15/hour or received bonus course credit.

Task design and procedure

Participants were fitted with a 64-electrode EEG cap from the BioSemi ActiveTwo system (BioSemi ActiveTwo System, Amsterdam, The Netherlands). As developed by Guzman-Martinez and colleagues (Guzman-Martinez, Leung, Franconeri, Grabowecky, & Suzuki, 2009), participants were first given a 10-minute rapid eye-fixation training task before the main study. This task provides immediate feedback on the stability of eye fixation, which is crucial to minimizing muscle movement artifacts in EEG data. Next, a 10-minute lateralized change detection task was administered, but the results from this task were not analyzed in the current study, so we do not expand upon its design here (see Luck & Vogel, 1997, for task design).

Following this, participants completed a lateralized OSM task, which took approximately one hour (see Fig. 1). Trials began with a fixation screen for 200 ms. Following this, an arrow cue pointing either left or right replaced the fixation dot for 200 ms. Participants were instructed to pay attention to the side of the screen that the arrow pointed to without moving their eyes. This lateralization of the task is necessary for obtaining the N2pc and SPCN components. Next, a fixation dot was presented for a variable interval between 200 – 500 ms.

Fig. 1
figure 1

Example of a lateralized Set Size 2, Masked trial of the object-substitution masking task. Participants were cued to one side of the screen by an arrow and then told to find the target and remember its orientation on the cued side only. Responses were recorded using continuous report, such that the target orientation could be reported anywhere from 0–359 degrees

Following this, the memory sample appeared for a total of 17 ms (similar to previous OSM designs; i.e., Harris et al., 2013; Harrison et al., 2016). This display consisted of 2 or 4 Landolt Cs (1 × 1 visual degree) in random orientations. There was a minimum separation of 30 degrees between the orientations of any two Landolt Cs presented on the screen. These Cs appeared in any of four locations on a semicircle 2 visual degrees away from the fixation dot. Equivalent visual stimuli were presented on the noncued side of the screen to balance visual inputs. Participants were instructed to notice and remember the orientation of the Landolt C that was surrounded by four dots (1.5 × 1.5 visual degrees). On half of the trials the mask had a simultaneous offset with the target (0-ms delay) and on the other half there was a delayed offset (300 ms). Following this, there was either a 583-ms or 283-ms delay period, depending on the Mask Condition, for a total of 600 ms after the presentation of the target array to capture the SPCN. Participants were not informed that on half of the trials the target was masked.

Participants were then required to make a free response wherein they rotated the orientation of a probe Landolt C, which was presented in the spot where the target was previously located. The gap of the Landolt C followed the mouse as it moved. Participants used the mouse to make an orientation response (0°–359°), and their response was recorded in degrees once the mouse was clicked. There were 120 trials per condition (2 Sides × 2 Set Sizes × 2 Mask Offsets), resulting in 960 trials in total.

EEG recording and preprocessing

Electrophysiological data was DC recorded at 512 Hz from 64 active Ag/AgCl electrodes placed at the standard 10–20 locations using an electrode cap (BioSemi ActiveTwo System, Amsterdam, The Netherlands). Horizontal and vertical eye movements were monitored with bipolar horizontal (HEOG) and vertical (VEOG) electrooculogram electrodes. The data were online referenced to the common mode sense (CMS) and the driven right leg (DRL) electrodes.

All EEG preprocessing was done in MATLAB R2014a with the EEGLAB (Delorme & Makeig, 2004) and ERPLAB (Lopez-Calderon & Luck, 2014) toolboxes. Data were rereferenced off-line to the average of the mastoids. A 40-Hz low-pass and 0.1 Hz-high-pass Butterworth filter was also applied off-line. Trials with large eye movements and/or eye blinks were excluded from the analysis (greater than 32 microvolts HEOG artifacts and/or greater than 80 microvolt VEOG peak-to-peak threshold). An average of 14.8% of all trials were rejected, leaving an average of 818 trials per participant to be included in the final analyses. ERPs were time-locked to the onset of the memory array, with a baseline correction 100 ms preceding stimulus onset. The data was epoched from −200 prestimulus to 800 ms poststimulus to create the averaged ERPs.

Difference waves were calculated by subtracting ipsilateral from contralateral channels, collapsing across the side on which the stimuli were presented. The difference waves reflect activity related solely to the processing of the target because activity related to the sensory processing of the stimuli is subtracted out during the calculation (i.e., ipsilateral sensory responses). There were four difference waves reflecting activity in each experimental condition: Set Size 2/Unmasked, Set Size 2/Masked, Set Size 4/Unmasked, and Set Size 4/Masked. We also analyzed precise and imprecise trials (as determined by behavioral response error) within the Masked and Unmasked conditions.

Behavioral data analysis

The behavioral data were analyzed with the three-component mixture model analysis using the MemToolBox 1.0 (Suchow, Brady, & Alvarez, 2013). Bays et al. (2009) defined this model based off of a two-component model of behavior previously described by Zhang and Luck (2008). This earlier probabilistic model proposed two sources of error in a continuous report task: variability in remembering a target feature (i.e., color or orientation) and the probability of making a random guess. Bays et al. (2009) added another source of error to this model: the probability of reporting a distractor item instead of the target. Both target and distractor responses are drawn from a von Mises distribution with the same standard deviation (i.e., the width of the distribution).

Each type of response has its own probability density function (Bays et al., 2009): target responses consist of a von Mises distribution centered on the target value with a given variability; nontarget responses consist of multiple distributions centered on the nontarget values, and a guess component consisting of a fixed uniform distribution. The standard deviation (SD) of response error is also obtained from the von Mises distribution as an estimate of behavioral precision (note: precision = 1/SD). For each participant, maximum likelihood estimates for each parameter within each experimental condition were obtained using maximum likelihood estimation (MLE; Bays et al., 2009; Suchow et al., 2013). MLE is a procedure used to find parameter estimates that best fit the model given the current data set (Dempster, Laird, & Rubin, 1977; Suchow et al., 2013).

In sum, the three-component mixture-model analysis provides an estimate of target rate (1 − guess rate − nontarget error rate), guess rate, nontarget error rate, and standard deviation for each experimental condition.

Behavioral participant exclusion

Based on their behavioral performance, two participants were excluded from the behavioral and electrophysiological analyses. One participant had a nontarget error rate greater than 3 standard deviations above the mean in both the Set Size 2, Unmasked (M = 0.01, SD = 0.02), and Set Size 4, Unmasked (M = 0.03, SD = 0.03) conditions. The second participant had a standard deviation in the Set Size 4, Masked Condition (M = 27.09, SD = 13.34) that was 3 standard deviations greater than the mean.

Results

Behavioral results

Response error

An estimate of behavioral precision, response error (measured in degrees), was calculated as the difference between the participants’ response and the target orientation on every trial. For each condition, the circular standard deviation of the response error was calculated (variability of response error in degrees). A Set Size (2) by Mask Condition (2) repeated-measures ANOVA was conducted on response error by condition. There was a main effect of Set Size, F(1, 19) = 105.57, p < .001, partial η2 = .847, and Mask Condition, F(1, 19) = 133.63, p < .001, partial η2 = .876 (see Fig. 2), such that individuals had greater response error at Set Size 4 (M = 41.74, SD = 6.41) than at Set Size 2 (M = 32.51, SD = 9.04), and greater response error for Masked trials (M = 45.86, SD = 7.42) than for Unmasked trials (M = 28.40, SD = 5.63). The interaction between Set Size and Mask Condition was not significant, F(1, 19) = 1.96, p = .178, partial η2 = .093.

Fig. 2
figure 2

Mean response error by condition. Within-subjects error bars represent the 95% CI

Mixture model

The data were then analyzed using the three-component mixture model, which uses response error to determine the proportion of trials on which a target, nontarget, or guess responses were made. The width or standard deviation of the response error distribution in the mixture model reflects target and distractor precision in an experimental condition. A Set Size (2) by Mask Condition (2) repeated-measures ANOVA was conducted on all parameter estimates (guess rate, nontarget error rate, standard deviation, and target rate; see Fig. 3 for means). For all of the parameters there was a significant main effect of Set Size and Mask (see Table 1 for ANOVA values). There was a significant interaction between Set Size and Mask for all of the measures, with the exception of standard deviation.

Fig. 3
figure 3

Means for each parameter estimate by condition. Error bars represent the within-subjects 95% CI

Table 1 ANOVA summary table for behavioral results by Set Size and Mask Condition

Behavioral results summary

Overall, masked target items and greater set sizes negatively impacted performance on all parameters derived from the mixture model as well as overall response error. This included decreasing the likelihood that the target item would reach awareness (i.e., guess and target rates) and reducing the precision of the target representation held in memory (i.e., response error and standard deviation). Masked targets and greater set sizes also increased the likelihood of making a nontarget responses, perhaps due to a decrease in access awareness for the masked target item, and the increased competition from nearby distractors (Emrich & Ferber, 2012). Additionally, the effect of Mask Condition and Set Size interacted with one another for the parameters measuring aspects of threshold responding (i.e., guess and target rate), and they did not significantly interact for measures of representational quality. These results suggest that threshold and precision responses are distinct measurements, perhaps reflecting separate underlying mechanisms (i.e., Salahub & Emrich, 2016). This is also a replication of the behavioral results found by Harrison et al. (2016), suggesting that processes occurring during OSM (such as object updating or substitution) degrade the quality of objects that reach awareness, and that this is a separate process from the effects of selective attention on object fidelity, as manipulated through set size.

ERP analysis and results

All ERP analyses were performed on channel pair PO7/PO8, as this is where the N2pc and SPCN were found to be maximal in the present study and is also consistent with the channel selection in previous experiments (Prime & Jolicoeur, 2010; Prime et al., 2011).

Upon examination of the grand average plot (see Fig. 4), it appeared that there were differential effects of the experimental conditions on the SPCN at two separate time points: one occurring during an early latency (350–500 ms; SPCNE) and one during a later latency (500–650 ms; SPCNL; see similar results in Luria & Vogel, 2011; Peterson, Gözenman, Arciniega, & Berryhill, 2015). To determine whether there were statistically separate effects of Set Size and Mask Condition at the different time points, a Set Size (2) × Mask Condition (2) × Time Point (2) repeated-measures ANOVA was conducted. There was a significant Set Size × Time Point interaction, F(1, 19) = 20.66, p < .001, partial η2 = .521, and a significant Mask × Time Point interaction, F(1, 19) = 7.00, p = .016, partial η2 = .269. Simple effects analyses showed that the effect of Set Size was only significant for the SPCNL, F(1, 19) = 5.70, p = .024, partial η2 = .333, and that the effect of Mask was only significant for the SPCNE, F(1, 19) = 9.51, p = .006, partial η2 = .239. Given the presence of the significant Condition × Time Point interactions, the early and late time windows were analyzed separately.

Fig. 4
figure 4

a Grand average ERP waveforms measured at PO7/PO8 for each Set Size × Mask experimental condition. (b–d) Mean amplitude of N2pc, SPCNE, and SPCNL for each experimental condition. Within-subject error bars reflect ±1 standard error of the mean. (Color figure online)

N2pc

For each ERP component of interest we ran a Set Size (2) by Mask Condition (2) repeated-measures ANOVA at channel pair PO7/PO8. There was a significant main effect of Set Size on the N2pc, F(1, 19) = 5.32, p = .033, partial η2 = .219. The N2pc had greater amplitude at Set Size 4 (M = −.945, SE = .312) than at Set Size 2 (M = −.516, SE = .193). There was not a significant main effect of Mask Condition, F(1, 19) = 1.85, p = .190, partial η2 = .089, or a Set Size × Mask interaction, F(1, 19) = 0.0004, p = .982, partial η2 = 0.00002. Thus, given the role of the N2pc in attentional selection, these results suggest that the number of distractors, but not the presence of the mask, affected participants’ ability to accurately allocate attention toward the target.

SPCNE

There was a significant effect of Mask Condition on the amplitude of the SPCNE, F(1, 19) = 9.51, p = .006, partial η2 = .333. The amplitude of the SPCNE was greater when the target was masked (M = −1.16, SE = .300) than when the target was unmasked (M = −.789, SE = .297). There was not a significant effect of Set Size on the SPCNE, F(1, 19) = .012, p = .912, partial η2 = .001, or a Mask × Set Size interaction, F(1, 19) = .187, p =.670, partial η2 = .010.

Although the behavioral results suggest that the mask was effective in reducing reportability of the target, the larger SPCN amplitude for masked items is contrary to previous results indicating that the SPCN amplitude decreases on incorrect trials (i.e., successfully masked; Harris et al., 2013; Prime et al., 2011). However, these previous findings were from comparisons between trials in which the target was accurately reported and those in which the mask was successful (inaccurate responses). Thus, because the current analysis averages across performance accuracy in all conditions, it remains possible that Masked trials that were generally responded to accurately have greater amplitude than inaccurate Unmasked trials (consistent with previous findings). We further examine the distinction between accurate and inaccurate trials in the ERP Precision Analysis section, below.

SPCNL

There was a main effect of Set Size for the amplitude of the SPCNL, F(1, 19) = 5.97, p = .024, partial η2 = .239. The mean amplitude of the SPCNL was greater at Set Size 4 (M = −1.21, SE = .301) than at Set Size 2 (M = −.925, SE = .263). There was not a significant effect of Mask Condition, F(1, 19) = .002, p = .966, partial η2 = 0.00, or a Set Size × Mask interaction, F(1, 19) = .548, p = .468, partial η2 = .028. Thus, unlike the early SPCN time window, the SPCNL was modulated by the number of distractors present, suggesting that increasing set size results in more distractors being stored in VWM. This finding is further explored in the Discussion section.

ERP precision analysis

The preceding analyses demonstrate that immediately following the allocation of attention (as indexed by the N2pc), the early window of the SPCN increases in amplitude in Masked compared to Unmasked trials. Given that accuracy is higher in the Unmasked conditions, this finding is somewhat in contrast to previous studies that have observed larger SPCN amplitude in response to correct compared to incorrect trials (Harris et al., 2013; Prime et al., 2011). One solution may be that the greater SPCN amplitude on masked trials is due to object individuation-through-updating processes. That is, when masking is unsuccessful (i.e., a correct response is made despite the presence of the mask), the representations of the target plus mask and mask alone may be consolidated as separate representations. Because the SPCN amplitude is typically thought to reflect information load, this would result in larger SPCN amplitude for Masked trials as compared to Unmasked trials. This should only be true, however, if the target is unsuccessfully masked, resulting in individuation of the target from the mask and more accurate performance (Goodhew et al., 2015; Moore & Lleras, 2005). On trials in which the target is successfully masked, however, the target should either fail to be individuated from the mask (object updating) or should be substituted by the mask entirely (object substitution).

Consequently, to further examine whether the SPCNE is affected by behavioral performance accuracy, over and above the effect of the mask, we analyzed the ERPs within the Masked and Unmasked conditions, separating trials based on how precise participants were on a given trial. Namely, precise and imprecise trials were determined for each participant by performing a median split on response error (response orientation − target orientation in degrees). Imprecise trials likely reflect a combination of imprecise responses as well as guesses and nontarget errors, and can be referred to as overall “inaccurate” trials. As we were primarily interested in examining selective attention and performance accuracy in the Masked condition, we averaged across Set Size in the following analyses. A 2 (Mask Condition) × 2 (Precision) repeated-measures ANOVA was conducted for the N2pc (reflecting selective attention, 200–350 ms) and the SPCNE (350–500 ms) at channel pair PO7/PO8 (see Fig. 5).Footnote 1

Fig. 5
figure 5

a Grand average ERP waveforms measured at PO7/PO8, split by Precision and Mask Condition. (b–c) Mean amplitude of N2pc and SPCNE for each condition. Within-subject error bars reflect ±1 standard error of the mean. (Color figure online)

SPCNE

Both main effects of Precision, F(1, 19) = 7.53, p = .013, partial η2 = .284, and Mask Condition, F(1, 19) = 14.53, p = .001, partial η2 = .433, were significant. The SPCNE amplitude was larger for Precise trials (M = −1.06, SD = 1.51) as compared to Imprecise trials (M = −.594, SD = 1.20). Additionally, there was greater amplitude for Masked (M = −1.01, SD = 1.46) as compared to Unmasked (M = −.646, SD = 1.25) trials. The difference in amplitude across Mask Conditions is consistent with our previous SPCNE analysis, and suggests that more information is consolidated into memory when the mask is present. The interaction term did not reach significance, F(1, 19) = 2.46, p = .133, partial η2 = .115.

Overall, these results suggest that an inaccurate behavioral response is due to the target representation being dropped from memory, reflected by a smaller SPCNE magnitude in the Imprecise than the Precise condition. This pattern is present regardless of whether the item was Masked or Unmasked, suggesting that inaccurate responses stem from disruptions to object consolidation. It remains possible, however, that poor attentional selection occurring before consolidation is causing the target object not to be encoded in memory at all, resulting in a reduced SPCNE. To test this, the N2pc was also analyzed for both Imprecise and Precise trials in the Masked and Unmasked conditions.

N2pc

The mean amplitude of the N2pc did not significantly differ between Precise and Imprecise trials, [F(1, 19) = 2.22, p = .153, partial η2 = .104, or Mask Condition, F(1, 19) = 2.83, p = .109, partial η2 = .130. The interaction between Precision and Mask was also not significant, F(1, 19) = .016, p = .902, partial η2 = .001). These results suggest that selective attention toward the target item does not reflect subsequent behavioral accuracy. Therefore, it is not early attentional processes that affect the precision/accuracy of the target representation, but later processes (such as consolidation during the SPCNE) that result in the target being dropped from VWM and subsequently recalled with low precision (see Discussion section for further examination).

It is possible that Precision is confounded with set size, such that imprecise responses are driven by a greater number of Set Size 4 trials and precise responses by more Set Size 2 trials. If this were true, we would expect to see a significant main effect of Precision during the N2pc, as we found that the N2pc is greatly affected by changes to set size. As we do not find an effect of Precision during this time window, it can be concluded that the Precision results are not simply due to differences in the number of set size trials within each Precision condition.

Discussion

The aim of the current study was to determine whether there are independent effects of set size and masking on ERP measures of selective attention (N2pc) and object consolidation/maintenance in VWM (SPCN). It was found that set size and masking exerted dissociable effects on attention and awareness, respectively. Additionally, we conducted a post hoc analysis with behavioral accuracy to better understand the neural mechanisms of successful and unsuccessful OSM. This analysis revealed that inaccurate responses result from the target being dropped from memory, which is likely caused by disruptions to object consolidation.

Electrophysiological independence of set size and masking

Behaviorally, set size and masking were found to interact, such that masking strength is greater at larger set sizes. Although previous studies have analyzed the impact of mask offset on N2pc amplitude (Harris et al., 2013; Prime et al., 2011; Woodman & Luck, 2003), none have concurrently manipulated set size. Therefore, it was unknown whether the interaction between set size and mask offset was due to early selective attention processes, later object consolidation/maintenance processes, or both.

In the present study, we found that the N2pc was greater in amplitude at Set Size 4 than at Set Size 2. This finding is consistent with the demonstrated importance of selective attention for object individuation in cluttered scenes. When it is more difficult to find the target item (i.e., more distractors present), the N2pc has been found to increase in magnitude (Eimer, 1996; Luck, Girelli, McDermott, & Ford, 1997; Luck & Hillyard, 1994; Mazza, Turatto, & Caramazza 2009). This may be due to a greater need for target enhancement when more distractors are present, especially when fine discriminations about a target feature (i.e., orientation) are required to complete the task successfully (Mazza et al., 2009).

Importantly, mask presence did not have an effect on the N2pc amplitude, which is consistent with the findings of previous OSM studies (i.e., Prime et al., 2011; Woodman & Luck, 2003). Because there was no effect of Mask Condition, it can be concluded that participants could successfully attend to the target item, whether or not it was subsequently masked. Thus, even though additional distractors demand extra attentional resources (as indexed by the effect of set size on the N2pc), any effect of the mask offset on performance must be due to processes following selective attention, at least at the set sizes tested here.

This finding is consistent with recent findings in the OSM literature, which proposes that the often observed behavioral interaction between set size and mask offset is an artifact of ceiling performance at small set sizes (Argyropoulos et al., 2013; Filmer et al., 2014). Although we observed a behavioral interaction in the current study (because we did not constrain for ceiling performance), the ERP data suggest that set size influences spatial attention independently from masking. That is to say, OSM disrupts consolidation of the target item independently and at a later time point from the effects of set size on selective attention (also see, Goodhew & Edwards, 2016).

Although the mask did not affect N2pc amplitude, it did have an effect beginning around 350 ms poststimulus offset, consistent with the onset of the SPCN (McCollough, Machizawa, & Vogel, 2007; Prime et al., 2011; Vogel & Machizawa, 2004). During this early window of SPCN, we observed greater amplitude for Masked as compared to Unmasked trials and no effect of Set Size. Previous studies did not examine the averaged ERPs for Masked and Unmasked conditions without splitting the trials by accuracy; thus, this finding has not been previously identified.

Why might the SPCN amplitude have been greater during masked compared to unmasked trials? To accurately report the target orientation in an OSM task, one of two things must happen: (1) The lingering mask must be successfully ignored or discounted, so that it does not overwrite the representation of the target, or (2) the target-plus-mask must be individuated as a separate object from the mask alone, such that two representations are separately consolidated and maintained in VWM. The first possibility is more consistent with an object substitution account of OSM, wherein the mask substitutes the representation of the target in memory, whereas the second option supports an object individuation-through-updating framework, wherein individuation results in two object representations stored in VWM.

As is evident, these two alternatives make different predictions about how many objects are consolidated into memory and then maintained, which can be tracked by the amplitude of the SPCN. The first option predicts that only one representation will be stored in VWM: the target plus mask. The second option suggests that two representations will be kept in memory, both the target plus mask and the mask alone. Therefore, the observed increase in SPCNE amplitude in the Masked versus Unmasked conditions may indicate that more object representations are being consolidated into memory in the Masked conditions, supportive of an object individuation-through-updating mechanism.

This result is somewhat consistent with Prime et al. (2011), who found that correct trials in the masked condition had greater SPCN amplitude than correct trials in the unmasked condition. However, one key difference between our study and Prime and colleagues (Prime et al., 2011) is that we included a lateralization precue to ensure that the participants were shifting their attention to one side of the screen prior to the onset of the cue. Thus, while in their study no SPCN was observed in response to the incorrect masked trials, as well as to correct unmasked trials, it was unclear to what extent these effects were due to participants adopting a diffuse attentional scope. Our results provide further evidence that even in the presence of strong attentional shifts to the target side, correctly reporting a masked target likely requires the consolidation of both the target and mask stimuli.

Once the target representation has been consolidated into VWM (whether successfully or not) we observed a significant effect of Set Size and no effect of Mask during the time period of the SPCNL (500–650 ms), such that there was greater amplitude for Set Size 4 than for Set Size 2. As previously mentioned, across a variety of paradigms greater SPCN amplitude has been found to reflect increases in VWM load (i.e., Drew, Horowitz, Wolfe, & Vogel, 2012; Drew & Vogel, 2008; Emrich, Al-Aidroos, Pratt, & Ferber, 2009; Ikkai, McCollough, & Vogel, 2010). However, in the current study VWM load (i.e., the number of items to be remembered) was not manipulated. In fact, participants should be only remembering one item (the target) and ignoring the distractors. Therefore, the observed effect of Set Size for the SPCNL amplitude suggests that participants stored a greater number of irrelevant distractor items in memory when more distractors were presented in the search display.

This finding corresponds with Camp, Pilling, Argyropoulos, and Gellatly (2015), who suggested that the behavioral interaction between set size and masking strength is due to crowding by nearby distractors instead of by deficits in selective attention. This is because the effect of distractors on OSM magnitude was found to be dependent on how close the distractors were to the target item (Camp et al., 2015). This is also in accordance with the current behavioral results, as there were significantly more nontarget errors made at the higher set size than at the lower set size, consistent with increased interference by nearby distractors (also see Emrich & Ferber, 2012). Therefore, greater incidental storage of distractors at higher set sizes could be reflected in both the increase in the SPCNL amplitude, as well as by the increased number of nontarget errors.

Overall, the present electrophysiological data did not demonstrate a clear interaction between set size and masking during any one temporal window. However, there were separate effects of these manipulations at different time points: set size affected the N2pc, mask offset was reflected in the SPCNE, and set size during the SPCNL. Thus, while the behavioral interaction between Mask and Set Size cannot be attributed to interactions of any one process, the results do allow us to speculate about the way in which this interaction emerges. Namely, although greater attention may have been required to select the target at high set sizes, this effect was independent from the subsequent effect of the mask. Instead, the increased difficulty selecting the target along with the greater number of nearby distractors (Camp et al., 2015), likely affected the number of competing representations encoded into VWM, ultimately reducing the number of correctly recalled targets and increasing the number of misreported items. This may be more likely to occur when the target item has not been properly consolidated, resulting in an interaction between the effect of the mask and set size. Ultimately, however, the observation that the SPCNE is greater in masked compared to unmasked trials suggests that the object consolidation process is most important in determining whether an item will reach awareness, which is largely unrelated to the number of distractors present.

Underlying mechanisms of object-substitution masking

Although this hypothesis was post hoc, we decided to examine whether the SPCN could provide evidence for the object updating or substitution accounts of OSM. We analyzed the data split by behavioral precision (imprecise vs. precise responses) to compare within masked and unmasked trials separately. It is important to note that the imprecise trials likely reflect a combination of low-precision responses as well as guesses and nontarget errors, which can all be considered as inaccurate target responses. When split by precision, we observed a sharp drop in the SPCNE amplitude between 350 and 500 ms. Critically, we observed a greater drop in amplitude for inaccurate compared to accurate trials, irrespective of mask offset. Given that the SPCN is thought to reflect the amount of information represented in visual working memory, this decrease suggests more information was maintained in awareness on those trials in which subjects accurately reported the target.

Interestingly, the drop in SPCNE amplitude is also consistent with previous studies examining resetting in VWM. Balaban and Luria (2017) observed a sharp drop in SPCN amplitude around 200–300 ms after one object separated into two halves. Balaban and Luria referred to this processes as resetting, such that the original representation was dropped from memory in the face of new object information. The concept of resetting may be similar to substitution during successful OSM. That is, the target representation is dropped from awareness and replaced with the mask alone (Di Lollo et al., 2000). By contrast, an updating account of OSM would not predict a sharp drop in SPCN amplitude, as the target representation is gradually updated to reflect the mask alone and is not altogether dropped from memory. We observed that in the masked condition there was only a drop in SPCN amplitude when an inaccurate response was made (i.e., when masking was successful), consistent with substitution accounts of OSM.

However, when a correct response was made, there was not a significant drop in SPCN amplitude. This suggests that when object individuation is achieved, it results in a separation of the target from the mask representation (as previously discussed), perhaps implicating an individuation-through-updating process during unsuccessful masking. Consistent with our findings, this pattern can also be seen in Harris et al. (2013). Although the early portion of the SPCN was not analyzed (350–500 ms) in that study, it appears as though there is a similar drop in amplitude for the miss versus hit trials (see Harris et al. 2013, Fig. 5, p. 1914). However, Harris and colleagues considered this time period as part of the N2pc, therefore an earlier latency of the SPCN was not analyzed alone. Previous studies have identified the onset of the SPCN as early as 300 ms (e.g., Emrich et al., 2009; McCollough et al. 2007; Vogel & Machizawa, 2004). Evidence for dropping information from VWM is reflected by the SPCN as early as 200 ms after a disruption occurs (Balaban & Luria, 2017). Thus, although this result is post hoc, consistent findings in the literature (Harris et al., 2013) and a mechanism for VWM resetting (Balban & Luria, 2017) suggest that future studies should explore an earlier time window of the SPCN as an index of interrupted consolidation resulting in object substitution.

We also observed a drop in the SPCN for inaccurate trials in the Unmasked condition. This suggests that even when unmasked, poor performance resulted from dropping the target item from memory. This is most likely caused by a failure to successfully consolidate the target representation. These results suggest that it is not substitution processes per se that are leading to poor performance in the Masked condition. Instead, it is likely processes related to incomplete or poor VWM consolidation that lead to the dropping of the target representation from awareness. What causes incomplete consolidation is unclear. However, given the absence of an effect of performance accuracy on the N2pc it is unlikely to be related to differences in spatial attention alone, although we cannot rule out the possibility of overall lapses in attention (van den Berg, Shin, Chou, George, & Ma, 2012).

It could be argued that the SPCN is only reflective of processes related to VWM maintenance, and therefore cannot provide any information about earlier visual processing/consolidation. However, visual processing of the target item likely continues into this time period, even in the presence of masks (Nieuwenstein & Wyble, 2014). Thus, although the SPCN appears after the offset of the target stimulus (in real time), visual processing and VWM consolidation continue into this window. There is also evidence that VWM consolidation and maintenance are separable processes during the SPCN. For example, Xie and Zhang (2017) found that the amplitude of the early SPCN is affected by stimuli familiarity independent from effects to the later portion of the SPCN. Moreover, there is ample evidence that the SPCN tracks both consolidation and maintenance-related processes in tasks that do not explicitly require working memory. Emrich et al. (2009) measured the SPCN in a visual search task and found that this component reflects the consolidation of a small number of visible items. Numerous studies also demonstrate that the SPCN is present during visual processing, including shape-from-motion tasks (Pun, Emrich, Wilson, Stergiopoulos, & Ferber, 2012), multiple-object tracking (Drew & Vogel, 2008), and even while monitoring static objects on a display (Tsubomi, Fukuda, Watanabe, & Vogel, 2013). Thus, the SPCN should be capable of tracking early consolidation processes in an OSM task, consistent with previous studies (e.g., Prime et al., 2011).

Overall, these results lend support to the idea that visual awareness can be affected independently from selective attention (Goodhew & Edwards, 2016), suggesting that processes following deployment of attention (i.e., object individuation through updating during VWM consolidation) determine whether information is stored in or dropped from memory.

Conclusion

The present study demonstrated that the often-observed behavioral interaction between set size and masking in OSM is not observed in neural measures of attention or VWM consolidation and maintenance. These results suggest that selective attention and visual awareness are temporally dissociable and independent during OSM. An early effect of set size on selective attention was followed by an effect of mask on memory consolidation. Finally, there was an effect of set size on VWM maintenance, which seemed to be driven by interference from distractor items held in VWM. A post hoc analysis provided evidence for object updating and individuation processes during OSM. These results suggest that object individuation is required for precise responding when the mask is present, and that disruptions to memory consolidation result in the target item being dropped from memory, regardless of whether it is masked or unmasked. In sum, our results provide neural evidence that selective attention is independent from visual awareness in an OSM task, and that it is predominantly object individuation-through-updating processes during memory consolidation that determine whether an object reaches consciousness.

Author notes

Correspondence concerning this article should be addressed to Christine M. Salahub, Department of Psychology, Brock University, St. Catharines, Ontario, L2S 3A1, Canada. Email: christine.salahub@brocku.ca