Introduction

Numerous tasks in everyday life require humans to attentionally select and temporally store multiple relevant objects present in the visual environment. Enumeration tasks represent one context for studying how multiple target processing is accomplished. In enumeration, fast (approximately, 100 ms/item) and accurate responses are typically found for up to three to four elements—a phenomenon known as “subitizing” (Jevons, 1871; Kaufman, Lord, Reese, & Volkmann, 1949; Mandler & Shebo, 1982; Trick & Pylyshyn, 1993).

To account for the capacity limitation in subitizing, theorists have proposed the existence of at least two distinct types of numerosity processing, based on which enumeration of varying object quantities can be accomplished with varying precision (Dehaene & Changeux, 1993; Feigenson, Dehaene, & Spelke, 2004; Piazza, 2010; Pylyshyn, 2001). Fast enumeration of larger sets of objects (n > 4) is assumed to be mediated by early perceptual processes that provide a coarse representation of the external world via apprehending, at a glance, the total number of spatially separable elements. Given its rather imprecise nature, this representation may provide the basis for approximate enumeration. Enumeration of smaller sets of objects (n < 4), by contrast, is assumed to be realized by an individuation mechanism that is capable of simultaneously singling out a limited number of individual items with high precision. Whereas early proposals argued that object individuation operates separately from attention (Trick & Pylyshyn, 1993), recent work suggests that simultaneous indexing of relevant items in a scene is tightly linked to attention (Cavanagh & Alvarez, 2005; Vetter, Butterworth, & Bahrami, 2008). Indeed, it is widely assumed that individuation is a key function of attention (Cavanagh, 2011). The operation of the capacity-limited, attention-based individuation mechanism yields a set of relatively stable representations of the individuated objects, which prevents them from being recounted, thus affording exact enumeration.

Recent even-related potential (ERP) studies have investigated the individuation mechanism underlying enumeration as well as the distinction between the two numerosity computations associated with approximate and exact enumeration (Ester, Drew, Klee, Vogel, & Awh, 2012; Hyde & Spelke, 2009; Mazza & Caramazza, 2011; Mazza, Pagano, & Caramazza, 2013; Pagano & Mazza, 2012). Mazza et al. (2013), for instance, used stimulus arrays that could contain a varying number of target objects in one visual hemifield, which appeared either in relative isolation (i.e., no-distractor condition) or simultaneously with distractor items (i.e., distractor condition). Importantly, to maintain sensory balance across hemifields—a prerequisite for interpreting lateralized ERPs—the contralateral hemifield displayed always the same number of (distractor) items. Using this design, Mazza and colleagues (2013) observed an effect of target numerosity on both the N1 and the PCN (Posterior-Contralateral-Negativity; also called N2pc)Footnote 1 waves in the no-distractor condition, in which the total number of objects was associated with the number of target items. By contrast, target numerosity influenced the PCN, but not the N1, in the distractor condition, in which the overall object number was dissociated from the number of presented targets. This electrophysiological dissociation suggests that the neural processes underlying the PCN waveform—generally assumed to reflect the stage of attentional selection (Eimer, 1996; Luck & Hillyard, 1994; Mazza, Turatto, Umilta, & Eimer, 2007; Töllner, Müller, & Zehetleitner, 2012)—provides the core representation critical for accomplishing exact enumeration of multiple (i.e., up to three/four) task-relevant items.

Whereas the study by Mazza et al. (2013) demonstrated that individuating multiple items can be separated from processing of the overall object quantity in the field, it is worth noting that the majority of ERP studies that have investigated this issue used stimulus designs in which the targets were always defined by the same feature value (e.g., the color red). Thus, it remains an open issue whether multiple object individuation mechanisms generalize to search situations in which multiple targets can be defined across different features that belong to the same (e.g., the colors blue and green) or to separable dimensions (e.g., the color blue and the shape triangle). Translated into everyday search scenarios, this is equivalent to the question of whether simultaneous attentional selection, for example, of two small green apples in the fruit market is comparable with attentionally selecting one small, green apple and one big, orange carrot.

In fact, several influential theories of visual attention would hypothesize differences in search between intra- and cross-dimensional targets—albeit, with varying predictions. According to the Boolean Map Theory (BMT) (Huang, Treisman, & Pashler, 2007), it should be advantageous simultaneously to select physically distinct targets if they are defined across different, compared to within the same, dimensions. This prediction can be derived from the BMT’s explicit assumption that a single Boolean Map—a representation that divides the visual scene into two distinct (i.e., selected vs. nonselected) regions and that mediates access to visual feature information (for further details, see (Huang & Pashler, 2007)—can represent either only one feature value per dimension (e.g., the color blue) or independent feature values that belong to different dimensions (e.g., the color blue and the shape square). Accordingly, searching for physically different targets defined across dimensions (e.g., red and square) should be mediated by a single, simultaneously accessible Boolean map, whereas searching for physically different targets that belong to the same dimension (e.g., red and blue) should be more costly (relative to cross-dimensional search) due to the requirement to serially create and access multiple Boolean maps.

The dimension-weighting account (DWA) (Müller, Heller, & Ziegler, 1995; Müller et al., 2010), by contrast, predicts a cost for cross- relative to intra-dimensionally defined targets. In particular, DWA proposes that attentional selection of singleton feature targets, and also the performance in subitization tasks, is based on an attention-guiding master (i.e., overall-saliency or priority) map, which integrates the outputs of preattentive feature contrast computations (Itti & Koch, 2001; Koch & Ullman, 1987) in a dimensionally weighted fashion. Critically, the DWA further assumes that the total amount of dimension-specific attentional weight is limited. As a result, searching for multiple items should be facilitated when all targets are defined within the same dimension, because there is no need to shift attentional weight across, or divide weight between, dimensions. Search for cross-dimensional targets, by contrast, should produce an RT cost, conceivably due to a time-consuming redistribution of dimensional weights. Accordingly, to give a correct enumeration response (e.g., two vs. three), responding on cross-dimensional trials would have to wait until at least two task-relevant dimensions (e.g., shape and color) were serially weighted to determine whether they produced one or more reliable target signals at the master map level (for further details, see Found & Müller, 1996).

Based on this background, the purpose of the present study was twofold: First, we aimed to establish the role of the featural/dimensional relation of spatially separable targets in the individuation of multiple objects in both distractor and no-distractor conditions. Second, by linking mental chronometry data to event-related brain potentials, we aimed to isolate functionally distinct processing stages at which putative feature- and/or dimension-based control mechanisms may influence target processing. We employed a paradigm similar to that recently tested by Mazza et al. (2013): multiple targets occurred randomly in isolation or together with distractor objects, with all targets randomly defined by the same feature value (e.g., all cyan), by different feature values within one dimension (e.g., the colors: cyan, purple, and yellow), or by different feature values in separate dimensions (e.g., the color: cyan, the shape: square, and the orientation: vertical).

At the electrophysiological level, we focused our analyses initially on three ERP waves. To measure the influence of the featural/dimensional target context on attention-based individuation/selection mechanisms that allow for exact enumeration (Ester et al., 2012; Mazza et al., 2013), we analyzed the PCN wave. Combining the DWA (Müller et al., 1995; Töllner, Gramann, Müller, Kiss, & Eimer, 2008) with our recent proposal that the PCN may represent attentional selection at the master map level (Töllner, Conci, & Müller, 2015), we hypothesized that the PCN should be attenuated for cross- relative to intra-dimensional targets. This prediction can be derived from the DWA’s explicit notion that the selection/individuation of multiple items on the master map should take overall longer and be temporally more variable for cross- relative to intra-dimensional targets (see above), which would translate into temporally smeared and, thus, markedly reduced (averaged) PCN amplitudes. According to BMT (Huang et al., 2007), by contrast, there should be no PCN difference between targets defined by identical feature values of the same dimension and targets defined by different features that belong to separate dimensions—as both conditions should be representable in a single Boolean map. Instead, there should be a cost for targets defined by different features of the same dimension, which may result in alterations of the PCN timing or amplitude. In addition, we controlled for putative target-context-induced processing differences arising before and after the stage of attentional selection via analyzing the N1 wave—indexing perceptual processing of the overall numerosity of the objects in the field (Mazza & Caramazza, 2015)—and the P3b wave—indexing the transfer of task-relevant information in working memory (Donchin & Coles, 1988; Polich, 2007; Verleger, Jaśkowski, & Wascher, 2005; Vogel, Luck, & Shapiro, 1998).

Methods

Participants

Sixteen young, healthy volunteers (7 females; mean age: 24.7 years) recruited from the Ludwig-Maximilians-University took part in this study for monetary compensation. All volunteers had normal or corrected-to-normal vision, and none reported a history of neurological disorder(s). All volunteers provided written, informed consent before the start of the experiment. The data of three participants had to be discarded from the analyses due to excessive horizontal eye movement artifacts. The experimental protocol was approved by the ethics committee of the Department of Psychology, Ludwig-Maximilians-University.

Stimuli, task, and study design

All equiluminant, colored-shape stimuli used in the current study were presented against a black background and arranged around the circumferences of two imaginary circles centered on a white fixation cross (Fig. 1). The two imaginary circles were 4° and 7.5° of visual angle in radius and were made up of 8 (inner circle) and 12 (outer circle) equidistantly placed stimulus locations, respectively. Each stimulus outline contained a grating—composed of three black bars (0.4° × 2.4°) separated by two gaps (0.3° × 2.4°)—that were oriented horizontally for all but the orientation target stimuli (see below). In the “distractor” condition, all 20 stimulus locations were occupied such that 2 or 3 locations in one visual hemifield contained target items, and the remaining 18 or 17 locations distractor items. In the “no-distractor” condition, the two or three targets were presented in relative isolation (i.e., without ipsilateral distractors) in one visual hemifield, whereas the opposite hemifield contained the same amount of (distractor) objects. Note that this bilateral stimulation is required to maintain sensory balance across hemifields—a prerequisite for analyzing lateralized ERPs.

Fig. 1
figure 1

Search displays used in the present study. Displays contained two or three targets in one visual hemifield, which were randomly defined by the same feature (sF), different features within dimensions (dFsD), or different features across dimensions (dD). For sF and dFsD trials, the respective target-defining features were selected randomly from the color, shape, or orientation dimension. Participants performed a two-alternative-forced-choice (2-AFC) task, enumerating as fast and accurate as possible the number of targets (2 vs. 3) present on the screen. Set size (2 or 3), target context (sF or dFsD or dD), distractor context (present or absent), and target locations were randomized on a trial-by-trial basis. Note that the actual background color was black and the fixation cross was white

On a given trial, targets were randomly defined relative to homogenous distractor items (i.e., blue, horizontally oriented circles; 1.2° of visual angle in radius; CIE: 0.143, 0.068, 4) in the color, shape, or orientation dimension. For color targets, we used horizontally oriented circles (1.2° of visual angle in radius) appearing in cyan (CIE: 0.208, 0.305, 4), purple (CIE: 0.294, 0.158, 4), or yellow (CIE: 0.422, 0.496, 4). For shape targets, we used blue (CIE: 0.143, 0.068, 4), horizontally oriented squares, diamonds, or triangles. For orientation targets, we used blue circles (1.2° of visual angle in radius; CIE: 0.143, 0.068, 4) that were oriented 45°, 90°, or 135° relative to the horizontal. There were three “target context” conditions defining how the two or three targets could relate to one another in a given trial: the same feature condition (sF; all targets were defined by the same feature; e.g., 3 cyan targets), the different feature condition (dFsD; all targets were defined by different features within the same dimensions; e.g., 1 cyan, purple, and yellow target), and the different dimension condition (dD; all targets were defined by different features across dimensions; e.g., 1 cyan, square, and vertical target). The exact number of targets (i.e., set size), lateral target locations, target-defining features, (ipsilateral, intermingled) distractor items, and target contexts were randomized on a trial-by-trial basis. For all trials, it was ensured that at least one target was assigned to both the inner and to the outer (imaginary) circle. For set size 3, we arranged search arrays such that the third target was always assigned to the outer circle, with at least one distractor placed between any two targets of the outer circle.

The experiment was conducted in a dimly lit, sound-attenuated, and electrically shielded experimental booth. Participants were seated comfortably at a viewing distance of approximately 57 cm from a 17-in computer screen (1024 × 768 pixel screen resolution, 85-Hz refresh rate). Each experiment consisted of 24 blocks of 81 trials each (36 distractor trials, 36 no-distractor trials, 9 catch trials; see below), resulting in a total of 1,944 trials. A trial started with the presentation of a white central fixation cross for 0.5 s, followed by the search array for 0.2 s. Trials were terminated by the participant’s response or after a maximum duration of 1.5 s. In case of errors, or if no response was made within the maximum allowed time window of 1.5 s, a red minus symbol appeared centrally for 1 s. Subsequently, the intertrial interval lasted randomly for 0.95, 1.00, or 1.05 s. Participants were instructed to maintain central eye fixation throughout the trial blocks and to report as fast and accurately as possible the number of targets—two vs. three—present in the search array. Notably, in the no-distractor condition, it is theoretically possible to provide the correct target numerosity via counting the distractor number, or via relying on the total number of objects (and matching, for instance, lower vs. higher overall numerosity to set sizes two vs. three). To prevent such strategies, we presented catch trials on 20 % of all no-distractor trials with mismatching target and distractor numerosity. Responses were given by pressing the left/right mouse button using the left/right thumb, respectively. Half of the participants started with using their left/right thumb to indicate the presence of two/three targets and vice versa for the other half. Stimulus-response (S-R) assignments were reversed after the first half of the experiment (i.e., after 12 blocks). To become familiar with the respective S-R mappings, participants performed one practice block before the start of the experiment. After each block, participants received summary performance statistics (mean response time and accuracy).

It is worth noting that the reduction of the number of possible targets from four in the Mazza et al. (2013) study to two possible targets in the current study was necessary for the following reasons: First, discarding set size “4” was motivated by the findings of Ester et al. (2012; see also Mazza et al., 2013), who showed that the PCN amplitude increases monotonically from set size 1 up to set size 3. Between set sizes 4-10, however, the PCN wave reaches a stable asymptote without any further amplification, in line with the subitizing effect (Trick & Pylyshyn, 1993). Second, given that the main purpose of the current study was to investigate the role of featural/dimensional relations between targets, there was no justification to keep set size 1. As a consequence, only set sizes 2 and 3 were used. Moreover, using two set sizes permits the engagement of a two-alternative-forced-choice (2-AFC) task, making the present study more comparable to most visual search studies, in which participants are typically required to make speeded 2-AFC responses—e.g., target localization or identification (Töllner, Conci, Rusch, & Müller, 2013). Most importantly, the reduction of set sizes provided the basis—in terms of trial numbers per condition—for running adequate ERL analyses for all conditions.

EEG recording and data analysis

We recorded the EEG continuously at 1 kHz from 64 Ag/AgCl active electrodes (actiCAP system, Brain Products). Sixty-three electrodes were mounted on an elastic cap (Falk Minow Service) with positions matching the international 10-10 system, and one electrode was placed at the inferior orbit below the left eye to record the electrooculogram to improve the monitoring of blinks and eye movements. BrainAmp DC amplifiers (Brain Products) were used for amplifying all electrophysiological signals with a 0.1-250 Hz bandpass filter. FCz was used as online reference. The impedances of all electrodes were kept below 5 kΩ and regularly controlled every 4 blocks.

The raw data of all participants was first visually inspected to detect and manually remove nonstereotypic noise (e.g., excessive electromyographic bursts). This was followed by running an infomax independent component analysis, based on which components representing blinks and/or saccades were identified and subsequently removed before back-projection of the residual components. The continuous EEG was then low-pass filtered using a Butterworth infinite impulse response filter at 30 Hz (24 dB per octave) and re-referenced to averaged mastoids (TP9/10). In the next step, the EEG data were segmented according to the 12 (set size × distractor context × target context) experimental conditions. Segments ranged from 0.2 s before to 0.8 s after the stimulus, with the prestimulus interval used for baseline correction. Trials with incorrect responses (average: 4.5 % of all trials; worst participant: 7.7 %), anomalously slow responses (slower than 1.2 s; average: 4.7 % of all trials; worst participant: 13.9 %), or signals exceeding ± 30 μV in channels F9/10 (average: 3.8 % of all trials; worst participant: 11.0 %) were discarded. Artifacts—defined as signals exceeding ± 60 μV, signal changes larger than 50 μV between two sample points, and signal changes lower than 0.5 μV within 0.5 s intervals—were discarded on an individual-channel basis before ERP averaging. To isolate the PCN from overlapping, nonlateralized components, we subtracted the ERPs ipsilateral to the target side from contralateral ERPs.

The latencies of the N1 and the PCN (difference) waves were defined individually as the maximum negatively directed deflection in the time windows 100-250 ms and, respectively, 150-350 ms poststimulus at the components’ most typical electrode sites PO7/8. The amplitudes of both waves were computed by averaging ten sample points before and after the components’ maximal deflection. Given that there is typically no distinct peak discernable for P3b waves, we calculated mean amplitudes for the time window 390-650 ms poststimulus, which was derived from visual inspection of the grand-average waveforms of all conditions at Pz. Differences in behavioral measures (reaction times, error rates), PCN latencies/amplitudes, and P3b amplitudes were assessed by performing separate three-way repeated-measures analyses of variance (ANOVAs) with the factors set size (2 vs. 3), distractor context (present vs. absent), and target context (same feature vs. different feature vs. different dimension). For (P1 and) N1 latencies/amplitudes analyses, the ANOVAs contained one additional factor—Electrode Side (PO7 vs. PO8)—to control for hemisphere-specific differences in early sensory potentials. Where appropriate, significant differences were further verified by means of post-hoc comparisons.

Results

Behavioral data

As shown in Fig. 2, the processing time required for accurate target enumeration was markedly influenced by target context [F(2,24) = 37.75, p < 0.001]; reactions were overall fastest for sF trials (657 ms [±91]), slowest for dD trials (756 ms [±127]), and of intermediate speed for dFsD trials (700 ms [±100]). In addition, we found both target context × distractor context [F(2,24) = 5.17, p < 0.014] and target context × set size [F(2,24) = 6.97, p < 0.004] interactions to be significant. In detail, for sF (but not dFsD and dD) targets, reactions were speeded for no-distractor relative to distractor trials (644 ms [±92] vs. 672 ms [±90]; p < 0.014; Fig. 2a). For dD (but not sF and dFsD) targets, RTs were delayed for higher relative to lower set sizes (766 ms [±124] vs. 745 ms [±130]; p < 0.047; Fig. 2b). The ANOVA on the error rates revealed all main effects to be significant (distractor context: [F(1,12) = 13.06, p < 0.004]; set size: [F(1,12) = 34.32, p < 0.001]; target context: [F(2,24) = 7.04, p < 0.004]), indicative of more error-prone responding on distractor relative to no-distractor trials (5.4 % vs. 3.6 %), with higher relative to lower set sizes (5.8 % vs. 3.2 %), and for dFsD (4.8 %) and dD (5.2 %) relative to sF targets (3.4 %). None of the interactions reached significance for the error rates (all p > 0.12).

Fig. 2
figure 2

Behavioral processing times. (a) Reaction times as a function of target context (same Feature, different Feature-same Dimension, different Dimension) and set size (2, 3). (b) Reaction times as a function of target context (same Feature, different Feature-same Dimension, different Dimension) and distractor context (present, absent)

Early sensory ERPs

Figure 3 displays the early sensory components as a function of distractor context and set size at channels PO7 and PO8. For the N1 amplitudes, only the main effect of distractor context [F(1,12) = 18.00, p < 0.001] reached significance, with higher amplitudes for no-distractor (−7.99 μV) relative to distractor trials (−6.57 μV). For the N1 latencies, the ANOVA revealed the main effect of set size [F(1,12) = 19.06, p < 0.001] as well as the interaction of set size with distractor context [F(1,12) = 9.96, p < 0.008] to be significant. Replicating the results of Mazza et al. (2013), we found shorter N1 latencies for higher relative to lower set sizes on no-distractor trials (172 vs. 178 ms p < 0.001), whereas the latencies were equivalent for the two set size conditions on distractor trials (both: 172 ms; p > 0.64).

Fig. 3
figure 3

P1 and N1 results. Grand-average, target-synchronized event-related potential waves elicited by no-distractor (turquoise lines) and distractor displays (purple lines) in the 300-ms interval following stimulus onset at channels PO7 (left panel) and PO8 (right panel). Scalp distribution maps (central panel) are shown separately for P1 and N1 waves at the point in time at which the components reached their maximum—positive and, respectively, negative—deflections

However, as noticeable in Fig. 3, the differences reported in the N1 might simply be the consequence of a similar activation pattern evident already in the preceding P1. To control for this possibility, we additionally analyzed the P1 amplitudes and latencies. As before, we determined P1 amplitudes individually by averaging ten sample points before and after the maximum positively directed deflection (within the 80-140 time windows post-stimulus at channels PO7/8), with the point of the maximum deflection defining the component’s latency. These analyses confirmed the P1 pattern observable in Fig. 3: The P1 amplitudes were significantly enhanced [F(1,12) = 5.14, p < 0.043] for distractor (3.02 μV) compared with no-distractor trials (2.48 μV). In addition, the P1 latencies showed a main effect of set size [F(1,12) = 6.14, p < 0.029] as well as an interaction between distractor context and set size [F(1,12) = 17.64, p < 0.001]. Mirroring the chronological order in the N1, the P1 was triggered faster for higher relative to lower set sizes on no-distractor trials (109 vs. 113 ms; p < 0.001), whereas there was no set size modulation on distractor trials (both: 111 ms; p > 0.81).

PCN

The contralateral-minus-ipsilateral PCN difference waves are plotted in Fig. 4. The rise of the PCN was boosted [F(2,24) = 14.52, p < 0.001] for targets defined by identical feature values relative to targets defined by distinct features both within (−2.36 vs. −1.91 μV; p < 0.002) and across dimensions (−2.36 vs. −1.82 μV; p < 0.001), with no difference between dFsD and dD trials (p > 0.36). Furthermore, the effects of set size [F(1,12) = 44.93, p < 0.001] and distractor context [F(1,12) = 14.22, p < 0.003] reached significance, indicating overall larger PCN amplitudes for higher relative to lower set sizes (−2.35 vs. −1.71 μV) and for distractor relative to no-distractor trials (−2.38 vs. −1.68 μV). PCN latencies were statistically influenced by set size [F(1,12) = 5.02, p < 0.045] and distractor context [F(1,12) = 5.43, p < 0.038]; with PCN waves triggered faster for lower relative to higher set sizes (263 vs. 269 ms) and for no-distractor relative to distractor trials (260 vs. 272 ms). There were no significant interactions (all p-values > 0.18).Footnote 2

Fig. 4
figure 4

PCN results. (a) Grand-average, target-synchronized PCN (contralateral-minus-ipsilateral difference) waves as a function of target context (sF vs. dFsD vs. dD) in the 400-ms interval following stimulus onset at channels PO7/8. (b) Main effect of target context on PCN amplitudes. (c) Topographical map of the PCN scalp distribution (computed by mirroring the PCN difference waves to obtain symmetrical values for both hemispheres, using spherical spline interpolation) at the point in time at which the grand-average difference between contralateral and ipsilateral activity was maximal

P3b

As illustrated in Fig. 5, all target displays elicited a clear P3b wave over parietal midline electrode sites. The ANOVA on the mean amplitudes revealed all main effects [set size: F(1,12) = 14.73, p < 0.002; distractor context: F(1,12) = 12.70, p < 0.004; target context: F(2,24) = 41.88, p < 0.001] as well as the three-way interaction [F(2,24) = 5.00, p < 0.015] to be significant. To further resolve the three-way interaction, we conducted follow-up ANOVAs with the factors target context and set size separately for distractor and no-distractor trials. These analyses revealed both main effects [target context: F(2,24) = 35.82, p < 0.001; set size: F(1,12) = 12.23, p < 0.004] and their interaction [F(2,24) = 4.15, p < 0.028] to be significant for no-distractor trials. For distractor trials, by contrast, only the main effects [target context: F(2,24) = 31.71, p < 0.001; set size: F(1,12) = 10.36, p < 0.007] but not their interaction [F(2,24) = 1.45, p > 0.26] reached significance. In detail, there was a strong influence of target context on the P3b in both distractor and no-distractor trials (Fig. 5b): amplitudes were significantly largest for sF trials (7.55 μV), smallest for dD trials (4.87 μV), and of intermediate magnitude for dFsD trials (6.11 μV). Additionally, on distractor trials, P3b amplitudes were further amplified with higher relative to lower set sizes for all three target context conditions (all p < 0.023). For no-distractor trials, however, this set size modulation of the P3b was statistically evident for sF (p < 0.003) and dFsD (p < 0.023) but not for dD (p > 0.078), targets.

Fig. 5
figure 5

P3b results. (a) Grand-average, target-synchronized event-related potential waves as a function of target context (sF vs. dFsD vs. dD) in the 800-ms interval following stimulus onset at channel Pz. (b) Main effect of target context on P3b amplitudes. (c) Topographical map of the P3b scalp distribution at the point in time at which the grand-average P3b reached its maximum deflection

Discussion

The present study was designed to illuminate the role of the featural/dimensional relationship between multiple targets for exact enumeration responses in visual search. We modified a paradigm recently devised by Mazza et al. (2013) by introducing three target context conditions: multiple targets on a given trial could be defined by (1) the same feature, (2) different features within the same dimension, or (3) different features across dimensions. Our behavioral analyses confirmed a strong influence of target context on response speed: reactions were fastest for sF targets, slowest for dD targets, and of intermediate speed for dFsD targets. This RT pattern replicates one of the groundbreaking studies on “dimensional weighting” (Found & Müller, 1996). In Experiment 3 of this study, participants were asked to enumerate the number of targets presented (3 vs. 4, displayed among homogeneous background items), with the critical manipulation that targets could belong either to the same or to two different dimensions. Mirroring the current results, Found and Müller (1996) found faster RTs when the targets were intra-dimensionally relative to cross-dimensionally defined. In their DWA model, Found and Müller suggested that, in order to decide upon the correct response in cross-dimensional multitarget (enumeration) searches, participants would have to sample information from (i.e., enumerate items in) all, rather than just one, target-defining dimensions—yielding prolonged processing times.Footnote 3 However, given that this reasoning was based solely on RT measures, it remained an open question whether the cross-dimensional RT cost stems from increased demands to attentionally select multiple targets in visual space, increased demands to encode multiple targets in working memory, or a mixture of both.

Attentional selection of multiple targets depends on feature-based mechanisms

As revealed by our electrophysiological analyses, the effect of target context started to modulate target processing only after the completion of the early sensory processing stages. This conclusion can be drawn from the present P1 and N1 results, which were sensitive to sensory factors—that is, overall object numerosity—but not to inter-target relations. The first processing level at which multiple target processing was affected by the featural/dimensional relation of the targets was the stage of attentional selection (and/or individuation). Specifically, the PCN was more pronounced when the targets could be singled out by the same feature value (i.e., the sF condition) compared with conditions with targets defined by physically distinct features (i.e., the dFsD and dD conditions). Going beyond previous observations of PCN waves for multiple targets of identical, rather than different, colors (Mazza & Caramazza, 2012), our findings demonstrate that at this stage it does not matter whether physically distinct targets belong to same or different dimensions. The only difference that does matter at this processing level concerns whether the targets are physically identical or not.

Of note, this pattern of effects challenges the predictions of both the BMT (Huang et al., 2007) and the DWA (Müller et al., 1995, 2010). Based on the explicit assumption that a single Boolean map can be made up solely of one feature value per dimension, or independent feature values that belong to different dimensions, BMT would have predicted a difference between dFsD trials relative to both sF and dD trials, without any difference between the latter. On the other hand, assuming that attentional target selection is constrained by dimension-specific weight settings that influence the integration of feature-contrast signals at the attention-guiding master map, a strong interpretation of the DWA would have predicted both sF and dFsD trials to differ from dD trials. Accordingly, even though both models provide frameworks accounting for a variety of visual search findings, their explanatory reach appears to be limited to single-target searches.

At first glance, the present findings also seem to contradict the pattern typically reported for redundant target searches. In this type of task, search performance is contrasted between conditions in which the response is defined by a single response-relevant signal (e.g., the color red) or redundantly by two response-relevant signals (e.g., the color red and the orientation vertical). The well-established finding is responses are faster to targets determined by two (or more), rather than just one, response-relevant target features. Importantly, there is an additional RT gain (violating Miller’s, 1982, race model inequality) when dual (nearby) targets are defined across rather than within dimensions, which has been taken to indicate a preselective coactivation mechanism modulating target (overall-) saliency (Krummenacher, Müller, & Heller, 2002). The view of a perceptual origin of the redundant-signals effect is further substantiated by recent EEG studies (Krummenacher, Grubert, Töllner, & Müller, 2014; Töllner, Zehetleitner, Krummenacher, & Müller, 2011) that—at variance with the current study—found faster and enhanced PCN waves for targets defined across relative to within dimensions. However, as demonstrated by Krummenacher and colleagues (2002), coactivation between dimensions and associated RT benefits were only evident when the two target-defining signals occurred at the very same or spatially directly adjacent locations. In the present study design, however, the two or three target signals were distributed randomly across eight spatially separated (i.e., nonidentical) locations. This variation may account for the difference—in terms of the behavioral and electrophysiological patterns—between redundant target searches and the current study.

One alternative mechanism that—in contrast to BMT and DWA—may account for the present PCN pattern is perceptual grouping (Beck, 1966; Palmer, 1992; Treisman, 1982; Yantis, 1992). Following her seminal feature-integration theory (Treisman & Gelade, 1980), Treisman (1982) suggested that attention may be directed to groups of items, rather than to single items, when they can be segmented at a preattentive level. Applied to the present study, it may be the case that enhanced PCN waves for sF trials may reflect facilitated deployment of attention to targets that can be readily grouped (on the basis of feature similarity) preattentively compared with less similar targets for which groupings (if formed at all) are more fragile. The idea of facilitated attentional selection for targets that can be grouped based on sharing the same feature description is consistent with recent proposals that the PCN may represent a neural measure of visual saliency (Conci, Töllner, Leszczynski, & Müller, 2011; Töllner, Conci, et al., 2015; Töllner, Zehetleitner, Gramann, & Müller, 2011). Specifically, comparing pop-out displays with varying target saliency, it has been shown that PCN waves gradually increase with decreasing target-distractor similarity (Töllner, Zehetleitner, Gramann, et al., 2011). Accordingly, the increased PCN waves observed for sF trials in the current study may indicate that the target side of the display was perceived (or encoded) as perceptually more salient overall when multiple targets could be preattentively grouped.

Encoding multiple targets in working memory depends on dimension-based mechanisms

The first stage at which dimension-based processing dynamics started to alter information processing in the present study was the level of transferring target information into or from working memory (WM). This conclusion can be drawn from the activation pattern of the P3b wave—a neural measure traditionally associated with WM (Donchin & Coles, 1988; Polich, 2007; Wiegand et al., 2016)—which was reduced not only for dFsD relative to sF trials, but even and more markedly, when the targets belonged to different dimensions (dD trials). Following one notion according to which the P3b reflects processes related to WM updating (Vogel & Luck, 2002; Vogel et al., 1998), this amplitude reduction may indicate a gradual impairment (dD > dFsD > sF) of transferring multiple targets into WM with increasing featural/dimensional inter-target dissimilarity.

While this post-selective locus of dimension-based processing dynamics is not envisaged by BMT and DWA, it extends and complements a growing body of EEG studies (Gramann, Töllner, Krummenacher, Eimer, & Müller, 2007; Rangelov, Töllner, Müller, & Zehetleitner, 2013; Töllner, Eschmann, Rusch, & Müller, 2014) that likewise observed dimension-specific modulations arising also at later, WM-related processing stages. One recent retro-cue study is of particular relevance in this context (Töllner et al., 2014): identical to the current task, participants were presented with stimulus arrays consisting of two or three target singletons defined in the same or different dimensions. Instead of responding to them immediately, however, the task required participants to transfer the targets to WM to decide whether a subsequently presented retro-cue item matched one of the stored (WM) items. Given that this study was designed to measure the access of WM representations by means of retro-cue-synchronized EEG lateralizations, the two or three targets were always bilaterally distributed. Given this, Töllner et al. (2014) could not analyze PCN waves in response to target displays, so it remained unknown whether attentional selection of multiple targets is modulated by dimensional context. Based on the retro-cue design, however, it was possible to isolate the process of extracting target information from WM, with the main finding being that memory search was more demanding—in terms of enhanced memory-related ERP waves and behavioral response times—when the targets were defined across dimensions.

On the basis of these results, we recently introduced the notion of “dimensional feature bundles” (DFB) (Töllner, Mink, & Müller, 2015), which extends the well-known, hierarchically structured feature bundle model (Brady, Konkle, & Alvarez, 2011) by adding an intermediate, dimension-based level of WM representations (that connects top-level object with lower-level feature representations). Accordingly, increased processing times for cross-dimensional (relative to intra-dimensional) target processing in the retro-cue task can be explained by the requirement to actively maintain and scan two (instead of just one) dimensionally organized feature bundles (for further details, see Töllner, Mink, et al., 2015). Whereas the DFB model was originally devised to explain RT differences in memory search, in theory, the same dimension-based processing dynamics also may be at work in the current paradigm. Following this idea, the present, dimension-based P3b results may reflect variable processing demands associated with transferring multiple target information from WM to subsequent, response-related stages in enumeration.Footnote 4

Conclusions

Three conclusions can be drawn from the present data set: first, the times taken for exact target enumeration in multiple-target search rely substantially on the physical relation between the target items (sF < dFsD < dD)—a factor that has been largely neglected in most previous studies on enumeration. Second, as revealed by our EEG analyses, this RT cost is not attributable to a single information-processing stage. Rather, the target context modulation originates from at least two functionally distinct stages: attentional selection and working memory. At the stage of attentional selection, signal processing varies depending on whether the to-be-selected items are physically identical (i.e., defined by the same feature value) or not—likely attributable to perceptual grouping mechanisms. At the subsequent WM-related stage, signal processing is additionally modulated by dimension-based inter-target relationships—presumably indicating increased processing demands associated with transferring cross-dimensional (relative to intra-dimensional) target information from WM to response-related stages. Third, this set of findings has important implications for the explanatory reach of, at least, two prominent visual attention models: BMT and DWA. While both models provide frameworks explaining a wide range of feature- and dimension-based phenomena in human vision, they cannot readily account for the present, multiple-target search data and appear, thus, limited to single-target searches.