Introduction

Feature-based selection

Here, we seek to better understand the nature of the mental representations that attention is allocated to. Evidence suggests that attention can be allocated to features (such as colors), as well as regions and objects (e.g., Arman, Ciaramitaro, & Boynton, 2006; Bichot, Rossi, & Desimone, 2005; Chelazzi, Miller, Duncan, & Desimone, 1993; Sàenz, Buracas, & Boynton, 2002, 2003; White & Carrasco, 2011). Just as selecting multiple objects or regions reduces performance, relative to selecting a single object, so might selecting multiple features. We will investigate this here.

Any performance cost for processing multiple stimuli might arise at selection or at access. Selection and access are thought to be separate stages of processing required to report on stimuli in the presence of distractors (Huang & Pashler, 2007).

Selection refers to confining processing to relevant stimuli from a complex display that includes irrelevant stimuli. It has been theorized to precede access, which refers to processing the contents of the selected stimuli. Consider that a person waiting to pick up a friend in an airport arrival hall may search for the color blue because the friend said she would be wearing a blue dress. So, using feature-based attention, the waiting person directs attention to all the passengers with blue outfits. This is selection. Once the candidates are selected, their characteristics must be apprehended to identify which is the friend. Apprehending the candidates’ characteristics, such as their height, skin color, and identity, requires access.

Evidence for feature-based selection was found by Bichot et al. (2005) among others. Bichot et al. demonstrated that searching for a target with a particular color enhanced the activity of the population of V4 neurons that prefer that color throughout the visual field, suggesting that attention can be allocated to a particular color regardless of stimulus location.

If featural attention has limited capacity in the same way as other forms of attention, selecting multiple features should induce a cost. For example, selecting both red and green objects might be less effective than selecting an equal total number of green objects. Here, we will use the word attribute to refer to a feature dimension, such as color, and feature will refer to an individual value, such as red.

In an early test of the effectiveness of multiple-color attention, Wolfe et al., (1990) had participants search for a conjunction of two colors. That is, a patch comprising red and green parts might be designated as the target, with other patches containing red and blue, and others green and blue. Search performance was rather poor, with the effect of number of distractors (search slope) greater for this color–color conjunction than for color–orientation conjunctions. Wolfe et al. concluded that attention could not be guided to two features of the same dimension at once. However, this failure to guide attention to the unique location where both colors are present does not necessarily mean that simultaneous activation of both colors did not occur. Indeed, evidence from cuing paradigms suggests that two colors can be simultaneously activated.

Irons, Folk, and Remington (2012) demonstrated that when the observer was required to respond to a target that could be either green or red, an irrelevant green or red cue, presented shortly before the target, could capture attention. This suggests that the observer can select two colors simultaneously (Adamo, Pun, Pratt, & Ferber, 2008; Irons et al., 2012).

The reason why search performance was poor for the unique location containing both of the two colors (Wolfe et al., 1990) could have been the need to guide attention to the only location where both colors was present. For example, when the observer is searching for a target containing a red and green region, selecting red and green colors is not sufficient for this task, because distractors also contain red or green regions. However, in the cuing paradigm, when the observer is looking for a red or a green target, the red or green cue presented beforehand is the only red or green stimulus in the cue display, so confining attention to the joint occurrence of red and green is not needed. This joint occurrence requirement is sometimes referred to as a conjunction (Wolfe et al., 1990).

Our previous study (Lo, Howard, & Holcombe, 2012) found evidence that a particular sort of conjunction is troublesome for selection and did so outside of a visual search context. The conjunction requirement was for a color–location conjunction, the selection of an object of a particular color in a particular location, avoiding a distractor in the same location. The first experiment of Lo et al. (2012) presented two pairs of overlapping gratings, one pair in the left and one pair in the right hemifield, and asked participants to monitor the spatial period of one target grating from each side (here schematized in Fig. 1a). The targets were designated by their location and color. The targets were either the same color or different colors (e.g., the instruction might be to monitor the red grating to the left and the green grating to the right). In this task, selection was required because participants had to segregate the targets from their superimposed distractors, one of which was red and the other green. Use of distractors with the same colors as the targets (but in different locations) is what made the task require a color–location conjunction. The spatial periods of the target gratings changed constantly, smoothly, and continuously (rather than in discrete jumps). After this continued for a random interval, the gratings disappeared, and participants had to report the final spatial period of one of the targets. A multiple-feature cost was found: Errors were larger when the target gratings were different colors (Fig. 1a, middle panel) rather than the same color (Fig. 1a, top panel). This result recalls the difficulty of color–color conjunction search (Wolfe et al., 1990), but here the task involved just two locations.

Fig. 1
figure 1

a Schematic illustration of stimuli used in Lo et al., (2012). In this figure, the target is denoted by portraying it in front of the distractor, but in the actual experiments, the two stimuli were exactly superimposed with no spatial offset. b Each horizontal strip represents the processing of the target in the corresponding condition in panel a. The gray areas indicate the periods where the observer is attempting to select the target, whereas the white areas indicate the periods where the observer is accessing the spatial period of the target. The dashed lines represent different stimulus durations. With long durations such as t2, the observer has successfully selected the target so the performances in the same-color target condition (top panel) and the different-color target condition without color–location conjunction (bottom panel) are similar, leading to little multiple-feature cost. A significant multiple-feature cost can be observed only in the different-color target condition with color–location conjunction, as it was shown in Lo et al. In the present study, a short duration such as t1 will be used to test whether a multiple-feature cost can be manifested in the different-color target condition without color–location conjunction (bottom panel). c Predicted selection rates as a function of stimulus duration. The first and second graphs from the top are based on the extended selection hypothesis and repeated selection hypothesis with color–location conjunction. The bottom graph is based on the prediction without color–location conjunction

The impairment in performance in the multiple-color condition occurred only when a color–location conjunction was required of selection. When the two distractors were irrelevant colors (Fig. 1a, bottom panel), participants could set their attention simply to the two target colors and need not be concerned with the locations of those colors. In one condition, the targets were red and green, and the superimposed distractors were blue and yellow. Errors were not significantly different from when both targets were red or both green. In other words, we found no cost of simply splitting featural attention between two colors.

Only when the locations of the designated colors was critical (e.g., red target on the left and green target on the right, with superimposed green distractor on the right and red distractor on the left) that performance was worse in the two-color condition. Observers likely failed to adequately conjoin color and location selection, allowing feature-based attention to spread from targets to distractors.

In addition to a larger error associated with selection of two color–location conjunctions, Lo et al. (2012) also showed that observers tended to report a spatial period value corresponding to the state of the grating at a time prior to the stimulus offset, rather than the value of the final frame. We referred to this as the perceptual lag. The lag was longer in the different-color with color–location conjunction condition (Fig. 1a, middle panel) than in the same-color condition (Fig. 1a, top panel), suggesting that observers’ access to the target tended to reflect an earlier time if the color–location conjunction is required.

Recall that our opening question was whether selecting two features yields worse performance than selecting one. That is, even when conjunctions with location are not required, is feature-based attention less effective for two features than for one? The study of Lo et al. (2012) yielded no sign of such a simple multiple-feature selection cost, since the multiple-feature cost was not found when the distractors were irrelevant colors (no color–location conjunction required). However, we here report evidence that a simple multiple-feature selection cost does occur even without a color–location conjunction requirement, but only when the time available to process the target features is brief.

The trial duration in the experiment not requiring a color–location conjunction (The second experiment of Lo et al., 2012) always exceeded 3 s, so any initial difficulty in selecting the targets in the different-color case may have been overcome by the time the trial terminated (as schematized in the bottom strip of Fig. 1b). Here, we tested short durations and observed a multiple-feature cost even when the distractors were irrelevant colors (no color–location conjunction required). We interpret this as meaning that it is more difficult to select two features, as compared with one, at the beginning of the trial but that, once the targets are selected, participants can access the spatial periods without any cost.

The extended selection hypothesis and the reselection hypothesis

A separate issue is the nature of the multiple-feature cost in the presence of distractors sharing the target colors (color–location conjunction condition). This was manifest even for long, 3-s trials, indicating that the selection difficulty was enduring (Lo et al., 2012). There are a few possibilities for the relative roles and dynamics of selection and access that yield this cost.

We will refer to an extended selection hypothesis and a reselection hypothesis. See Fig. 1b, where each horizontal strip indicates how the stimuli are processed with time in the corresponding condition in Fig. 1a. The gray areas indicate the intervals during which the observer is attempting to select the targets, and the white areas indicate the intervals during which the observer is accessing the spatial period. The extended selection hypothesis is that in the color–location conjunction condition, selection takes a long time (Fig. 1b, the second strip from the top), so long that it continued sometimes for the entire duration (> 3 s). According to this hypothesis, even for such prolonged trials, selection in the multiple-feature condition will sometimes not be complete, yielding a multiple-feature performance cost.

The reselection hypothesis (which here we will find evidence for) suggests that initial selection in the color–location conjunction condition may not require a long time but that it happens more than once (Fig. 1b, the third strip from the top), since selection is sometimes disrupted and the targets must be selected again. An extended (e.g., >3 s) trial should then be viewed as multiple episodes of shorter duration, with each episode including a selection process and access not possible until selection succeeds. According to this hypothesis, in the color–location conjunction condition, even for prolonged trials, the trial still occasionally terminates when selection is not effective, resulting in the observed multiple-feature selection cost.

Decomposition of the selected and unselected components in the error distribution

According to our theory (and that of Huang & Pashler, 2007), observers can access the target contents only after they have selected the targets. Applying a mixture modeling technique to the distribution of errors allows us to assess observers’ success at selection in different conditions. We will estimate the selection rate in each condition, by which we mean the proportion of trials on which participants had selected the targets so that they could report a relevant value.

For very short trial durations, there may not be enough time for selection, so we expect that if trial duration is varied, selection rate will improve as a function of duration. We have suggested, however, that in the color–location conjunction condition, even for long durations, selection is not always successful when the trial terminates, so the selection rate might not come close to 100 %.

Our mixture modeling approach is similar to that pioneered by Zhang and Luck (2008) and widely applied to visual short-term memory. In the experiments reported here, the task of participants is to report the final orientation of one of the target gratings, so the error distribution spans from −90° to 90°. This yields a distribution of errors.

The error distribution is assumed to comprise two kinds of trials. The first kind consists of trials wherein the participant guessed. When attentional selection fails, participants are assumed to have no knowledge of the answer (here, the target’s orientation) and, therefore, must guess. On the other kind of trial, selection succeeds, and the target’s orientation is accessed and reported. For these trials, the error is assumed to have a Gaussian distribution. The precision of access to target orientation might vary, yielding a different σ for the Gaussian in different conditions.

The mixture modeling provides an estimate of the guessing rate, which is one minus the selection rate. Our two hypotheses make different predictions for the selection rate (Fig. 1c).

According to the extended selection hypothesis, the selection in the color–location conjunction condition takes several hundred milliseconds or more, so it takes that long for selection rate to reach an asymptote. But that asymptote should be the same as that in the conditions that do not require color–location conjunction (Fig. 1c, upper panel).

According to the reselection hypothesis, on the other hand, reselection in the color–location conjunction is intermittently required, leading to a relatively low final selection rate (a low asymptote). That is, because reselections occur even for long trials, the trial will still occasionally terminate when the observer is attempting to select the targets. The selection rate in the color–location conjunction condition will thus never reach the high value predicted for the conditions where color attention need not be confined to the targets’ locations (Fig. 1c, lower panel).

Overview of experiments in the present study

In an unpublished study, we found that when all stimuli were arrayed at the four corners of an imaginary square, the best condition for eliciting a multiple-feature cost was when the targets were located horizontally at the upper two or lower two corners. If they were arranged vertically or diagonally, the multiple-feature cost on lag decreased. This remains an intriguing issue, but since we are interested in the multiple-feature cost rather than its relationship with spatial arrangement, we used only the horizontal arrangement here.

Here, to manipulate the duration of the stimulus that was available for selection and eventual access, instead of changing the overall trial duration, we used rotations and jumps on each trial. By rotations, we mean continuous changes in orientation, and by jumps, we mean abrupt changes in orientation. During rotations, the participant can use the orientation continuity to maintain selective attention on the target (Blaser, Pylyshyn, & Holcombe, 2000), but after a jump, we suggest that the participant will need to reselect the target, using color (the selection attribute).Footnote 1

As a result of the disruption of selection caused by each jump, the duration following the final jump is that most relevant to what the participant reports. If the selection process takes longer than the interval after the final jump, then selection fails.

Experiments

Two experiments were conducted. Participants were presented with four pairs of gratings, one pair in each quadrant. Two of the gratings were designated as targets and were the same color or different colors. The task was to monitor the orientations of the two targets. The orientation changes were gradual continuous shifts (rotations) punctuated by sudden changes to a random value (jumps) at random times during the trial. The duration of the last rotation of a trial was the duration of interest, since we suggest that it corresponded to the interval available for selection, during which the target had an orientation related to its final value.

To probe for a multiple-feature cost, the main factor of both experiments was the relationship between the target colors, which could be isochromatic (e.g., both red or both green) or heterochromatic (one target red and one green). In the first experiment, the distractor color set was the same as the target color set, necessitating color–location conjunction selection, and in the second experiment, they were different. Figure 2 shows the stimuli of an exemplar trial in the heterochromatic-target condition of Experiment 1.

Fig. 2
figure 2

Trial structure of Experiment 1. The white text on the graphs was not shown in the actual experiments. The orientations of all the gratings were constantly changing, and the participant had to report the final orientation of the postcued grating. Experiment 2 used an identical stimulus layout, except that the distractors did not share target colors. Animations of the stimuli can be viewed at http://www.psych.usyd.edu.au/staff/shihyu/

In Experiment 1, we analyzed the errors and lags partially as a replication study probing for the multiple-feature color–location-conjunction cost of Lo et al. (2012). We also estimated the selection rates to test whether the repeated selection hypothesis or the extended selection hypothesis would better explain the multiple-feature cost (see Fig. 1c for predictions).

Experiment 2 probed for a multiple-feature cost when no conjunction with location was required. If in these conditions, selection is more time-consuming for two colors but is completed after a short period, a multiple-feature cost should be observed only with short durations since the last jump. In those short-duration conditions, the selection in the isochromatic-target condition should usually be complete, but that in the heterochromatic-target condition often will not be. Because we had no a priori prediction for the amount of time required for a selection, a number of durations following the last jump were tested.

Method

Participants

Ten participants (3 female), including the 2 authors of this study, participated in Experiment 1; 8 participants (2 female), including 1 author of this study (S.Y.L.), participated in Experiment 2. All had normal or corrected-to-normal vision. Both experiments were approved by the Human Research Ethics Committee of the University of Sydney.

Stimuli and procedure

Four pairs of gratings were presented at the four corners of an imaginary square, as shown in Fig. 2b and c. The target gratings were at either the upper two quadrants or the lower two quadrants.

The main factor of both experiments was relative target color, which could be isochromatic (e.g., both red or both green) or heterochromatic (e.g., one target red and one green). To necessitate color–location conjunction, in Experiment 1, the distractor color set was the same as the target color set, and in Experiment 2, it was different.

In Experiment 1, all the stimuli were either red or green. So if the two target colors were red and green, their superimposed distractor colors would be green and red, respectively. That way, selection of the targets was contingent on both color and location. In Experiment 2, the target colors and distractor colors were distinct; if the two target colors were red and green, the two superimposed distractors were blue/yellow, yellow/blue, blue/blue, or yellow/yellow. While in this example the targets were different colors, on other trials, they were the same color (both blue, both red, both yellow, or both green).

In both experiments, the two targets were always arrayed horizontally and presented in either the upper or the lower field. In Experiment 2, the remaining two quadrants comprised all four colors (red, green, blue, yellow) permuted randomly. For those distractors that are superimposed with the targets, whether or not they were in the same color as each other was counterbalanced with whether or not the two targets were the same color as each other. For example, in the isochromatic-target condition, where the two target colors were in the same color, on half of the trials, their superimposed distractors would be in the same color as each other, and on the other half, they would be in different colors.

The screen resolution was 1,024 × 768 pixels, and the frame rate was 85 Hz. The experiment program was written in Python and used the Vision Egg library (Straw, 2008). In order to confirm that the two targets were presented in the intended retinal locations in the two hemifields, participants’ eyes were monitored to ensure that they fixated. The participants’ right eyes were monitored by an eyetracker (Eyelink 1000) with a sampling rate of 1000 Hz. The participants sat at a distance of 60 cm from the monitor with their head on the chinrest. The experiment began with the eyetracker’s standard calibration and validation, involving participants following a white dot with their eyes.

Each trial of the experiment began with a central white fixation point with a radius of 0.22° of visual angle (dva) and two precues presented on the screen. The two precues were discs, 0.5 dva in radius and presented at an eccentricity of 9.7 dva, colored either green or red to inform the participants of the colors of the targets they had to monitor subsequently (Fig. 2a).

The experimenter initiated the trial after confirming that the participant was fixating on the fixation point. Four pairs of superimposed square-wave gratings subtending 4.2 dva each were presented 941 ms later, one pair in each quadrant. The precues stayed with the gratings for the initial 1,176 ms of the trial and disappeared afterward. The four gratings were positioned at the four corners (eccentricity 4.8 dva) of an imaginary square (Fig. 2b and c) centered on fixation. In Experiment 1, each pair of gratings comprised one green (CIE x, y, .29, .59; luminance, 30.7 cd/m2) grating and one red (CIE x, y, .61, .34; luminance, 30.5 cd/m2). In Experiment 2, the target gratings could be red, green, yellow (CIE x, y = .43, .50; luminance, 97.3 cd/m2), or blue (CIE x, y = .15, .07; luminance, 18.1 cd/m2), and the target color set had no colors in common with the color set of the superimposed distractors.

All gratings changed their orientations on every frame. Most of the trials included both abrupt changes to a random orientation value (jumps) and smooth changes in orientation (rotations). The duration of the whole stimulus train was assigned a random duration between 1,776 and 2,941 ms, with each smooth rotation interval assigned a random duration between 118 and 1,176 ms. The final rotation duration was set to 118 (Experiment 1 only), 294, 588 (Experiment 1 only), or 1,176 ms. In addition to these four conditions that included jumps, there was a pure-rotation condition where the orientation changed smoothly throughout the whole trial. At the end of the presentation, the postcue appeared, which was identical to one of the precues. The task was to report the final orientation of the postcued grating by adjusting two oriented lines (Fig. 2d) on the screen to match the target orientation the observer had last seen. Feedback was given after the participant made the response by displaying the orientations the targets had in the final frame.

At the time of each jump, all the gratings changed orientation to random and independent values. During the intervals between jumps (rotations), their orientations changed smoothly with trajectories that were independent of each other. The algorithm that created these trajectories was as follows. For each grating, the starting orientation was set to a random value. The starting angular velocity was set to a value between 0.043 and −0.043 deg/ms, but the value of 0 was avoided by excluding absolute values smaller than 0.0085 deg/ms. The starting angular acceleration was set to −2.17 × 10-4 deg/ms2 or 2.17 × 10-4 deg/ms2. Every 235 ms, the angular acceleration was reset randomly to either 7.23 × 10-5 or −7.23 × 10-5 deg/ms2. When the angular velocity was below 0.017 deg/ms, the absolute angular acceleration values would be increased from 7.23 × 10-5 to 2.17 × 10-4. The maximum angular velocity value was 0.26 deg/ms. If the velocity of a grating exceeded this maximum value, the sign of the acceleration was reversed.

Fully crossed with relative target color (isochromatic targets or heterochromatic targets) was the factor of final rotation duration (Experiment 1, 118, 294, 588, or 1,176 ms or pure-rotation condition; Experiment 2, 294 or 1,176 ms or the pure-rotation condition).

Sixty-four trials of each condition were presented, in random order and mixed with the other conditions. The whole experiment for each participant required four sessions in Experiment 1 and two sessions in Experiment 2, with each session lasting approximately 1 h.

Results

Eye movement data

Participants were instructed to fixate throughout the trials, and two criteria were used to exclude trials where participants seemed to break fixation. The first criterion for exclusion was whether the eyetracker reported that an eye fixation location was more than 1.3° to the right or left of the midline. Second, at times, the eyetracker failed to register the eye location. If any missing period exceeded 300 ms, the trial was excluded. In Experiment 1, we computed the proportion of trials excluded on the basis of these two criteria for the 10 participants. The proportion for one participant was 44 %, and because this was more than 1.5 times of the interquartile range greater than the third quartile of the group data, the participant’s data were excluded from further analysis. The mean proportion excluded for the remaining 9 participants was 14 %. In Experiment 2, the data from the 8 participants were all retained, and the average exclusion rate was 19 %.

Error curves and permutation tests

To measure the perceptual lag as we did in the study of Lo et al. (2012), an error curve was plotted for each participant in each condition. The error curve compares the orientation observers report with the target’s orientation at different times prior to stimulus offset (see Fig. 3a for a cartoon illustration and Fig. 3b for some real data). The task was to report the orientation of the target at the time of offset, but if access to the target’s orientation is a time-consuming process or requires switching between the two targets, then when the target offsets, participants may be able to report only an orientation from an earlier time.

Fig. 3
figure 3

a Error curve schematic. The illustration below the horizontal axis represents a target grating changing its orientation. The error curve is the differences between the target orientations at different times and the orientation reported by the participant. Although the participant is required to report the final orientation, typically the reported orientation was closer to the orientation earlier (here, 200 ms). b Error curves of participant A.O.H. in the pure-rotation conditions of Experiment 1. The error is defined by the error with respect to the final frame (time before the stimulus offset = 0), which is greater in the heterochromatic-target condition. The lag is indicated by the minimum error of the curve. Here, it is 35 ms for the isochromatic-target condition and 71 ms before the stimulus offset in the heterochromatic-target condition. These results suggest that processing of the stimuli was more time-consuming or intermittent in the heterochromatic-target condition. Both the error and minimum error values were also larger in the heterochromatic-target condition, suggesting that selection, access, or both were poorer in this condition

The horizontal axis of an error curve (e.g., that in Fig. 3b) represents each frame from the final frame back to frame 1,765 ms earlier. The vertical axis plots the absolute value of the difference between the stimulus orientation for each frame and the orientation reported as the final state by the participant. The error with respect to the final frame will be referred to as the error, whereas the lowest value of the error curve is the minimum error. The interval between the time of the minimum error and that of the stimulus offset is the lag. Outliers, defined by the error being outside 1.5 times the interquartile range from the first or third quartile for that condition, were excluded.

If the participants guessed on a large proportion of trials, the lag value calculated by our procedure would be meaningless (it would be a random value created by random noise). The mixture modeling reported later provides some assurance that the participants were not responding entirely randomly, but as a further precaution, we also conducted permutation tests (Lo et al., 2012). The results indicated that the lags were not caused by random noise (see Appendices 1 and 2 for the procedure and results of the permutation tests in Experiments 1 and 2).

Tests of errors and lags

We analyzed the minimum errors using a 2 (relative color) × 5 (final rotation duration) repeated measures ANOVA in Experiment 1 (Fig. 4, upper panel), and a 2 (relative color) × 3 (final rotation duration) repeated measures ANOVA in Experiment 2 (Fig. 5, upper panel). As in the study of Lo et al. (2012), all the mean minimum errors, errors, lags, and their square roots were first subjected to Levene’s test (Levene, 1960), for the homogeneity of variance among conditions. If the square root transformation yielded more homogeneous variance, it was used for the ANOVA. Untransformed and transformed data are used to refer to the original data and their square roots, respectively. Mauchly’s test was used to evaluate the data’s sphericity, and all the data in the present study were not significantly nonspherical.

Fig. 4
figure 4

Minimum errors (upper panel) and lags (lower panel) in Experiment 1 (color–location conjunction required in the heterochromatic-target condition). Error bars show one standard error across participants

Fig. 5
figure 5

Minimum errors (upper panel) and lags (lower panel) in Experiment 2 (color–location conjunction not required). Error bars show one standard error across participants

Experiment 1 (color–location conjunction required in the heterochromatic-target condition)

The statistics reported in this section reveal that target color dissimilarity yielded significantly larger minimum error, F(1, 8) = 14.67, p = .005, for all final rotation durations except, notably, the shortest duration (118 ms). These results replicate the multiple-feature cost in long trials observed previously for location conjunction conditions (Lo et al., 2012; see Fig. 4, upper panel).

The ANOVA on minimum errors revealed a significant interaction between final rotation duration and whether the target colors were different, F(4, 32) = 6.34, p < .001. Simple main effect analyses showed that the cost of target colors being different was significant when the final rotation duration was 294, 588, or 1,176 ms or in the pure-rotation condition, with the error values in the isochromatic-target condition versus the heterochromatic-target condition being 19° versus 31°, F(1, 8) = 23.51, p = .001, 20° versus 27°, F(1, 8) = 13.21, p = .007, 18° versus 26°, F(1, 8) = 11.06, p = .01, and 18° versus 27 ° F(1, 8) = 7.97, p = .02, respectively. However, only an insignificant cost of less than 1° (27.9° vs. 28.2°) was observed when the duration was 118 ms, F(1, 8) = 0.02, p = .89. This analysis was based on the untransformed values (Levene’s test on the untransformed minimum errors, W = 1.15, p = .34; transformed minimum errors, W = 1.68, p = .11). Turning from minimum error to error itself (the error with respect to the final frame), the corresponding ANOVA yielded the same pattern of significance. The mean difference between the minimum error and error was less than 1°.

Target color dissimilarity not only inflated the errors, but also yielded longer lags (Fig. 4, lower panel)—123 ms for heterochromatic targets and 43 ms for isochromatic targets, F(1, 8) = 7.66, p = .02 (based on the transformed data, Levene’s test on the transformed lags, W = 2.02, p = .05; untransformed lags, W = 3.14, p = .003). No main effect of final rotation duration, F(4, 32) = 1.70, p = .18, or interaction with target color dissimilarity, F(4, 32) = 0.57, p = .69, was observed.

Experiment 2 (color–location conjunction not required to select the targets)

Experiment 2 probed whether the cost of target color dissimilarity would persist when color–location conjunction was not required to select the targets (because the distractors had different colors than the targets). A significant cost of target color dissimilarity was still observed; it yielded larger minimum error,Footnote 2 F(1, 7) = 13.63, p = .008 (see Fig. 5, upper panel).

However, this cost varied with duration [interaction of target color dissimilarity with final rotation duration, F(2, 14) = 6.60, p = .01]. Specifically, there was no significant simple main effect of target-color dissimilarity in the pure-rotation condition [18° vs. 19°, F(1, 7) = 0.06, p = .82]. In contrast, for the shorter critical durations, the cost (simple main effect) was significant; at 294 ms, the mean minimum error was 18° for the isochromatic-target condition versus 26° for the heterochromatic-target condition, F(1, 7) = 13.99, p = .007, and for the 1,176-ms condition, the corresponding figures were 17° and 22°, F(1, 7) = 7.83, p = .03.

This effect of final rotation duration may be explained by the increase in time available for selection, which was longest in the pure rotation condition, allowing participants to overcome an initial difficulty of multiple-feature selection. The aforementioned analysis was based on untransformed data (Levene’s test on untransformed minimum errors, W = .33, p = .89; transformed minimum errors, W = .92, p = .48). Analysis of the error, as opposed to the minimum error, yielded the same pattern of significance as that described above. The mean difference between the minimum error and error was less than 1° .

The lag was not significantly affected by any factor. The lag values (Fig. 5, lower panel) of the isochromatic-target versus heterochromatic-target condition were 25 versus 22 ms, 51 versus 59 ms, and 29 versus 78 ms, when the final rotation duration was 294 ms or 1,176 ms and in the pure-rotation condition, respectively. There were no statistically significant effects on lag of final rotation duration, F(2, 14) = 0.78, p = .48, and relative color, F(1, 7) = 0.75, p = .41, and no interaction, F(2, 14) = 0.927, p = .42, based on the transformed values (Levene’s test on the transformed lags, W = 2.58, p = .040; untransformed lags, W = 2.87, p = .03).

Intertrial priming

Theeuwes and colleagues have suggested that intertrial priming can explain many aspects of featural attention (Awh, Belopolsky, & Theeuwes, 2012; Belopolsky, Schreij, & Theeuwes, 2010; Theeuwes & van der Burg, 2011). In a typical attention capture design, Folk, Remington, and Johnston (1992) showed that when the observer was looking for a red target, presenting a red cue causes attention to shift to the cue’s location. As Theeuwes and his colleagues pointed out, this might be due not to volitional attention to red but, rather, to priming of red from previous trials. Because Folk et al. used a blocked design, on trials where participants were instructed to attend to red, the target on the previous trial was also red (except for the first trial of each block).

To assess whether intertrial priming, rather than volitional attention to the target color, might explain our results, we performed an additional analysis. In Experiment 1, for an isochromatic-target trial, if the target of a trial was the same color as the previous one, contrary to the priming hypothesis, the (nonsignificant) trend was for the error to be slightly higher than when they were different [21.4 vs. 20.8, t(8) = 0.77, p = .46]; for a heterochromatic-target trial, whether the target of a trial was the same color or a different color than the previous one [28.0 vs. 28.2, t(8) = 0.15, p = .89] did not affect the error significantly. Experiment 2 also yielded little to no evidence for an intertrial priming effect. For an isochromatic-target trial, the error was just slightly higher if the target color on a particular trial was the same as that on the previous one [19.1 vs. 17.8, t(7) = 1.67, p = .14] versus when they were different; for a heterochromatic-target trial, the corresponding figures were 22.8° versus 23.2°, t(7) = 0.32, p = .76.

Mixture model fitting

To tease apart the extended selection hypothesis and the reselection hypothesis, we assess how selection rate varies with final rotation duration in the color–location conjunction condition (the heterochromatic-target condition in Experiment 1) (Fig. 1c). Mixture modeling was used to estimate the selection rate in each condition.

The distribution of the errors (E) was assumed to be

$$ E\sim {P}_{\mathrm{s}}V\left(\mu, \sigma \right)+\left(1-{P}_{\mathrm{s}}\right)U\left(-90{}^{\circ},90{}^{\circ}\right). $$
(1)

The first term of the equation reflects the trials where the target was successfully selected and its orientation was accessed, where P s is its probability. But this access to target orientation is subject to some noise in registering the orientation, which we assume is distributed normally. Because orientation error is a circular variable, we use the von Mises distribution V, a circular analogue of the normal distribution. The second term of the equation captures guessing. The complementary value of P s, 1 − P s, indicates the probability of an item not being selected, which leads to guessing. If the target is not selected, the observer can only guess its orientation. The guessing rate (1 − P s) is therefore multiplied by the guessing distribution U, a uniform distribution from −90° to 90°.

We fit this mixture model to the minimum errors and the errors of each cell using maximum likelihood, as implemented in the “mle” function of the MATLAB Statistics Toolbox. By cell, we mean one particular condition of an individual participant. The multiple trials within each cell provide the distribution of errors that the model is fit to.

Fitting the mixture model to the minimum error, as opposed to the error, was problematic. Rather than the data providing a distribution of minimum errors for each cell, there was only one minimum error in each cell, because the minimum error was obtained by the error curve, which is derived by averaging across multiple trials within one cell. Fitting the model required a distribution of minimum errors for each cell, and this was achieved by assuming that each trial has the same lag. For example, if a given cell has 50 trials and the lag is 118 ms, the minimum error of each trial within this cell will be derived from comparing the difference between the reported value and the stimulus value 118 ms before the stimulus offset. How this assumption might affect the estimated parameters will be discussed in the next section.

Unlike in previous analyses, outliers were not excluded, because they were presumed to be part of the guessing rate. In addition, the errors’ signs (positive or negative) were preserved. In the preliminary analysis allowing all three parameters (μ, σ, P s) to vary, the fitting algorithm did not converge, so we fixed μ as the circular mean of the errors and estimated the values of P s and σ within each cell. Figure 6 shows example model fits to minimum errors of Experiment 1. White bars show the obtained frequencies, and black bars the fitted model’s frequencies. All the parameters were estimated separately for each participant, but the graphs in Fig. 6 use the mean parameters averaged across participants.

Fig. 6
figure 6

Two examples of the mixture model fit to the minimal errors. The white bars and the black bars, respectively, denote the observed frequencies and expected frequencies of the minimum errors in the a isochromatic-target pure-rotation condition and b heterochromatic-target 118-ms-duration condition in Experiment 1. The model fitting was executed separately for each participant, but the graphs here show the parameters averaged across participants. A positive value indicates that the reported value is counterclockwise to the stimulus value. The estimated parameters are written in the inset text. Error bars show one standard error across participants

The P s and σ values for Experiments 1 and 2 are plotted in Figs. 7 and 8 and are described separately below, along with associated ANOVAs.

Fig. 7
figure 7

Estimated a P s values and b σ values in Experiment 1 (for which color–location conjunction is required in the heterochromatic-target condition), based on minimum error. Error bars show one standard error between subjects

Fig. 8
figure 8

Estimated a P s values and b σ values in Experiment 2 (color–location conjunction not required), based on minimum error. Error bars show one standard error between subjects

Experiment 1 (color–location conjunction required in the heterochromatic-target condition)

The P s values (Fig. 7, upper panel) in the isochromatic-target condition appear to asymptote at around 74 % for final durations longer than 294 ms. For the heterochromatic-target condition, P s was approximately 60 % for all final rotation durations, with little increase over time. This might be explained by the reselection hypothesis, which posits that the distractors repeatedly interfere with selection throughout.

Consistent with the previous paragraph’s description of the results pattern, the interaction between relative target color and final rotation duration was significant, F(4, 32) = 3.42, p = .02. Simple main effect analyses showed that the selection rates did not change significantly with duration in the heterochromatic-target condition, F(4, 32) = 0.83, p = .52. In the isochromatic-target condition, in contrast, final rotation duration had a significant effect, F(4, 32) = 6.36, p < .001. Multiple comparisons indicate that for isochromatic targets, the selection rate in the 118-ms condition was smaller than the other four (ts > 3, ps < .05), whereas there was no significant difference across these four durations, F(3, 24) = 0.07, p = .98, confirming that the selection rate had asymptoted. The higher asymptotic rate in the isochromatic-target condition than in the heterochromatic-target condition is evident from comparing their mean selection rates across different durations, except for the shortest one (118 ms), which were 74 % and 60 %, respectively, F(1, 8) = 9.23, p = .02. This analysis was based on the untransformed data (Levene’s test on the untransformed P s values, W = .66, p = .74; transformed P s values, W = .84, p = .58).

The dispersion (σ) was 3.6° larger in the heterochromatic-target condition than in the isochromatic-target condition, F(1, 8) = 23.09, p = .001 (Fig. 7, lower panel). This did not interact with final rotation duration, F(4, 32) = 0.54, p = .71, and neither did final rotation duration have a significant main effect, F(4, 32) = 0.61, p = .66. The higher dispersion of orientations in the heterochromatic-target condition could be caused by a coarser or noisier representation of the target orientations. The existence of distractors possessing target colors might increase processing noise, leading to coarser representations for the targets. This analysis was based on the untransformed data (Levene’s test on the untransformed σ values, W = .46, p = .90; transformed σ values, W = .47, p = .89).

The P s and σ values derived from the errors showed the same pattern of significance as those from the minimum errors, except for the following: The interaction between relative color and final rotation duration on the Ps values estimated from the errors was not significant, F(4, 32) = 2.17, p = .09. Nevertheless, there was still a trend that the multiple-feature cost on P s was smaller in the 118-ms condition than those in the other conditions. The main effect of relative target color was significant, F(1, 8) = 12.18, p = .008, where the P s value in the isochromatic-target condition was significantly higher than that in the heterochromatic-target condition. Thus, the pattern of results observed for errors was consistent with that for minimum errors.

Experiment 2 (color–location conjunction not required to select the targets)

In Fig. 8, upper panel, a trend is evident whereby P s in the isochromatic-target condition reached asymptote sooner than that in the heterochromatic-target condition, suggesting that multiple-feature selection requires more time than single-feature selection. The statistical details follow.

Although there was no significant interaction between relative color and final rotation duration, F(2, 14) = 2.19, p = .15, separate ANOVAs (with Bonferroni correction, α = .05/2) for isochromatic-target condition and the heterochromatic-target condition showed a significant effect of final rotation duration in the latter condition, F(2, 14) = 10.99, p = .001, but not the former, F(2, 14) = 1.45, p = .27; this suggested that the selection rate in the isochromatic-target condition had asymptoted at 294 ms, whereas the selection rate in the heterochromatic-target condition continued to increase for longer durations. This analysis was based on the untransformed values (Levene’s test on the untransformed P s values, W = .21, p = .96; transformed P s values, W = .22, p = .95).

The σ values (Fig. 8, lower panel) did not reveal any statistical significance of relative target color, F(1, 7) = 0.09, p = .77, final rotation duration, F(2, 14) = 0.38, p = .69, or the interaction of the two, F(2, 14) = 0.10, p = .90, based on the transformed values (Levene’s test on the transformed σ values, W = .21, p = .956; untransformed σ values, W = .21, p = .955). There was not even a trend, with the mean σ values of 17.5° and 17.8° in the isochromatic-target and the heterochromatic-target conditions, respectively.

We also carried out ANOVAs on the σ values and the Ps values estimated from the errors. The same pattern of significance was found as for the minimum errors.

Possible concerns about an assumption of the mixture model fitting

The model fitting assumes that each trial has the same lag within one condition. This may not be true, and the likely consequence would be an overestimation of guessing rate (1 − P s) or orientation dispersion (σ). For example, if a given condition yields a lag value of 118 ms on a certain trial of this cell, the observer might have accessed the orientation 100 ms, rather than 118 ms, before the stimulus offset. This mistaken assumption about the value of the lag causes the error calculated to be larger than it should be, leading to an inflated estimate of (1 − P s) or σ. The more lag variance there is, the more overestimation will result, which means that there should be a correlation between lag variance and the extent to which the parameters are overestimated.

Although the technique used in the present study does not allow us to estimate the lag variance, the correlation between lag variance and (1 − P s) or σ can be estimated by the correlation between lag mean and (1 − P s) or σ, given that there should be a correlation between lag mean and lag variance. The rationale is as follows. Assume that each trial in one condition has the same stimulus update rate—for example, every 10 ms. The trial ends at an unpredictable time, so it may be the time that the observer has just updated the target value, leading to a lag value of 0. Alternatively, the observer may have just started updating, so he or she can only report the representation of the previous update cycle, which is 10 ms ago. The consequence of an update rate of 10 ms is a mean lag value of 5 ms and a range of 10 ms, As the update rate gets slower, the mean and the range of the lag values would get higher, and so would the variance. In other words, there should be a correlation between lag mean and lag variance.

If the estimated (1 − P s) or σ contains a high proportion of overestimation, there should be a correlation between lag variance and (1 − P s) or σ, which also predicts a high correlation between lag mean and (1 − P s) or σ. In other words, if there is little correlation between lag mean and (1 − P s) or σ, the estimated (1 − P s) or σ should contain only a small proportion of overestimation.

In Experiment 1, the correlation between guessing rate (1 − P s,) (untransformed) and lag mean (transformed) was .02, t(8) = 0.13, p = .90, and the correlation between σ (untransformed) and lag mean (transformed) was .19, t(8) = 1.42, p = .19. This suggests that the estimated guessing rate and orientation dispersion may have been only slightly overestimated, due to the assumption that each trial has the same lag.

In Experiment 2, the correlation of lag mean (transformed) and guessing rate (1 − P s) (untransformed) was − .19, t(7) = 1.47, p = .18, and the correlation of lag mean (transformed) and σ (transformed) was .004, t(7) = 0.03, p = .98. This argues against much overestimation of (1 − P s) from assuming that each trial has the same lag.

Discussion

Summary of experiments 1 and 2

The study of Lo et al. (2012) demonstrated that the multiple-feature cost was significant only when color attention had to be contingent on location to yield successful selection. The multiple-feature cost therefore appeared to reflect a spread of feature-based selection to distractors sharing colors with the targets, rather than being a consequence simply of selecting multiple colors. Here, we first replicated this effect (Experiment 1) with different stimuli. We also discovered that a multiple-feature cost can occur even when distractors have irrelevant colors (so that color attention need not be contingent on location), if the relevant stimulus exposure duration is sufficiently short (Experiment 2).

A limited capacity for featural attention is supported by the finding in Experiment 2 that even when the distractor colors are irrelevant, selecting multiple colors requires more time than selecting a single color (specifically, that selection asymptoted sooner in the isochromatic-target condition). It is unclear whether this is because, in the heterochromatic-target condition, selection occurs in parallel but takes longer or because color selection occurs one-by-one.

When color attention needed to be contingent on location (heterochromatic-target condition of Experiment 1), the multiple-feature selection cost persisted even for long exposure durations, suggesting that the distractors perpetually interfered with selection. The pattern of how the selection rate varied with stimulus duration is consistent with the reselection hypothesis (Fig. 1c).

In the case of the isochromatic-target condition in Experiment 1, the selection rate (P s) was 52 % for the 118-ms condition but, by 294 ms, rose to 75 % and appeared to not increase further for longer durations. This suggests that 294 ms is sufficient time for selection of two stimuli with the same color. The reason for not reaching 100 % could be due to a combination of factors such as participants blinking, lapses of attention, and intrinsic difficulty of segregating the two overlapping gratings.

The repeated selection–access theory

To explain the full set of results, we propose that initial selection of different-color targets requires more time than selecting a single color, yielding the multiple-feature cost for short durations. We further propose that distractors of the same color as the targets disrupt target selection, necessitating that selection be repeated. We will refer to this as the repeated selection–access theory. Figure 9 schematizes the repeated selection–access process.

Fig. 9
figure 9

A schematic of the repeated selection–access theory for the conditions of Experiments 1 (upper two horizontal strips) and 2 (lower two horizontal strips). Each horizontal strip represents postcued target processing following the final jump of a trial. Gray areas indicate the selection stage, and the white areas the access stage. The upper horizontal strip of each panel depicts the isochromatic-target condition, for which selection requires less time, so when the trial ends at 294 ms after the final jump, the observer can report a recent orientation. The lower horizontal strip of the upper panel depicts the intermittent reselections required when color attention must be contingent on location. If the trial ends at the selection stage, the observer needs to guess; if the trial ends at the access stage, the observer can report an orientation. The bottom strip indicates the heterochromatic-target condition in Experiment 2. The conjunction of color and location is not required, so no reselection is required. Although no reselection is required, the initial selection takes a longer time in the heterochromatic-target condition than in the isochromatic-target condition. This can be inferred from the time needed for the selection rate (P s) in the heterochromatic-target condition to reach that in the isochromatic-target condition

On this theory, information processing involves a selection stage before access to stimulus characteristics can occur (Huang & Pashler, 2007). We suggest that when color attention needed to be contingent on location, the observer needed to intermittently reselect the targets.

The difficulty of confining feature-based selection to a particular location was demonstrated by Lo et al. (2012), as well as Irons and Remington (2013). In Irons and Remington’s study, observers were required to identify a letter with a particular color–location conjunction (e.g., a green letter on the right side) and a distractor sharing the target color but presented at a different location other than the target’s (e.g., green letter on the left side) impaired target identification performance. This suggests that the selection of a particular color cannot be confined to a particular location, so a distractor at a nontarget location will be mistakenly selected or, at least, disrupt the maintenance of selection of the target. To rectify this, selection must be attempted again. In our experiment, as a result, trials sometimes end at a time when reselection is not complete, so the observer cannot access the orientation and must, instead, guess. That may explain the larger errors (according to the model fit, caused by reduced selection rate) persisting when color–location conjunction is required.

Why are lags longer in the color–location conjunction condition?

In addition to triggering reselections, interference from distractors having the target colors yielded a longer perceptual lag. In Experiment 1, the distractors might have been mistakenly selected because they possessed target colors. This mistaken selection might have increased the number of accessed items. As the number of accessed orientations increases, previous work varying the number of targets has found that the lag increases (Howard & Holcombe, 2008). This could explain why the perceptual lag in the heterochromatic-target condition was significantly larger than that in the isochromatic-target condition when distractors possessed target colors. In this scenario, the lag increase arose in the access stage, not the selection stage.

An alternative explanation is that the lag increase was a consequence of the intermittent selection hypothesized by our repeated selection-–access theory. On this scenario, when the trial ended at a time while target selection is disrupted, the participant reported a value from an old selection episode. This is unlikely. If the observer can retrieve a past orientation value of the target while the selection is disrupted, the observer should also be able to report a past orientation before the jump if the trial ends soon after a jump, because when a jump occurs, selection is disrupted due to a lack of continuity of target orientation. If the observer can always report a value before the final jump, then the performance on minimum error and lag should be affected very little by the duration after the final jump. For example, if the participant tends to report a value 300 ms before the offset and is able to retrieve a value from memory when the selection is disrupted, then no matter how short the final rotation duration is—say, 118 ms—the lag should also be 300 ms and the minimum error should also be constant. The higher minimum error in the 118-ms condition of Experiment 1 (Fig. 4, upper panel) did not support this account.

An assumption of this repeated selection–access theory is that the observer selects all the possible targets first and then accesses the contents of those selected items. One might argue that the observer might select one target first, access its content, and then select and access another target. The multiple-feature cost might then arise because it is more difficult to switch from one target to another in the multiple-feature condition. If this is the case, on half of the trials, the observer will be processing one target, but the other target will be postcued, resulting in a selection rate of 50 %. This is not supported by the high selection rates of the present study, unless one further argues that the observer might have simply reported a past orientation value from memory when the postcued target is not the target the observer is currently processing and a past value is mostly very close to the present value, so it does not decrease the selection rate that much. However, it is very unlikely due to the same reason described in the previous paragraph: If the observer can report a past value of the postcued target while still processing the nonpostcued target, the selection rate should be fairly constant across different final rotation durations.

The results from the present study suggests only that multiple-feature selection takes a longer time than single-feature selection, but how the selection is implemented within each condition is still an open question. The longer selection time in the multiple-feature condition could result from a serial switch between the selections of the targets or an overall delay of selection on both targets without a serial switch between them (It should be noted that the serial switch here indicates observers attempting to select one color and then another color, rather than observers selecting one target, accessing the target contents, and selecting another target, as described in the previous paragraph). Another issue is that even within the multiple-feature conditions, selections may be implemented differently with and without color–location conjunction. In the case of multiple-feature monitoring with color–location conjunction (Experiment 1), the selection is repeated, and the performance outcome is a mixture of many selections, so it is hard to compare one single selection with and without color–location conjunction. More work is required to shed light on how the two kinds of selections are implemented.

Conclusions and future direction

A repeated selection–access theory explains why a persistent multiple-feature selection cost was observed only when color–location conjunction is required (Lo et al., 2012, and the present experiments). If color–location conjunction is required in the multiple-feature condition (distractors share the targets’ colors), the distractors are sometimes mistakenly selected. This mistaken selection has two consequences. First, reselection is required. Second, these mistakenly selected items increase the number of items to access. As the number of accessed items increases, the representation of each item corresponds to further in the past.

However, when color–location conjunction is not required (distractors and targets are from different sets of colors), selection of each color need not be restricted to a particular location, little or no mistaken selection occurs, and no reselection is required. In that situation, the multiple-feature cost occurs only for short durations, when the initial selection in the isochromatic-target condition is complete but that in the heterochromatic-target condition is not.

The present study demonstrated that the selection of two colors requires more time than the selection of one color, supporting the notion that the resource for feature selection is limited. In the literature of resource theory, it has been shown that the resources for object tracking are independent between the left and right hemifields (Alvarez & Cavanagh, 2005; Hudson, Howe, & Little, 2012), whereas the present study used gratings presented in separate hemifields but still found that they competed for the same resource pool. This suggests that the restriction on feature selection arises at later stages than the parietal regions thought to be involved in object tracking (Culham et al., 1998; Jovicich et al., 2001).