The visual system deals with the world’s impractically large sum of information by deploying selective attention to process the important bits. Generally speaking, attention is allocated by selecting locations like a spotlight (Posner, 1980), as the classic metaphor goes, or by selecting objects (Kanwisher & Driver, 1992). The interaction of these modes has been the subject of longstanding debate (Kahneman & Henik, 1981), and while it is established that both modes of selection guide attention under different circumstances, the principles determining the interplay between location-based and object-based selection remain unclear. In this article, we report three experiments demonstrating that object-based effects (OBEs) are contingent on feature-based attentional control settings. That is, object-based selection depends on the objects possessing features that match the observer’s top-down attentional set.

Object-based selection is typically demonstrated by showing that attending to part of an object facilitates attending to the whole. Two classic examples of this are that it is easier to process two features of the same object than to process two features of separate but overlapping objects (Baylis & Driver, 1993; Duncan, 1984), and that response selection is harder when the target is perceptually grouped with flanking distractors (Kramer & Jacobson, 1991). Perhaps the most famous demonstration of object-based attention is Egly, Driver, and Rafal’s (1994) finding that targets are detected faster if they appear on the same object as a cue, compared to when targets appear on a different object. This is true even when the same-object and different-object target locations are equidistant from the cue, indicating that attention—summoned to the cued end of the rectangle—spreads preferentially throughout the object.

Because the objects in Egly et al.’s (1994) paradigm are task-irrelevant, the spontaneous generation of an OBE implies that same-object selection occurs automatically (Chen & Cave, 2008; Yeari & Goldsmith, 2010). The basic idea is that objects are parsed preattentively; when attention is deployed to a location containing an object, it will spread according to object boundaries (de-Wit, Cole, Kentridge, & Milner, 2011). This automatic, within-object attentional spreading is central to the idea that object-based selection is a default mode. Consistent with this idea, OBEs have been demonstrated under various conditions that are assumed to be automatic (e.g., Kimchi, Yeshurun, & Cohen-Savransky, 2007; Norman, Heywood, & Kentridge, 2013), indicating mandatory object-based selection.

In contrast, others have proposed that the visual system uses object-based selection flexibly, according to the needs of the observer (Shomstein, 2012). A core tenet of this view is that OBEs should emerge when there is considerable uncertainty in the environment, but confirming this prediction has proven contentious. For example, OBEs are not observed when the target location is known with certainty (Drummond & Shomstein, 2010; Shomstein & Yantis, 2002), but they reemerge when object distinctions are emphasized (Chen & Cave, 2006) or when perceptual objecthood is accentuated (Richard, Lee, & Vecera, 2008). OBEs are flexibly foregone when cue reliability is high (Shomstein & Yantis, 2004; Yeari & Goldsmith, 2010) or when a target location is incentivized with reward (Shomstein & Johnson, 2013), supporting the idea that observers adapt to location-based selection when there is strong incentive. In this article, we selectively elicited OBEs under conditions of equal uncertainty, incentive, object structure, and perceptual stimulation. Object-based selection was observed only when objects incidentally matched the top-down filters participants had adopted for target processing, demonstrating that object-based selection is contingent on goal-driven attentional control settings (ACSs).

The logic of our method is adapted from the contingent attentional capture literature. Early studies on attentional orienting demonstrated that abrupt visual onsets capture attention in a mandatory way (Jonides & Yantis, 1988). To demonstrate that top-down constraints could filter out this ostensibly mandatory capture, Folk and colleagues tested whether onsets would capture attention when the target was not an onset (Folk, Remington, & Johnston, 1992). They found that onsets captured attention when observers were looking for an onset target, but not when observers searched for a color-defined target. Likewise, color cues captured attention for the color target, but not the onset target; in other words, ACSs filter irrelevant cues. In this article, we used ACSs to filter irrelevant objects and demonstrate that OBEs reflect a non-mandatory, non-default mode for attentional selection.

Experiment 1

In our first experiment, we adapted Egly et al.’s (1994) two-rectangle paradigm so that the rectangles were presented in the same or different color from the target. Participants updated their ACSs to the target color on a trial-by-trial basis. The question was whether the same-object advantage would emerge, regardless of the object–target color relationship.

Method

Participants

A total of 25 students (16 female, nine male) participated in exchange for course credit. All of the students gave informed consent according to the University of Toronto’s institutional review board (IRB). Twenty-five participants was deemed an ample sample given the similar or smaller samples used in many replications of this paradigm in the literature. All of the participants had normal or corrected-to-normal vision, and all were naïve to the purpose of the study and its hypotheses.

Apparatus and materials

The stimuli were presented on a Dell computer with a CRT monitor using MATLAB software with the Psychophysics Toolbox. Viewing distance was controlled with a chin rest. All of the stimuli were presented on a dark gray background. The stimuli were light gray dots subtending 1.0°, and two parallel rectangles subtending 14.9° on the long edge and 5.1° on the short edge. These rectangles could be presented horizontally or vertically, but they were always parallel. The rectangles were drawn with a composite, two-layer line; each of these lines was 0.3° thick. The outer layer was presented in black, and the inner layer was presented in either red (RGB: 255, 0, 0) or green (RGB: 0, 176, 80), depending on the trial. The target array consisted of three circles, colored red or green, each subtending 2.0°. These circles randomly contained either a “T” or an “L,” printed in white, in size 36 Arial font.

Procedure

Trials began with the central fixation dot presented in light gray. After 500 ms, the fixation changed color to red or green for 1,000 ms, indicating the target color at the end of trial and establishing the ACS. The fixation returned to light gray for 500 ms. The rectangles were presented for 500 ms and were displayed horizontally or vertically, depending on the trial. The inner layer of both rectangles could be red or green. The cue was a transient color change of one end of the outer, black layer of one of the rectangles, lasting 100 ms. The color changed from black to either red or green, and the cued end of the rectangle then returned to black for 100 ms before target onset. The three circles of the target array appeared at the cued location and the two adjacent locations (the invalid–within and invalid–between locations). One of the circles was presented in the target color (the same color as the fixation at the beginning of the trial), and the other circles were presented in the distractor color. Participants responded by pressing the “T” or “L” buttons on a computer keyboard, indicating the target identity. The stimuli remained onscreen until response. See Fig. 1 for an illustration of the trial sequence.

Fig. 1
figure 1

Time course of a trial in Experiment 1. At the beginning of the trial, the fixation indicates the color of the target, establishing the color-based ACS. The objects can be presented in the same color as the target (match) or in a different color (mismatch). The task is to identify the target letter in the same color as the fixation. The target was always presented with two non-target-color distractors. The top row shows a trial with a valid cue and rectangles that match the target ACS. The bottom row shows a trial with a valid cue and rectangles that mismatch the target ACS

The experiment had a 3 (Cue Validity: valid, invalid–within, or invalid–between) × 2 (ACS: object–target match or object–target mismatch) repeated measures design. The cue was informative: There was a 75 % chance that the target would appear at the cued location, and a 12.5 % chance that it would appear at either of the other possible locations. The fixation color always matched the target color, to indicate the appropriate ACS for the participant. Importantly, the cue also always matched the target color, so that it would capture attention (Folk et al., 1992). The object–target ACS match was balanced across levels of cue validity, such that the object matched or mismatched the target color on equal numbers of trials. The object orientation (vertical or horizontal) was balanced across trials, as were the target and object colors (red or green). Participants completed 12 practice trials and 480 experimental trials.

Results

Trials faster than 150 ms were discarded as anticipations, and trials slower than three SDs from the participant’s mean for every condition were discarded as outliers (2.2 %). Incorrect responses were also discarded (6.4 %) for the response time (RT) analyses. The mean RTs were submitted to a 3 (Cue Validity: valid, invalid–within, or invalid–between) × 2 (ACS: object–target match or object–target mismatch) repeated measures analysis of variance (ANOVA). We observed a significant main effect of cue validity, replicating the conventional effect of peripheral cues: F(2, 48) = 111.37, p < .001, η p 2 = .82. A main effect of ACS also emerged, F(1, 24) = 6.18, p = .020, η p 2 = .20, in which participants were slower to respond in the match condition. The main effect of ACS match was likely due to more information being processed (i.e., the task-irrelevant objects) when the objects matched the ACS. The interaction between ACS and cue validity did not reach significance: F(2, 48) = 0.62, p = .540.

The critical test of the OBE was whether the cost to orienting to an invalidly cued location would be greater in the different-object than in the same-object conditions. To test this, we calculated RT costs by subtracting the valid RT from those in the invalid–same-object and invalid–different-object conditions for each participant, and submitted these means to a 2 (Cue Validity: invalid–same object, or invalid–different object) × 2 (ACS: object–target match or object–target mismatch) repeated measures ANOVA. We also planned separate comparisons of the same–different object cost in the match and mismatch conditions with paired-samples t tests. The ANOVA revealed no significant effects: Fs < 2.59, ps > .121. Consistent with our predictions, there was a significantly greater cost to orienting between objects versus within objects when the object feature matched the target ACS: t(24) = 2.43, p = .023, d = 0.64 (see Fig. 2), but we found no difference when the object and target ACS did not match, t(24) = 0.35, p = .732.

Fig. 2
figure 2

Mean response time cost (RTInvalid – RTValid) to identify targets presented in the same object as or a different object from the cue, for the match and mismatch conditions. When the objects are presented in a color matching the target ACS, the conventional same-object advantage is observed. When the objects are presented in the nonmatching color, there is no object-based influence on orienting. Error bars represent one SEM, within subjects

Discussion

Classic OBEs were observed only when participants adopted an ACS that was congruent with the task-irrelevant objects’ color. Note that the objects were present and salient in the mismatch condition (in fact, during the object preview they were the only stimuli besides the fixation), but they remained unused for attentional selection. Because object segmentation occurs preattentively (de-Wit et al., 2011; Qiu, Sugihara, & von der Heydt, 2007), the objects in the mismatch condition should have been available to guide attention, yet they did not. We conclude that OBEs are not mandatory, and that a participant’s goals mediate the influence of objects on the distribution of attention. This provides an explanation for why object-based selection in the two-rectangle paradigm is inconsistent across observers (Pilz, Roggeveen, Creighton, Bennett, & Sekuler, 2012); perhaps only some observers demonstrate OBEs because they depend on the subjective adoption of top-down settings.

The results also have implications for the flexibility of top-down ACSs. Specifically, there is some debate regarding whether feature-based ACSs can be established on a trial-by-trial basis (Belopolsky, Schreij, & Theeuwes, 2010; Lien, Ruthruff, & Johnston, 2010). The present results support this idea.

Experiment 2

For our second demonstration of contingent object-based selection, we wanted a paradigm not involving orienting to peripheral cues. These rapid luminance onsets are visually complicated events that automatically recruit multiple processes (Luck & Thomas, 1999). Moreover, the cues in Experiment 1 were possibly differentially salient in the match and mismatch conditions because they abutted onto rectangles of the same or a different color. Although this is unlikely to have driven the ACS effect, it would be advantageous to use a paradigm without cues, avoiding their perceptual baggage altogether (West, Pratt, & Peterson, 2013).

Although not typically cited as an exemplar of object-based attention (Kanwisher & Driver, 1992; Scholl, 2001), Castiello and Umiltà’s (1990) use of different-sized objects to modify the size of the attentional focus is a clear example of objects modulating the distribution of attention. Castiello and Umiltà presented objects of different sizes with a five-element, radial target array (one central element and four eccentric elements); when the objects were small, the center element appeared within the object and the eccentric elements appeared outside; when the objects were large, all elements appeared within. The results showed a processing advantage for the central element only when the objects were small, indicating that the size of the attentional focus adjusted to match the size of the objects; small objects excluded selection of the stimuli outside the box. Although not couched in the parlance of the literature (perhaps because it was contemporary with and not subsequent to its most influential findings), this result is a clear example of object-based attention.

In our second experiment, we adapted this paradigm within our ACS framework so that the objects could match or mismatch the target color. The target array was always presented with a target-color element and a non-target-color element. Consequently, all trials exhibited equal perceptual structures: two objects (both red or green) and a target array with one red, one green, and three gray elements. The question was whether the within-object advantage for small objects would emerge under conditions under which the object color did not match the target color.

Method

Participants

A new sample of 25 students (17 female, eight male) participated in exchange for course credit. All of the students gave informed consent according to the University of Toronto’s IRB. They all had normal or corrected-to-normal vision, and all were naïve to the purpose of the study and its hypotheses.

Apparatus and materials

The setup was the same as in Experiment 1. All stimuli were presented on a dark gray background and consisted of a small, gray fixation point subtending 1° and two peripheral circles, 3.6° or 9.6° in diameter, centered 6.0° to the left and right of fixation. The circles were empty, with a border width of 0.4°, and could be colored red or green. The target array consisted of five letters, which were each randomly designated to be “H” or “E.” The target array would appear on the left or the right side of fixation. The central letter was 6.0° to the left or right of fixation, so that it would appear within the circle on that side. The other letters were displaced 3.0° in either direction along the vertical and horizontal axes, such that the eccentric letters would appear outside the small circle, but that all letters would appear inside the large circle. The letters of the target array were printed in size 40 Arial font. On all trials, one red letter, one green letter, and three gray letters were presented.

Procedure

Trials began with the central fixation dot presented in light gray. After 500 ms, the fixation changed color to red or green for 1,500 ms, indicating the target color at the end of trial and establishing the ACS. The fixation returned to light gray for 500–1,000 ms, whereupon two circles would appear to the left and right of fixation for 500 ms. The target array then appeared until response. Participants were instructed to respond to the identity, “H” or “E,” of the letter in the same color as the fixation at the beginning of the trial. See Fig. 3 for an illustration of the trial sequence.

Fig. 3
figure 3

Time course of a trial in Experiment 2. At the beginning of the trial, the fixation indicates the color of the target, establishing the color-based ACS. The objects can be presented in the same color as the target (match) or in a different color (mismatch). The top row shows a trial with small objects that match the target ACS. In this case, the target is situated outside the circle. The bottom row shows a trial with large objects that mismatch the target ACS. In this case, the target is inside the circle

The experiment had a 2 (Object Size: small or large) × 2 (Target Location: central or eccentric) × 2 (ACS: object–target match or object–target mismatch) repeated measures design. The fixation color always matched the target color, to indicate the appropriate ACS. The target array was presented equally often on the left and right sides of the display. The target was presented equally often at all five possible locations of the array. Participants completed 12 practice trials and 320 experimental trials.

Results

Trials faster than 150 ms were discarded as anticipations, and trials slower than three SDs from the participant’s mean for every condition were discarded as outliers (2.9 %). Incorrect responses were also discarded (9.7 %) from the RT analyses. The mean RTs were submitted to a 2 (Object Size: small or large) × 2 (Target Position: central or eccentric) × 2 (ACS: object–target match or object–target mismatch) repeated measures ANOVA. We observed a significant main effect of ACS, as participants were slower to respond when the object color matched the target ACS: F(1, 24) = 39.82, p < .001, η p 2 = .62. Critically, there was a three-way interaction between all of the factors, as predicted: F(1, 24) = 9.55, p = .005, η p 2 = .28. No other sources of variance were reliable: all Fs < 3.05, all ps > .093.

Castiello and Umiltà’s (1990) original effect was observed in a two-way interaction between object size and target position. We predicted that we would observe the same effect when the object matched the target ACS, and would observe no two-way interaction when the object was presented in the nonmatching color. So, to probe the observed three-way interaction further, we conducted separate 2 (Object Size: small or large) × 2 (Target Position: central or eccentric) repeated measures ANOVAs on the mean RTs in the ACS match and mismatch conditions. When the object color did not match the target ACS, we found no significant sources of variance: all Fs < 1.33, all ps > .260.

When the object color matched the target ACS, we observed a significant interaction between object size and target position, replicating Castiello and Umiltà’s (1990) original effect: F(1, 24) = 7.42, p = .012, η p 2 = .24 (see Fig. 4). Further support for the replication comes from a planned comparison of the mean RTs for central versus eccentric targets presented with small objects, t(24) = 2.04, p = .026 one-tailed, d = 0.59, confirming that identification of the target letters was slower outside than inside the small objects. In addition to the significant interaction, a marginal effect of object size emerged, F(1, 24) = 3.57, p = .071, η p 2 = .13, and no effect of target position, F(1, 24) = 0.54, p = .468.

Fig. 4
figure 4

Mean response times (RTs) to identify targets presented as central or eccentric elements of a search array with small or large objects. The object color could match or mismatch the target ACS. When the objects were presented in a color matching the target ACS, we found a within-object processing advantage. When the objects were presented in the nonmatching color, there was no object-based influence on target processing. Error bars represent one SEM, within subjects

A further test of the idea that OBEs should emerge only when the object color matches the ACS would be to compare the RTs to eccentric targets presented with small objects in the match and mismatch conditions; the objects should restrict the spread of attention within in the match condition, and they should not affect the allocation of attention when they mismatch. Confirming this prediction, RTs were significantly slower to eccentric targets presented with small objects in the match versus the mismatch condition: t(24) = 7.99, p < .001, d = 2.26.

Discussion

OBEs emerged only when participants’ ACSs compelled them to attend to the objects, confirming the conclusion that object-based selection is contingent on top-down control. In Experiment 1, the cues abutted onto the rectangles, such that the mismatch condition presented a two-color contrast, whereas the match condition did not. Although it is unlikely that the contrast caused the effect—because the cue was effective in both ACS conditions, as evidenced by the large location-based cueing effect—in Experiment 2 we did away with this contrast. Consequently, the match and mismatch trials presented equal perceptual stimulation.

Experiment 3

In Experiments 1 and 2, we established an ACS for the target color by briefly changing the fixation color prior to object onset. Consequently, the fixation may have acted as an intratrial feature prime for the objects in the match conditions (Awh, Belopolsky, & Theeuwes, 2012). In other words, seeing a red fixation could facilitate the processing of subsequent red objects and produce the observed object-based effects in the match conditions. To test this possibility, we replaced the color fixation instruction with a word.

Method

Participants

A new sample of 25 students (19 female, six male) participated in exchange for course credit. All of the students gave informed consent according to the University of Toronto’s IRB. All had normal or corrected-to-normal vision, and all were naïve to the purpose of the study and its hypotheses.

Procedure

Experiment 3 was identical to Experiment 2, with two exceptions. First, the fixation display formerly used to indicate the target color (thereby establishing an ACS for red or green) was replaced by a display containing the word “RED” or “GREEN,” centered, and printed in gray size 40 Arial lettering; second, the large-object condition was removed. For our purposes, the OBE involved in Castiello and Umiltà’s (1990) paradigm was ascertained by comparing the RTs to targets appearing at central versus eccentric locations in the small-object condition. By eliminating the large-object condition, we doubled the number of trials in the small-object condition, increasing the power of our critical comparison. The total number of trials remained 320.

Results

Trials faster than 150 ms were discarded as anticipations, and trials slower than three SDs from the participant’s mean for every condition were discarded as outliers (2.2 %). Incorrect responses were also discarded (6.2 %) from the RT analyses. One participant was removed prior to the analysis because of a mean RT greater than three SDs from the group mean; no other participant was greater than two SDs from the mean. The mean RTs were submitted to a 2 (Target Position: central or eccentric) × 2 (ACS: object–target match or object–target mismatch) repeated measures ANOVA. We found a significant main effect of ACS, in which participants were slower to respond when the object color matched the target ACS: F(1, 23) = 4.30, p = .05, η p 2 = .16. There was no main effect of target location: F(1, 23) = 0.44, n.s. Critically, we observed an interaction between ACS and target location: F(1, 24) = 8.19, p = .009, η p 2 = .26 (see Fig. 5). The critical test of OBEs in this paradigm was whether the RTs to detect targets appearing would be faster at the center than at the eccentric locations. Confirming our prediction, a paired-samples t test comparing the mean RTs for centrally and eccentrically presented targets in the match condition showed a significant within-object advantage: t(23) = 2.14, p = .044, d = 0.67. Surprisingly, we also observed a reversal of this effect in the mismatch condition: t(23) = 2.26, p = .033, d = 0.69.

Fig. 5
figure 5

Mean response times (RTs) to identify targets presented as central or eccentric elements of a search array with small objects. The object color could match or mismatch the target ACS. When the objects were presented in a color matching the target ACS, we observed a within-object processing advantage. Error bars represent one SEM, within subjects

Discussion

These results showed a within-object processing advantage only when the object feature matched the target ACS, replicating Experiments 1 and 2. Because the ACS instruction was presented as a gray word, it could not have primed object processing at a feature level, falsifying an intratrial feature-priming account of Experiments 1 and 2.

General discussion

In three experiments, we showed that object-based selection is contingent on the top-down ACS. We modified two classic tasks in which objects are known to affect the distribution of attention, such that the objects were presented in a color matching or mismatching the ACS. OBEs emerged only when the object color matched the ACS, indicating that object-based attention requires selection of the objects in question.

It is important to note that our results do not suggest that an object–target feature match is a necessary condition for OBEs to emerge. Indeed, that claim cannot be true, given the range of nonmatching object and target stimuli that have been used in existing demonstrations of object-based attentional orienting (e.g., de-Wit et al., 2011). However, existing demonstrations with the two-rectangle paradigm have never employed feature-based ACSs with nonmatching elements. Without an active feature-based ACS to filter nonmatching objects, all objects and targets should be processed and therefore be available for object-based processing. In contrast, in the present study we used feature-based ACSs to gate object processing. In other words, an object feature match is not required for OBEs, but rather, OBEs proceed at the behest of the observer’s top-down, feature-based control settings.

This conclusion speaks to the contentious dichotomy of flexible versus mandatory object-based selection. In order to demonstrate flexible object-based selection, researchers usually modify the task, the stimuli, or the outcome across conditions to incentivize object-based selection (e.g., Shomstein & Johnson, 2013). The results from such experiments have led to the important notion that object-based selection is flexible, but under engineered circumstances. Predictably, other researchers have generated different circumstances under which OBEs return (e.g., Chen & Cave, 2006). In the present experiments, the critical conditions—whether or not the ACS matched the object feature—were presented with equal uncertainty, structure, incentive, and perceptual stimulation. In other words, nothing in the physical circumstances biased location-based over object-based selection, or vice versa.

Like object or scene parsing, feature-based modulation of visual processing has been shown to occur at very early, supposedly preattentive, levels (Liu, Larsson, & Carrasco, 2007; Saenz, Buracas, & Boynton, 2002). It is not surprising, then, that a feature-based ACS should prevent the selection and processing of objects, and subsequent OBEs, or should otherwise modulate the orienting of attention (Folk et al., 1992). It is surprising, though, that a simple, top-down setting could so completely disrupt object-based attention, given the frequent demonstrations of OBEs under conditions that are assumed or implied to be powerfully automatic (e.g., de-Wit et al., 2011; Kimchi et al., 2007), and even below conscious awareness of the objects in question (Norman et al., 2013).

The present findings argue strongly against the notion that object-based selection is mandatory; otherwise, OBEs would have emerged regardless of the color-based ACS. Attentional capture despite color-based settings has been demonstrated for other phenomena (Al-Aidroos, Guo, & Pratt, 2010), so it is reasonable to expect that object-based selection could overcome the ACS in mismatch conditions. The present study, however, clearly shows that OBEs are contingent on top-down settings. If object-based selection is a default mode under circumstances of location-based uncertainty (Yeari & Goldsmith, 2010), OBEs should have emerged even in the mismatching condition, since there was no incentive to ignore the task-irrelevant objects, and our location-based uncertainty was equivalent to that in the original tasks. If object-based selection is a default mode, as others have suggested, it is default only in a very weak sense of the word. This position raises the question: If object-based selection is not a default mode of selection, why do OBEs emerge under conditions similar to our mismatch condition, as in the original experiments (Castiello & Umiltà, 1990; Egly et al., 1994)? We propose that, without any other instruction or motivation, observers participating in these object-based selection experiments spontaneously adopt top-down settings for the only static visual stimulus provided—the objects—eliciting OBEs in a manner that seems default. The question of default modes is reminiscent of Bacon and Egeth’s (1994) investigation into why attentional capture can appear either stimulus-driven (Theeuwes, 1992) or goal-driven (Folk et al., 1992) under different circumstances. They proposed a stimulus-general singleton detection mode that caused any perceptual singletons to capture attention. In their words, observers defaulted to a setting that prioritized anything perceptually interesting, “because it was easier and because they could” (Bacon & Egeth, 1994, p. 493). We propose that the seemingly default adoption of object-based selection follows the same principles.