Object-based attention: A tutorial review
- First Online:
- Cite this article as:
- Chen, Z. Atten Percept Psychophys (2012) 74: 784. doi:10.3758/s13414-012-0322-z
This tutorial provides a selective review of research on object-based deployment of attention. It focuses primarily on behavioral studies with human observers. The tutorial is divided into five sections. It starts with an introduction to object-based attention and a description of the three commonly used experimental paradigms in object-based attention research. These are followed by a review of a variety of manifestations of object effects and the factors that influence object segmentation. The final two sections are devoted to two key issues in object-based research: the mechanisms that give rise to the object effects and the role of space in object-based selection.
Visual perception is necessarily selective. A natural scene typically contains a vast amount of information. However, because of the limited processing capacity of the visual system at any given time, we cannot process everything simultaneously. Given this limitation, it is perhaps not surprising that the factors that influence visual attention and the mechanisms that underlie the unit of selection are among the most studied topics in modern psychology.
Until the early 1980s, it was generally believed that visual attention operated within a spatial reference frame. This view is perhaps best illustrated by the various metaphors that have been used to describe attention, with the most widely accepted ones being spotlight (B. A. Eriksen & Eriksen, 1974; Hoffman & Nelson, 1981; Posner, 1980; Posner, Snyder, & Davidson, 1980), zoom-lens (C. W. Eriksen & St. James, 1986; LaBerge, 1983), and gradients (Downing & Pinker, 1985). Although these models of attention differed regarding their conceptions of the flexibility of attentional selection and the spread of attentional resources within a selected region of space, they all emphasized the spatial properties of attention. Attention was believed to select on the basis of space, and all stimuli within the selected region were thought to receive some degree of processing regardless of observers’ behavioral goals.
Although there is little doubt that space plays an extremely important role in visual selection (for reviews, see Cave, in press; Cave & Bichot, 1999; Lamy & Tsal, 2001), by the early 1980s, it had become clear that space was not the only reference frame within which attention operated. Because objects often overlap in space in natural scenes and we seem to have little difficulty attending to a specific feature or object among irrelevant distractors, it makes intuitive sense that the unit of attentional selection may also be based on features and objects, in addition to space.
This tutorial focuses on object-based selection.1 Although object is a commonly used word in everyday communication, the question of what an object is in visual perception turns out to be rather difficult to answer (Adelson & Bergen, 1991; Duncan, 1984; Logan, 1996; Scholl, 2001). This is because what constitutes an object depends not only on the physical properties of a stimulus or a group of stimuli (Baylis & Driver, 1992; Kimchi, Yeshurun, & Cohen-Savranzky, 2007; Kramer & Jacobson, 1991; Kramer & Watson, 1996), but also on how we parse an image in accordance with our behavioral goals (Marr, 1982). For the purpose of this tutorial, I will follow previous researchers (e.g., Goldsmith, 1998; Kimchi et al., 2007) and define a perceptual object as the elements in the visual scene organized by one or more Gestalt grouping principles and/or uniform connectedness. Due to space constraints, I will focus my review of object-based attention primarily on behavioral research, with a very selective review of physiological, neuroimaging, and clinical studies when necessary. The tutorial starts with a description of object-based attentional selection and the three commonly used experimental paradigms in object-based attention research. They are followed by a review of the different manifestations of object effects, the factors that influence object-based deployment of attention, and the mechanisms that give rise to the object effects. In the final part of the tutorial, I review the literature on the role of space in object-based selection. For interested readers, an extensive bibliography can be found at the end of this article.
Object-based attentional selection and three commonly used experimental paradigms
Even in the heyday of the space-based view of attention, various researchers noted the effect of objects on selective attention (e.g., Francolini & Egeth, 1980; Kahneman & Chajczyk, 1983; Kahneman & Henik, 1981; Kahneman, Treisman, & Burkell, 1983; Neisser & Becklen, 1975). Neisser and Becklen reported that people who were required to perform an attention-demanding task concerning one of two superimposed visual scenes could become remarkably unaware of superthreshold events happening in the unattended scene. Kahneman and Henik (1981) also found that interference from a task-irrelevant feature of a stimulus was much larger when that feature belonged to an attended object, relative to an unattended object, despite the fact that the locations of these objects were unpredictable. Furthermore, when the task was to report as many items in a display as possible, participants tended to jointly report or jointly miss items that were in the same perceptual group. These results led to the proposal that objects affect the distribution of attention and that attending to one aspect of an object facilitates the processing of other aspects of the same object regardless of task relevancy (Kahneman & Chajczyk, 1983; Kahneman & Henik, 1981).
In the 3 decades since Duncan’s (1984) study, there has been an explosion of research on object-based selection (for reviews, see Driver & Baylis, 1998; Kanwisher & Driver, 1992; Scholl, 2001). One study, which was conducted by Egly, Driver, and Rafal (1994), is of particular significance, for it introduced a paradigm that allowed the investigation of both space- and object-based deployment of attention within the same experiment. This paradigm has since become the most widely used paradigm in object-based attention research. In Egly, Driver, and Rafal, observers saw two rectangles presented side by side (see Fig. 1b). A spatial cue then appeared at one of the four ends of the rectangles, followed by a target at one of three locations: the cued location on 75 % of the trials (the valid condition), the uncued end of the cued rectangle on 12.5 % of the trials (the invalid same-object condition), and the uncued equidistant end of the other rectangle on the rest of the trials (the invalid different-object condition). The task was to detect the target as quickly as possible. Observers were faster to respond to the target at the cued location than at either of the uncued locations, indicating space-based attentional facilitation. Furthermore, they were also faster in the invalid same-object condition than in the invalid different-object condition. Since the spatial separation between the cue and the subsequent target was held constant in the latter two conditions, the differential reaction times (RTs) observed in these conditions suggest that attention spreads more quickly to other locations within the same object than between different objects (for alternative interpretations, see Lamy & Egeth, 2002; Shomstein & Yantis, 2002), indicating object-based deployment of attention. Using variants of Egly, Driver, and Rafal’s two-rectangle paradigm, many researchers have replicated these findings. Regardless of whether the task required stimulus detection or identification, the shift of attention was faster within an object than between objects (e.g., Chen, 1998; Lavie & Driver, 1996; Macquistan, 1997; Moore, Yantis, & Vaughan, 1998; Pratt & Sekuler, 2001).
A third paradigm commonly used in object-based attention research is the flanker interference paradigm (B. A. Eriksen & Eriksen, 1974) with object manipulation. In this paradigm, a target is shown at a central location flanked by distractors that indicate either the same response as or a different response from that of the target (see Fig. 1c). On some trials (the same-object condition), the target and distractors belong to the same object or perceptual group. On the rest of the trials (the different-object condition), they belong to different objects or perceptual groups. Regardless of whether objects are defined on the basis of contours (e.g., Chen & Cave, 2006; Richard, Lee, & Vecera, 2008; but see Shomstein & Yantis, 2002), Gestalt principles of color (e.g., Baylis & Driver, 1992; Harms & Bundesen, 1983; Kramer & Jacobson, 1991), common motion (e.g., Driver & Baylis, 1989; but see Berry & Klein, 1993; Kramer, Tham, & Yeh, 1991), connectedness (e.g., Kramer & Jacobson, 1991), or good continuation (e.g., Baylis & Driver, 1992), the general finding is that interference from distractors is greater in the same object/perceptual-grouping condition than in the different object/perceptual-grouping condition. In addition, when focal attention is prevented, observers are more likely to wrongly combine features from different objects when these objects are from the same perceptual group than when they are from different perceptual groups (e.g., Baylis, Driver, & McLeod, 1992; Prinzmetal & Keysar, 1989). Their ability to track independently moving targets in multiple-object tracking tasks (Pylyshyn & Storm, 1988) is also impaired when the targets are merged to form objects such as lines, rubber bands, or Necker cubes (e.g., Scholl, Pylyshyn, & Feldman, 2001). These results confirm that items that belong together are selected together.
Other manifestations of object effects
In addition to the findings described above and the paradigms that produced them, object-based attention has also been manifested in a variety of other ways via a number of other methods. Kahneman and Treisman (1984; Kahneman, Triesman, & Gibbs, 1992) were among the first to explore object-based attention. Kahneman et al. (1992) used an object preview paradigm to investigate the relationship between object continuity and the efficiency of visual information processing. A typical trial consisted of a preview display with two or more letters, each in an individual frame, and a target display with a single letter in one of the frames. The task was to report the identity of the target letter. RTs to the target were reliably shorter when the target was a previewed letter that appeared in the same frame (absolute or relative), as compared with a previewed letter that appeared in a different frame. These results provide evidence for an object-specific preview advantage, which occurs when two objects in close spatiotemporal proximity are seen as different states of the same object relative to different objects.
Several studies have explored the effect of object-based attention on saccadic eye movements. It was found that observers were more likely to make within-object, relative to between-object, eye movements when saccades were required for target identification (e.g., McCarley, Kramer, & Peterson, 2002; Theeuwes & Mathot, 2010), that the dwell time preceding the saccades was shorter when the switch of attention was within rather than between objects (e.g., McCarley et al., 2002), and that in memory recall tasks, participants’ eyes were more likely to fixate on a location when that location was linked, rather than not linked, to an animated creature that presented the relevant information (e.g., Hoover & Richardson, 2008).
Object-based attention also enhances manipulations in working memory. In Bao, Li, and Zhang (2007), participants were required to perform two tasks concurrently: to continuously monitor and update a target’s location on the basis of incoming information and to count the number of times a second stimulus occurred. One group of observers (the separate group) were simply told to perform the two tasks, while the other group (the binding group) was encouraged to integrate the location and object occurrence information into a single object by imagining that the target was a digit 0, which moved to a different location in accordance with incoming location information and which increased its value by 1 every time a second stimulus, whose frequency required monitoring, appeared. The results showed that RTs were longer in the separate group than in the binding group. Moreover, the cost of shifting attention between the location and object occurrence tasks was larger for the separate group than for the binding group, suggesting that binding information to a single object facilitates information manipulations in working memory. Related results were reported by Kahneman and Henik (1977, 1981), who manipulated the perceptual groupings of the stimuli that the participants had to recall and found a higher recall rate when the stimuli were displayed in the same perceptual group rather than in different perceptual groups. Ohyama and Watanabe (2010) observed object-based attentional benefits in memory recognition tasks. Their participants had better recognition memory for letters whose onset coincided with, rather than mismatched, a sudden change that occurred to an object upon which the letters were shown. These results suggest the existence of an object-based attentional mechanism that underlies both scene perception and information retrieval. Attention to one part of an object appears not only to facilitate the speed of information manipulation pertaining to the attended object, but also to enhance the strength of encoding, resulting in better retrieval of the encoded information.
Object-based attention also influences the efficiency of visual search. In general, search efficiency increases with increasing similarity among the distractors and decreasing similarity between the target and the distractors. This perceptual grouping effect has been found with a variety of features, including color, shape, proximity, good continuation, connectedness, and even perceived surface in 3-D space (e.g., Banks & Prinzmetal, 1976; Donnelly, Humphreys, & Riddoch, 1991; Duncan & Humphreys, 1989, 1992; Z. J. He & Nakayama, 1995; Humphreys, Quinlan, & Riddoch, 1989; Treisman, 1982; Wolfe & Bennett, 1997). These results are presumably caused by the fact that, whereas the homogeneity of distractors promotes perceptual grouping, which in turn facilitates their rejection as a perceptual unit, the homogeneity between the target and distractors impairs segmentation, making it harder to distinguish the target from the distractors (Duncan & Humphreys, 1989, 1992). Thus, a line segment was easy to detect when it appeared in isolation but was difficult to detect when it was embedded in a configuration (e.g., Rensink & Enns, 1995). Similarly, visual statistical learning—that is, acquiring information about the frequency of stimulus pairing over successive trials—was easier when an attended stimulus was connected with the other (unattended) member of the pair than when the two stimuli were separated (e.g., Baker, Olson, & Behrmann, 2004). Searching for two features was also more efficient when the target features belonged to a single object or perceptual group rather than to two different objects or perceptual groups (e.g., Goldsmith, 1998; Kahneman & Henik, 1981; Wolfe & Bennett, 1997). Finally, all else being equal, when a target and a probe differed in orientation, search was more efficient when the target was shown in its canonical orientation rather than in other orientations (e.g., Newell, Brown, & Findlay, 2004). These results indicate that object-based attention contributes to both scene perception and information retrieval in long-term memory.
Interestingly, object-based attention has also been found to influence some phenomena that are typically associated with low-level visual processing. Spivey and Spirn (2000) found that observers who viewed two colored gratings that overlapped in space but differed in orientation could selectively adapt to one of the gratings via attention, resulting in a tilt aftereffect in the direction opposite to the attended grating. Using a different paradigm, Mitchell, Stoner, and Reynolds (2004) demonstrated the effect of attention on dominance in binocular rivalry. They showed observers two patterns of dots that rotated in opposite directions. The patterns were projected to both eyes. After attention was cued to one pattern, the image of the cued pattern was removed from one eye while the image of the uncued pattern was removed from the other eye. Since the two eyes were now viewing different images, binocular rivalry occurred. Interestingly, although the dominant pattern shifted between the two eyes, as one would expect during binocular rivalry, it was more likely to be the cued pattern, rather than the uncued pattern. In subsequent experiments, Chong and Blake (2006) showed that in order to counteract the attentional effect of a cued grating on initial dominance in binocularly rivalry, the contrast of the grating had to be reduced by an amount in the neighborhood of 0.3 log-units. Taken together, these findings are consistent with the notion that attention can enhance the early representation of the selected item or its region (Desimone & Duncan, 1995), a topic that I will discuss in more detail later.
Although the majority of the literature on object-based attention demonstrates object-based facilitation, object-based inhibition has also been explored. In a typical experiment that uses the inhibition of return (IOR) paradigm (Posner & Cohen, 1984), a peripheral location is cued, followed by a central fixation and then a target at either the cued location or a new location. Target detection is facilitated at the cued location when the cue-to-target stimulus onset asynchrony is short (e.g., within 300 msec). However, when it is long (e.g., beyond 300 msec), responses to the target are slower at the cued location relative to an uncued location, demonstrating location-based IOR. It has been proposed that the function of IOR is to prevent repeated sampling of locations that have already been searched (Klein, 1988).
Using dynamic displays with moving objects, a number of studies found object-based IOR (e.g., Chou & Yeh, 2008; Gibson & Egeth, 1994; Jordan & Tipper, 1998, 1999; List & Robertson, 2007; Tipper, Driver, & Weaver, 1991; Tipper, Weaver, Jerreat, & Burak, 1994). Tipper et al. (1991) cued attention to a moving object and found that IOR moved with the object to a new location, rather than remaining at the original environmental location. Gibson and Egeth (1994) showed their participants a computer-generated brick that rotated in 3-D and found both location- and object-based IOR. Relative to a control condition in which a cue and a subsequent target appeared at different locations on two different surfaces of the rotating brick, their participants were slower when the cue and target were on different surfaces but at the same environmental location (showing location-based IOR) and when the cue and target appeared on the same surface but at different environmental locations (showing object-based IOR). Similar results were reported in experiments using static displays (e.g., Chou & Yeh, 2008; Jordan & Tipper, 1999; List & Robertson, 2007).
In addition to object-based IOR, object-based inhibition has been demonstrated in the negative priming paradigm. Negative priming refers to the longer RTs to a target on a probe trial (trial n + 1) when that target was a distractor rather than a neutral stimulus on a prime trial (trial n) (Tipper, 1985). In Tipper, Brehaut, and Driver (1990), participants saw stimulus displays that induced the perception of a target and distractor moving through occluding columns (i.e., the movement itself was never in view), with the target emerging a moment later at either the projected location of the distractor or a different location. Negative priming was found when the target on the probe trial emerged at the projected location of the distractor, even though this location was not the environmental location where the distractor was last seen. In other words, inhibition of the distractor did not simply stay at its original location. Instead, it moved with the inhibited object to its new location, despite the fact that the actual movement of the distractor was never seen. Interestingly, negative priming can be eliminated and even become positive priming when the target and distractor are perceptually grouped in the prime display (e.g., Fuentes, Humphreys, Agis, Carmona, & Catena, 1998). Taken together, these results are consistent with the notion of object-based inhibition, suggesting that both facilitation and inhibition can spread across an object’s surface and move with an attended object to its new location.
Object-based attention is not restricted to neurologically intact people. Patients with brain damage have also shown evidence of using an object-based reference frame in visual processing. Driver and Halligan (1991) showed pairs of vertically aligned nonsense shapes to their patient, P.P., who suffered from severe left neglect due to damage in her right temporo-parietal region. The task was to determine whether the pair of shapes, which were centrally presented, were the same or different. Since neglect is primarily a space-based attentional deficit (Bisiach & Luzzatti, 1978), it was not surprising that P.P. performed the task poorly when the shapes differed on the left. Interestingly, when the shapes were tilted 45° to the right, she continued to show poor performance when the shapes differed on their left side even though the differences were now in her intact right side of space. Similar results were reported by a number of other researchers (e.g., Behrmann & Moscovitch, 1994; Caramazza & Hillis, 1990; Driver, Baylis, & Rafal, 1992; Marshall & Halligan, 1994; Young, Hellawell, & Welch, 1992). In all these studies, patients with neglect in their left visual field were less impaired in performance when the critical information was on the right side of the objects, even when the right side of the objects was in their impaired left side of space (cf. Farah, Brunn, Wong, Wallace, & Carpenter, 1990).
A similar pattern of performance can be found in patients with visual extinction, which is a less severe form of neglect confined to a contralesional stimulus when it is presented concurrently with an ipsilesional stimulus. It has been shown that patients with visual extinction can reduce their deficits when the contralesional stimulus is perceptually grouped with the ipsilesional stimulus (e.g., Mattingley, Davis, & Driver, 1997; Ward, Goodrich, & Driver, 1994). Grouping also improves the perceptual impairments of patients with Balint’s syndrome, who typically see only one object at a time. Humphreys and Riddoch (1993) tested two Balint’s patients, whose performance in perceiving multiple objects improved remarkably when different-colored objects were connected by black lines. Other object properties also appear to influence the extent of deficits in brain-damaged patients. Humphreys and colleagues (Humphreys & Riddoch, 2003; Humphreys, Romani, Olson, Riddoch, & Duncan, 1994) found that their patients, who had parietal lobe damage, showed differential degrees of extinction as a function of object type. For example, when pairs of stimuli were shown simultaneously, extinction was more likely to occur with an open geometric shape rather than a closed geometric shape. Remarkably, these patients were often unable to locate the stimulus they had just successfully identified. As Humphreys and his colleagues noted (Humphreys & Riddoch, 2003; Humphreys et al., 1994), these results suggest that when spatial selection was impaired, the grouping strength between the components of an object could influence the probability of an object being selected, with the object-based selection system favoring the object having the stronger grouping. Moreover, the finding that damage in the parietal lobe could impair the explicit representation of space while leaving the implicit coding of location intact suggests that multiple forms of spatial representation exist in the brain, and not all of them can be accessed explicitly.
Factors that influence object-based selection
Most studies modeled after Egly, Driver, and Rafal (1994) have used exogenous (peripheral) instead of endogenous (central) cues to direct attention to a specific location in an object. In general, object effects are more readily demonstrated with exogenous than with endogenous cues. Macquistan (1997) used Egly, Driver, and Rafal’s two-rectangular paradigm but showed one group of participants an exogenous cue and another group an endogenous cue before the onset of the target. Object effects were found with exogenous but not endogenous cues. Similar results were reported by Dagenbach and colleagues (Arrington, Dagenbach, McCartan, & Carr, 2000, November; Dagenbach, Goolsby, Neely, & Dudziak, 1997; Neely & Dagenbach, 1996). These findings led some researchers to question whether endogenous control of object-based attention was possible (e.g., Lauwereyns, 1998; Macquistan, 1997).
For object-based attention to be deployed, a robust object-based representation must be established. Thus, variables that affect the quality of object-based representations also influence the degree to which object-based attention is utilized. One such variable is stimulus presentation duration. Object effects are less reliably elicited with short, relative to long, display durations (e.g., Avrahami, 1999; Chen & Cave, 2008; Law & Abrams, 2002). In Chen and Cave (2008), participants demonstrated object effects when they had 1,005 ms to view a stimulus display before the appearance of a precue. No object effects were found when the viewing time was decreased to 120 ms. Similar effects of display duration were reported by Avrahami, who manipulated the cue-to-target stimulus onset asynchrony (420 vs. 210 ms), and by Law and Abrams (2002), who varied the target display duration across experiments (186 vs. 129 ms). In both cases, object effects were more evident with the long, rather than the short, display duration. However, object effects have also been found with display durations as brief as 50 ms (e.g., Duncan, 1984). Given the diverse durations that have elicited object effects, it seems that the exact stimulus presentation duration may not really matter. Instead, what matters is the quality of object-based representation that a specific duration allows the participants to establish, which can be influenced by a variety of factors, including task demand, stimulus characteristics, and response mode. Consistent with this idea is the finding by Ariga, Yokosawa, and Ogawa (2007), who used a modified version of Egly, Driver, and Rafal’s (1994) two-rectangle paradigm and found no evidence of object-based attention when their participants were not consciously aware of the presented objects (but see Mitroff & Scholl, 2005, for evidence of forming and updating object representations when changes were made to unseen stimuli during motion-induced blindness).
Another factor that contributes to the quality of object-based representation is the “goodness” of an object. All else being equal, a “good” object is one that has surface uniformity and closed boundaries. Thus, object effects are more reliable when objects show uniform connectedness—for example, when objects have the same color and luminance, as compared with various colors or luminance (e.g., Hecht & Vecera, 2007; Kramer & Watson, 1996; Matsukura & Vecera, 2006; Watson & Kramer, 1999), when they have closed rather than open boundaries (e.g., Marino & Scholl, 2005), and when targets appear on the same straight line within an object, rather than on different segments of an object separated by angles (e.g., Crundall, Cole, & Galpin, 2007).
Object effects are also more robust when the perceptual load is low rather than high (e.g., Ho & Atchley, 2009), when the observers are young rather than old (e.g., McCrae & Abrams, 2001), when the motor responses required are grasping rather than pointing (e.g., Fischer & Hoellen, 2004; Linnell, Humphreys, McIntyre, Laitinen, & Wing, 2005; but see Bekkering & Pratt, 2004, for object-based effect with pointing), and when the left rather than the right hemisphere receives object-related information (e.g., Egly, Driver, & Rafal, 1994; Egly, Rafal, Driver, & Starreveld, 1994).
As with display duration, factors that promote the “goodness” of an object are conducive to the deployment of object-based attention, but they are not a necessary condition. Object effects have been obtained in objects without closed boundaries (e.g., Avrahami, 1999, Crundall et al., 2007; Kramer & Jacobson, 1991) or uniform surfaces (e.g., Hecht & Vecera, 2007). Moreover, it has been found in objects created through illusory contours (e.g., Moore et al., 1998) and amodal completion (e.g., Behrmann et al., 1998; Matsukura & Vecera, 2006; Moore et al., 1998; Pratt & Sekuler, 2001; but see also Saiki, 2000, for an alternative interpretation of Behrmann et al., 1998). These results suggest that the formation of an object representation, regardless of the manner through which such a representation is established, is a critical factor in the deployment of object-based attention.
Mechanisms that give rise to object effects
There are three main interpretations regarding the mechanisms that give rise to object effects: sensory enhancement, attentional prioritization, and attentional shifting. The sensory enhancement interpretation emphasizes the spread of attention that respects object boundaries and attributes object effects to the improved sensory representation of the selected object (e.g., Avrahami, 1999; Chen & Cave, 2006, 2008; X. He, Fan, Zhou, & Chen, 2004; Martínez, Teder-Sälejärvi, & Hillyard, 2007; Richard et al., 2008; Roelfsema & Houtkamp, 2011; Roelfsema, Lamme, & Spekreijse, 1998; Valdes-Sosa, Bobes, Rodriguez, & Pinilla, 1998, Vecera & Farah, 1994; Weber, Kramer, & Miller, 1997). The attentional prioritization account (as originally presented) stresses the biasing of attentional scanning order in visual search, which, by default, starts from the locations within an already attended object (e.g., Shomstein & Yantis, 2002, 2004). Finally, the attentional shifting account emphasizes the relatively higher cost of attentional shifts between objects, relative to within an object (e.g., Brown & Denney, 2007; Lamy & Egeth, 2002), and attributes this between-object cost to the additional disengagement operations when attention needs to be disengaged from an object to a location outside that object (Brown & Denney, 2007).
When the object-based attentional effect was first reported, it was explained in terms of selecting either the internal representation of the region of space occupied by an attended object (e.g., Kim & Cave, 1995, 2001; Kramer et al., 1997) or the internal representation of a location-independent object (e.g., Vecera & Farah, 1994). Vecera and Farah referred to these two types of selection as grouped-array and spatially invariant representations, respectively. In both cases, it is assumed that the spread of attention respects object boundaries and that attention improves the quality of the perceptual representation of the selected item. The attentional enhancement is likely to be the result of biased competition (Desimone & Duncan, 1995) among neural representations of multiple objects, causing the representation of the attended object to become more effective in its competition with the representations of the other, unattended objects. The selection of the attended object, in turn, leads to faster and/or more accurate processing of the features or items within the object, as compared with those in the nonselected objects.
In addition to neurons in V1, the target enhancement effect has also been reported with motion-sensitive neurons in the middle temporal area (MT) of monkeys. Wannig et al. (2007) cued monkeys to attend to one of two transparent random-dot surfaces and found that the motion of the attended surface activated the neurons in MT more strongly than the motion of the unattended surface, even though the two surfaces occupied the same spatial region. These results provide a direct link between attention to an object or surface and increased neural activation of the representations of the selected object or surface in early sensory areas.
Changes in neuronal responses have also been observed in experiments using event-related brain potentials (ERPs). Valdes-Sosa et al. (1998) showed their observers stimulus displays consisting of two sets of superimposed dots that differed either in both color and the direction of motion, thus creating the perception of two transparent surfaces in rigid rotation (the two-object condition), or in color but not in the direction of motion, thus creating the perception of one object either at rest or rotating in the same direction (the one-object condition). The participants judged the direction of motion of a subset of the dots (defined by color) that simultaneously underwent brief linear displacements (i.e., nonrotational motion). Their motion-onset ERPs were recorded while the target dots changed locations. Motion-onset posterior P1 and N1 components were found to be associated with both the attended and the unattended sets of dots in the one-object condition, but with only the attended set of dots in the two-object condition. In the latter case, a strong suppression of P1 and N1 was observed with the unattended object. These findings are consistent with the notion that object effects are the result of changes in the neural representations of the selected object. They also suggest that both the enhancement of the attended object and the suppression of the unattended object may play a role in the observed object effects. Results in support of the sensory enhancement account can also be found in a number of other ERP experiments, including X. He et al. (2004, 2008), Martínez et al. (2007; Martínez et al., 2006), and Weber et al. (1997). Despite the differences in their methodology (e.g., using exogenous or endogenous cues or a postdisplay probe to measure the distribution of spatial attention), a common finding is that object-based attention is associated with an enhanced N1 component over the occipito-temporal areas (but see Weber et al., 1997, for a larger N1 amplitude in the different-object condition than in the same-object condition).
Experiments using functional magnetic resonance imaging (fMRI) have provided converging evidence in support of the sensory enhancement account (e.g., Arrington, Carr, Mayer, & Rao, 2000; Martínez, et al., 2006; Müller & Kleinschmidt, 2003; O’Craven, Downing, & Kanwisher, 1999). O’Craven et al. showed their participants semitransparent images of a face and a house that were spatially superimposed. On each trial, either the face or the house would move while the other remained stationery. The participants attended to the face, the house, or the motion in different conditions. The results showed that attention to one attribute (e.g., the face) led to an enhanced blood oxygenation level dependent (BOLD) signal change not only in the brain area associated with the processing of that attribute (i.e., the fusiform face area, which is involved in the processing of faces), but also in the brain area associated with the processing of the task-irrelevant attribute (i.e., the MT/MS area for motion) that belonged to the attended object rather than the unattended object. The finding that neural activation pertaining to a task-irrelevant attribute differs as a function of whether that attribute was part of an attended or an unattended object supports the notion that attention leads to enhanced neural representations of all the attributes that belong to the selected object regardless of task relevancy. Arrington, Carr, et al. (2000) further showed that attending to a region of space bounded by an object evoked stronger brain activity, as compared with attending to an empty space not bounded by any object. This result indicates that object-based spatial selection requires additional mental resources over and beyond location-based spatial selection. It should be noted, however, that the results above do not entail that the degree of enhanced activation is equivalent in all the regions of the selected object. In fact, Müller and Kleinschmidt (2003), whose study I will describe in more detail in the next section, found a larger increase in BOLD signal activation at the cued location than at uncued locations in early visual areas (V1–V4). A similar finding was reported by Martínez et al. (2006) in an ERP experiment where they observed a smaller N1 amplitude associated with object-based attention than with space-based attention.
Shomstein and Yantis (2002) noted that many experiments that demonstrated object effects required observers to shift attention from one location to another within a trial (e.g., Chen, 1998; Egly, Driver, & Rafal, 1994; Moore et al., 1998). If the default scanning in visual search is to start from locations within an already attended object, this would result in the uncued locations of the attended object being searched before any locations of the unattended object, and this, in turn, would lead to reduced RTs and/or increased accuracy when the target appears in the same object, relative to a different object. In other words, object effects can be the result of attentional prioritization in visual search, rather than the result of attentional spread that respects object boundaries.
A key prediction of the attentional prioritization account is that object-based attention would not be deployed when the location of a target is known in advance, since the positional certainty of the target would eliminate the need for search and would result in the target location being allocated the highest attentional priority. Although this prediction was confirmed in Shomstein and Yantis (2002), other studies have shown object effects when the location of the target was known in advance (e.g., Chen & Cave, 2006, 2008; Harms & Bundesen, 1983; Kim & Cave, 2001; Kramer & Jacobson, 1991; Richard et al., 2008). Chen and Cave (2006) used an experimental paradigm similar to that used in Shomstein and Yantis (2002, Experiments 1–4), where the target always appeared at the center of a cross-like configuration. While no object effect was found when participants saw the full cross-like configuration on every trial, it was observed when that configuration appeared on only some of the trials, with the rest of the trials consisting of displays that showed only one or two of the three rectangles. These results are inconsistent with the attentional prioritization account. Chen and Cave (2006) suggested that mixing the partial displays with the full displays prompted the participants to perceive the stimulus pattern as separate objects, rather than as a single configuration (e.g., a cross). Since subjective organization of a stimulus pattern is known to affect the deployment of object-based attention (e.g., Albrecht et al., 2008; Chen, 1998; Li & Logan, 2008), these results suggest that the key factor in the lack of an object effect in Shomstein and Yantis (2002) may be the perceived structure of the stimulus configuration, rather than the lack of need for visual search.
Object effects with positional certainty have also been found in Harms and Bundesen (1983), Kim and Cave (2001), and Kramer and Jacobson (1991). These studies all showed that grouping influenced the allocation of attention despite the fact that the target appeared at a known location on every trial. In addition, the observers in Chen and Cave (2008) responded faster to letters located at the two ends of a single object, relative to two ends of different objects, even though in both cases the onset of the targets was preceded by an endogenous central cue of 100 % validity. Richard et al. (2008) used a flanker interference paradigm with a centrally located target and found object effects when the target was a part of an object (i.e., belonged to the object), but not when it was a letter sitting on top of a rectangle. On the basis of their results, Richard et al. proposed that the key factor in obtaining object-based attention under the condition of positional certainty was the perception of the task-relevant feature as an integral part of an object shape, rather than as something perceptually segregated from the object shape. It should be pointed out, however, that this interpretation did not explain why object effects were observed in other studies where the task-relevant feature was clearly not an integral part of an object shape (e.g., Chen & Cave, 2006, 2008). Regardless of what induced the object effects found in Richard et al., the finding of an object effect even when the location of the target was known in advance suggests that object effects are not just a by-product of the order in which different regions of a scene are visited during visual search.
To date, the strongest physiological evidence supporting the within-group spread of attention has come from several recent studies by Roelfsema and colleagues (e.g., Roelfsema & Houtkamp, 2011; Wannig, Stanisor, & Roelfsema, 2011). In one experiment by Wannig et al. (2011), monkeys were shown displays that consisted of two target bars and two distractor bars. The task was to fixate on a fixation point, to wait for a dot to appear at one of the target bars, and upon the offset of the dot, to make a saccade to the indicated target bar. The researchers simultaneously recorded the responses of V1 neurons from two sites: site 1 for those neurons whose receptive field was on one of the target bars, and site 2 for those neurons whose receptive field was on one of the distractor bars. The results show that the appearance of the dot in the receptive field of site 1 triggered not only an increase in activity in those neurons whose receptive field was in site 1, but also an increase in those neurons whose receptive field was in site 2. Furthermore, the activity of the neurons was stronger when site 2 was on a distractor bar collinear to the target bar (i.e., in the same perceptual group), rather than when the two bars were not aligned collinearly (i.e., in different perceptual groups). Similar results were observed for perceptual groupings based on color or common fate. Since site 2 was on a distractor bar, these results provided direct evidence that attention could spread to task-irrelevant stimuli outside the focus of attention and that the attentional enhancement was greater when these stimuli were bound to the attended stimulus through one or more Gestalt grouping principles.
More recently, Drummond and Shomstein (2010) suggested that in addition to search order, attentional prioritization can also be the result of a parallel search process where information at different locations of a configuration is extracted at different rates according to attentional priority and that attentional prioritization can affect the quality of the sensory representation of an attended object. In this revised model, there is little difference between the attentional prioritization account and the sensory enhancement account.
As was mentioned earlier, object effects have also been explained in terms of the relative cost of shifting attention within an object versus between objects (Brown & Denney, 2007; Lamy & Egeth, 2002). Lamy and Egeth used a variant of Egly, Driver, and Rafal’s (1994) two-rectangle paradigm and asked their participants to perform tasks that either required or did not require shifts of attention. Object effects were found in the former but not in the latter. For example, the participants demonstrated an object effect when the task was to detect the presence of a target, and its onset was preceded by a precue indicating the most likely location of the target. In contrast, there was no evidence of an object effect when the task was to judge the size of two simultaneously presented targets whose onsets were not preceded by a precue. Lamy and Egeth interpreted these results in the context of required attentional shifts within a trial (cf. Drummond & Shomstein, 2010). Whereas the precue in the detection task encouraged the participants to switch attention from the cued location to the target location, the simultaneous onset of a pair of targets with no precue in the size judgment task induced the participants to adopt a diffuse attentional window without the need to switch attention. Since shifting attention between objects is more difficult and, therefore, has a higher cost than shifting attention within an object, object effects are typically found in cuing paradigms where the location of the target is uncertain and attentional shifts are required within a trial.
Taken together, the available evidence suggests that there is substantial flexibility in how attention is distributed within an object and how fully a stimulus configuration is segregated into objects. Many factors, including those reviewed in the previous section (e.g., attentional focus, “goodness” of an object, etc.) and the probability of the target’s appearing within a cued versus an uncued object, all influence the deployment of object-based attention. Attention spreads within an object, yet the spreading of attention is not necessarily automatic. Furthermore, although object segregation may often be triggered spontaneously, it is not an automatic process. Object segregation and/or object-based allocation of attention as a result of object segregation can all be subject to strategic control (Yeari & Goldsmith, 2010).
The role of space in object-based selection
Although many studies have shown that location plays a special role in selective attention (e.g., Cave & Pashler, 1995; Chen, 2009; Kim & Cave, 1995; Tsal & Lavie, 1993; for reviews, see also Cave, in press; Lamy & Tsal, 2001), the role of space in object-based selection is not straightforward. Whereas some studies have reported results consistent with object-based attention selecting a location-independent representation, where attention selects the features of an attended object, such as its shape, color, orientation, and texture, without selecting its spatial location (e.g., Awh, Dhaliwal, Christensen, & Matsukura, 2001; Matsukura & Vecera, 2011; O’Craven et al., 1999; Vecera & Farah, 1994), other studies have found that object-based attention selects from a location-mediated representation, where attention selects the regions of space occupied by the attended object (e.g., Arrington, Carr, et al., 2000; Kim & Cave, 1995; Kramer et al., 1997; Martínez et al., 2007; Martínez et al., 2006; Müller & Kleinschmidt, 2003; Valdes-Sosa et al., 1998; Vecera & Farah, 1994; Weber et al., 1997). As was mentioned earlier, Vecera and Farah (see also Vecera, 1994) referred to these two types of selection as spatially invariant and grouped-array selection, respectively.
The first study to distinguish between these two types of selection was conducted by Vecera and Farah (1994), who used a variant of Duncan’s (1984) bar-on-box paradigm. Participants saw displays that consisted of a bar and a box that were either superimposed at fixation (the superimposed condition) or positioned in separate spatial locations on the left or right of fixation (the separated condition). The task was to report two features that belonged to the same object or to different objects. Vecera and Farah reasoned that selection from a location-invariant representation would result in an object effect of comparable magnitude from both the superimposed and separated conditions. In contrast, selection from a location-mediated representation would lead to a larger object effect in the separated condition than in the superimposed condition. Implicit in this reasoning was the assumption that the cost of switching attention between objects would increase with their spatial separation (see Kramer et al., 1997, for arguments against this assumption; but see also Vecera, 1997, for counterarguments). The results showed that the object effects were comparable in the superimposed and separated conditions. Moreover, in a subsequent experiment where the task was stimulus detection instead of feature identification, a larger object effect was observed in the separated condition than in the superimposed condition. On the basis of these results, Vecera and Farah concluded that object-based attention could select from both location-independent and location-mediated representations and that the level of selection in a specific task depended on the nature of the representations required by the task.
Kramer et al. (1997) later challenged these conclusions. In two experiments, they measured observers’ object-based deployment of attention and their distribution of spatial attention within the same paradigm. Observers saw a bar and a box that were either superimposed or separated in space. To hold visual acuity constant across the two conditions, the bar and the box in the superimposed condition were displayed on the left or right side of fixation, with a filler on the other side. In addition to reporting object features that were part of the same or different objects, the observers, on a small number of trials, were also required to detect the presence of a small probe when it appeared immediately after the offset of the object display. These postdisplay probe trials were included to measure observers’ distribution of spatial attention (Kim & Cave, 1995). A larger object effect was found in the feature identification task in the separated condition, as compared with the superimposed condition. Moreover, RT to the probe was shorter when it appeared at the location of the object that possessed both of the target features, rather than at the location of the object that possessed neither of the target features. These results suggest that the location of the attended object was selected even when the task was feature identification. Importantly, a similar probe RT result was observed in a subsequent experiment, where Kramer et al. (1997) placed the objects in the superimposed condition at fovea, as in Vecera and Farah’s (1994) original study, and replicated the latter’s results of comparable object effects in the superimposed and separated object conditions. Taken together, these results support the notion that object-based attention is accompanied by the selection of the internal representation of an object’s location.
Several other studies have reported findings consistent with a location-mediated selection of object-based attention. Using a paradigm that involved moving objects, Lamy and Tsal (2000) found attentional effects both at the old location of a precue (i.e., the cued location of an object before it started to move) and at the new location that followed the moving object. Similarly, O’Grady and Müller (2000) reported increased target detectability at all the locations along the contour of a cued object, relative to an uncued object. Müller and Kleinschmidt (2003) measured their participants’ BOLD signals during a gap discrimination task in an fMRI study. The participants saw displays that consisted of wrench-like objects. A central cue, which was several seconds long, indicated the most likely location of the target. As in a typical experiment on object-based attention, the target could appear at the cued location (the valid condition) or at an uncued location either on the cued object (the invalid same-object condition) or on the other object (the invalid different-object condition). Both space and object effects were found in RTs. Moreover, participants showed an increase in BOLD signal activation in response to the cue in early visual cortical areas (V1–V4) at the retinotopic representations of not only the cued location relative to the uncued locations, but also the uncued location of the same object relative to that of a different object. These results were in line with the findings of Roelfsema and colleagues (Roelfsema et al., 1998; Wannig et al., 2011), who showed object-based modulations of neuronal responses in V1. The fact that object-based attention modulated neural activation in the early visual areas provides evidence that attending to an object entails the selection of that object’s location.
A similar conclusion was reached by Weber et al. (1997) in an ERP study. Observers saw two partially overlapping objects on either the left or right side of fixation. The task was to judge whether a prespecified color/shape conjunction was present in the display. The task-relevant features, if present, were on either a single object (the same-object condition) or two different objects (the different-object condition). On some trials, a task-irrelevant small probe would appear after the offset of the target display. These probe trials did not require overt responses. However, the participants’ ERPs in response to the onset of the probe were measured. The results most relevant here were the findings from the probe trials. When the probe appeared at the location previously occupied by objects that contained the target features, a larger P1 was found in the same-object than in the different-object condition. Since P1 is known to indicate the distribution of spatial attention (e.g., Hillyard et al., 1996; Luck, Heinze, Mangun, & Hillyard, 1990), this result, together with the results from other ERP studies (e.g., Martínez et al., 2007; Martínez et al., 2006) and fMRI and single-cell recording studies (e.g., Müller & Kleinschmidt, 2003; Roelfsema et al., 1998), provides physiological evidence supporting the location-mediated selection of object-based attention.
Matsukura and Vecera (2011) recently proposed that a spatially invariant representation could occur under conditions when objects were clearly segregated. They showed participants displays that consisted of a bar superimposed on a box. Object effects were found when attention could be directed to a specific object or objects in advance (e.g., when participants knew one or both of the to-be-reported features before the onset of the object display), but not when the knowledge of the to-be-reported features was withheld until after the offset of the object display. However, when the bar and the box were shown in different colors at separate spatial locations, object effects were observed in the absence of advance knowledge of the to-be-reported features when the objects were in view. Furthermore, the magnitude of the object effect was not influenced by the extent of spatial separation between the objects (2.48° vs. 5.24°). On the basis of these results, Matsukura and Vecera concluded that object-based attention could select from space-invariant representations so long as the objects in question could be easily individuated. However, caution should be taken in interpreting these results, for there is evidence that spatial attention does not necessarily shift in an analog fashion (e.g., C. W. Eriksen & Murphy, 1987; Yantis, 1988). Perhaps the role of space in object-based selection is best illustrated in a recent study by Hollingworth, Maxcey-Richard, and Vecera (2012), who found interaction between space- and object-based attention within the same experimental paradigm. Consistent with the notion that there are linkages between lower-level spatial representations and higher-level spatially invariant representations at multiple levels of selection (Di Lollo, Enns, & Rensink, 2000; Hochstein & Ahissar, 2002; Roelfsema & Houtkamp, 2011; van der Velde & de Kamps, 2001), Hollingworth et al. showed that whereas spatial attention forms a gradient across an attended object, the spread of this gradient is constrained by the boundaries of the object.
Since Duncan’s (1984) seminal study, many advances have been made regarding the mechanisms that underlie the selection of visual attention. It is now generally accepted that attention selects the internal representation of both space and object, that space- and object-based attention interact, and that they are often evoked within the same visual scene. Object-based attention is frequently but not mandatorily deployed, and there are many factors that influence object segmentation. When object-based attention is deployed, it typically acts via the selection of an object’s location, resulting in enhanced quality of the sensory representation of the selected object and more efficient processing of the features that belong to that object. It is important to recognize that although this tutorial emphasizes object-based selection, attention can also select features and surfaces in addition to space. Our visual system uses different types of attention to give us a unified view of the world.
Due to space constraints, the literature on feature-based attention is not included in this review. Feature-based attention refers to the enhanced sensitivity to a feature value (e.g., a specific orientation, color, or motion direction) similar to an attended feature value regardless of whether the former is at the attended location or belongs to the attended object (see Maunsell & Treue, 2006, for a review). For example, Treue and Martínez-Trujillo (1999; Martínez-Trujillo & Treue, 2004) showed that attending to a specific motion direction at one location enhanced the gain of MT neurons selective to the attended direction even though the receptive fields of the affected neurons were in the opposite visual hemifield. In addition to motion, feature-based attention has been found in several other feature dimensions, including spatial frequency, orientation, and color, and the attentional effects have been demonstrated in both physiological and psychophysical studies (e.g., Arman, Ciaramitaro, & Boynton, 2006; Liu, Larsson, & Carrasco, 2007; Roelfsema, Khayat, & Spekreijse, 2003; Rossi & Paradiso, 1995; Sàenz, Buračas, & Boynton, 2002, 2003; Shulman & Wilson, 1987; White & Carrasco, 2011). Although feature-based attentional effects can contribute to object-based effects and vice versa under certain experimental conditions, these two types of effects can be dissociated (see Wannig et al., 2011). Whereas feature-based attentional effects are not limited to a perceptual object or group, object-based attentional effects are confined to the attended object or perceptual group.
I thank Kyle Cave, Morris Goldsmith, Pieter Roelfsema, and Jeremy Wolfe for their helpful comments on an earlier version of the manuscript.