Category-specific learned attentional bias to object parts
Humans can selectively attend to information in visual scenes. Learning from previous experiences plays a role in how visual attention is subsequently deployed. For example, visual search times are faster in areas that are statistically more likely to contain a target (Jiang and Swallow in Cognition, 126(3), 378–390, 2013). Here, we examined whether similar attentional biases can be created for different locations on complex objects as a function of their category, based on a history of these locations containing a target. Subjects performed a visual search task in the context of novel objects called Greebles. The target appeared in one half (e.g., top) of the Greebles 89 % of the time and in the other half (e.g., bottom) 11 % of the time. We found a reaction time advantage when the target was located in a “target-rich” region, even after target location probabilities were equated. This indicates that attentional biases can be associated not only with regions of space but also with specific object features, or at least with locations in an object-based frame of reference.
KeywordsLearned attention Probability cuing Attention in complex objects
The world contains a great deal of visual information that must be selectively filtered for further processing. Theories of attention have often presented a dichotomy between top-down goals (Folk et al., 1992) and bottom-up perceptual salience (Theeuwes, 1991, 1994). However, the deployment of attention can also be affected by previous experience and response histories (Awh, Belopolsky, & Theeuwes, 2012), such that we may learn to attend to bottom-up information that consistently facilitates top-down goals. For instance, implicit learning of regularities in the structure of scenes guides spatial attention during visual search (Chun & Jiang, 1998). This ability to abstract regularities from the environment, or statistical learning, can influence how attention is deployed (Saffran, Aslin, & Newport, 1996; Fiser & Aslin, 2001; Zhao, Al-Aidroos, & Turk-Browne, 2013).
One form of statistical learning is probabilistic cuing, wherein attention is implicitly drawn to areas of the visual field that have a higher probability of containing behaviorally relevant information. Geng and Behrmann (2005) used probability cuing in a task in which a target object could appear in one of four locations. The target was in one of the locations 75 % of the time but in one of the other locations a total of 25 % of the time. Subjects were faster and more accurate to detect targets in the high probability area compared to low probability areas. In addition, interference from distractors was reduced in the high probability location. Other studies using probabilistic cuing have demonstrated that these attentional biases persist for several days and remain for several hundred trials after the probabilities are equalized (Jiang, Swallow, Rosenbaum, & Herzig, 2013b). In these studies, the spatial bias was acquired rapidly in a short training session, indicating that probabilistic cuing is a powerful way to direct spatial attention to frequently selected locations.
One question is whether these spatial attentional biases are framed relative to the viewer or the external environment. Viewer-centered frames are low in computational demands but are relatively unstable because they are sensitive to changes in movement and viewpoint (Marr & Nishihara, 1978). In contrast, environment-centered frames are more stable to movement changes but are more computationally expensive. Jiang, Swallow, Rosenbaum (2013a) found that after acquiring a bias to attend to one quadrant of space, subjects who were reseated so they were seeing the screen from another viewpoint switched their bias to a previously sparse quadrant, demonstrating a viewer-centered frame of reference. This is consistent with other work showing that contextual cuing is also viewer-centered (K. P. Chua & Chun, 2003).
Since attentional biases acquired during probabilistic cuing are long lasting and persistent to statistical changes, a spatial bias acquired in one task could generalize to another task. However, recent results suggest that such transfer may not occur: A bias to attend to a region in space induced by probabilistic cuing did not transfer to a foraging task (Jiang, Swallow, Won, Cistera, & Rosenbaum, 2015). It is possible that spatial biases do not transfer because space must be shared for all manner of tasks (e.g., attending to the bottom right is relevant in typing, cooking, and opening doors). Therefore, generalized spatial biased may be counterproductive.
Although spatial aspects of different tasks may be uncorrelated, different tasks that use the same objects could depend on a similar set of features. Attention may be drawn to certain object features (e.g., the eyes of a face) to discriminate them from other objects in that category but also to get information about eye gaze or emotional expression. To the extent that a category is associated with several tasks for which the same spatial biases are helpful, or at least not incompatible, a category-specific but task-general attentional bias could develop.
One open question is whether learned attentional biases can occur within objects. When categorizing complex objects, information and features that are diagnostic can be prioritized. For example, when learning to categorize different types of fish that varied in the shape of the tail or mouth, features useful for categorization are selectively attended (Sigala & Logothetis, 2002), resulting in a “stretching” of the relevant dimension in a multidimensional category space that increases perceptual discrimination along that dimension. A recent study using similar stimuli cued attention near different parts of the fish and found a reaction time advantage when the cue was spatially closer to features crucial for identification (Baruch, Kimchi, & Goldsmith, 2014), indicating that spatial attention can be drawn toward diagnostic object features. Likewise, in previous work, we found that attentional biases could develop to specific parts of faces (K. W. Chua, Richler, & Gauthier, 2014) and novel objects (K. W. Chua, Richler, & Gauthier, 2015) due to their history of being useful for individuation. In these studies, subjects were trained to individuate faces or Greebles wherein one half contained most of the information diagnostic for identification. When later asked to selectively attend to just part of those objects, subjects could not ignore parts that were previously diagnostic.
Here, we ask if spatial attention can be learned in a category-specific manner (e.g., learning to attend to the top of an object) without requiring object individuation. In the fish experiments mentioned previously (Baruch et al., 2014), attention was drawn to features crucial for object recognition (see Rehder & Hoffman, 2005a, b). In the Greeble experiments, a history of finding information relevant to individuation in an object part made it harder to ignore (K. W. Chua et al., 2015). Here, we ask if learned attentional biases to object parts can occur when the object is not relevant for the task, whether these biases generalize to other objects of the same category, and whether they persist once probabilities are equated, as in viewer-centered probability cuing.
To investigate these questions, we used probability cuing with two Greeble categories. Subjects had to detect a valid “T” among distractors and were asked to indicate what direction the head of the T was pointing. Critically, the target appeared in the top half of one Greeble category 89 % of the time and the bottom half of the other Greeble category 89 % of the time. In the second half of the experiment, we equated target location probabilities for all object halves and examined whether target detection remained faster in object regions with a history of high target probability.
Twenty-one subjects participated in Experiment 1 (8 male, 13 female, mean age = 20.1 years). Sample size was determined based on a power analysis using the effect size from a previous probabilistic cuing study (Cohen’s d = 1.6; Jiang et al., 2013a), aiming for power greater than 0.90 with alpha = 0.05 (two-tailed). Subjects received class credit. The study was approved by the Vanderbilt University IRB.
On each trial, subjects saw a single Greeble. A valid sideways “T” and a slightly offset “T” were superimposed on the top and bottom halves of the Greeble after 0.5 seconds. This 0.5-second latency period gave subjects time to scan the features of the Greeble before the target appeared. Both “T” shapes were displayed in a darker gray than the Greeble. The task was to press the left or right arrow key to indicate the direction the head of the valid “T” was pointing. A beeping noise was played for incorrect answers. There were 1,152 trials with four blocks of 288 trials.
Critically, the valid “T” appeared in one half of one Greeble category 89 % of the time (e.g., the top of Glips) and in the other half of the other Greeble category (e.g., the bottom of Ploks) 89 % of the time (part assignment counterbalanced). For the first half of the experiment (576 trials; blocks 1 and 2), subjects saw Greebles with this probability asymmetry. In the second half of the experiment (blocks 3 and 4), the target probabilities were equated to assess if the attentional bias would persist. Target-rich locations were defined as the areas in each Greeble where the target was most likely to appear (89 %), and sparse locations were defined as the areas where the target appeared less often (11 %). Importantly, richness is defined through a combination of category membership and object-specific location (e.g., the tops of Ploks and the bottom half of Glips, wherever they appear on screen), so it is unlikely that there was a bias based on screen position.
The Greeble could appear in one of nine locations on a 3 × 3 grid that spanned 1,200 × 1,200 pixels in the center of the screen. Positions were randomized on each trial to minimize any attentional bias due to screen position. Note that there is an overall screen-based bias because targets were on average higher (or lower) on the screen for one category. However, the target distributions for the two categories overlapped greatly, and no location had greater probability when category was not taken into consideration.
Subjects were as accurate when the target was in the rich half (94.4 %) versus the sparse half (93.3 %), p = .33, ηp 2 = 0.04. Our analyses therefore focus on mean correct response times. Trials with RTs faster than 200 ms and slower than 2,000 ms were excluded (0.006 % of trials).
Experiment 1 Discussion
Probabilistic cuing is a powerful means of directing attention to areas of space (Jiang et al., 2013a). Most studies to date have focused on these spatial biases in an environment-based frame of reference. Here, attention was drawn to target-rich parts of complex objects. This bias started in the second block and persisted into the third block, providing evidence that the attentional bias lasted several hundred trials after the probabilities were equated. However, by the fourth block, there was no evidence of any attentional bias. These results are similar to Jiang et al. (2013b), who found that a bias to attend to rich quadrants of space lasted for a few hundred trials before being extinguished.
Because we used 20 Greebles from each category, it seems reasonable to assume that the effect was associated with the categories and not specific objects. To provide a more direct test of this interpretation, we conducted Experiment 2, which differed from Experiment 1 in three ways. First, we used different sets of objects during the first two blocks where probabilities were asymmetric and the last two blocks where probabilities were equated. If the advantage for the target-rich half persists even after exemplars are changed, we will have evidence that the learned attentional bias is associated with features that define the two Greeble categories. Second, there was sufficient variability among subjects in Experiment 1 that we wondered if this was due to variability in subjects noticing that there were two discrete categories of objects. Could we maximize learning by ensuring that subjects knew there were two Greeble categories? Would this produce biases that persist until Block 4? To this end, we included a short categorization task before the visual search task. Finally, to encourage learning to begin as early as possible, we encouraged accuracy in the visual search task using an aversive timeout following incorrect answers.
Twenty-three subjects participated in Experiment 2 (10 male, 13 female, mean age = 18.9 years) for class credit. The sample size was based on the same power analysis as Experiment 1. The study was approved by the Vanderbilt University IRB.
For each trial, subjects saw either a Plok or Glip. They were instructed to press “p” if the Greeble was a Plok or “g” if it was a Glip. Subjects learned through corrective feedback, and all subjects completed 60 trials.
Probabilistic Cuing Task
The procedure was the same as in Experiment 1 except that 20 new, unique Ploks and Glips were used as stimuli starting in Block 3. Additionally, when subjects made an incorrect response, they saw a feedback screen for 1,500 ms that encouraged them to answer as accurately and quickly as possible.
Subjects were more accurate when the target was in the rich half (95.7 %) as opposed to the sparse half (94.8 %), F(1, 22) = 13.68, p = .001, ηp 2 = 0.13. We again focused our analyses on mean correct reaction times. Trials with RTs faster than 200 ms and slower than 2,000 ms were excluded (0.009 % of trials).
Experiment 2 Discussion
In Experiment 2, we replicated the basic finding in Experiment 1, with an attentional bias developing in the second block of the asymmetric probability phase. This bias lasted until the third block and was extinguished in the fourth. New exemplars were introduced in Block 3, when the probabilities were equalized. The attentional bias persisted even in these conditions, suggesting that the bias was not limited to the exemplars encountered during the first half of the experiment, but it transferred to new exemplars based on category membership.
Subjects categorized the two categories of Greebles, but the explicit categorization task before the probability cuing seemed to have little effect on the overall pattern of results. Previous studies of probability cuing have made the distinction between whether the learning that takes place during these paradigms is implicit or explicit (Jiang, Won, & Swallow, 2014), so it is possible that subjects had detected the probability asymmetry and were using a more explicit, top-down strategy when performing the visual search. To test this possibility, we conducted Experiment 3. Subjects performed two blocks of the probability cuing task with the probability asymmetry (89 %/11 %). Afterwards, they performed a posttest assessment wherein a Greeble was shown on screen and subjects were to predict whether a target would appear on the top or bottom. Performance above chance in this task would indicate some explicit knowledge of the probability manipulation. Additionally, we were interested in whether performance on this posttest might be predictive of the magnitude of the attentional bias to target rich parts.
Forty-five subjects participated in Experiment 3 (15 male, 30 female, mean age = 18.6 years) for class credit. This sample size was chosen to allow detection of correlations between posttest accuracy and the attentional bias of 0.4 or above, with 80 % power at an alpha of .05. The study was approved by the Vanderbilt University IRB.
Probabilistic Cuing Task
The probability cuing task was identical to the first half of Experiment 1. The task consisted of 576 trials with the probability asymmetry in two blocks of 288 trials. As before, the valid “T” appeared in one half of one Greeble category 89 % of the time (e.g., the top of Glips) and in the other half of the other Greeble category (e.g., the bottom of Ploks) 89 % of the time (part assignment counterbalanced).
Following probability cuing, subjects performed a short task to assess explicit knowledge of the category-specific probability asymmetry. On each trial, they saw a Greeble appear on screen in one of the nine positions used for the probability cuing task. Subjects were instructed to indicate whether they thought the target was more likely to appear in the top or bottom half. Accuracy was assessed by whether subjects chose the Greeble half that was target rich during the probability cuing phase. Greebles appeared in each of the nine possible areas twice for both categories, resulting in a total of 36 trials.
Probability Cuing Results
Subjects were more accurate when the target was in the rich half (95.7 %) as opposed to the sparse half (94.8 %), F(1, 22) = 13.68, p = .001, ηp 2 = 0.13. We again focused our analyses on mean correct reaction times. Trials with RTs faster than 200 ms and slower than 2,000 ms were excluded (0.003 % of trials).
Mean accuracy in the posttest was 52.7 % (ranging from 36 % to 94 %), which was not significantly different from chance (p = .072). As a group, the subjects were unable to predict where the target would appear above chance, although it is clear from the range that some subjects may have. Only five subjects performed above chance according to a one-tailed sign test (greater than 66 %). The mean reaction-time advantage (sparse–rich) for the subjects with above chance performance was 33.4 ms in Block 1 and 61.2 ms in Block 2. With these subjects removed, the mean reaction time was 21.2 ms in Block 1 and 18.6 ms in Block 2. Critically, even with the subjects who did not have knowledge of the manipulation, the extent of the reaction time bias was significant in Block 1, F(1, 39) = 17.1, p = .0002, ηp 2 = 0.30, and Block 2, F(1, 39) = 7.4, p = .001, ηp 2 = 0.16, suggesting that the bias was not driven by those with explicit knowledge of the manipulation.
Additionally, the mean reaction time advantage in Blocks 1 and 2 was similar (22.3 ms in Block 1, 21.7 ms in Block 2). However, there was no correlation between the reaction time advantage in Block 1 and Block 2 (r = -.15, p = .33), suggesting little relationship between the reaction time advantages within individual subjects over the course of the experiment.
Experiment 3 Discussion
With Experiment 3, we were interested in the possibility that explicit knowledge could have an impact on statistical learning effects. Mean performance was no different from chance, although it was clear that some subjects had explicit knowledge of the probabilities of target location. Five out of 45 subjects performed significantly above chance after 576 trials of the 89 %/11 % probability regimen, and these subjects showed reaction time advantages for rich areas greater that were greater than average, suggesting that explicit knowledge may contribute to the effect. By the second block, there was a positive relationship between posttest performance and the reaction time advantage for cued locations.
However, there is also evidence that most subjects had little explicit knowledge and nonetheless showed cuing effects. Therefore, despite very large cuing probabilities (higher than those in prior spatial cuing studies; e.g., Jiang et al., 2013a), we also find evidence that the present effects are to some extent implicit.
Previous studies demonstrated spatial attentional biases elicited by probability asymmetries in visual search (Jiang, et al., 2014). Here, we demonstrate for the first time an attentional bias to regions of complex objects that frequently contained targets in the past. These biases extended beyond the trained exemplars and therefore had to depend on features that define category membership (such as the body shape, part orientation, and/or texture). Importantly, this bias was category specific and was not associated with a specific location on the screen because the rich locations were defined by a combination of category membership and object-specific location (e.g., the top of Glips and the bottom of Ploks, wherever they were shown on the screen).
Accordingly, this work expands upon the frames of reference on which spatial biases can operate. Previous studies elicited a bias to certain regions in a viewer-centered frame of reference, whereas in the current study, attention was drawn to specific features in an object-based frame of reference. We should note, however, that we would not expect this to be a viewpoint-dependent frame of reference that would rotate with the object, because learning with complex objects is viewpoint specific (e.g., Ashworth, Vuong, Rossion & Tarr, 2008; Rossion & Curran, 2010). Therefore, the present effects could represent a mixture of an object-based frame of reference and viewpoint-specific object representations.
The kind of implicitly learned attention that inspired this work, probability cuing (Geng & Behrmann, 2002), can be relatively long lasting, but recent work demonstrated that the bias is actually viewer centric (Jiang & Swallow, 2013), which led the authors to suggest that probability cuing may be too egocentric to play a valuable role in everyday tasks. More recent work by the same group revealed that probability cuing is also task specific: It does not transfer between a visual search and foraging task (Jiang, Swallow, Won, Cistera, & Rosenbaum, 2015).
Given that there is not one set of spatial biases in an egocentric frame of reference that applies to all tasks, it is reasonable that attentional mechanisms influenced by probability cuing would adapt to the statistics of the world. Jiang et al., (2013a) suggested that the effects are akin to the how visual stream in Milner and Goodale’s (1995) two-stream model. In contrast, the extension of probability cuing to an object-centered frame of reference that is category specific evokes properties of the what visual stream. While we only found relatively short-lasting effects once target location probabilities were equated, category-specific biases could be more stable when target probabilities are never completely equated. While we can certainly perform different tasks with the same object, many object categories are associated with one predominant task (e.g., we handle tools, we select fruits for edibility, we identify letters). Another learned attention phenomenon, context-specific control, refers to learned mappings between a stimulus and attentional filters that can be retrieved by environmental cues (Jacoby, Lindsay, & Hessels, 2003). These filters form when specific items in a selective attention task are mostly congruent or incongruent (Bugg & Crump, 2012). In previous studies of context-specific control, certain locations (above or below fixation) were associated with a high proportion of congruency in a Stroop task (Crump, Gong, & Milliken, 2006). Stroop effects were highest for probes appearing in high congruency locations. Critically, the location of the cue was not task relevant but acted as an environmental cue that retrieved certain attentional settings. In the current study, the Greebles themselves serve as the context cues that indicated where the target was most likely to appear, and attentional biases are formed to the target-rich Greeble parts. Thus, we extend previous studies of context-specific control by demonstrating that parts of complex objects can act as contextual cues that retrieve attentional settings.
The resulting learned attentional settings as a result of context-specific control can transfer to new objects of a trained category (Bugg, Jacoby, & Chanani, 2011), including categories of complex objects like faces (Cañadas, Rodríguez-Bailón, Milliken, & Lupiáñez, 2012). The present work extends context-specific control to locations within (or to the parts of) an object. Our results therefore share features with the effects obtained in both probability cuing and context-specific control and suggest that attentional mechanisms operate at a level of generalization and specificity that may account for some category-specific effects observed in the object and face recognition literatures (K. W. Chua et al., 2014, 2015).
This work was supported by the NSF (Grant SBE-0542013) and NIH (P30-EY008126 and R01 EY013441-06A2). We thank Dr. Chu Chang Chua for continued guidance.
- Baruch, O., Kimchi, R., & Goldsmith, M. (2014). Attention to distinguishing features in object recognition. Visual Cognition, 22(9–10), 1184–1215.Google Scholar
- Chua, K. W., Richler, J. J., & Gauthier, I. (2015). Holistic processing from attention to diagnostic parts. Journal of Experimental Psychology: General, 144(4), 723–729.Google Scholar
- Crump, M. J., Gong, Z., & Milliken, B. (2006). The context-specific proportion congruent Stroop effect: Location as a contextual cue. Psychonomic Bulletin & Review, 13(2), 316–321.Google Scholar
- Fiser, J., & Aslin, R. N. (2001). Unsupervised statistical learning of higher-order spatial structures from visual scenes. Psychological science, 12(6), 499–504.Google Scholar
- Milner, A. D., & Goodale, M. A. (1995). The visual brain in action. Oxford, England: Oxford University Press.Google Scholar
- Jiang, Y. V., & Swallow, K. M. (2013). Spatial reference frame of incidentally learned attention. Cognition, 126(3), 378–390.Google Scholar
- Jiang, Y. V., Swallow, K. M., & Rosenbaum, G. M. (2013a). Guidance of spatial attention by incidental learning and endogenous cuing. Journal of Experimental Psychology: Human Perception and Performance, 39(1), 285–297.Google Scholar
- Jiang, Y. V., Swallow, K. M., Rosenbaum, G. M., & Herzig, C. (2013b). Rapid acquisition but slow extinction of an attentional bias in space. Journal of Experimental Psychology: Human Perception and Performance, 39(1), 87–99.Google Scholar
- Jiang, Y. V., Won, B. Y., & Swallow, K. M. (2014). First saccadic eye movement reveals persistent attentional guidance by implicit learning. Journal of Experimental Psychology: Human Perception and Performance, 40(3), 1161–1173.Google Scholar
- Saffran, J. R., Aslin, R. N., & Newport, E. L. (1996). Statistical learning by 8-month-old infants. Science, 274(5294), 1926–1928.Google Scholar
- Sigala, N., & Logothetis, N. K. (2002). Visual categorization shapes feature selectivity in the primate temporal cortex. Nature, 415(6869), 318–320.Google Scholar
- Zhao, J., Al-Aidroos, N., & Turk-Browne, N. B. (2013). Attention is spontaneously biased toward regularities. Psychological Science, 24(5), 667–77.Google Scholar