Forty years after feature integration theory: An introduction to the special issue in honor of the contributions of Anne Treisman
- 81 Downloads
Anne Treisman’s seminal paper on Feature Integration Theory (FIT) appeared 40 years ago (Treisman & Gelade, 1980). When she died in 2018, we wanted to honor her memory with a special issue of Attention, Perception, and Psychophysics and FIT seemed like a good organizing theme. At one level, that seems like an obvious choice. With over 13,000 citations in Google Scholar, the 1980 paper is her most cited work (though there are at least a dozen more with over 1000 citations). On the other hand, if you asked almost any of the researchers currently working on topics like visual search or texture segmentation about FIT, they would probably tell you that the model was wrong. To cite a personal example, I entered the discussion in 1989 with a paper entitled “Guided Search: An alternative to the Feature Integration model for visual search” (Wolfe, Cave, & Franzel, 1989). So, why are we rendering homage to this model, 40 years later? In the 1970’s, the statistician George Box coined the aphorism “All models are wrong but some are useful.” FIT has proven to be more than useful. It has shaped the discussion about visual search and much of the discussion about attention more generally for decades. You could not work in this area and ignore the idea of a two-stage system with a “preattentive” front end leading to a bottleneck into later attentive processes. You had to react to the idea that a few visual attributes like color and size could be processed in parallel in that front end. You had to take some position on the idea that attention was necessary if you were going to ‘bind’ those features into a representation – an ‘object file’ – that could be recognized as a salad, a chair, or whatever it might be. Some of us adopted Treisman’s ideas and modified them to accommodate newer data (That is certainly the story of Guided Search.). Others have actively rejected various components like a serial process of binding one object after another. However, all of us engaged with Treisman’s ideas.
Given the importance of FIT, it seemed likely that we would get a reasonable response if we announced a special issue on the topic. We expected 30-40 submissions, yielding a special issue of about 20 papers. We received 80 submissions and the resulting crop of papers will be spread over two issues of the journal. The vagaries of the peer review process mean that some of these papers were accepted a year ago, while others were being revised. In this electronic era, papers appeared online as they became available. We are happy now to be able to gather them together in one place (Well, in two, successive places. Placement into the first or second issue was done on the basis of the data of final acceptance.). These papers advance our science and pay tribute to the impact of Anne Treisman.
So, forty years after FIT appeared, what work did it inspire for this first of two special issues of Attention, Perception and Psychophysics? We start with an excellent piece of historical frame-setting. One might expect a review in which FIT is the seed from which a tree of research has grown (e.g. Quinlan, 2003). However, in Kristjansson and Egeth’s (DOI: https://doi.org/10.3758/s13414-019-01803-7) paper, FIT is the fruit of earlier work. That fruit will become the seed that leads to so much work over the last decades but Kristjansson and Egeth do us an important service in showing how FIT emerged from other ideas. Hochstein (DOI: https://doi.org/10.3758/s13414-019-01797-2) provides a second brief review to get us started, focusing in some detail on Treisman’s work on gist and ensemble processing, topics she took up later in her career (see Chong, DOI: https://doi.org/10.3758/s13414-019-01827-z, in this issue). Moreover, he discusses her little known early work on binocular rivalry.
After those reviews as preamble, I have tried to organize the issue roughly around the structure of FIT. FIT proposed a preattentive stage, followed by an attentive stage. Thus, the next papers deal with preattentive processing; what aspects of an object can be appreciated before attention is directed to that object? Much of that work has been focused on preattentive features. These can be defined in many ways. Perhaps the simplest is to imagine a target item among a homogeneous array of other items (a red item among green, a T among Ls). If that target “pops out”, if it is found quickly regardless of the number of distractor items, then the feature that differentiates the target from the distractors is a good candidate for preattentive feature status. Here red would pop out while the T would not. Color is a preattentive feature (Wolfe & Horowitz, 2017). In the current issue, Schill et al. (DOI: https://doi.org/10.3758/s13414-019-01834-0) discuss whether axis-of-rotation can function as a basic feature in visual search. Does an item rolling toward or away from you around a horizontal axis pop-out from items spinning around a vertical axis? Thornton & Zdravković (DOI: https://doi.org/10.3758/s13414-019-01750-3) ask similar questions about the illusion of motion the Kitaoka has made famous (visit http://www.ritsumei.ac.jp/~akitaoka/index-e.html). Motion is an uncontroversial preattentive feature (Dick, Ullman, & Sagi, 1987), but what if that motion isn’t real?
Treisman’s work on preattentive features led her naturally into the question of what could be seen in a single glance. What was the “gist” of the image that could be seen before attention began to do its work of “feature integration”? What did you know about the average size or orientation of groups of items? Today this topic of “ensemble” perception (Whitney & Yamanashi Leib, 2018) is a small industry. As discussed by Hochstein (DOI: https://doi.org/10.3758/s13414-019-01797-2), Treisman and her student Sang Chul Chong were early pioneers (Chong & Treisman, 2003) and in this issue, Chong continues this work with his “Distributed attention model of perceptual averaging”.
Returning to the topic of basic features, the Schill paper on axis-of-rotation employs a search asymmetry paradigm. Treisman made extensive use of search asymmetry; the observations that, for some pairs of stimuli, search for X among Y was markedly more efficient than search for Y among X. Going back to the Dick et al. (1987) paper on motion search, finding a moving target among stationary distractors is much easier than finding a stationary target among moving distractors. Treisman thought this was useful for identifying basic features. She made the argument that search for the presence of a feature (e.g. motion among stationary distractors) was easier than searching for its absence (stationary among moving distractors) (Dick et al., 1987). This was the topic of a whole special issue nearly 20 years ago (Wolfe, 2001). Zhang and Onyper (DOI: https://doi.org/10.3758/s13414-019-01818-0) return to this topic in the present issue. Specifically, they are looking at a different account of asymmetry. If it takes longer to disengage from Y distractors than from X, search for X among Y will be slower than Y among X even if no basic feature search is involved. In their paper, Zhang and Onyper argue that faster search for novel among familiar distractors cannot be explained whole by faster rejection of familiar distractors.
FIT stressed the idea that a target defined by a unique, preattentive feature could be found effortlessly. About a decade later, the “Guided search” model (Wolfe et al., 1989) broadened the roll of those features. Attention could be biased to all the red items, for instance, even if the target was not uniquely red (Egeth, Virzi, & Garbart, 1984). Subsequent research has shown that this guidance need not be based exclusively on basic feature information. Scene content guides attention along with the effects of recent history (e.g. priming) and the learned “value” of features. History effects are represent in this issue by the work of Hollingworth and Bahle (DOI: https://doi.org/10.3758/s13414-019-01759-8) while value is the focus of Daniel and Raymond (DOI: https://doi.org/10.3758/s13414-019-01744-1). Daniel and Raymond are not looking at the role of value in the guidance of attention in visual search. Instead, they are asking how the attentional effects of value alter our perception of groups of items (ensemble perception), as noted above, a topic of interest to Treisman later in her career. Daniel and Raymond report that your impression of the overall size of items in an ensemble can be altered by giving different values to subsets of different size.
There are other examples of the idiosyncratic nature of feature processing; two of them in this issue. Orientation is a generally accepted preattentive feature, so it should guide attention in a straight-forward manner. If half the items are of the wrong orientation, observers should be able to guide attention away from those items and toward those of the correct orientation. Hulleman, Lund, & Skarratt (DOI: https://doi.org/10.3758/s13414-019-01787-4) have been studying a situation where that expectation is violated. In their hands, adding orientation information actually makes performance worse. Adding color data, on the other hand, behaves differently (Hulleman, personal communication). They argue that their orientation data is hard to explain if attention is directed to “items”. They prefer to think about attention to functional visual fields (FVF) around the point of fixation. For each fixation, the FVF could be defined as the region within which a target might be found. The topic of what sort of processing goes on within the FVF is one that is taken up a bit later in this issue.
Another example of color and orientation behaving differently as features comes from Hannus, Bekkering, and Cornelissen (DOI: https://doi.org/10.3758/s13414-019-01841-1). They were performing searches for conjunctions of color and orientation. Thus, the target of search might be a red vertical line while the distractors were red horizontal and green vertical lines. A color feature map would show half red, half green while the orientation map showed half vertical and half horizontal items. On the other hand, Hannus et al. used less dramatic color and orientation differences. They previewed a conjunction search for a colored, orientated bar by previewing either the color or the orientation but not both. Knowing the location of items of the correct color or the correct orientation should help search. In abstract terms, the color and orientation situations are equivalent. In the messy world of real preattentive features, preview by color worked differently from (and better than) preview by orientation. These studies scratch the surface of a large data space. There is a whole family of preattentive features (Wolfe, 2018) and the details of how they operate in search and in other attentional tasks have not been worked out in most cases.
FIT is a two-stage model with a preattentive stage followed by an attentive stage. Treisman proposed a bottleneck between these stages because there were tasks that could not be accomplished in parallel, over multiple items at the same time. High on the list of such tasks was the “binding” of features to objects. Preattentive processes might register the presence of colors, sizes, orientations, and so forth; but knowing how those features were bound together into a recognizable object requires attention. Conjunction tasks, of the sort just discussed, would require attention to bind the color with the orientation. In this issue, the topic of binding is represented by Harris et al. (DOI: https://doi.org/10.3758/s13414-019-01677-9) in a paper on misbinding errors in brief displays as well as Wu, Dowd, and Golomb (DOI: https://doi.org/10.3758/s13414-019-01739-y) looking at similar errors after an eye movement. Binding errors were important to Treisman because they seemed to show that features were processed separately and needed to be put together with the help of selective attention (Burwick, 2014; Treisman, 1996; Treisman & Schmidt, 1982). In Treisman’s formulation, binding creates “object files” in working memory. How do you know if your object file at one moment is the same as at another? Spatiotemporal continuity has been thought of as the vital glue, maintaining the representation. Moore, Stephens, and Hein (DOI: https://doi.org/10.3758/s13414-019-01763-y) propose that the role of feature information has not been given as much credit as it deserves.
The classic FIT evidence for the role of binding in visual search was the search for conjunction targets defined as conjunctions of two basic features. Only attentional selection of one item at one location would allow the color and orientation (or other feature) information to be linked through a master map, allowing the observer to determine if this bound item was a target or distractor. Treisman often used a rather broad definition of conjunctions. Thus, a search for the letter T among Ls could be described as search for one conjunction of vertical and horizontal line segments among distractors consisting of a different conjunction of those two features. Subsequently, it has been more useful to see these “spatial configuration” searches (Wolfe, 1998) as a different class of search task. The distinction between these types of task has to do with whether the search can be “guided”. In Treisman’s original formulation, all of the conjunction tasks required random, serial attention from one item to the next, in order to bind and recognize the item. Subsequently, multiple labs showed that search for two-feature conjunctions could be more efficient than FIT predicted (e.g. Alkhateeb, Morland, Ruddock, & Savage, 1990; McLeod, Driver, & Crisp, 1988; Sagi, 1988). As we have discussed, my contribution to this discussion was to argue that basic feature information could be used to “guide” attention to likely candidate targets (Wolfe et al., 1989). Thus, if you guided attention to red items and, at the same time, to vertical items, you were very likely to find that you had found a red vertical item. Indeed, only an assumption of noise in this system could explain why searches for targets defined by conjunctions of salient features were not as efficient as search for feature singletons (Wolfe, 1994). Searches like the search for a T among Ls were different in that feature guidance would not help. Both target and distractors in those searches were composed of the same features and the spatial relationship between those features does not seem to be useful for guidance.
The addition of guidance allowed the Guided Search model to preserve the parallel/ serial architecture of Treisman’s FIT without requiring that each search task be described as either serial or parallel. That, however, does not end the discussion about the nature of the processing architecture. In this issue, for example, Blunden et al. (DOI: https://doi.org/10.3758/s13414-019-01775-8) are asking what happens when an item has more than one instance of a feature type. After all, unlike red vertical lines in a simple search display, most objects in the real world (cars, sandwiches, etc), contain multiple orientations, colors, and other features. Blunden et al. show evidence for coactive processing of different instances of the same type of feature (albeit in stimuli that are much simpler than cars or sandwiches).
A different attack on the basic parallel/serial architecture of FIT comes from researchers who argue that the only serial component of search is oculomotor. Obviously, fixations occur in series and some sort of search occurs within the FVF surrounding each fixation. Researchers like Hulleman and Olivers (2017) argue that processing within the FVF is essentially parallel, with all items processed in a single step. In contrast, I would argue, that covert attention is deployed in series, to items within the FVF. Empirically, it is lamentably difficult to distinguish between these positions (as Townsend has been telling us for many years (Townsend, 1971; Townsend, 2016). In the present issue, Liesefeld et al. (DOI: https://doi.org/10.3758/s13414-019-01819-z) offer a “both-and” solution, instead of the usual “either-or” formulation of the debate. They see room for both serial and parallel processing in their “theoretical attempt to revive the serial/parallel-search dichotomy”.
Stefanie Becker (DOI: https://doi.org/10.3758/s13414-019-01807-3) describes a different “both-and” solution to a debate in the literature. Above, I described guidance to “red” and “vertical” as if the human search engine was set to look for the specific features; “red” and “vertical”. For a number of years, Becker (Becker, Harris, York, & Choi, 2017) and others (Yu & Geng, 2019) have produced evidence that the guiding template may not be precisely pointed at specific guiding features. Becker has shown that it can be better to think of attention as guided to the “redder” item, rather than to the specifically red item. Similarly, Yu and Geng (2019) describe situations where it is better to tune a guiding template to a value a bit away from the actual target feature in an effort to more effectively distinguish between targets and distractors. As often happens, these results led to a debate between proponents of relational guidance and of specific feature guidance and, as also often happens, the truth of the matter seems to be that both types of guidance are available and our search engine is able to configure itself, within limits, to do what works. This, in any case, is the thesis of Becker, Atalla, and Folk (2020) in the present issue.
The work by Becker, Geng, and others on specific and relative guidance slides easily into the topic of what we remember about what we are looking for. After all, if you are asked to look for a red vertical line, your cat, or anything else, you need to have some representation of that target of search, stored in memory. Actually, there are two representations or “templates” that are relevant. These have not been treated as distinct in the search literature, but it seems clear that there are two. One of these is what we can call the “guiding template”. The other is the “target template”. The guiding template holds the properties of the target that can be used to guide attention. This template probably resides in working memory. The ability of working memory representations to influence search has been the subject of much recent research (e.g. Dowd, Pearson, & Egner, 2017; Foerster & Schneider, 2018; Hollingworth & Beck, 2016; Kristjánsson, Thornton, & Kristjánsson, 2018; Oberauer, 2019). Thus, if you are searching for your cat, the guiding template might bias attention to objects of your cat’s color, size, and approximate shape.
If the guiding template in working memory directs your attention to an object of the right color, size, and shape, how do you know that it is your cat? Now you must match the object in the visual field with a “target template” whose precision allows you to decide that this is a big, orange cat but it is not your big orange cat. How do we know that this is a second type of template? Guiding templates in working memory are subject to the tight limits on working memory capacity (Suchow, Fougnie, Brady, & Alvarez, 2014). On the other hand, it is possible to search for any instance of 100, different objects at the same time (Wolfe, 2012). Those 100 target templates, held in memory, cannot possibly be resident in working memory as we understand it. Both types of template may be involved in the Rajsic and Woodman (DOI: https://doi.org/10.3758/s13414-019-01721-8) paper in this issue. They are asking their observers to do a search and to precisely identify a color. It should be noted that they are not framing their work in the context of two types of template and may be a bit surprised by this take on their work.
By the time we are talking about the role of working memory in search, I could be introducing a special issue, focused not on Treisman’s contributions, but to the contributions of Alan Baddeley and colleagues, who brought the concept of working memory to life in cognitive psychology. We are fortunate, therefore, to have a review article by Hitch, Allen, and Baddeley (DOI: 10.3758/s13414-019-01837-x) that ties FIT to Baddeley and Hitch’s work, modeling the working memory system. Those interactions of working memory and attention go well beyond simply holding a guiding template. There are, for example, interactions between memory for the identities of objects and the locations of objects that are the addressed in the papers by Toh, Sisk, & Jiang (DOI: https://doi.org/10.3758/s13414-019-01738-z) and Donovan, Zhou, and Carrasco (DOI: https://doi.org/10.3758/s13414-019-01815-3).
The tree of FIT has many branches. A number of these are represented in this issue. Madden et al. (DOI: https://doi.org/10.3758/s13414-019-01823-3) are interested in the effect of aging and show how RT and fMRI methods can be combined to address these questions. Marsh et al. (DOI: https://doi.org/10.3758/s13414-019-01800-w) turn to the effects on visual attention of auditory distraction. Spence and Frings (DOI: https://doi.org/10.3758/s13414-019-01813-5) also want us to think beyond the merely visual. They observe that Treisman didn’t extend FIT to other sensory modalities or their interaction with vision. They find this surprising, given that Treisman’s original work on selective attention was carried out in the auditory domain (Treisman, 1969; Treisman, 1960) before she turned to visual attention in the 1970s. Spence and Frings argue that it would not be simple to make FIT work in other senses or in a multisensory framework. Finally, while these papers extend Treisman’s reach to other senses, Miron and Kalanthroff (DOI: https://doi.org/10.3758/s13414-019-01833-1) apply her ideas further afield, to the problem of depression.
The interested reader should read fast because we will be back next month, with the second half of this issue. Clearly, forty years has not exhausted the possibilities of Feature Integration Theory. We may have other models of attention and search that we believe better capture the data. Nevertheless, we cannot and do not ignore the debt we owe to Anne Treisman’s foundational ideas.
- Egeth, H. E., Virzi, R. A., & Garbart, H. (1984). Searching for conjunctively defined targets. J. Exp. Psychol: Human Perception and Performance, 10, 32-39.Google Scholar
- Hulleman, J., & Olivers, C. N. L. (2017). The impending demise of the item in visual search. Behav Brain Sci, 1-20. doi: doi:10.1017/S0140525X15002794, e132Google Scholar
- Townsend, J. T. (2016). A Note on Drawing Conclusions in the Study of Visual Search and the Use of Slopes in Particular. A reply to Kristjansson and Wolfe. i-Perception, ms.Google Scholar
- Whitney, D., & Yamanashi Leib, A. (2018). Ensemble Perception. Annu Rev Psychol, 69, 105-129. doi: https://doi.org/10.1146/annurev-psych-010416-044232 CrossRefPubMedGoogle Scholar
- Wolfe, J. M. (2012). Saved by a log: How do humans perform hybrid visual and memory search? Psychol Sci, 23(7), 698-703. doi: doi: https://doi.org/10.1177/0956797612443968
- Wolfe, J. M. (2018). Visual Search. In (J. Wixted) (Ed.), Stevens’ Handbook of Experimental Psychology and Cognitive Neuroscience (Vol. II. Sensation, Perception & Attention: John Serences (UCSD)): Wiley.Google Scholar
- Wolfe, J. M., Chun, M. M., & Friedman-Hill, S. R. (1995). Making use of texton gradients: Visual search and perceptual grouping exploit the same parallel processes in different ways. In T. Papathomas, C. Chubb, A. Gorea & E. Kowler (Eds.), Early vision and beyond. (pp. 189-198). Cambridge, MA: MIT Press.Google Scholar
- Wolfe, J. M., & Horowitz, T. S. (2017). Five factors that guide attention in visual search. [Review Article]. Nature Human Behaviour, 1, 0058. doi: https://doi.org/10.1038/s41562-017-0058