In everyday life, animals spend much of their time with looking for something. They need food, nest-building material, or mates and compete with conspecifics for all these pivotal resources. Unfortunately, their location is mostly uncertain, and they are typically embedded within an environment full of irrelevant stimuli (distractors) competing for attention and hindering identification of the actual search item. Many animals actively try to impede their detection as prey or predator by camouflaging and hence, blending into the surrounding. Thus, an individual requires efficient strategies to search in the visual field, which facilitate quick detection of significant objects. The difficulty of visual search depends on factors like scene complexity, number of distractors, and similarity between search objects, distractors, and background. This is a computationally demanding task. Since neuronal resources are limited, visual systems cannot process all incoming information in parallel and therefore have to filter out relevant stimuli while suppressing responses to distractors or noise. Thereby, especially attentional mechanisms mediate selection of cues and guide gaze during search.
Although pivotal for all visual animals, major models of visual search strategies are coming from human psychology and are intimately related to theories of attention (Eimer 2014; Knudsen 2007; Wolfe 2003). Due to the interrelations between attention and visual search, search paradigms are typically used to investigate attentional processes. Recent comparative studies, however, indicate an astonishing similarity of fundamental mechanisms between different vertebrate classes and even insects irrespective from profound differences in the organization of their neuronal systems (Nityananda 2016).
Visual Search in the Laboratory
Experimental search is typically less complex than in natural surroundings. But difficulty of the task can be manipulated by changing the number of items, by the complexity of the target, or by the similarity between target and distractors or background. Search performance is measured as error rate (= erroneously indicating presence (false alarm) or absence (miss) of the target) or as reaction time (RT, time required to indicate presence or absence of the target). The relation between RT and number of presented items in the display (set size) is an important measure indicating the underlying search strategy. If RT rises linearly with increasing set size, the subject presumably scans the items one by one until the target is found. This indicates a serial search strategy. If on the other hand, RT is independent from set size, the subject may observe all items at once. This pattern points to a parallel search strategy.
Steps and Mechanisms of Visual Search
Visual search proceeds along different steps, which confront the neuronal system with different problems. Because of the richness of the natural environment, it is impossible to process all visual input at one time. A major problem especially during natural-scene search is the decision how to scan the environment and hence how to disperse attention to select relevant information helping to find the search object. Thereby, bottom-up processes identify salient, conspicuous features, objects, or places, which directly attract attention. Top-down processes guide attention according to expectations about the appearance and localization of the search item and therefore consider context information that predict where the search object might be. Since mostly not all details of scenery can be analyzed in parallel, the visual system switches between serial and parallel search mechanisms, which differ in speed and accuracy and which are differentially efficient depending on the complexity of the search items, distractors, and scene.
Search starts by deciding what to look for. This implies that the individual has some expectations how the search object looks like. A search object is typically defined by a combination of features like shape, size, color, or texture. These features are used as templates guiding attention through the visual scenery. Attentional templates are represented in working memory, which temporarily stores a limited amount of selected information for later processing. Incoming signals are permanently compared with these templates to select specific signals over others in the environment. Thereby, precise (expert) knowledge of the target characteristics (because of extensive experience with the search objects) can profoundly improve detection performance (e.g., ground-feeding birds like pigeons or chicks are highly efficient in detecting grains between pebbles; Fig. 1c). When the exact appearance of a search object is not known (e.g., when different edible grains or seed might be present on a ground), the individual can use characteristic visual properties, which are invariant across diverse members of an object class (categorizing).
At the beginning, parallel search gains a global impression of the scene eventually detecting critical features allocating attention to possible target objects while filtering out distractors and background noise. As a primary filter, several species display differential resolution of visual processing. The retinae of primates or many birds, for instance, possess areas of enhanced visual acuity (fovea) so that detailed vision is confined to a small range of the visual field while the rest is only diffusely perceived. This in turn requires attentional mechanisms guiding gaze to positions in the visual scene that might contain the target and that should be analyzed more extensively. Eye movements are controlled by a mesencephalic brainstem network that is highly conserved in vertebrates with the optic tectum (superior colliculus in mammals) as a central structure that closely interact with specific forebrain areas in orienting gaze (Knudsen 2011).
Bottom-up and top-down processes converge to establish a salience map, which encodes conspicuous features while preserving topographic organization of the scene. This map is used by the gaze controlling networks to guide eye movements. Thereby, salience describes how distinct one element of a scene is from surrounding elements. Sometimes, a characteristic feature of the search object “pops-out” since it is very different from the surrounding (e.g., a ripe, red berry in green leaves). It automatically attracts attention and the object can be directly identified by parallel search. Many plants, which depend on animal pollination or seed spreading, harness this ability by vividly coloring their fruits and blossoms. Pop-out effects can be observed in different vertebrate species and even insects like bumblebees (Nityananda 2016), indicating that parallel analysis of features in a visual scene is a universal search mechanism across the animal kingdom. Salience can also be caused by a highly biological relevance of a stimulus (a predator hidden in the underwood). These stimuli are detected faster because of phylogenetically developed attentional priority mechanisms. Humans and other primates for instance detect deviant snake picture in a complex array more quickly than neutral stimuli like flowers (Shibasaki and Kawai 2009). Moreover, experience-dependent “search images” facilitate as primed representations the detection of vitally relevant objects. Foraging birds for instance overselect grain types that are relatively abundant. Tits, which capture different insect during flight, are more likely to notice that prey type, which they have repeatedly encountered before. In a similar way, honey bees typically restrict their search to a subset of available flower species when exploiting nectar and pollen sources on a meadow (flower constancy).
Understanding the structure of a scene also supports search since a natural environment exhibits regularities, which provide information about the likely localization of search objects (Le-Hoa Võ and Wolfe 2015). Contextual cues direct attention via top-down processes to those parts of a visual environment that have the highest probability of containing the target (ripe fruits are found on tree tops and not in the underwood). Knowledge increases with experience since individuals gain memory for the localization of detected targets in a familiar environment.
Parallel search identifies candidate objects or locations in space, which are selected for more detailed analysis during serial search. In this case, focal attention is allocated from object to object until the target is identified. Serial search is always necessary when a search item does not pop-out or when complexity of a search item requires combination of different features. Since combination of different features is computationally demanding, single objects can only be analyzed and recognized successively. The underlying neuronal processes represent key aspects of models of visual attention like the Feature Integration Theory of Treisman (Wolfe 2003). Detailed analysis and combination of features finally enables object recognition. This is a demanding computational process on its own and a prerequisite to initiate an adequate behavioral response.
Serial and parallel search might be controlled by processes in the left or right brain side. In several species, visual attention and hence search is not symmetric across the visual field. Chicks and pigeons for instance preferentially peck into the left visual half-field when they explore an area uniformly spread with grains. Since input from this half-field is processed by the right brain side, the leftward bias indicates right-hemispheric control of visuospatial attention (Diekamp et al. 2005). Since grains are not obscured under test conditions, parallel search enables quick collections of the food items. The underlying global processing is a typical specialization of the right brain side in vertebrates. When, however, grains have to be discriminated from similar looking pebbles (Fig. 1c), the left hemisphere is more efficient in mediating a detailed analysis of visual stimuli requiring focused attention and serial search (Rogers 2017).