Modelling Visual Search with the Selective Attention for Identification Model (VS-SAIM): A Novel Explanation for Visual Search Asymmetries
- First Online:
- Cite this article as:
- Heinke, D. & Backhaus, A. Cogn Comput (2011) 3: 185. doi:10.1007/s12559-010-9076-x
- 621 Downloads
In earlier work, we developed the Selective Attention for Identification Model (SAIM ). SAIM models the human ability to perform translation-invariant object identification in multiple object scenes. SAIM suggests that central for this ability is an interaction between parallel competitive processes in a selection stage and a object identification stage. In this paper, we applied the model to visual search experiments involving simple lines and letters. We presented successful simulation results for asymmetric and symmetric searches and for the influence of background line orientations. Search asymmetry refers to changes in search performance when the roles of target item and non-target item (distractor) are swapped. In line with other models of visual search, the results suggest that a large part of the empirical evidence can be explained by competitive processes in the brain, which are modulated by the similarity between target and distractor. The simulations also suggest that another important factor is the feature properties of distractors. Finally, the simulations indicate that search asymmetries can be the outcome of interactions between top-down (knowledge about search items) and bottom-up (feature of search items) processing. This interaction in VS-SAIM is dominated by a novel mechanism, the knowledge-based on-centre-off-surround receptive field. This receptive field is reminiscent of the classical receptive fields but the exact shape is modulated by both, top-down and bottom-up processes. The paper discusses supporting evidence for the existence of this novel concept.
KeywordsVisual attention Visual search Computational modelling Search asymmetry
The visual search task is a commonly used experimental procedure to study human processing of multiple object scenes. In a standard visual search task, participants are asked to determine whether a pre-defined target item among non-targets (distractors) is present or absent. During the course of the experiments the number of distractors (display size) is varied. Typically, the time it takes participants to make this decision (reaction time) is measured as a function of the display size (search function). The slope of the search function is interpreted as indicator for the search efficiency for particular target-distractor pairings. For instance, search for a diagonal line among vertical lines is highly efficient with a slope close to 0ms/item whereas search for a ’T’ among ’L’s is inefficient with a slope of around 25 ms/item. Over 40 years or so, visual search tasks have produced a plethora of experimental evidence (see [31, 41] for reviews). There have been numerous attempts to develop qualitative theories of visual search, e.g. most prominently the Feature Integration Theory (FIT) by Treisman et al.  or the Attentional Engagement Theory (AET ). This article presents a connectionist model of visual search. This model is an extension of the Selective Attention for Identification Model (SAIM; [16, 19, 20]) adopted to simulate visual search and therefor is termed VS-SAIM.
SAIM was developed in a connectionist framework and aims to explain human behaviour in terms of the underlying neurophysiological processes in the brain. However, SAIM avoids the full complexity of neurophysiological processes, e.g. the dynamics of different neurotransmitters and employs rate-coded neuron models. On the other hand, this simplification is balanced with SAIM’s objective to unify a broad range of behavioural data in one model (see ; for extensive discussions on the relationship between models of the neural substrate and modelling behavioural data). SAIM’s starting point is the human ability to identify objects in multiple object scenes. SAIM suggests that central for this ability is an interaction between parallel competitive processes in a selection stage and a object identification stage. Based on this assumption, SAIM was able to simulate a broad range of experimental evidence usually associated with normal operation of attention and with dysfunctional attention . The simulations of normal attention covered two-object costs on selection, global precedence, spatial cueing both within and between objects, and inhibition of return. The effects of disordered attention included view-centred and object-centred visual neglect. In Heinke et al. , SAIM was successfully applied to simulate a few visual search experiments. These studies showed that the search functions in visual search can be an emerged property of the competitive processes in the brain. The slopes of the search functions were influenced by the similarity between distractors and target. However, when we attempted to simulate a broader range of visual search experiments, it became clear that this initial version of VS-SAIM was not able to mimic this additional data. Consequently, we modified some operations within VS-SAIM. Especially, we replaced the original similarity measure, the scalar product, with the Euclidian distance. The present article reports on a first set of results of this extension.
For the first set of results we chose experimental evidence that, on the face of it, is particularly challenging to VS-SAIM’s similarity-based approach, the search asymmetry (see ; for a review). In search asymmetries search slopes differ when the roles of target item and distractor item are swapped. For instance, it is easier to find a tilted line among vertical lines then vice versa ; a diagonal line among vertical lines than the reverse . Other examples are: orange item (easier) versus red item , moving item (easier) versus static item [11, 34]. For a similarity-based approach these data are particular challenging, as the target-distractor similarity simply does not change when target and distractor are swaped around. A theoretical account needs to introduce an additional factor to explain these findings.
On a wider note, there is no satisfactory theoretical account for the occurrence of search asymmetry at present. Initially, Treisman and Gormican  suggested that search asymmetries are indicative for the existence of feature maps assuming that detection of the presence of a feature is better than the detection of its absence . However, subsequent evidence has not supported their theory. For instance, their assumption does not fit with the findings on diagonal line versus vertical line , as there are well-known feature maps for diagonal lines in the brain. Moreover, recent evidence showed that search for an “inverted elephant” among upright elephants is more efficient than the other way around  pointing towards the involvement of object knowledge in search asymmetries. The current paper aims to develop a first coherent account of search asymmetries. It focuses on the search asymmetries with line orientations.
The reminder of the paper is organized as follows. After introducing VS-SAIM in detail, we discuss how VS-SAIM relates to other important models and theories of visual search. Then we illustrate how the search process in VS-SAIM plays out in detail (Study 1). Study 2 demonstrates that VS-SAIM mimic the experimental findings of asymmetries of line orientation for both diagonal versus vertical line and titled versus vertical line. We also present detailed explanations for this success. The explanation also suggests that VS-SAIM’s search efficiency depends not only on target-distractor similarity but also on the orientations of the distractors. Study 3 confirms this point through simulating findings by Foster and Westland . To complete the picture, Study 4 shows that VS-SAIM can also simulate a visual search task with symmetric results . The general discussion discusses the theoretical implications and present supporting evidence for VS-SAIM’s explanation of search asymmetries.
It is important to note that, as in the previous versions of SAIM, VS-SAIM was designed with the help of the principle of minimization of energy function. This idea was first introduced into connectionism by Hopfield and Tank  and implements a soft-constraint satisfaction. The design principle follows the following steps: First the problem is formulated as constraint satisfaction problem which defines the constraints a solution has to fulfil. These solutions are translated into activation patterns in a connectionist network. Then an energy function is designed in which these activation patterns are minimal energy values. Finally, to find these energy minima starting from pre-defined activation pattern, a gradient decent procedure is applied to the energy function. The gradient decent procedure results in nonlinear differential equations which, in turn, define a biologically plausible network topology, including the weights between connections. The advantage of this approach is that the energy minima defines a stable state or attractor state for the nonlinear differential equations. This property makes this approach appealing to the design of connectionist models. However, while designing the model in such a way, we found that some of the terms in the equations did not lead to a successful object selection and identification. Subsequently, we relaxed the minimization approach. The details of this relaxation are discussed in the "Appendix". Nevertheless, the topology of the model is still directly motivated by the energy minimization approach.
Early Visual Processing Stage (EVPS)
VS-SAIM’s early visual processing stage consists of Gabor-filters tuned to four orientations, 0°, 90°, 45° and 135°. Gabor-filters have been widely used to model receptive fields of orientation-selective simple cell in the primary visual cortex V1 (e.g. ). Details about the implementation of the filters and the parameters can be found in the "Appendix".
It is important to note that the content network can implement an arbitrary mapping which depends on the activation pattern in the selection network. For instance, if the unit in the centre of each layer in selection network had a high activation and all other units in the selection network were set to zero, the content of the centre of the input image would be represented in all FOA pixels. Hence, translation-invariant mapping is a special case that is achieved, if two constraints on the activation pattern in the selection network are fulfilled: First, only one unit in the each layer should be activated. With this restriction only the content of one image location is routed into the FOA, because the multiplication allows only one location to be passed into the FOA. Second, only units across the selection network that map neighbouring locations in input image onto neighbouring locations in FOA are allowed to be active. The constraint ensures that the FOA forms veridical representation of the selected object in the input image and is implemented through a “diagonal” activation pattern in the selection network. The necessity of “diagonality” arises from the following rational: If one unit in one layer is activated, the layer that controls the adjacent FOA-pixel has to activated the unit adjacent to the first unit. In this way, two locations adjacent in the input image are mapped into adjacent pixels in the FOA. The connections in the selection network implement the corresponding constraint satisfaction process.
The knowledge network implements the object identification in VS-SAIM. A unit in the knowledge network represents an object by being associated with a template of this object. The template is a copy of the object, as it would appear in the FOA. In order to determine which object is represented in the FOA, the template units compare their template with the FOA activation in a matching process. The similarity measure in this template matching is based on the Euclidian distance commonly used in connectionist networks. In order to determine which of the template units represents the best matching template, the units interact in a competitive process similar to the one implemented in the selection network (see "Appendix" for mathematical details). The output activation of the template units represent the output of VS-SAIM. A high output activation indicates that VS-SAIM has successfully identified the content of the FOA.
In VS-SAIM the knowledge network introduces not only a identification stage as an output stage, but also adds a general knowledge-based constraint on VS-SAIM’s behaviour. In order to fully integrate this additional constraint the knowledge network also influences the behaviour of the selection network via the matching network. This top-down pathway is a direct outcome of the energy minimization procedure employed in VS-SAIM (see "Appendix" for details). In general, this knowledge biases the VS-SAIM’s behaviour towards selecting locations in the input image that matches best the templates. Moreover, if the initial activation in the knowledge network is biased towards one template unit, VS-SAIM’s overall behaviour is biased towards selecting the item associated with this template. In this paper, we use this property to implement the fact that the visual search experiment requires the search for a set target. Hence, we will bias VS-SAIM towards the selection of the target item. If the target is not present, VS-SAIM is expected to overcome the initial bias and select a distractor item.
In the second stage, the matching template is compared with the feature maps from the EVPS and the result of this comparison feeds into the selection network. Again, like in the knowledge network the matching is based on the Euclidian distance. The usage of this distance is a direct outcome of the energy function minimization approach. It reflects the necessity that the matching in the bottom-up pathway needs to be consistent with the matching in the top-down pathway to ensure an overall consistent behaviour. Note that the matching network also mirrors the translation-invariant mapping of the bottom-up pathway by implementing the comparison between matching template and feature maps in a location-by-location fashion. Figure 5 illustrates this implementation for ’L’ and ’T’ as templates and an ’L’ and ’T’ in the input image. Figure 5 also shows the result of the matching process. Since the outcome plays an important role in this paper we introduced a special term, the matching surface. Bright pixels stand for highly matching locations and dark pixels represent no matching. The matching surface forms the input to the selection network, where the competitive processes activate units at locations with high matching values.
This section presented details on how VS-SAIM achieves translation-invariant object identification in a multiple object display. Crucial for achieving this objective are three mechanisms: competitive interactions for selection and identification of items; similarity-based matching in the bottom-up and top-down pathway to direct the selection process and identify the selected item; and an interaction between top-down and bottom-up pathways to ensure consistency between both levels. To implement the search for a target in visual search, the initial activation in the knowledge network is biased towards one template unit, biasing VS-SAIM’s overall behaviour towards selecting the target.
It should be noted that VS-SAIM is part of an ongoing project. Some of the mechanisms presented here have already been validated against experimental evidence other than data from visual search. For instance, the layered structure in the selection network, turned out to be crucial for simulating attentional disorder, such as extinction and object-based neglect . The excitatory connections in the selection network were useful in simulating proximity-based grouping . A first step towards the integration of similarity-based grouping was presented in . Also, SAIM proved robust enough to process natural images . Compared to the version published in 2003, the main extensions here are a different similarity measure (Euclidian distance instead of scalar product) and the introduction of an early visual processing stage.
VS-SAIM falls into a class of models that conceptualize visual attention as mapping details of an input image into a new representation. The most prominent representative of this class is the Selective Tuning (ST)-model by Tsotsos et al. . Similar to VS-SAIM, the ST-model uses competitive processes controlled by bottom-up and top-down pathways to guide the mapping process. Interestingly, in a recent extension of the ST-model Tsotsos et al.  stressed the importance of considering interactions between recognition and attention when modelling visual attention. This type of integrative approach is also taken by VS-SAIM and its earlier version, SAIM.
However, for the remainder of this discussion and in keeping with the theme of this paper we will focus on the most prominent theories and models of visual search in experimental psychology. Similar to VS-SAIM, all these models and theories postulate that an interaction between top-down and bottom-up influences plays a role in human performance in visual search. Moreover, all models suggest that at some stage a “featureless” encoding of the search display. For instance, in the Guided-search model  this representation is called “saliency map” or “master map”. In MORSEL  the input to the attentional module represents the contents at locations in search display “featureless”. In Deco and Zihl’s biased-competition model of visual attention  a location map receives inputs from all feature maps in a retinotopic fashion. In VS-SAIM the selection network and its input, the matching surface, are “featureless” maps. However, the Guided-search model and MORSEL suggest that this “featureless” map is static and is no longer modified during the search process. In contrast, Deco and Zihl’s model  and VS-SAIM postulate that the “featureless” map is dynamic and changes during the selection process. Especially, in VS-SAIM the dynamic “featureless” map, the matching surface, is an integral part of interactions between the selection process and the identification process. 1 Intuitively, this seems to be a more biologically plausible approach to modelling processes during visual search tasks. Finally, VS-SAIM also shares with the seminal Attentional Engagement Theory (AET ) the assumption that similarity-based matching plays a crucial role.
Another point to note is that VS-SAIM implements visual search in completely parallel manner. This contrasts with earlier versions of SAIM [16, 19] and also with most other models of visual search. For instance and most prominently the Guide-Search model postulates an entirely serial search process. Even the models with a competitive approach assume that there is some sort of serial rechecking mechanism (see the Search via Recursive Rejection (SERR)-model, ; for an example). However, our implementation of VS-SAIM does not imply that visual search is performed entirely in parallel. Instead, the work of VS-SAIM focuses on contributions of competitive processes to visual search which we, nevertheless, consider to play a crucial part in visual search. On the same token, the visual search mechanism proposed in this paper are assumed still to play an important role even when a serial mechanism is added to VS-SAIM in future versions.
Study 1: Basic Behaviour
Figure 6 shows the simulation result with target being present. The simulation was terminated after the knowledge network produced a clear-cut winner (see time plot of the knowledge network). At this point of time, activations in VS-SAIM were dominated by the target item. FOA and matching template show a stronger representation of ’L’ than of ’T’. The time plot of the selection network (top left) shows only the time course of the activations in the centre layer at the central locations of the items in the search display. The time plot illustrates that the target item (Item 5) in the visual field won the competition. This successful selection of the target item began in knowledge network where the initial activation of the two template units is biased towards the target item. This bias drove the matching network from a unbiased matching template (both templates are equally weighted) towards a matching template that is biased towards the target template. This, in turn, led to better matching values at the target location in the matching surface. Therefore, the selection network began selecting the target item which resulted in a stronger representation of the target item in the FOA. This improvement reinforced the initial bias in the knowledge network, eventually making the target template unit the winner unit.
When the target is absent (Fig. 7), the initial bias towards the target unit is overcome and the distractor template eventually wins the competition. Analogous to the present trial, the identity of this winner item was eventually reflected in all parts of the model. However, VS-SAIM reaches this state later than in the present trial. Hence, the initial bias in the knowledge network contributes to the delay of reaction times compared to the present trials. Moreover, in the absent condition the matching surface does not produce a clear winner early on, as in the present condition. Instead, the noise added in the EVPS generates a small difference between distractor items which, eventually, allows the selection network to randomly chose an item.
It is interesting to note that the delay in VS-SAIM’s reaction time in the target presence condition compared to the absent condition mimics typical experimental findings in visual search tasks . However, these simulation results go beyond the focus of the present paper. The strategy with which participants treat absent trials represents a entirely different issue (see ; for a rare example of modelling absent trials). Further simulations will need to explore whether this treatment of absent trials constitutes a valid approach.
Study 2: Search Asymmetry
This paper focuses on two asymmetries found in oriented line searches. First, if a tilted line is searched among vertical lines, search is more efficient than a vertical line among tilted lines . Second, if a diagonal line is searched among vertical lines, search is more efficient than a vertical line among diagonal lines .
Displays were generated with set-sizes of 2, 3, 4, 5, 6, 7 and 8 items. Each condition was run 5 times amounting to 70 trails in total. Only templates for the items present in a particular experiment were included in the knowledge network. At the beginning of each simulation run the template unit of the target was biased to a higher activation 0.506 than the distractor unit 0.494.
The reaction time (RT) of the model is the simulation time it takes for one template unit to pass a set threshold 0.7. Passing the threshold was interpreted as the model having recognized an item.
VS-SAIM’s reaction times were analysed with an ANOVA as well as a linear regression to obtain search slope and intercept. In the search function plots, the search slope is depicted next to the average reaction time for highest set-size.
Figure 8a, b show the RT functions for both orientation differences, 30° and 45°. For each orientation difference a separate two-way ANOVA with set-size and target-type as independent variables was carried out. The ANOVA for 30°-difference revealed significant main effects of set-size (F(6, 69) = 2631.6, p < 0.001) and target-type (F(1, 69) = 49939.0, p < 0.001) The interaction between the two factors was also significant (F(6, 69) = 812.45, p < 0.001). Figure 8a shows that overall reaction times increased with increasing set-size and that search for a vertical target was slower compared to search for the tilted target. The significant interaction resulted from a higher search efficiency when the tilted line was the target compared to when the vertical line was the target. This finding is also confirmed by the different slopes shown in Fig. 8a.
The results for the 45°-orientation difference were similar. The main effects of set-size (F(6, 279) = 90355.0, p < 0.001) and target-type (F(1, 69) = 16700.0, p < 0.001) were both significant. Also, the interaction between the two factors was significant (F(6, 69) = 4755.4, p < 0.001). Figure 8b shows that overall reaction times increased with increasing set-size and that search for a vertical target was slower compared to search for the diagonal target. The significant interaction resulted from a higher search efficiency when the diagonal line was the target compared to when the vertical line was the target. This finding is also confirmed by the different slopes shown in Fig. 8b.
The simulation results show that VS-SAIM is able to qualitatively reproduce the central result of asymmetric visual search tasks, that of an altered search efficiency when target and distractor roles are swapped. A vertical line target among tilted lines is searched less efficient compared to a tilted line among vertical lines. There are three interesting aspects of these results. First the results demonstrate that the competition processes can produce set-size effects. Second the set-size effect is modulated by target-distractor similarity. Third target-distractor similarity is not the only factor influencing search efficient as otherwise search asymmetry would not have been possible. The first two results were expected and are briefly discussed here. The third finding needs more explanation and will be discussed in the best part of this discussion.
As discussed earlier, the fact that competition process can produce set-size effects has been by our earlier work  by other such as a biased-competition model of visual search  and MORSEL . A good way of conceptualizing the reason for this behaviour is that the speed of convergence of the competitive process in the selection network by and large determines the VS-SAIM’s reaction times. 2 Moreover, the speed of convergence is proportional to the contrast between activations in the matching surface. The contrast is the difference between the highest input activation (target position) and all other input activations (distractor locations and background). For instance, the contrast would be highest, if there was only one item in the display. The contrast diminishes the more items are present in the search display leading to the set-size effect. Furthermore, the search slope depends on the target-distractor similarity, because the more similar target and distractor are the more the contrast diminishes with each additional item.
It is also important to note that the amplitude of the mismatch is influenced by the absolute activation in the feature maps, as opposed to the relative activation resulting from the matching between matching template and item. This is illustrated in Fig. 10. For simplicity this effect is depicted for the intensity feature map. However, it should be noted that each feature map leads to the same effect. In Fig. 10 b) the input item is brighter than in Fig. 10 a). Hence, when matching template and input item partially overlap, the mismatch is larger when the input item is brighter than when the input item is dimmer since the match is mainly performed against the background in the matching template. In VS-SAIM, this matching is implemented with the Euclidian distance. Returning to the simulation results, it is important to note that this Euclidian distance for the vertical line is larger than for the diagonal and tilted line. This results from the fact that the diagonal feature map is less weighted than the feature map for vertical orientations. In turn this difference leads to a smaller mismatch for the diagonal line compared to the vertical line. Moreover this leads to a smaller decrease in contrast in the matching surface for the vertical line as distractor compared to the diagonal line as distractor. Therefore the property of the mismatch surround a search item explains the search asymmetry found in the simulations.
Study 3: Background Orientation
In this simulation, the distractor item was one out of 0 to 180° rotated counter clockwise from the vertical rotated lines, with a step-size of 30°. The target object was either a 30° or 45° counter clockwise from the vertical rotated line with respect to the background orientation. All distractors in a display had the same orientation. The rotated lines were created with Matlab routine imrotate and a bi-linear interpolation. Display size was five.
Results and Discussion
Study 4: Symmetric Search
So far the simulations concentrated on mimicking asymmetric search patterns. Indeed, the simulations seem to imply that the asymmetric search pattern is the standard finding and there should be no symmetric search pattern. However, there is some empirical evidence for symmetric search as well. For instance, Egeth and Dagenach  showed that in a search with ’L’ and ’T’ items, the swap of target and distractor has no significant effect on the participants’ search performance. These simulations tests whether VS-SAIM can also simulate these symmetric experimental results.
Results and Discussion
Figure 13 shows the search function produced by the model. A three-way ANOVA revealed a significant main effect of set-size (F(6, 69) = 455.9, p < 0.001) and no significant main effect in target-type (F(1, 69) = 1.07, p = 0.31) reflecting the symmetric search behaviour. The interaction between set-size and target-type was not significant (F(6, 69) = 0.84, p = 0.55). The results show that there is no modulation of search efficiency by swapping the target and distractor roles of the items. The reason for the successful simulation of the symmetric search pattern is that both, L and T, are mainly made up of vertical and horizontal strokes. Only at cross points and end points the diagonal feature map shows some responses (see Fig. 2 for an illustration). Therefore, the mismatch area does not change much when L and T are swaped because the Euclidian distance from the background for both items does not differ. In other words, VS-SAIM suggests that when the item are predominately made of similarly weighted features, e.g. vertical and horizontal strokes, the search results should be symmetrical.
The Selective Attention for Identification Model (VS-SAIM) is a model of translation-invariant object recognition in a multiple object scene. In a first step, a early visual processing stage generates feature maps of vertical, horizontal and diagonal orientations. Then translation-invariance is achieved by mapping the content of the feature maps through to an attention window (FOA). Object recognition is implemented by a similarity-based (Euclidian distance) matching between stored templates for objects and activation in the FOA. With the issue of multiple objects, VS-SAIM deals with a mix of competitive and co-operative processes which are controlled by bottom-up and top-down influences. In the present paper, we simulated important findings from visual search experiments. Study 2 utilized search displays consisting of oriented lines (vertical, diagonal and titled lines). Each of these lines were either target or distractor in the simulations. The simulations demonstrated that VS-SAIM was able to mimic the typical increase in reaction times with increasing numbers of items (search slope). This result originates from the competitive processes in the selection network. As discussed in the introduction, this explanation has been put forward by several biologically plausible models, e.g. MORSEL , a biased-competition model of visual search  and our own work (e.g. ). Compared to these earlier works, the main progress is that, despite complex interactions between several competitive layers, VS-SAIM still produces a linear increase in reaction times. Hence, VS-SAIM suggests that, despite the fact that several competitive processes must interact in the brain, it is still possible that linear search function can emerge from these interactions. Furthermore, the slope of the search function is proportional to the similarity between target and distractor, in terms of orientation. For instance, search for the diagonal line among vertical lines is more efficient than search for a titled line among vertical lines. This is not unexpected as similarity-based matching plays a large role in VS-SAIM’s behaviour. This outcome also fits to one of the central hypotheses put forward by the Attentional Engagement Theory .
The following two studies tested two implications of Study 2. Study 3 showed that VS-SAIM cannot only simulate the influence of distractor orientation on visual search performance in general, but also the specific modulation found by Foster and Westland . This success is mainly due to how the Euclidian distance between distractor items and background changes with item orientation. Second, Study 2 seem to imply that search asymmetry is the standard finding. However, there is also evidence for symmetric search pattern . With its successful simulation VS-SAIM suggests that symmetry occurs when search items are formed by similarly weighted features, such as ’L’ and ’T’. Future research needs to follow up this prediction. For now it is important to notice the simulations presented in this paper underline the validity of VS-SAIM. Moreover, the results go beyond a simple ”proof of existence” and make a specific prediction of what is crucial for these simulation results, the knowledge-based centre-on-surround-off receptive field. Because of its novelty and, to some extend, its counterintuitivity this concept is discussed for the remainder of this paper.
In general VS-SAIM suggests that search asymmetries can be the outcome of knowledge-based influence. Interestingly, this is consistent with behavioral evidence that can be interpreted as knowledge-based influences, e.g. mirrored letters versus normal letters  or ”inverted elephants” versus normal elephants . On the other hand, search asymmetries are often seen as diagnostic for the existence of feature maps (e.g. ). However, this apparent contradiction is resolved in VS-SAIM by the fact that the top-down influence is modulated by the featural properties. Moreover, the simulations presented here suggest that this Euclidian-based modulation in VS-SAIM presents a good approximation for searches among lines.
But in how far are the spatial properties of this top-down modulation, the knowledge-based on-centre-off-surround receptive field, plausible? To begin with, it is intuitive to suggest that top-down influence effects search not only exactly at locations of items but also in the vicinity of items. If this top-down influence consists of some kind of matching processes as assumed in VS-SAIM, this matching should not drop off rapidly, as the system has to be robust against noise, distortion, etc. Now, the matching could either tail off to the level of the background level or go below the background level and then increase again as it is the case in our simulation results. The latter option has the advantages that it improves the contrast against the background and makes it more detectable for following processing stages. Moreover and importantly, apart from these theoretical considerations, there is also empirical support for the on-centre-off-surround shape of the matching surface: the well-known response characteristic of receptive fields in the early visual system (e.g. [1, 6, 23, 35]) and recent findings of behavioral performances surrounding the focus of attention (e.g. [2, 4, 8, 29, 32]). The classical findings on on-centre-off-surround receptive fields are usually interpreted as a purely feature-based process located in the retina or the LGN. VS-SAIM generalizes this type of spatial response to a knowledge-based on-centre-off-surround receptive field. The location of such receptive fields in the brain is unclear. It could be that the receptive fields in the early visual field indeed are influenced by knowledge. This has not been tested, but there are indications that responses in early visual processing are influenced by top-down modulation (e.g. see  for evidence on the effect of spatial attention in V1 in an animal study). An obvious alternative could be regions in which fMRI studies have shown indication of object processing, e.g. lateral occipital cortex (e.g. ). It is also worth noting that such a generalization from a model of low-level processes to higher-level processes is not uncommon. For instance, models based on the principle component analysis (PCA) have been applied to model the formation of low-level receptive fields (e.g. ) and human face recognition (see ; for a recent example). A similar transfer of a mechanism from low-level processes to high-level process is suggested by VS-SAIM for the on-centre-off-surround receptive field.
The second supporting evidence for VS-SAIM’s on-centre-off-surround receptive field comes from behavioural experiments on visual attention. In these experiments, the location of the focus of attention is manipulated by target locations in visual search [2, 4, 29], spatial cue  or identification of letters at a pre-defined location . The spatial profile of the focus of attention is determined by measuring the success of detecting a simple probe stimulus [2, 4], comparing the identity of the probe letter (same colour) with the target letter  and identifying the probe stimulus [29, 32]. The experiments show that the probe performance exhibits a similar on-centre-off-surround profile as VS-SAIM. Interestingly, even some details of the response characteristics are consistent with VS-SAIM’s profile. The profile is influenced by the saliency of the target whereby the inhibitory zone is deeper when the target is more salient . Second, Boehler et al.  showed that the exact shape of the profile depends on the task performed, i.a. simple target detection vs. detecting a feature on the target. This finding can be interpreted that the profile is affected by top-down processes as in VS-SAIM. However, future research needs to test whether this attentional profile is affected only by the task setting or whether properties of distractors influence the profile, e.g. by applying a probe task to asymmetric and symmetric search tasks. Furthermore, these experimental findings are normally conceptualized as profiling the focus of attention. Hence in VS-SAIM this can be construed as activation profile in the selection network. On the other hand these experiments can also be interpreted as tapping into the control mechanism of attention (see  for a similar point). This interpretation is consistent with VS-SAIM’s prediction that the centre-on-surround-off profile is produced by the matching network. Future experiments need to tease these two hypotheses apart.
Finally, the simulations with VS-SAIM suggest that search is strongly influenced by bottom-up properties of the distractors, especially highlighted by Study 3. In other words VS-SAIM’s simulations suggest that, apart from the target-distractor similarity, the properties of distractors play an important role in the efficiency of visual search. This point is interesting, because it is in contrast to most classical theories on visual search, where the focus is on the properties of the target rather than the distractors. In some sense VS-SAIM’s suggestion seems intuitively plausible as there are simply more distractors present in the search display, consequently, exerting stronger influence on human behaviour. Future experiments need to explore this novel suggestion.
Indeed simulations not included in this paper suggest that the concept of a ”dynamic saliency map” can improve our understanding of visual search tasks.
In fact, the dynamics of the activations in the matching surface also play a role, but are not crucial for the simulation results in this paper.
This work was supported by grants from the Engineering and Physical Sciences Research Council (EPSRC, UK) to the authors. The authors would like to thank Glyn Humphreys and Gustavo Deco for invaluable discussions.
This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.