Introduction

In most cases perception by contemplation accompanied by prior knowledge occupies a shorter time than that achieved by simple contemplation [...] things which exist in the soul and are memorized have no need to be recognized (Ibn al-Haytham [Alhazen], 1039/1989, II.4, 20)

As we orient towards the things that we need to interact with in our visual environment, there are many sources of distraction that can interfere with our current goals. Often, we are engaged in continuous tasks that require us to repeatedly attend to the same, or similar, items. One example could be picking blueberries. While the color of the berries tends to be constant, factors such as changes in their orientation relative to a light source, our viewing position, whether they are hidden behind leaves, or whether we block the sun and cast a shadow, vastly change the actual luminance and color values that are reflected to our eyes. Interpretation of the visual input, where these influences are considered, then alters our perception of these qualities. Figure 1 shows how the range of color and luminance values for a patch of berries can vastly change, yet our perception of the hue and brightness of the berries is largely constant despite large changes in the physical stimulation. How does the visual system maintain this constancy? And how should the mechanisms that guide our attention be tuned for maximum effectiveness during our interactions with the world? In the case of berry picking, a mechanism tuned to precise color and luminance values would not provide particularly effective guidance, but a mechanism tuned to a certain range of values might fare better. This sets the stage for a probabilistic view of visual selection, since some features will be more probable than others. A lot of evidence shows how the visual system is strongly biased towards selecting what it has seen before. When we search for a particular target item, stimuli sharing features with previously selected stimuli are selected and processed more quickly than others (known as attentional priming; Kristjánsson & Ásgeirsson, 2019). The central claim here is that such attentional priming is probabilistic.

Fig. 1
figure 1

Blueberries and crowberries in a berry patch in Finnsstaðir in Northeast Iceland. The range of shades of blue on the left shows the actual luminance and color of nine randomly chosen berries from the image. Despite this large range of hues and lightness, berry picking is quite efficient

Attention is guided by templates

Selective visual attention has traditionally been thought to keep our attention focused on the task at hand at any given moment (Desimone & Duncan, 1995; Neisser, 1967; see Driver, 2001, and Kristjánsson & Egeth, 2020, for historical reviews). A common assumption is that visual attention fulfils this role through guidance templates stored in visual working memory that are tuned towards our goals at a given moment (Bundesen, 1990; Desimone & Duncan, 1995; Kristjánsson & Kristjánsson, 2018; Woodman et al., 2013). This concept is captured in the roughly 1,000-year-old quote from Alhazen at the start of this review.

There is a lot of evidence that supports this notion. Neurophysiological findings (Treue & Trujillo, 1999; Maunsell & Treue, 2006; Moran & Desimone, 1985; Reynolds & Heeger, 2009) suggest that attention involves the gain modulation of sensory neurons tuned to the features of interest. Chelazzi et al. (1993) reported that neuronal activity relevant to a cued feature is maintained in the period between the presentation of a cue and a target. Bichot et al. (2005) observed a bias towards properties of targets within neuronal receptive fields responsive to the location of undiscovered targets as monkeys freely scanned multi-element arrays. This suggests that receptivity to task-relevant features was enhanced and that templates guide attention by biasing activity in favor of neurons responsive to target features. Eimer et al. (2011) demonstrated the benefits of generating templates, reporting that feature-specific top-down guidance speeded target selection, prevented irrelevant information from attracting attention, and rapidly resolved attentional competition. Carlisle et al. (2011) measured active maintenance in visual working memory (using contralateral delay activity in EEG signals) showing how information accrues in visual working memory as the same search was repeated. When their participants searched for two target objects, CDA amplitude was roughly twice as large as when observers only needed to find the same single target throughout a trial block. There is also good evidence that attentional guidance is modulated by the amount of information in working memory (Hollingworth & Beck, 2016a, 2016b; Kristjánsson & Kristjánsson, 2018). All this supports the idea that templates stored in working memory guide attention.

Salient visual distractors can be suppressed during repeated visual search and the strength of attentional capture by task-irrelevant items can depend on the distractor statistics (Chelazzi et al., 2019; Sayim et al., 2010; Töllner et al., 2015; Won & Geng, 2018). These findings show how information about distractor frequency is encoded, possibly forming so-called templates for rejection that can be broadly tuned to items that differ from the items that we search for (Geng et al., 2019).

But how are these guidance templates formed, and how are they maintained as we perform our daily tasks? One mechanism – priming of attention shifts – may serve a key role in this respect. Such priming enables quick reorientation to items that we are tracking at any given moment, even if we briefly attend to something else (Maljkovic & Nakayama, 1994; see Kristjánsson & Ásgeirsson, 2019, and Ramgir & Lamy, 2021, for recent reviews). Priming often seems to have a dominating influence on how attention operates in the visual environment and can be volitionally controlled only to a limited extent (Cochrane & Pratt, 2020; Pascucci et al., 2012; Shurygina et al., 2019). Additionally, priming can release visual stimuli from crowding (Kristjánsson et al., 2013), decrease the effectiveness of masks (Kristjánsson, 2016), and selection history can even modulate figure/ground assignment (Kristjánsson, 2009). Priming can furthermore serve to build and reinforce attentional templates through repetition (Ásgeirsson et al., 2015; Carlisle et al., 2011), and through this facilitate attentional selection (Brascamp et al., 2011a, b; Chetverikov & Kristjánsson, 2015).

Attentional priming is increasingly thought to strongly influence attentional orienting, and plays a central role in many current conceptions of attention (Awh et al., 2012; Kristjánsson, 2006; Luck et al., 2021) including the latest version of the widely known Guided-Search model (Wolfe, 2021). This highlights the need for a thorough understanding of priming effects, but this role of priming in shaping attentional orienting has not yet been explicated clearly in the literature (Kristjánsson & Ásgeirsson, 2019). A central concept here is that priming may serve to maintain and enhance guidance templates. This is, in fact, the message of the quote from Alhazen, since he argues that prior knowledge speeds perception.

I have two main goals here: Firstly, I discuss recent developments regarding priming of shifts of visual attention, specifically recent evidence that has led to the proposal that priming can be thought of as learning of the underlying probability density function of the target or distractor sets in a given continuous task. While the focus of the discussion is contexts involving attentional selection of targets among distractors, these effects are definitely not confined to such paradigms.

Secondly, I discuss evidence from both computational and theoretical neuroscience and neurophysiology that shows how the concept of probabilistic representations is neurophysiologically feasible and computationally tractable and may also be necessary to understand the brain since uncertainty is a fundamental property of perception (Averbeck et al., 2006; Sahani & Dayan, 2003). By extension, this also applies to the mechanisms of priming of attention shifts.

Target priming, distractor priming, and role reversals

While it has been clear for a number of years that what we have recently attended to or selected tends, overall, to capture attention, such history effects (or priming) cannot be considered as one single entity. Priming is multifaceted, seen in various paradigms and its influences can take many forms (Kristjánsson & Ásgeirsson, 2019). A clear example is evidence from neuroimaging that shows how priming is associated with many diverse influences at various stages of the perceptual hierarchy (Becker et al., 2014; Brinkhuis et al., 2020; Geng et al., 2006; Kristjánsson et al., 2007; Rorden et al., 2011), and that priming operates at varied time scales (Brascamp et al., 2011b; Martini, 2010). Findings from single-cell neurophysiology and electrophysiology tell a similar story (Bichot & Schall, 2002; Eimer et al., 2010; Westerberg, Maier, & Schall, 2020a; see also review in Westerberg & Schall, 2021). Priming can occur both for simple feature tasks (Goolsby & Suzuki, 2001; Maljkovic & Nakayama, 1994) and also for more difficult search tasks such as conjunction search (Kristjánsson et al., 2002). And modulating the difficulty of otherwise similar tasks can change the nature of the priming (Ásgeirsson & Kristjánsson, 2011).

Another key consideration for the current purposes is that these priming effects are as strong, or can seemingly be even stronger, for items that are ignored or rejected (such as visual distractors). Kristjánsson and Driver (2008) assessed the independent effects that could be attributed to target and distractor repetition. They found that repeating distractors, independently of where they appeared, strongly facilitated search performance. Notably, this occurred not only for non-target trials followed by another non-target trial (where the repeated nontargets would be associated with the same response), but also for non-target trials followed by target trials, where the required response changed. This showed that information from preceding trials about which items to ignore was picked up during the search. Similarly, Lamy et al. (2008a, b) concluded that priming of visual search involved both target facilitation and distractor inhibition, and that both were reflected in speeded response times when visual searches are repeated and inhibition when observers switch between search types (see converging evidence in Saevarsson et al., 2008; Wang et al., 2005). These effects were particularly strong when role reversals of targets and distractors occurred.

Encoding of probability information of distractor sets

But the evidence reviewed above does not tell us anything about how the visual system deals with changes in shading and hue that can be very dramatic depending, for example, on whether the sun is shining or whether dark clouds start covering the sky, or whether an object is under bright light or in a dim corner. The berries can have a range of hue and luminance values, yet we are unperturbed in our berry picking (Fig. 1). So, this boils down to an issue of probability: On average, some berry colors in Fig. 1 are more probable than others. How does attention deal with this and can priming be described in a probabilistic way? Notably, such probabilistic models have been proposed with success for visual working memory – where tasks (such as change detection) that are thought to rely on visual working memory, are considered as a Bayesian inference process (Brady & Tenenbaum, 2013).

So-called probability cueing of location speeds shifts of visual attention (Geng & Behrmann, 2002; Miller, 1988). Geng and Behrmann (2005) showed, for example, that search was more efficient for targets that appeared in high-probability positions over a preceding series of trials than when they appeared in random probability or low probability positions. Spatial probabilities could in other words cause a powerful attentional bias.

But until recently, less has been known about whether feature probability also influences attention in the same way. Kristjánsson and Driver (2008) revealed that role reversals from target to distractor (or vice versa) across trials caused a particularly dramatic slowing effect upon search times. Chetverikov et al. (2016) used these role-reversal effects to demonstrate how probabilistic feature information is used for history-based guidance of attention, finding that feature probability predicts the size of the role-reversal effects and that this size can, in turn, reveal observers‘ representations of the probability density functions behind the search distractors.

In this way, Chetverikov et al. (2016) were able to probe into how observers‘ representations of distractor sets build up over time. Chetverikov et al. reasoned that observer’s search times following role reversals would reveal their representations of the probability density functions of the visual search items (see Fig. 2 and Chetverikov et al., 2019, for a tutorial on this feature distribution learning (FDL) method). The observers in Chetverikov et al. (2016) performed a series of searches for an oddly oriented target where the distribution of distractor features was repeated for streaks of adjacent trials. Following these learning trials where observers could encode the repeated target and distractor features, role reversals were introduced where the target now came from within the previous distractor feature distribution. This resulted in increased search times but most interestingly the response times (top graph in Fig. 2b) followed the actual shape of the probability density function of the distractor distribution (bottom graph in Fig. 2b). In other words, if the target came from the mean of a Gaussian distribution of distractor orientations, search was slowest (the role-reversal effect the largest), but faster if the target came from its tails. Conversely, if the target came from a preceding uniform distribution, search times did not change much as long as the target came from within the distribution (e.g., from its tails vs. its mean).

Fig. 2
figure 2

The feature distribution learning (FDL) method. Observers perform a few search trials in a row, where the target comes from a certain distribution. Panel A shows examples of Gaussian (left) and Uniform (right) distributions. On the left, the target in this particular case is in row 3, column 4, and on the right, the target is in row 5, column 3. On test trials the target and distractor distributions reverse, causing role reversals between target and distractors. Using this method, Chetverikov et al. (2016, 2017a-c) were able to measure the slowing of search when the target came from a preceding distractor distribution. As panel B shows (data from Chetverikov et al., 2017a), the profile of the slowing effects (the top graph) matched the probability density function of the distractors on the preceding learning trials (the bottom graph)

The most remarkable result in Chetverikov et al. (2016) was therefore that the shape of the distributions of search distractors on preceding learning trials was reflected in the response time curves. This suggested that observers encode not only the summary statistics of the feature distributions (Alvarez, 2011; Haberman & Whitney, 2011), but the actual distributions themselves. The structure of observer’s representations of the distractor sets on the search trials was therefore revealed by probing the role-reversal effects at different locations in orientation space, manipulating the similarity between the current target and the preceding distractors. In the current context, it is only a small step to assume that the mechanism of the priming (of distractor sets in this case) is the represented probability and that what is primed is a certain probability of a given feature.

Previous evidence had suggested that probabilistic information critically affects performance on visual attention tasks. Michael et al. (2014) observed what they termed variability priming. Their observers had to categorize a “target” array of visual stimuli by average color or shape. A 100-ms duration (“prime”) array that the observers were supposed to ignore preceded the array to be judged. Judgments were faster when target variance (low or high) matched the variance of the prime display. The judgments were in other words primed by the preceding variance. Corbett and Melcher (2014) contrasted visual search for a Gabor patch tilted away from vertical among horizontal ones. Observers had to judge the orientation during “stable” blocks where the mean and/or local sizes of Gabors were constant, and “unstable” blocks where the sizes changed unpredictably between trials. Their observers found targets faster if the global mean size was stable over several adjacent search displays, although there was no relationship between this variability and the spatial location of the target. The set of distractors became easier to reject as nontargets as the information about its characteristics was picked up over consequent trials. But while this evidence suggested that probability influences attention, the results of Chetverikov et al. took this further by showing that the representations themselves are probabilistic (see discussion in Tanrikulu et al., 2021a, b).

Further evidence for the learning of feature distributions

The FDL method for studying representations of visual ensembles that Chetverikov et al. developed revealed that observer’s representations are far more detailed than previous studies had suggested. A particularly sharp contrast was drawn with the literature on summary statistics. A large number of studies in the literature had emphasized how the visual system relies on summary statistics, i.e., some measure of central tendency and variance when assessing the characteristics of ensembles (for reviews, see, e.g., Alvarez, 2011; Haberman & Whitney, 2011). Utilizing such summaries seems like a sensible strategy to cut down on the information that needs to be represented. In contrast, the results of Chetverikov et al. (2016) showed how observers can represent surprisingly complex distribution shapes with considerable accuracy. This result opened the possibility that more information than summary statistics was available if needed. Following up on their original findings, Chetverikov et al. (2017a) then showed that observers can even learn whether a distractor distribution of oriented lines was negatively or positively skewed and whether the distributions were bimodal or unimodal. Importantly, Chetverikov et al. (2017b) then found similar results for color distributions showing that this feature distribution learning was not confined to the orientation domain but is more likely a general principle in visual processing.

It is important to note that distribution learning does, on its own, not reveal much about how a coherent visual world might be constructed from such information. Chetverikov and Kristjánsson (2021) revealed how such probabilistic representations for different feature dimensions can be bound together (e.g., orientation probability modulated the learning of color distributions and vice versa), or bound to particular locations, and probabilistic representations can therefore serve as building blocks for perception in a more general sense. Note that Gekas et al. (2013) failed to find integration of distributions of motion and color, but these are different feature dimensions, and other methodological differences could account for this discrepancy. Notably analyses of neural computation show how more than one feature distribution can be simultaneously encoded in neural population responses (Pouget et al., 2000; Sahani & Dayan, 2003).

A crucial finding was then reported by Hansmann-Roth, Kristjánsson, et al. (2021a), who showed that this detailed knowledge of the distributions of distractor features seems only to be revealed with the implicit FDL method that takes advantage of role-reversal effects, but is not available for explicit report. As in Chetverikov et al. (2016, 2017a, 2017b), observers performed a sequence of visual search trials (for an odd-one-out color target) where the distractors were drawn from a certain distribution. When observers were asked to explicitly select the set from two simultaneously presented ones that appeared more similar to the sets from the preceding trials, there was a very strong dissociation between judgments of central tendency and variance on the one hand, and distribution shape on the other. One of these test sets consisted of items drawn from the same distribution as the distractors during learning trials, while the distributions used to draw the colors for the comparison in this explicit test differed in mean, standard deviation, or shape of the distribution. While during explicit judgments observers could easily distinguish which set contained the mean and standard deviation (summary statistics) of the preceding distractor distribution, they were completely unable to do this for distribution shape. But the results from the FDL method (importantly on the same observers) showed that observers had robust representations of the shape of the feature distributions of the distractor sets, as Chetverikov et al. had previously demonstrated. But these representations were implicit since they were not available for use on the explicit task. Moreover, when analyzing the noise sources for the two test methods, Hansmann-Roth et al. found that the explicit report data had common sources of noise that differed from those for the implicit measures (Hansmann-Roth, Kristjánsson, et al., 2021a), suggesting that they reflect the operation of different mechanisms. Note, finally, that Rafiei et al. (2021a, b) have then shown how such feature distribution learning directly affects perception (not only search performance) by biasing perceived line orientation.

Learning of feature distributions of targets

While the evidence for distractor set representations that include the actual probability density function is strong, as shown above, another crucial question is whether the same holds for targets, the focus of attention at a given time, not just the distractors that should be ignored. This would be evidence for learning that would actually involve the template for the search target rather than templates for rejection. Chetverikov et al. (2020) reported evidence for probabilistic templates: following similar learning trials as in the FDL studies described previously, they presented two targets simultaneously within a single search trial, where the orientations of the search distractors came from a bimodal distribution, testing whether observers encode the full distribution, only one of the two peaks, or their average. Search times for targets coming from between the peaks of a preceding bimodal distribution were lower than if the target came from either of the distribution peaks. Consistent with this, targets from the peaks of the previous distractor distribution were reported later than targets outside or between the distribution peaks, showing how the learning increased the saliency of those targets above other targets. This suggested that the contents of templates guiding visual search are probabilistic, reflecting the statistics of preceding stimuli and that the bimodality of the underlying distribution was represented.

But this result did not directly reveal whether similar learning applies to target representations. The targets are after all the item that is actually searched for and selected in the task, and in theory the learning should be easier since they are the focus of attention. But in a typical visual search there is only one target exemplar on each trial. Hansmann-Roth, Thorsteinsdóttir, et al. (2021b) tested this question of whether observers can learn the shape of target distributions in a paradigm where there was a single target among two distractors (Fig. 3). Observers had to perform a discrimination task on a target drawn on each trial from a color distribution with a certain shape while the distractors came from another section of linearized color space (Witzel & Gegenfurtner, 2015). Hansmann-Roth et al. found that the response-time functions moved closer to the shape of the target color distribution the larger the number of searches observers performed, accumulating more and more information about the distribution behind the targets. Search times were slower for targets at the edges of a distribution, and this slowing was more pronounced for targets coming from a Gaussian distribution than for targets from a uniform distribution (Fig. 3). This indicates that observers represent the full target distribution since targets at the extremes of a distribution are rarer for Gaussian than uniform distributions, and observers were able to learn this over a long sequence of adjacent trials, even though only one exemplar from each distribution was presented on a given trial.

Fig. 3
figure 3

Target distribution learning. The paradigm in Hansmann-Roth, Thorsteinsdóttir, et al. (2021b) and the response time (RT) results as a function of the different target color distributions. Panel (a) shows the different distributions that the targets and distractors were drawn from in color space (five color values from the color space marked on the abscissa). Panels B and C show the normalized response times (b) and proportion correct (c) as a function of the distance of the search target from the mean of the target distribution (for a Gaussian target distribution in blue and a uniform one in green)

Additionally, Hansmann-Roth et al. (2021a, b) noted that there was an upwards slope in the response times the further the target color was from the mean of the distribution (independently of any effect of the difference in distributions). This argued that the closer to the mean the target is, the faster the search, again suggesting that a probabilistic representation of the target representation builds up with the increased number of searches. Overall, the results of Hansmann-Roth et al. (2021a, b) are the first to directly demonstrate learning of target feature distributions and therefore provide an important complement to the learning of distractor distributions shown by Chetverikov et al. (2016, 2017b).

For the question of attentional priming, more generally, this opens up the intriguing possibility that priming of single targets may be biased to the exact repeated color simply because there is no distribution to be learned. The results of Hansmann-Roth et al. (2021a, b) reveal how the visual system gradually builds models of the world – but in some cases the information may simply be very sparse and the task demands very basic. So, for example, if the target color is constant across trials the feature learning may simply collapse to the learning of a distribution with a variance of zero.

Biasing of templates

Other recent evidence suggests that attentional templates for both targets and distractors are biased by attentional history, which is highly pertinent to the issues under discussion here. But this biasing can have some surprising characteristics. While, overall, templates should be tuned towards our goals, interestingly, Navalpakkam and Itti (2007) argued that templates should sometimes be tuned away from the target value for maximal efficiency. This means that under certain conditions, in particular for difficult discriminations, it can be beneficial to tune the search template to nontarget features; the templates may, in other words, be exaggerated to improve discriminability of the target among the information to be ignored (see Fig. 3).

Navalpakkam and Itti (2007) demonstrated this in a clever experiment where observers searched for a target line oriented 55° among distractors oriented 50°. In a subsequent test of the templates underlying the search task, observers had to find the target line in an array of tilted lines. Observers most often selected items oriented 60° while the actual target was oriented 55°, showing how the template was exaggerated relative to the actual target. This shows that target value is not represented, per se, but that guidance is more strategic and can be described probabilistically (see also Scolari & Serences, 2009). Kerzel (2020) showed converging results, by measuring attentional capture by cues. The distribution of cueing effects was asymmetric around the target color in that they were shifted away from the nontarget colors. The largest effects did not occur for cues sharing the target features, but for values shifted away from the actual target value.

Such strategic tuning of attentional mechanisms and templates could be an adaptive strategy, for example to notice things that are unique in a given temporal and spatial context. A particular stimulus or stimulus feature can be expected and unremarkable in one scenario but unexpected and a sign of danger in another. Recent examples of this come from the work of Geng and colleagues. Geng et al. (2017) showed that perceptual and attentional history play a key role in strategic biasing of attentional templates. They found that the probability that a distractor is similar to the target influences the tuning of search templates, with increased similarity leading to more precisely tuned templates. The aforementioned results of Hansmann-Roth, Thorsteinsdóttir, et al. (2021b) are a variant of this, showing how this tuning process may lead to increasingly precise representations of the target distribution. Won and Geng (2018) then used a related approach to demonstrate that distractor templates are more broadly tuned than target templates, and that this allows more generalized suppression than the sharper tuning needed for target templates. Overall, this work of Geng and colleagues provides important clues about how probabilistic information from previous attentional history is utilized during attentional guidance. Lau et al. (2021) then used images of real stimuli, finding that target templates are coarsely tuned when targets and distractors can easily be distinguished, but also that these coarse templates do not transfer well to new scenarios where more precise templates are required.

Note also that attentional priming has been found to occur for faces (Lamy et al., 2008a, b). Given its hierarchical nature (holistic vs. featural; Jozranjbar et al., 2021; McKone et al., 2001), face processing can be an interesting case from the perspective of learning probability distributions and their integration, also with regard to priming of facial emotion (Lamy, Amunts, & Bar-Haim, 2008b). Notably, recent evidence (Schwiedrzik & Freiwald, 2017) shows how face-selective modules in macaques use prediction and error correction in line with predictive coding proposals (see Discussion below).

A related account that argues that target feature values (per se) are not the unit of priming is the relational encoding account of attention (Becker, 2010). While this account is not necessarily probabilistic in the same way as the one proposed here, Becker proposes that features are encoded by the visual system in relational terms (e.g., a red target among orange distractors is encoded as "redder" than the distractors). The relational encoding account involves that target-distractor relations may play a key role in the tuning of attention features (Becker, 2013), and that what is primed on one trial to the next are the target-distractor relations. While Becker’s relational encoding view is not a distribution-based view of priming (or attention) per se, it does emphasize how context strongly affects visual attention, and a question for the future is to what degree the relational encoding account can be encapsulated within a probabilistic account.

What is the evidence for probabilistic representations in vision?

The discussion above shows that attentional priming can be probabilistic – that the priming effects reflect the statistical information available in a given visual scene. So, at this point it is timely to review evidence for a distribution-based view of visual perception and attentional templates more generally.

The implicit assumption in the literature has been that templates are static entities – if you are searching for your friend in the green dress, you have a template for green. But slight changes to brightness levels (a cloud passes before the sun) or viewpoint changes (your friend moves into different lighting) can drastically change the physical values of what we see. So, a precise template for a given feature value is not particularly useful in a given natural setting, where a distribution-based one may work far better (see Kristjánsson & Draschkow, 2021). Various illusions show this convincingly where the perceived color of objects can change because of contextual information (Land, 1977; Purves & Lotto, 2003) or when black changes to white through contextual information, as in the Gelb effect (described in Koffka, 1935; see also Adelson, 1993; Anderson & Winawer, 2005; Gilchrist, 2006). Vision scientists need not look much further than to the media uproar about "the dress" to see how instructing observers to generate a precise template for blue or gold would be badly confusing for effective performance in such a scenario (Chetverikov & Ivanchei, 2016; Winkler et al., 2015).

Distribution-based templates (for either selection or rejection, or both) should, on average, fare better with uncertainty such as whether the target is in a shadow or in direct sunshine (see, e.g., Fig. 1) since such distributions are not as strongly affected by the context. A distribution-based view allows for influences of different interpretative mechanisms – depending on what part of the neural signal carries the most pertinent information.

The very influential signal detection theory (Green & Swets, 1966) assumes that optimal perceptual decisions can be made based on probability distributions. And there is indeed growing evidence that sensory representations operate based on probability. Support comes from findings such as those of Jazayeri and Movshon (2007), who showed that during motion perception, signals from neurons that best discriminate the motion are preferentially weighted, reflecting decoding that optimizes the neural population response. They argued that sensory representations recode sensory responses into sensory likelihoods and that the "subjective experience of sensory events arises from the representation of sensory likelihoods, and not directly from the responses of sensory neuronal populations" (p. 915). To take one example consistent with this from a different domain of vision, De Valois et al. (2000) showed how chromatic tuning of neurons is affected by non-linear transformations from the lateral geniculate nucleus to the striate cortex. A crucial assumption of many recent theories of perception, including Bayesian theories, is that the brain represents information probabilistically (e.g., Griffiths et al., 2010; Kersten & Yuille, 2003; Mamassian et al., 2002; Tanrikulu et al., 2021a, b).

Mathematical modelling indicates that for many discrimination judgments the gain brought on by attentional facilitation should not always be applied to the most active unit, especially when the discrimination is difficult (Seung & Sompolinsky, 1993). For different tasks different aspects of the neuronal response should be utilized to maximize the information from the signal. The results of Pasupathy and Connor (2002) show, for example, how multipeak population responses can represent a complex stimulus (in this particular case shape in V4), and that generally, similar stimuli lead to similar population responses.

A lot of psychophysical evidence is consistent with such strategic tuning (e.g., Navalpakkam & Itti, 2007). Regan and Beverley (1985) reported evidence suggesting that the most active neurons determine detection of a stimulus in an orientation task while more weakly excited neurons determine discrimination performance. Slight changes in orientation are detected by those units that change their activity the most even if they are not the most active neurons. Hol and Treue (2001) observed analogous results suggesting that detection of visual motion is carried out by the units tuned to the motion direction, while discrimination performance is based on neurons that may be tuned 40–60° away from the test direction. Scolari and Serences (2009) found evidence that attention increases the gain of the most informative sensory neurons, even if they are tuned away from the relevant target features. This suggests that attention maximizes the differential response associated with targets (see Fig. 4). This is a very important point since it indicates that attention may not serve to maximize the response of a neural mechanism tuned to exact task-relevant features but rather to those providing most information, through the largest change in signal contrast rather than absolute activity. This is a clear example of how statistics play a key role in selection (see also Butts & Goldman, 2006; Purushothaman & Bradley, 2005). Both the peak (or difference in peaks) and the slopes (or differences in slopes) of neural tuning functions carry large amounts of relevant information, and the two can serve different roles depending on context.

Fig. 4
figure 4

The amount of differential information as a function of position on neural tuning functions (see, e.g., Navalpakkam & Itti, 2007; Scolari & Serences, 2009). In the upper panel, the change in activity in neuron B is larger for a difficult discrimination than for neuron A, even though neuron A is precisely tuned to the target but B is not. This is not the case for the easier task in the lower panel

Consistent with this, Jazayeri and Movshon (2007) highlighted that for coarse direction discrimination of dot motion, neurons tuned to the direction of interest provide the most reliable information, but they also argued that perceptual decisions reflect a decoding strategy of population responses where statistical reliability of sensory responses is taken into account. They found that irrelevant subthreshold motion had a significant effect on decisions, concluding that the visual system relies on pooled information from neurons tuned to a wide range of directions. Importantly, Jazayeri and Movshon (2006) had previously shown how a neuronal population can compute likelihood, arguing that "[m]any perceptual tasks are better viewed as a statistical inference problem." According to this, the brain decides on the most probable stimulus that induces a given neuronal population response.

Population encoding and predictive coding

Population encoding involves that perceptual input is coded by a set of overlapping neural units. This concept has gained increasing momentum within neuroscience, over accounts where single neurons are assumed to fulfil a given task in perception (e.g., Barlow, 1972; see Yuste, 2015, for a review). As suggested by Lehky and Sejnowski (1999), the first concept of a population code is probably the nineteenth century theory of trichromacy, attributed to Thomas Young and Hermann von Helmholtz, since the proposal is that perceived color is produced by relative activity in three broadly tuned channels. Another example is the proposal that neuronal mechanisms in primary visual cortex perform operations akin to Fourier analyses of the image (Albrecht, DeValois, et al., 1981). Population encoding involves probabilistic representations where there is a transformation from input space to an abstract representation (Lehky & Sejnowski, 1999).

Further arguments for such views come from the concept of predictive coding (Rao & Ballard, 1999). A key component of predictive coding accounts is that the brain strives to minimize free energy (defined in terms of information rather than thermodynamically). This involves optimization where the brain is thought of as a Bayesian inference machine that generates a probabilistic model that yields predictions about the environment. Such a generative model is decomposed into a likelihood of a given event and the prior probability of the causes of these events. As Friston (2010) proposes: "the theme underlying the Bayesian brain and predictive coding is that the brain is an inference engine that is trying to optimize probabilistic representations of what caused its sensory input (p. 130).“ A central goal is to maintain homeostasis by avoiding nasty surprises. Girshick et al. (2011) reported an informative application of such ideas in the domain of orientation perception. They reported that internal models (priors) match environmental statistics (as assessed in a database of over 650 natural scenes) and showed how such a representation could be mathematically implemented in a neural population, and, secondly, when these biases were used as priors in a Bayesian-observer model, human performance was accurately predicted (see Zhang et al., 2013, for similar findings for motion).

As Vogels (1990) makes clear, population encoding is mathematically tractable and can account for findings such as hyperacuity (Westheimer, 1981). Hyperacuity is a term applied to findings where acuity is larger than neural receptor density allows, which, on its own, makes the case that such discrimination must reflect the processing of neural populations (since otherwise better performance than receptor density allows could not be achieved). Secondly as Averbeck et al. (2006) argue, probabilistic population coding seems logically necessary: "the same pattern of activity never occurs twice, even when the same stimulus is presented. Because of this noise, population coding is necessarily probabilistic" (p.358, my italics).

If we assume that priming operates on these population representations (and it is hard to see what else they would operate on in light of the neurophysiological evidence), they are likely to operate in this probabilistic way. And if the visual system does indeed utilize probabilistic information about distractor distributions in the same detail as the results of Chetverikov et al. suggest, integration of information (such as between trials) should accord with well-known Bayesian integration principles (Knill & Pouget, 2004; Körding & Wolpert, 2006). Notably, a testable prediction from this is that more reliable information should be weighted more strongly (Tanrikulu, Chetverikov & Kristjánsson, 2021; Witkowski & Geng, 2019).

Neurophysiology of attentional priming

Findings on population encoding of feature values discussed above indicate that a probabilistic account of priming is plausible from a neural perspective. Each neuron is considered to have a distribution of responses over some set of inputs (e.g., feature values, such as color or orientation), and the responses of many neurons are combined to reach a conclusion about the input. Note that population encoding does not necessarily entail that perceptual representations are probabilistic, but, as Sahani and Dayan (2003) point out, information about any single stimulus is encoded in the activity of a large set of neurons at any given time. And as Averbeck et al. (2006) argue, similar activity patterns never occur twice, even when the same stimulus is presented, meaning that population coding is "necessarily probabilistic."

Turning to neurophysiological evidence, Chen and Wise (1995) proposed that the supplementary eye fields (SEFs) in the frontal lobes are part of a neural system that is capable of learning flexible, nonspatial relations between stimuli. Consistent with this, Olson and Gettner (1995, 1996, 1999, see also Tremblay, Gettner, & Olson, 2002) found SEF neurons that had the highest firing rates when monkeys prepared saccadic eye movements to particular locations on a stimulus irrespective of its absolute position. The importance of this finding in the present context is that the SEFs show activity patterns concerned with the learning of spatial relations between stimuli, in object-centered rather than position-centered coordinates. Kristjánsson et al. (2001); Kristjánsson & Nakayama, 2003; Kristjánsson, 2009) found implicit object-centered learning of probabilistic cue-target relations, which is consistent with this. They argued that this learning is based on the statistics of the input, and such learning could occur for a wide range of phenomena, including those manifesting as attentional priming (Nakayama et al., 2004). Note, however, that the evidence for the SEF involvement is not all consistent with this temporal modulation role of SEF (Purcell et al., 2012). Also, Westerberg, Maier, Woodman, and Schall (2020b) found that the error-monitoring activity that the SEFs seem to be involved was not modulated with repetition of search features.

Bichot et al. (1996) showed that frontal eye field neurons do not initially show feature selectivity, rather, they gradually develop such selectivity as monkeys are trained on targets of a single color. This shows how statistics within attentional history are reflected in neural activity. These neural mechanisms are strongly connected with eye movements and attention, consistent with the finding that attentional priming also speeds saccades (Bichot & Schall, 1999; McPeek et al., 1999; Shurygina et al., 2019). A key finding was then reported by Bichot and Schall (2002), who showed how as priming builds up (as monkeys performed a saccade task to color singleton target), frontal eye field neurons discriminated targets earlier and more precisely.

Findings from neuroimaging and neuropsychology show that the frontoparietal attention networks are involved in priming (Brinkhuis et al., 2020; Campana et al., 2007; Kristjánsson et al., 2005, 2007). Liberal exchange of information within the frontal cortices, mainly in medial and orbital regions, allows them to operate as a general learning mechanism of environmental patterns (Duncan, 2010; Duncan & Owen, 2000), including learning the probabilistic priming effects discussed here.

But the frontoparietal network is clearly not the only neural network influenced by search repetition. Such effects are seen in early sensory areas (Adam & Serences, 2021; Won et al., 2020). Visual priming is diminished by lesions of TEO and abolished by lesions of V4 (Walsh et al., 2000). Neural spiking from V4 revealed earlier target selection associated with priming of pop-out (Westerberg, Maier, & Schall, 2020a). This is, overall, consistent with EEG evidence on priming (Eimer et al., 2010) since the N2PC (that is modulated with priming) is likely to have an origin in occipto-parietal areas.

Findings from neuroimaging show that priming of color, for example, causes BOLD activity modulations in areas related to color processing, such as V4, while location priming is connected with processing in the intraparietal sulcus among others (Brinkhuis et al., 2020; Geng et al., 2006; Kristjánsson et al., 2007; Rorden et al., 2011). Additionally, attentional networks such as the frontal eye fields and the intraparietal sulcus are modulated with priming (Becker et al., 2014; Brinkhuis et al., 2020; Kristjánsson et al., 2007). This is consistent with the idea that priming is a general mechanism of different brain modules that serve differing functions, echoed in the highly diverse nature of priming effects (Kristjánsson & Ásgeirsson, 2019). Additionally, Brinkhuis et al. (2020) found parallels between the measured temporal profiles of across-trial patterns of cortical BOLD signals and search response-time reductions (Brascamp, Pels, & Kristjánsson, 2011b; Kruijne et al., 2015; Maljkovic & Nakayama, 1994; Martini, 2010), strongly supporting that these cortical areas mediate attentional priming.

Finally, learning can sharpen population codes in mice performing orientation discrimination (Failor et al., 2021). Such pruning of neuronal responses could produce priming effects. And recently such learning by prediction has been demonstrated to occur at the level of single cells (Luczak et al., 2022).

The neurophysiological evidence is overall consistent with probabilistic encoding and that the statistics of the input shape this activity over time, but there are still unanswered questions that involve exciting challenges for future research. A probabilistic notion of attentional priming entails clear prediction that certain neural structures and populations, such as microcircuits serving predictive functions (Bastos et al., 2012), could show modulation of their activity profiles as probability functions of input are learned.

What is attentional priming for?

In the preceding discussion I have tried to make clear how a general probabilistic view of perceptual representations is viable from the evidence from psychophysics, neurophysiology (in particular the idea of population encoding), and computational and theoretical neuroscience. I have also discussed how priming operates in a probabilistic way. At this point I will look at attention and attentional priming from a functional viewpoint, focusing on the role priming may play in perception from this perspective.

Priming has often been considered as the priming of the identity of the feature in each case, but recent evidence (reviewed above) shows how priming effects could be thought of as reflecting learning and encoding of probability distributions, and that these encodings bias attentional selection.

A unifying theme in the literature on attentional priming is that our goals are the same over a given time period (e.g., keeping track of our child in a playground, or picking blueberries), and priming is thought to keep our attention tuned towards the task we have to perform at a given moment. For this purpose, relevant items are highlighted and can be attended to and irrelevant information can be ignored. This can occur for features, as has been shown in many experiments starting with Maljkovic and Nakayama (1994; see review in Kristjánsson & Ásgeirsson, 2019), or for assembled objects (Ásgeirsson & Kristjánsson, 2011; Kristjánsson et al., 2008).

Maljkovic and Nakayama (1996) used the analogy of a capacitor that accumulates electrical charge: "As each charge is added to the input, the storage in the capacitor increases, the duration of the increase being dependent on the capacitance and the resistance through which it is discharged (p. 988)." Maljkovic and Nakayama argued that priming of position and features is mediated by a process roughly analogous to this physical model. There is in fact evidence of how templates may become more robust (Carlisle et al., 2011) or their tuning more precise (Geng et al., 2017; Hansmann-Roth et al., 2021a, b) as the same task repeats. This capacitor analogy is also interesting because of evidence for different precision (or fidelity) of attentional templates (as reviewed above) dependent on circumstances (see, e.g., Brady & Tenenbaum, 2013). Witkowski and Geng et al. (2019) found that the visual system prioritizes stable features over variable ones, once again consistent with the prediction that more reliable information should be weighted more highly (Tanrikulu, Chetverikov, & Kristjánsson, 2021b; see also Friston, 2009; Körding & Wolpert, 2006) and that, in turn, target feature variability influences templates.

Maljkovic and Nakayama (1996) drew up another interesting analogy, in this case from a more functional viewpoint, suggesting that priming "would be most beneficial in situations where the focus of attention must be directed efficiently to temporary repetitions occurring over seconds or minutes." They used an example of cutting carrots for cooking: "Consider the repetitive motor activities involved in food preparation – for example, reaching for carrots, cutting them up, and throwing them into a pot. Automatic color priming for orange resulting from several attentional deployments would help redirect attention to the next uncut item. When switching to another class of vegetables, however, the short duration of color priming would also help us adapt quickly to the new task" (p. 998). Kristjánsson and Campana (2010) drew up a similar analogy although the context is a love interest at a party: "You seem to be constantly aware of where that person is, and your gaze is repeatedly drawn toward the dashing red shirt or dress that he or she is wearing or to their shining black hair, in such fine contrast to their paler face, despite your best efforts to not look too eager (p. 5)."

Although there is a large difference between keeping an eye out for a love interest and cutting fruit or vegetables, the analogies are otherwise similar. In both cases, the argument is that priming maintains the template for a particular color in working memory (and that its fidelity gradually increases as searches repeat). But I argue that a probability-based view would handle this even better, since the carrots are actually unlikely to have precisely identical hues. And the red item of clothing that the person you have a crush on is wearing can vary in its physical properties depending on whether they are outside in sunlight, inside under dim lighting, or under a strong light. Consider a case analogous to Maljkovic and Nakayama‘s carrot example of cutting up apples for your apple pie. Each apple has a range of hues from light green through red and representing them as having a certain probability of hues would be obviously more useful than a single-color value (see Fig. 5). A probabilistic template would work better than a precisely tuned one in this scenario.

Fig. 5
figure 5

Apples ready for cutting, and 14 samples from their range of color values. Daylight from top left and a tungsten lightbulb from top-right illuminate the scene, while an opaque screen casts a shadow on the top-middle apples. The apples therefore have a wide range of hues and brightness (shown in the color samples from the image at bottom right). For a continuous task involving these apples (e.g., cutting them up), representing the different possible hues would be most practical, and although they cover a sizable range, some colors are clearly more probable than others

A finding reported by Ballard, Hayhoe, Li, and Whitehead (1992; see also Draschkow, Kallmeyer & Nobre, 2021) is relevant to this. Their participants were instructed to recreate an adjacent array of colored blocks from a pile of blocks. Ballard et al. expected observers to look at the model area, memorize the blocks, and then place the blocks in the copy area. However, their observers did not seem to memorize the whole area, instead seeming to memorize only a small amount of information at a time since they continually checked back and forth from the model to the copy area. Maljkovic and Nakayama (1996) argued that priming enabled participants to use this strategy effortlessly and reorient quickly to the same color. This example is informative, but importantly in many natural tasks the items we interact with are unlikely to have a similar uniformity of color to these controlled conditions, and templates tuned to precise values would be impractical in natural tasks where there is variation in the stimuli (see Kristjánsson & Draschkow, 2021).

Priming has been thought to reflect that the identity of the repeated target is facilitated between trials, and in many cases this has been the obvious explanation, but recent evidence for probabilistic representations and probabilistic priming that I have reviewed above, indicates that the learning is far more interesting and complex than this implies. The representations that develop during visual attention tasks include the probabilities of different feature values. This also means that priming of, say, a particular color or orientation simply boils down to priming of a distribution with no variance.

Priming and attentional capture

But priming may not only be good for cutting up fruit and vegetables, or more generally, reorienting to items during continuous tasks. In the context of attentional capture, a hotly debated topic for some time now, priming may keep us on track in the tasks we perform, preventing attention from being captured by irrelevant stimuli. A distribution-based view of priming makes sense in this context. We might assume that the features that are prevalent in the scene are learned and events out of the ordinary within that scene capture attention as they may denote something unexpected and dangerous. In fact, in a recent discussion of the attentional capture literature (Luck et al., 2021), history effects were invoked to explain key factors in attentional capture. As discussed above, attentional priming can draw attention to task-relevant items, while other findings show how priming effects that are no less potent can be attributed to the nature of the distracting stimuli. Importantly, these effects have been shown to be independent of target facilitation effects (Kristjánsson & Driver, 2008; Lamy et al., 2008a, b). The distribution-based view can make this role of priming in preventing attentional capture more explicit. It makes sense that features close to the primed one also capture attention but maybe not to the same degree as the primed feature. The brain therefore faces a probabilistic computational problem, and the degree to which items capture attention depends upon how they fit the distribution. A clear prediction from this is that feature distribution learning of distractors (Chetverikov et al., 2016, 2017a-c, 2020) should tend to reduce attentional capture by irrelevant items from outside the distribution.

Attentional learning that can be attributed to independent effects of targets and distractors is likely to affect the degree to which stimuli capture attention (Kristjánsson & Driver, 2008; Maljkovic & Nakayama, 1994). These priming effects are likely to reflect the intricate learning of detailed distributions of distractor features revealed by Chetverikov et al. (2016, 2017a, 2017b2017c, 2020). Together, these mechanisms can highlight task-relevant items and inhibit irrelevant ones.

Attentional capture occurs within a context both in time and space and the temporal effects vary as a function of the different roles that particular stimuli play, whether they are targets or distractors. Luck et al. (2021) argued that "understanding the learning processes involved in suppression will be an important step towards developing training programs to prevent visual distraction” (p. 13). I wish to argue that answers to questions of this sort are already available in the literature, for example in the previously reviewed distractor distribution learning found by Chetverikov et al. that shows that observers can learn remarkably detailed characteristics of the statistical properties of feature distributions as well as evidence showing that templates for selection and rejection are strategically encoded (Geng et al., 2017; Won & Geng, 2018). This implies that knowledge of the moment-by-moment statistical properties of the visual environment gradually builds up over time (in the short term, as observers perform a particular task, or attend within a certain environment) and gives clues about what to ignore and what to pay attention to. In turn, this can affect what captures attention at any given moment. Findings that the strength of attentional capture by task-irrelevant items can depend on the distractor statistics are consistent with this (Chelazzi et al., 2019; Geng et al., 2019; Sayim et al., 2010; Töllner et al., 2015; Turatto et al., 2018). 

A Bayesian, probabilistic view of priming?

So, should we expect to be able to construct a Bayesian account of priming where probabilistic representations play a key role? Bayesian models assume that perceivers use implicit knowledge of the environment for understanding the visual scene, which are computationally implemented as priors. These priors can be long term, such as the assumption that light comes from above (Adams, 2007) or that visual objects are static or move slowly (slow-speed prior; Weiss et al., 2002), but they can also occur in the short term (in the current case priming, or in knowledge of recent environmental statistics). Seriès and Seitz (2013) proposed that expectations can operate as Bayesian priors, and priming could similarly serve as priors. Updating of priors can occur very quickly (Raviv et al., 2012), and recent evidence is highly weighted. Consistently, recent trials have a very strong influence in attentional priming. These short-term priors could interact with longer-term priors for constructing the world (Chalk et al., 2010; Sotoripolous et al., 2011). And the sheer strength of priming effects may suggest that we have a prior for things staying constant from one time to the next – that the visual system assumes continuity (Cicchini & Kristjánsson, 2015; Fischer & Whitney, 2014). Interestingly, Chetverikov and Kristjánsson (2015) show how when expectations from history are violated through role reversals observers tend to actually like the "offending" stimuli less than other stimuli during such a task (see discussion in Chetverikov & Kristjánsson, 2016).

While a Bayesian/probabilistic account of priming as distribution learning is not formally proposed here, the ingredients would seem to be available. Bayesian approaches have fared well in accounting for a number of history effects, such as serial dependence (Fritsche et al., 2020) and adaptation (Kale & Hullman, 2019). Kale and Hullman (2019) present a formulation of the Bayesian theorem that describes how properties of a distribution could be learned. If learning of target feature distributions as in Hansmann-Roth, Thorsteinsdóttir, et al. (2021b) is taken as an example a particular property of a distribution (say, its variance Vi) could be estimated recursively following a set of search trials j through i:

$$P\left({V}_i|{tc}_{j\le i}\right)=\frac{P\left({tc}_i|{V}_i,{tc}_{j<i}\right)P\left({V}_i|{tc}_{j<i}\right)}{P\left({tc}_i|{tc}_{j<i}\right)}$$

where tcj < i are the target colors on trials before the current trial color tci. Similar equations could be generated for other distribution properties such as the mean or expanded to complex distribution shapes. It is interesting to note that this equation predicts that the learning of distribution means should be easier than learning their variance, and lower variance should be easier to learn than higher variance (Tanrikulu, Chetverikov, & Kristjánsson, 2021b). With regard to Bayesian accounts, the more uncertain the data, the more the prior influences perceptual interpretation (Seriès & Seitz, 2013; Tanrikulu, Chetverikov, Hansmann-Roth, & Kristjánsson, 2021a), and there is indeed evidence that uncertainty regarding target identity increases the strength of priming (Meeter & Olivers, 2006).

But more generally, a Bayesian view also speaks to attention defined as a process of optimizing predictions during perceptual inference. As we have seen, priming may aid with that. These priors could operate within short-term memory (Trapp et al., 2021) through the updating of templates. As Dasgupta and Gershman (2021) argue, memory can serve as a resource for computational reuse, enabling efficient computation. It is tempting to speculate that priming can be part of such reuse of prior calculations.

Conclusions

So, what does all this tell us about priming of attentional selection, visual representations, and templates for selection and rejection? There is good evidence that the guidance of attention relies on probabilistic representations and that these same representations are the unit of priming. This makes sense: If probabilistic information is the unit of perceptual decisions and attentional selection, this should, by necessity, also be the unit of priming.

This distribution-based view is consistent with accumulating evidence that the brain uses all kinds of statistical tricks when constructing models of the visual world from neuronal input, consistent with ideas of population coding. The noise inherent in neuronal signals makes the possibility of single neurons for single functions (e.g., Barlow, 1972) unlikely, and this is part of the reason that probabilistic views gain popularity.

The success of vision should be measured by utility. The question is: Do we eat or starve, live or die, not whether we get the physics of visual stimuli right, the actual reflectance, hue, or luminance values. Priming is likely to help us reorient to items of importance and keep us on track. This can only be achieved with some variance in the guiding template, and the most parsimonious explanation is that this is achieved by approximating the probability distribution underlying the input.

The concept of probabilistic priming follows naturally from recent developments within theoretical neuroscience, predictive coding (Rao & Ballard, 1999), the brain as a Bayesian inference machine (Tanrikulu et al., 2021a, b), and the free energy principle, which supposes that organisms strive to minimize the difference between predictions about sensory information and the information itself (Friston, 2009). The free energy principle entails that the visual system encodes visual information in a probabilistic way and sensory signals are adjusted to represent prediction errors, coded as differences from expectations. Consistently, there is evidence that guidance templates are based not on what is optimal for detecting targets but what is optimal for distinguishing targets from distractors (Becker, 2010; Geng et al., 2017; Kerzel, 2020; Lau et al., 2021; Navalpakkam & Itti, 2007; Scolari & Serences, 2009). This naturally entails a distribution-based view.

A final note is warranted in that these proposals here can have important implications for conceptions of attentional templates, and by extentsion, working memory (Woodman et al., 2013). Templates stored in visual working memory are often thought of as fixed, precisely tuned entities – if you search for a red vertical stimulus your template includes exactly that. A probabilistic view of attentional guidance entails that conceptions such as the argument that task-important stimuli result in templates with a “special” status in visual working memory, differing from others‘ representations (Olivers et al., 2011), would need revision. It may be a moot point whether only a single template is active at any given time (van Moorselaar et al., 2014) or whether search enhances the fidelity of templates (Rajsic et al., 2017). How narrowly templates are tuned would be a direct function of the variance of the task-relevant stimuli (Chetverikov et al., 2020; Hansmann-Roth et al., 2022).

In sum, the proposal here is that visual representations and templates are not tuned to precise values. Priming of these templates can therefore be thought of as learning of the underlying probability density function of the target or distractor sets in a given continuous task. Priming of, say, a particular constant color involves priming of a distribution with no variance. A corollary of this view is that templates for selection and rejection are dynamic and modulated by the temporal, spatial, and featural context, in other words, the probability distribution of the input.