Introduction

Things that visually stand out from their surroundings attract attention and appear salient. In everyday life, this is experienced most prominently if salience interferes with the goals we pursue. A relevant object, like a key on a cluttered table, may be not salient enough, and therefore overlooked, which results in the iconic event of finding the key exactly where one was already looking a couple of minutes ago. A mismatch of salience and goals can also occur if one is distracted from a task by a new bright object appearing in the periphery. Apparently, such properties can strongly influence how we process visual stimuli.

Much evidence shows that salience contributes to the deployment of attention (for reviews see Carrasco, 2011; Schütz, Braun, & Gegenfurtner, 2011; Treue, 2003). Salience can be caused by local feature contrasts (e.g., Nothdurft, 1993b; Treue, 2003). Feature contrasts exist in different dimensions such as luminance, orientation, color, and motion (for a review see Wolfe & Horowitz, 2004). The stronger the contrast in a dimension, the more salience affects attention (Avraham, Yeshurun, & Lindenbaum, 2008; Duncan & Humphreys, 1989; Wolfe, Cave, & Franzel, 1989; Wolfe, 2007. Also, salience increases if contrasts from different dimensions are combined (Huang and Pashler 2005; Nothdurft 1993a; 2000), and time modulates the influence of salience gradually (e.g., Donk & van Zoest, 2008; Dombrowe, Olivers, & Donk, 2010). In all these cases, a gradual influence of contrast is present.

Although these results suggest that salience is a useful concept in explaining attentional phenomena, not all models of attention are explicit about the relationship between physical contrast, salience, and attention. One way of modeling visual attention is biased competition in which stimuli struggle for representation (Desimone and Duncan 1995). This competition is affected by limitations and biases. To link these influences and their possible interaction with actual behavior, visual processing can be described mathematically. Such an approach has been worked out in Bundesen’s theory of visual attention (TVA; 1990, 1998) as a set of equations that describe the encoding of visual stimuli into visual short-term memory (VSTM).

According to TVA, a visual stimulus is either encoded into VSTM or not. As time proceeds, the chance that a stimulus is encoded increases. This property of TVA is often explained by the metaphor of stimuli entering a race. The faster the racer, the higher the chance of getting represented although limitations and biases preclude that everything is eventually represented.

One important advantage of TVA is that the postulated limitations and biases correspond to high-level concepts that psychologists are interested in. These concepts include the limits of VSTM, the processing capacity of the visual system, and the spatial distribution of attention. With respect to these concepts, TVA’s parameters and mechanisms provide high explanatory power (Logan 2004): A small number of parameters suffices to precisely describe data from a multitude of experimental conditions (e.g., Shibuya & Bundesen, 1988) and to explain a remarkable number of attention-related phenomena (e.g., Bundesen, 1990; Kyllingsbæk, 2006; Tünnermann, Petersen, & Scharlau, 2015).

TVA allows for quantitative analysis of attention in a wide variety of research topics ranging from temporal expectations (e.g., Vangkilde, Coull, & Bundesen, 2012), over attentional dwell time (e.g., Petersen, Kyllingsbæk, & Bundesen, 2012), feature-based attention (e.g., Hung, Driver, & Walsh, 2005), psychopharmacological influences on attention (e.g., Vangkilde, Bundesen, & Coull, 2011), and cognitive training (e.g., Schubert et al., 2011) to and clinical deficits (for a review see Habekost, 2015). The aptness of the theory’s equations has further been substantiated by showing how the proposed mechanisms may be implemented on a neuronal substrate (Bundesen, Habekost, & Kyllingsbæk, 2005, 2011). Although there are different layers of abstraction for analyzing information processing (Marr 1982) we think that, given these merits, TVA is a promising tool for investigating quantitative properties of visual salience as well.

Classical TVA modeling regarded salience as a regular feature of stimuli. Such regular features affect attention only if they are task-relevant. As mentioned above, however, a local contrast may have an influence independent of relevance. To capture this stimulus-driven influence, TVA was recently extended by a new parameter (Nordfang, Dyrholm, & Bundesen 2013). In addition to category- and relevance-based influences of earlier TVA, the authors modeled the effect of feature contrast on visual attention in a new variable called κ. Additionally, the authors showed that feature contrast and feature relevance interact multiplicatively. This makes explicit how salience and attention are related.

However, Nordfang et al. (2013) do not model how physical contrasts cause the salience parameter κ to change. Based on the reviewed literature, it is highly likely that there is a systematic causation based on the strength of contrast within a dimension and the combination of contrasts from different dimensions. One possible reason for this limitation is the partial-report design used by Nordfang et al. (2013). Partial report requires naming letters. The ease of recognizing an individual letter is affected by contrast like luminance and orientation contrast. Thus, systematically varying contrasts may induces a confound. To relate TVA’s salience parameter κ to contrasts actually used in salience research, TVA needs a new experimental paradigm.

Recently, it has been proposed that the judged temporal order of two events can be understood as the outcome of two TVA races (Tünnermann et al. 2015). Temporal-order judgments (TOJs) allow estimating the distribution of attention for individual elements in cluttered displays (Krüger Tünnermann, & Scharlau 2016). How this approach can be combined with the TVA extension by Nordfang et al. (2013) will be explained in the General Method section. Afterward, we tackle the main goal of the present article: We extend TVA by modeling how physical contrasts cause TVA’s salience parameter κ to change. Quantitative measurement and modeling of salience extend TVA’s mathematical approach to visual attention, thus broadening the explanandum of TVA.

General method

TVA can be understood as a powerful mechanism that models visual recognition and selection with a small set of parameters. The TVA parameters relevant for the current approach are overall processing capacity, C, and the newly introduced parameter for stimulus-driven influences, κ, which captures the effect of salience (Nordfang et al. 2013). To trace how these two parameters explain and describe the data in the experiments, further TVA concepts are needed: For a particular stimulus x, a processing rate, v x , and attentional weight, w x can be specified. All further TVA parameters concern influences that are kept constant in the present paradigm and thus will only be briefly mentioned when necessary.

As mentioned above, the central assumption of TVA is that visual stimuli are processed in parallel, illustrated by the metaphor of a race. In our experimental design, two stimuli compete in the race. To distinguish both stimuli, we call one probe (index p) and the other reference (index r). The probe stimulus will be subject to experimental manipulations whereas the respective property of the reference stimulus is kept constant. Whichever stimulus wins the race becomes accessible to further cognitive processing first. The speed with which the stimulus is expected to finish the race is modeled as its processing rate v p and v r . These rates are not independent because TVA assumes that the visual system has an overall processing capacity. This capacity is denoted formally as C and can be thought of as being distributed to the two stimuli. Thus, summing up both rates yields C. Models in which the condition of a fixed C holds are called a fixed-capacity independent race model (e.g., Shibuya & Bundesen, 1988).

How C is exactly distributed to the stimuli depends on attention. In TVA, attention is represented for each object by an attentional weight, which is denoted as w p and w r , again for probe and reference respectively. This yields (1a).

$$\begin{array}{@{}rcl@{}} &&v_{p} = w_{p} \cdot C \end{array} $$
(1a)
$$\begin{array}{@{}rcl@{}} &&v_{r} = w_{r} \cdot C \end{array} $$
(1b)

The attentional weights are a general quantitative measure of attention. As mentioned above, Nordfang et al. (2013) modeled feature contrast as a separate variable, κ, contributing to these weights. They identified a multiplicative interaction of κ and the category-based top-down influence of TVA’s attentional weight. Their results motivated the CORE (short for contrast-relevance) equation shown in Eq. 2a for the probe and reference stimulus.

$$\begin{array}{@{}rcl@{}} &&w_{p} = \kappa_{p} \sum\limits_{j \in R} \eta(p,j)\pi_{j} \end{array} $$
(2a)
$$\begin{array}{@{}rcl@{}} &&w_{r} = \kappa_{r} \sum\limits_{j \in R} \eta(r,j)\pi_{j} \end{array} $$
(2b)

Most components of the CORE Equation are also part of the classical TVA. These components are the function η and the parameter π. The function η represents the perceptual evidence that a stimulus belongs to a perceptual category where R is the set of all perceptual categories. Filtering for particular categories is modeled by π j . The higher π j , the more important the category j is in the filtering process. For a detailed explanation of these category-based biases (called pigeonholing and filtering), which are of no further interest in the present context, the reader is referred to the works of Bundesen (1990, 1998).

If both stimuli, probe and reference, are equally task relevant (π) and the task-relevant feature has the same sensory evidence (η), then the sums in Eqs. 2a and 2b cancel out. Note that it is crucial that the task-relevance of the perceptual category “salience” represented by π salience is zero. Else, the stimulus-driven component κ and the effect of the π salience and η salience on attention would be indistinguishable. In our experimental design, however, attending to salience does not help to complete the task. Under these circumstances, the salience of the probe can be manipulated to measure its κ without confounds from task relevance.

Nordfang et al. (2013) proposed that a non-salient stimulus has a κ value of 1. Given this assumption κ r = 1, Eqs. 1a and 2b can be combined to yield (3a).

$$\begin{array}{@{}rcl@{}} &&v_{p} = \frac{\kappa_{p}}{1+\kappa_{p}} \cdot C \end{array} $$
(3a)
$$\begin{array}{@{}rcl@{}} &&v_{r} = \frac{1}{1+\kappa_{p}} \cdot C \end{array} $$
(3b)

In these equations, both processing rates merely depend on the salience of the probe stimulus κ p and the overall processing capacity of the visual system C.

As mentioned above, these considerations would not be helpful in the partial-report design of TVA because the manipulation of contrast will possibly affect the categorization. Recently, however, Tünnermann et al. (2015) developed a scheme with which TVA parameters can be inferred without the need to categorize. They used a temporal-order judgment (TOJ) of two stimuli. In this simple task, the participant judges which of two events occurred first. Attentional manipulations cause a well-known illusion in TOJs called prior entry in which an attended stimulus is perceived as appearing earlier than a like, but unattended stimulus (for a review see Spence & Parise, 2010). This phenomenon can be interpreted as the outcome of two TVA races: the races of probe and reference starting at different points in time (because of the onset difference between the stimuli) and modeled by their processing rates v p and v r , respectively. An increased attentional weight influences the processing of the probe. Besides the temporal difference due to the stimulus-onset asynchrony (SOA), there thus is an attentional advantage for one of the races such that the stimulus is perceived earlier. Generally speaking, attentional weights can be inferred from the relative arrival times of the two stimuli. This approach has been tested for several attentional manipulations (Tünnermann et al. 2017; Tünnermann and Scharlau 2016) including cluttered salience displays (Kröger et al., 2016). Based on these findings, we propose a new parameterization of the TVA interpretation, such that the functions describing the TOJ depend on two TVA races that in turn depend on the κ p and C parameter as shown in Eq. 3a.

Concerning statistical analysis, we used the same approach as in an earlier study (Kröger et al., 2016). Based on the participant’s judgments, parameters κ p and processing capacity C are estimated by a hierarchical Bayesian model. The Bayesian approach is particularly suitable for inference under a given model (Little 2006). The hierarchical Bayesian model allows individual estimates of the parameters for all participants and at the same time an estimation of the respective group parameters (so-called hyper-parameters).

Bayesian statistics estimates whole probability distributions, not single values. For interpreting these distributions depicted in the results figures, you may imagine that for each point on the x-axis, the probability distribution captures the belief that this parameter value is the correct description of the underlying psychological parameter, given the respective model. The result figures of all experiments reported below contain these distributions, the maximum a posteriori probability (MAP) and the highest density interval (HDI). The MAP describes the mode of such a distribution and the HDI describes the interval that contains the most probable 95% of the parameter values. For technicalities of the analysis, see the Appendix.

The following four experiments investigate how feature contrast systematically affects κ and how this can be modeled. Experiment 1 tests the two hypotheses that κ increases with salience whereas the overall processing capacity C does not. These hypotheses were drawn from TVA predictions. Contrast is manipulated by orientation. The pattern in the independently estimated κ values is modeled by functions used in psychophysics. This model is justified by the second experiment with more fine-grained orientation contrasts. Experiment 3 extends the model to the luminance dimension, and Experiment 4 combines both dimensions to test how salience from two dimensions combines.

Experiment 1

Experiment 1 tests the general applicability of a TVA-based model of salience. Three different components are combined in this experiment: cluttered displays with one potentially salient element, a TOJ for which contrast has no task relevance and a TVA measure for the feature contrast’s effect on attention.

These components come together in the following way: The cluttered display with maximal one unique element is shown before the TOJ. There are two elements in the cluttered display on the left and on the right that flicker with a small interval between them. The participants judge their temporal order. This response is the TOJ and constitutes the data. The relative frequency of reporting one of the options first can be depicted for each of the SOAs. An example can be seen in Fig. 10 where the relative frequency of the judgment ’probe stimulus was first’ is depicted in dots for each SOA in every single plot.

Whereas such data is usually described by psychometric functions (Kuss, Jäkel, & Wichmann 2005; Wichmann & Hill 2001), we use a TVA interpretation of TOJs (Tünnermann et al., 2015; Kröger et al., 2016). In this interpretation, the parameters directly correspond to psychologically meaningful quantities that are the strength of salience-driven influence of on the spatial attention distribution κ and the overall processing capacity of the visual system C. Roughly speaking, a higher C-value leads to a steeper slope of the function whereas a higher κ-value leads to an asymmetrical shift to the right.

The theoretical meaning of the TVA parameters is not only useful in describing the data but also leads to hypotheses. According to the TVA framework, salience should merely affect attention (κ) and not the processing capacity (C) (Nordfang et al. 2013). Going into more detail, the influence of salience could in principle show up in the data in the following ways: Firstly, if salience has no effect on attentional weights, κ would be independent of local contrast. If an influence exists, there are different ways in which κ may depend on local contrast. One theoretically possible (though admittedly unlikely) alternative is that there are differences in attentional weights, but no pattern. Secondly, there could be a systematic interrelation. For instance, (κ) could increase linearly, exponentially or in other non-linear ways with contrast.

To decide which of the alternatives describe the data well, we formulate three models. In all models, the data from the TOJ is described by the two parameters κ and C. They differ only in how the two parameters depend on the experimental conditions. The first model corresponds to “no effect”, that is, it assumes that the parameters are equal for all conditions so that both κ and C are fixed. This model will be called no-effect model. The second model corresponds to an “unstructured effect”, that is, each experimental conditions has an individual κ and C parameter which are not connected. Hence, this model is called independent- κ model. The third model corresponds to a “structured effect” in which κ is a systematic function of the local contrast and C stays constant. This model is called power-function model. By the comparison of the three models, Experiment 1, thus answers three questions: whether there is a substantial effect worth modeling at all, whether κ and C values are in line with the hypotheses, and whether it makes sense to assume a function connecting physical contrast and κ value. In Experiment 1, we use orientation contrast.

Method

Participants

Thirty persons (13 male and 17 female; M a g e = 22.67, range 18–37) participated in Experiment 1. The sample size was fixed in advance based on earlier studies (Kröger et al., 2016) and a simulation using the independent κ model. All participants were students or members of Paderborn University. Each participant gave informed written consent, completed one session, reported normal or corrected-to-normal visual acuity and received course credit or a payment of 8 euros per hour.

Apparatus

The experiment was conducted using a Microsoft Windows XP PC, Iiyama Vision Master Pro512 22 inches (40.4 cm × 30.3 cm) CRT monitor. A resolution of 1024 × 768 pixels was used with 32-bit colors and a refresh rate of 100 Hz. The experiment was programmed using OpenSesame Mathôt, Schreij, & Theeuwes (2012) and PsychoPy (Peirce 2007). The viewing distance was 50 cm. The left ctrl key and the right enter key on the number pad were used for collecting judgments. Responses were given with the corresponding hand. The experiment took place in a dimly lit experimental booth.

Stimuli

Each trial began with a fixation cross. It appeared in the center of the screen for 900 ms. Afterwards, an array of 17 × 17 bars was shown. The array encompassed 34.99× 34.99 of visual angle. Length and width of the bars were 1.07 and 0.18, respectively. The central element of the array was the fixation cross. Background color was gray, RGB (96,96,96) and luminance \(6.98 \frac {cd}{m^{2}}\). Bars and fixation cross were white, RGB (224,224,224) equivalent to \(65.2 \frac {cd}{m^{2}}\).

Two positions, one to the left and one to the right of the fixation cross at an eccentricity of 8.24 of visual angle, were filled with a reference and a probe stimulus. While the probe stimulus was potentially salient, varying in four steps (0 to 90 orientation contrast in steps of 30), the reference stimulus was not salient in any way: It had the same orientation and luminance as the background elements. The difference between the orientation of the potentially salient stimulus and the orientation of the background stimuli is denoted as Δo. The two positions were the same throughout the experiment.

After the whole display had been presented for 150 ms plus a small jitter (0, 30, or 70 ms). This is an appropriate temporal window for strong salience effects (Dombrowe et al., 2010; Donk & Soesman, 2011; Van Zoest, Donk, & Van der Stigchel 2012). The probe and reference stimuli flickered briefly (an offset and an onset after 80 ms) and the two flicker events were separated by a stimulus onset asynchrony (SOA) of 0 ms, 50 ms, or 100 ms in two orders (probe first, reference second: positive SOA; reference first, probe second: negative SOA). Each SOA was presented in 15 trials except for the 0ms SOA which was repeated 30 times. Background orientation was chosen randomly in each trial. The procedure is sketched in Fig. 1.

Fig. 1
figure 1

Schematic procedure of each trial. The white corona symbolizes a rapid off- and onset that was perceived as a flicker

Procedure

The participants were instructed to look at the fixation cross that was visible from the beginning of each trial. They judged which of the two flicker events appeared earlier, the event on the left or the event on the right. Responses were given with the left ctrl or the right enter key. The next trial started automatically. A break was offered after 50 trials each. A short training of 40 trials with feedback allowed getting familiarized with the task and to learn the locations of the two relevant stimuli. After the training, there was no more feedback about the correctness. The whole experiment lasted approximately 45min.

Results and discussion

The results are computed using the no-effect model, the independent- κ model, and the power-function model. They are used in this order to answer whether there is a substantial effect to model, whether it is in line with the hypotheses derived from TVA, and how physical contrast and κ can be modeled. The latter question has not yet been answered within the TVA framework.

To compare the models, we computed the deviance information criterion (DIC) as developed by Spiegelhalter, Best, Carlin, & Van Der Linde (2002) for all models to provide a way of comparison. This criterion provides a penalized deviance for the models. Smaller values indicate better models. Differences of less than 3 can be considered not substantial (Spiegelhalter et al. 2002). Because the DIC can be computed for each of the three models, they can all be compared. The penalized deviance of the no-effect model was 2480 against 2311 for the independent- κ model. Thus, assuming an effect worth modeling is empirically justified.

The independent- κ model allows to analyses influence on salience, κ, and overall processing capacity, C, for each condition individually. Although these parameters are allowed to vary freely by the model, results violating our hypotheses would be difficult to reconcile with the TVA meaning of the parameters. The contrast increasing over the experimental conditions was expected to cause an increase in κ but should not systematically influence the overall processing capacity C. The Bayesian parameter estimation with the independent- κ yields values that are in line with this hypotheses. The estimates are depicted in Fig. 2 for the κ parameter and in Fig. 3 for the C parameter. Both figures show the probability distribution over the respective parameters. The parameter’s MAP is drawn as a continuous line. Dashed lines represent the highest-density interval (HDI) boundaries, within which the most likely 95% percent of parameter values are contained. Also, the condition to which the variable belongs is visualized on the right.

Fig. 2
figure 2

Plot of the salience κ for each of the four orientation contrasts in Experiment 1. Each κ was modeled as an independent parameter

Fig. 3
figure 3

Plot of the processing capacity C for each of the four orientation contrasts in Experiment 1. Each C was modeled as an independent parameter

The neutral condition yields a κ value close to 1 which is the theoretical value for a non-salient stimulus (Nordfang et al., 2013). For the increasing contrast of the three remaining conditions, a strong initial gain in κ is observed. The HDIs of the nonsalient and salient condition are clearly not overlapping which indicates a distinct difference. Furthermore, the experimental conditions show that the increase in κ decreases with increasing orientation contrast.

The overall processing capacity C is around 60 Hz. The respective HDIs largely overlap in all conditions. This result is in line with the results of (Kröger et al., 2016), Tünnermann et al. (2017) and TVA studies in general (e.g., Finke et al., 2005). It fits the theoretical meaning of C as the overall rate with which stimuli are processed by the visual system.

Although the independent- κ model is better than assuming no difference between conditions, the critical reader may wonder whether the TVA-based approach fits the data at all. Practically, the best model among alternatives can still be quite bad at describing the data adequately. Therefore we inspected the data predicted by the model, the posterior predictive, and the actual data so that the fit can be assessed. The posterior predictive should be able to adequately represent the patterns in the observations. Plots of the predicted values and the original data allow judging whether these patterns resemble each participant’s data.

Deviations could be for example a plateau in the data that is missing in the posterior predictive or a combination of a steep slope and strong shift that in principle cannot be fitted by the psychometric function derived from TVA. These are examples of deviations we looked out for. Such essential deviations were not found, agreeing well with earlier results (Kröger et al., 2016). To not clutter this article, plots of the posterior predictives are presented for the most complex of our models and not for each model individually. These plots can be examined in Fig. 10 for four randomly chosen participants. The complete plot for all participants is outsourced to Fig. 14 in the Appendix.

Returning to the results of the independent- κ model, the parameter estimations confirms the hypotheses that an increase in feature contrast causes an increase in κ while C remains unaffected. How can this structured difference be described more formally? Stevens (1957) proposed a power law to capture the relation between physical stimuli and subjective intensity. This law allows for a wide range of gains: linear, declining, and increasing gains. Although the generality of Stevens’ approach is somewhat controversial (e.g., Ellermeier & Faulhammer, 2000), we suggest testing his power law as a logical model because of its theoretical generality. A logical quantitative model has desirable mathematical properties derived from theory and explanatory value (Taagepera 2008). Adapted to salience, the power function used by Stevens states

$$ \kappa = 1+ k\cdot {\Delta} o^{n} $$
(4)

This Eq. 4 means that the intensity of salience κ depends on a power function where n is the exponent describing the shape of the growth process. The proportionality constant k is a scaling constant (note that k is different from the TVA parameter κ). The 1 was added because in Nordfang et al.’s TVA extension (2013), 1 is the neutral salience value - not 0.

According to Eq. 4, two parameters should suffice to determine the κ for an arbitrary orientation contrast. We estimated the proportionality constant for orientation k o = 0.15 [0.06,0.24] and the exponent n o = 0.43 [0.28,0.59]. Besides the salience κ, C is needed to explain the observed data. Based on the homogeneous empirical results and the theoretic meaning of C, we modeled C as a single variable explaining the data of all conditions. The parameter C was estimated on the group level to equal 62.7 Hz [52.63,73.23].

How appropriate is this third model? We answer this question again by model comparison. The penalized deviance is 2311 for the independent- κ model and 2190 for the power-function model inspired by Stevens’ power law. The DIC thus indicates that, given the observed data, the power-function model is the better model. This result further supports the notion of a systematic connection of feature contrast and κ.

To sum up, Experiment 1 reveals that gradual contrast changes affect salience, can be measured by κ, and hence described and explained within the TVA framework. Furthermore, the estimated κ values suggest that the more contrast is already present, the less salience is gained. The idea that the increase systematically depends on feature contrast is further sustained by a model comparison revealing that the model explicitly describing the growth process is the better model. If such a systematic increase is indeed present, it should be replicable. Therefore, Experiment 2 replicates Experiment 1 with more contrast conditions. The model is also tested against further alternatives.

Experiment 2

Experiment 1 showed that a gradual increase in contrast leads to a gradual increase in the κ value and that this increase can be modeled by our power-function model. Experiment 2 is designed to justify this model in a replication and test it against an alternative. Seven different orientation contrasts provide a more precise view on how feature contrast causes a change in the salience parameter κ. We used the power-function model to characterize the growth of κ. If the systematic gain of salience, κ, based on a particular kind of contrast is captured by this function, we should be able to replicate the results from Experiment 1. The hypothesis was hence that the growth described by κ p = 1 + 0.15Δo 0.43 describes the new experimental conditions well. Because the growth of this function is visually similar to a logarithmic function, we explicitly tested this alternative.

Method

Participants

A planned number of thirty persons (14 male and 16 female; M a g e = 22.06, range 19–26), participated in Experiment 1. All were students or members of Paderborn University. Each participant gave informed written consent, completed one session, reported normal or corrected-to-normal visual acuity and received course credit or a payment of 8 euros per hour.

Apparatus

The apparatus was the same as in Experiment 1.

Stimuli

Stimulus material was the same as in Experiment 1 except that seven instead of four experimental conditions were used. The seven steps comprised successive increases of 15 orientation contrast spanning the maximal range of 0 to 90 of contrast.

Procedure

The procedure was the same as in Experiment 1.

Results and Discussion

The results again are in line with the general expectation that the κ parameter increases systematically with local contrast while C does not vary systematically over conditions. The individually estimated κ and C values are shown in Figs. 4 and 5. The figures show the probability distribution over the respective parameter, their MAP (continuous line) and HDI (dashed line), and a sketch of the stimulus contrast in the condition.

Fig. 4
figure 4

Plot of the salience κ for each of the seven orientation contrasts in Experiment 2. Each κ was modeled as an independent parameter

Fig. 5
figure 5

Plot of the processing capacity C for each of the seven orientation contrasts in Experiment 2. Each C was modeled as an independent parameter

Furthermore, we fitted the power-function model. The proportionality constant for orientation k o = 0.15 [0.08, 0.23] and the exponent n o = 0.40 [0.26,0.54] were nearly the same as in the initial experiment. C amounted to 71.9 Hz [59.2,85.6].

Huang and Pashler (2005) suggested that salience gains are best described by the Weber-Fechner Law that is based on a logarithmic relation between physical strength and subjective intensity. This option is visually indistinguishable from the exponential growth that we proposed. Besides assuming a different quantitative relationship, the logarithmic increase has the advantage that it needs one parameter less. Therefore, we tested a model where the exponential growth was replaced by a natural logarithm. The only difference was that instead of the Eq. 4 the salience value was modeled as κ = 1 + k ⋅ log(Δo + 1). This parameter k was estimated to equal 0.37 [0.19,0.57]. Additionally, again C was estimated C = 72.9 Hz [59.3,86.7].

To compare the three models (independent- κ, power function, logarithmic), the DIC was computed for each of them. This criterion favors the power-function model with a penalized deviance of 3257 against 3290 of the logarithm-based model and 3555 of the independent- κ model. As mentioned above, differences of more than 3 can be considered substantial (Spiegelhalter et al. 2002). One may still wonder whether the small differences between the large penalized deviances can be regarded as meaningful. The large numbers stem from the complex hierarchical model. Because the overall hierarchical structure remains unaffected by the changes that were compared, even differences that are small compared to the overall number are meaningful. A visualization of κ values of the three models can be found in Fig. 6. The figure illustrates how the increase in salience depends on the contrast in the three compared models. The simple view of independent- κ model does not assume a general pattern behind contrast and salience. This view is represented by the dots. (It is important to note that the dots in the Figure are not data but represent the parameter estimations of the independent model). The alternative models, of course, include such patterns.

Fig. 6
figure 6

Visualization of the three models based on their parameters’ MAPs. Dots mark the individual κ estimates of the model which assumes no connection between the experimental conditions. Note that these dots are parameter estimates, not data. The two models based on the power function and the logarithmic growth propose a growth based on contrast. They are visualized by the graphs

Experiment 3

Experiments 1 and 2 showed that orientation contrast affects TVA’s κ parameter by an exponential increase. Experiment 3 tests this salience growth model for a further simple dimension, luminance contrast. Unlike orientation contrast, however, luminance contrast may vary in two distinctive ways: The contrast can be induced by decreasing the luminance of the unique element or by increasing the luminance of this stimulus. If the overall luminance of the stimulus does not affect salience, its growth parameters should depend solely on the contrast between background elements and the salient element. This hypothesis was tested in Experiment 3. To this end, four conditions with increasing stimulus contrast were introduced. However, this was done in two ways: In four conditions, the contrast increased by increasing the probe’s salience in comparison to the surrounding elements, and in another four conditions, the salient element was less luminant than the surrounding elements.

Method

Participants

A planned number of 25 persons (12 male and 13 female; M a g e = 23.5, range 19–35), participated in Experiment 3. The number of participants was reduced in comparison to the previous experiments and following experiment as it was conducted as a pre-study to Experiment 4. All participants were students or members of Paderborn University. Each participant gave informed written consent, completed one session, reported normal or corrected-to-normal visual acuity and received course credit or a payment of 8 euros per hour.

Apparatus

The apparatus was the same as in Experiment 1.

Stimuli

Stimuli were the same as in Experiment 1, except that there were two factors, stimulus energy and feature contrast. Stimulus energy either decreased or increased, that is, the unique element was either lighter or darker than the other elements. For each energy condition, there were four steps of feature contrast to the surrounding elements, varying from no to high contrast in equal steps, such that the luminance difference, denoted δ l in the following, was either \(0 \frac {cd}{m^{2}}\) or approximately \(20 \frac {cd}{m^{2}}\), \(40 \frac {cd}{m^{2}}\), or \(60 \frac {cd}{m^{2}}\). The background was black, \(0.31 \frac {cd}{m^{2}}\) at RGB (0,0,0) in all conditions. The differences originate from the use of four luminance values \(5.36 \frac {cd}{m^{2}}\) at RGB (88,88,88), \(24.7 \frac {cd}{m^{2}}\) at RGB (152,152,152), \(45.1 \frac {cd}{m^{2}}\) at RGB (192,192,192), and \(65.2 \frac {cd}{m^{2}}\) at RGB (224,224,224). The Weber contrasts are 16.29, 81.9, 144.48, and 209.32 for the respective luminance level given the background luminance level. For the increasing intensity conditions, the surrounding stimuli were drawn in \(5.36 \frac {cd}{m^{2}}\) and for the decreasing intensity conditions in \(65.2 \frac {cd}{m^{2}}\). All bars had the same orientation which was chosen randomly in each trial. All conditions are illustrated in the respective sections of the results in Figs. 7 and 8. Trial number per conditions was the same as in Experiment 1. The flicker events were the same as in the previous experiments.

Fig. 7
figure 7

Plot of the salience κ for the four increasing luminance intensity conditions on the left and the four decreasing luminance intensity conditions on the right in Experiment 3. Each κ was modeled as an independent parameter

Fig. 8
figure 8

Plot of the processing capacity C for the four increasing luminance intensity conditions on the left and the four decreasing luminance intensity conditions on the right in Experiment 3. Each C was modeled as an independent parameter

Procedure

The procedure was the same as in Experiment 1.

Results and discussion

For the increasing intensity conditions κ did increase similar to Experiments 1 and 2 (Fig. 7). The power-function model was fitted to the TOJ data. Again, the MAP is marked by a continuous line, HDI boundaries are depicted by dashed lines and the respective conditions is depicted as a sketch on the right. The parameters of the power-function model were estimated as k = 0.49 [0.23,0.8] and n = 0.53 [0.36,0.72], and its processing rate as C = 53.51 Hz [43.07,64.71].

In the decreasing intensity conditions, κ estimates were clearly different, although the contrasts were equal to those of the increasing intensity conditions: Initially, κ did not change at all. Only for the most extreme contrast condition, κ weakly increased. Because of this pattern, the growth parameters were not estimated: As the element in the first three steps does not seem to be salient at all, a growth estimate within this region is pointless.

Summarizing these results, Experiment 3 reveals that stimulus contrast is not the only influence on salience as modeled by TVA’s κ parameter. For increasing luminance contrasts, the power-function model was confirmed. For decreasing contrast, there was only a small increase for the strongest contrast so that the growth model could neither be confirmed nor refuted.

However, although an influence of stimulus energy is present, it cannot explain the whole effect on κ. This is because the stimulus energy of the potentially salient target is equal in some of the increasing and decreasing energy conditions, but does not have the same influence on κ: The luminance value of Condition 0 matches the luminance of Condition − 60, Condition 20 the luminance of Condition − 40, Condition 40 is the same as − 20, and Condition 60 matches − 0. Obviously, these conditions do not result in the same κ estimate. Therefore, the energy or luminance of the potentially salient stimulus does not suffice to explain the results; the luminance of the surrounding elements affects salience as well.

A possible explanation of the weak κ in the decreasing energy condition is the black background: In this situation, two contrasts oppose each other (i.e., the salient stimulus becomes different from the surrounding stimuli but more similar to the background), whereas they work in parallel in the increasing energy condition (i.e., the salient stimulus becomes more different to background as well as the surrounding stimuli). To test this explanation, the luminance of the background would have to be changed systematically. This is beyond the scope of the present study.

The overall processing capacity C varies more than in the orientation experiments (see Fig. 8). Given the HDIs, differences appears likely. To statistically test whether they are substantial, we subtracted the C distributions of the experimental conditions from the C distribution of the neutral condition. If a difference of 0 is outside the HDI of the difference of two C distributions, we reject the assumption that the processing capacity did not vary. This comparison revealed only one difference between the processing capacities \(C^{\mu }_{0} - C^{\mu }_{60} = 21.33 \text { Hz}\) [5.28,38.1]. This is the visually most distinct difference between the neutral (upper left in Fig. 8) and the maximal contrast of the increasing intensity conditions (lower left of Fig. 8). Note that the parameter comparison by computing the difference between the most extreme values is more conservative than model comparison with a fixed C model. Roughly speaking, whereas the model comparison asks whether it is reasonable to assume the same C for all conditions, we tested, under the assumptions that C varies freely, whether there is a single pair of C values that are unlikely to be similar.

Finding such a change in C does not agree with our hypothesis that only κ is needed to describe the variance in the TOJ data generated by feature contrast. The capacity is the sum of the processing rates of probe and reference stimulus. If we compare these processing rates, the rate of the reference stimulus in the high contrast condition seems to be reduced to 8.98 Hz [8.0,10.18] in comparison to the neutral condition 30.37 Hz [26.83,33.9]. A similar result has been reported earlier for color contrasts (Tünnermann et al. 2017). We should, however, be aware that the assumption of a constant processing capacity C has to be rejected in only one of the six contrast conditions. Hence, the possibility that contrast manipulations also may affect the processing capacity C should not be ignored, but is far from frequent.

The interpretation of the diminished visual processing capacity is complicated because not much is known about the mechanisms which can affect processing capacity. Studies which successfully manipulated C include alertness (Matthias et al. 2010), psychopharmacological research (Vangkilde et al. 2011; Finke et al. 2010), and temporal expectations (Vangkilde et al. 2012). These studies do not offer a readily accessible interpretation for the finding reported above. Thus, we can merely state that introducing strong salience can reduce the effectiveness of processing of a nonsalient stimulus resulting in a diminished overall processing capacity. Although this was unexpected, it does not make the κ estimate useless because κ still quantifies the distribution of attention caused by feature contrast.

In summary, Experiment 3 revealed a limitation of the present approach. One the one hand, salience as measured by κ was again affected by feature contrast. A stimulus increasing in luminance in comparison to its surroundings resulted in a pattern similar to Experiment 1 and hence suggests a power-function growth mechanism. On the other hand, decreasing the luminance of the unique stimulus needed much more contrast to cause an equal effect on attention and thus seems to be influenced by at least one further variable. Furthermore, strong luminance salience affected the processing of the nonsalient stimulus negatively. This limits the explanatory power of the approach. Although not a central problem, it is an effect that cannot be explained as of now. We investigate the effects of luminance and orientation contrast again in Experiment 4 to see whether their growth characteristics persist when contrasts in two dimensions are combined.

Experiment 4

In Experiments 12, and 3, the growth of salience within a single dimension was described by a model based on a power function. Experiment 4 turns to a combination of dimensions, orientation and (increasing) luminance contrast. Previous studies that provided a quantitative perspective on the salience of combined contrasts found that although contrasts add up, they do so with a discount. Depending on which dimensions were combined, the discount varied between 25% and 80% (Nothdurft 2000; Huang and Pashler 2005). In other words, an already salient stimulus gains salience from adding a further salient feature, but this gain may be smaller than the added salience of the features measured independently. Furthermore (Koene and Zhaoping 2007) examined (without a quantification of salience) whether two combined contrasts interact or are rather processed independently by a model comparison. This study showed that combined contrasts are at least as salient as individual contrasts. Together, this evidence suggests that two combined contrasts are at least as salient as the most salient individual contrast (measured without a combination) and usually less salient than the sum of the salience values of both individual contrasts.

We formalized this thought by a further parameter a which changes the summed salience by a percent value. For a highly general approach, we allowed this parameter to be positive as well as negative which corresponds to a discount or bonus for adding dimensions. Following earlier results, a bonus is rather unlikely: According to Nothdurft’s (2000) findings on combinations of luminance, orientation, motion, and color, the discount varies between approximately 25% for luminance and orientation and 80% for color and motion. Following Huang and Pashler (2005) who differentiated between a high-salient main component and a low-salient sub-component it holds that \(\frac {\text {combined salience}-\text {main-salience}}{\text {sub-salience}} = 0.68\), which also indicates a discount. Hence, a penalty is more likely than a bonus.

One important limitation of Nothdurft’s (2000) approach is that he compared only one particular orientation and luminance combination to several luminance steps that served a reference scale. Huang and Pashler (2005) had four different combinations of size and luminance contrasts. However, one feature was always clearly dominant. By contrast, Experiment 4 uses a full factorial design with four values for each feature difference resulting in 16 individual combinations of features. This design should provide conclusive evidence for or against a particular mechanism of salience, more precisely, a discount or bonus.

Method

Participants

A total of 28 persons (10 male and 18 female; M a g e = 22.85, range 19–35), participated in Experiment 4. Thirty participants were planned, but two did not complete the experiment. All were students or members of Paderborn University. Each participant gave informed written consent, completed one session, reported normal or corrected-to-normal visual acuity and again received course credit or a payment of 8 euros per hour.

Apparatus

The apparatus was the same as in Experiment 1.

Stimuli

Stimuli were the same as in Experiment 1 except for the following: The background color was set to black, RGB (0,0,0) equivalent to \(0.31 \frac {cd}{m^{2}}\), while bars were gray, RGB (88,88,88) equivalent to \(5.36 \frac {cd}{m^{2}}\), and the fixation cross was white, RGB (224,224,224) equivalent to \(65.2 \frac {cd}{m^{2}}\). The orientation manipulation was reduced to four steps of 0, 15, 30, and 60. Additionally, the luminance of the potentially salient probe stimulus varied between \(5.36 \frac {cd}{m^{2}}\) RGB (88,88,88); difference to background stimuli \(0\frac {cd}{m^{2}}\), \(13.08 \frac {cd}{m^{2}}\) RGB (120,120,120); difference to background stimuli \(10\frac {cd}{m^{2}}\), \(24.7 \frac {cd}{m^{2}}\) RGB (152,152,152); difference to background stimuli \(20\frac {cd}{m^{2}}\), and \(65.2 \frac {cd}{m^{2}}\) RGB (224,224,224); difference to background stimuli \(60\frac {cd}{m^{2}}\). The Weber contrasts are 16.29, 41.19, 81.9, and 209.32 for the respective luminance level given the background luminance level A factorial design resulted 16 distinct conditions. Each SOA was presented in 15 trials except for the 0 ms SOA which was repeated 30 times. The SOAs were the same as in Experiment 1.

Procedure

The procedure was the same as in Experiment 1.

Results and Discussion

In a first step and analogously to Experiment 1, the 16 conditions were analyzed as independent conditions. This resulted in 16 κ and C estimates as shown in Table 1. As in Experiment 3, the likelihood of a difference in C was checked by subtracting the respective C distribution from the distribution of the neutral condition. This comparison revealed that 0 was always a credible parameter value. (We would like to note that the condition of Experiment 3 which deviated from this pattern is included in the factorial design of Experiment 4 and did not differ substantially from the neutral condition.)

Table 1 Parameters estimated by the independent κ model

The κ estimates of the independent- κ model are plotted, depending on orientation and luminance contrast, as dots in 3D space in Fig. 9. Recall that the independent κ model does not presuppose any relation between the conditions and estimates κ and C independently.

Fig. 9
figure 9

Visualization of salience (κ) depending on luminance difference (Δl) and orientation difference (Δo). The 16 individual estimates of salience are marked as dots. The prediction of the power-function model is visualized as grid

The independent κ estimates should not be viewed as the real salience values. Rather, they should be interpreted as reference points which visualize the discrepancy between a model assuming no connection between conditions and a model capturing the systematic increase with feature contrast. The latter is represented by the grid in Fig. 9 and will be further explained below.

The salience model tested in this experiment is described a set of equations. These equations express a set of ideas formally: Salience increases exponentially within a dimension of feature contrasts as shown in Eq. 1, adds up if features from different dimensions are combined, and there may be a penalty or bonus, called a, for the combination of salience. This yields (5).

$$ \small \kappa \,=\,\left\{\!\begin{array}{l} 1+ k_{o}\cdot {\Delta} o^{n_{o}} + k_{l}\cdot {\Delta} l^{n_{l}} \text{\;if either \({\Delta} o = 0\) or \({\Delta} l= 0\)} \\ 1+ k_{o}\cdot {\Delta} o^{n_{o}} + k_{l}\cdot {\Delta} l^{n_{l}} - a \cdot (k_{o}\cdot {\Delta} o^{n_{o}} + k_{l}\cdot {\Delta} l^{n_{l}}) \text{\;else}. \end{array}\right. $$
(5)

This model again strongly reduces the complexity in comparison to a model which assumes all conditions to be independent: Instead of 16 independent κ and C values, only two proportionality constants k o and k l , two exponents n o and n l , and a penalty a for the addition of two features and the processing capacity C are used as model parameters. These parameters were estimated to be k o = 0.08 [0.04,0.15], k l = 0.22 [0.13,0.34], n o = 0.63 [0.45,0.82], n l = 0.66 [0.53,0.81], a = −0.03 [−0.15,0.085], and C = 58.5 Hz [43.96,72.85]. These estimates predict the increase in salience visualized as a three-dimensional grid in Fig. 5. There is not much deviation from the individual estimates. Together with the DIC, which is in favor of the new model (penalized deviance: 6333 for the power-function model with penalty, 6905 for the individual κ model), this means that the simplified model provides a better way of describing salience caused by the combination of two dimensions.

Because the power-function model with two contrasts is considerably more complex than the earlier models, the critical reader may question the fit to the data. As explained in Experiment 1 we checked this fit by inspecting the posterior predictive plots. Because these plots total to a number of 448 (yielded by multiplying the 16 conditions and 28 participants), 4 randomly selected participants are shown within the article. All 28 participant’s plots are presented in the Appendix for completeness. The posterior predictive was computed with the same SOAs and the same number of repetitions as in the original experiment. Figure 10 shows the posterior predictive as a ribbon whereas the original data, “probe first” responses per repetition, are shown as black dots. These plots show that the complex hierarchical model produces data comparable to data patterns of the participants and hence passed the sanity check for the model.

Fig. 10
figure 10

Plot of the posterior predictives of the model of Experiment 4 shown as ribbons. The relative frequencies of ’probe first’ judgments are shown as black dots. The x-axis represents the SOA. Exemplary, participants 8, 10, 13, 24 are shown left to right, top to bottom. The 16 conditions include the neutral condition in the lower left. Luminance contrast increases with the y-xis and orientation contrast with the x-axis. Plots for all 28 participants are shown in the Appendix

Turning back to the data, the parameters can be compared to those of Experiments 12 and 3. In general, the estimation of κ allows to compare salience between experiments. The tendency that luminance contrast was more salient than orientation contrast is consistent in all experiments and the κ values of the independent- κ model in Experiment 4 are comparable to the single contrast results of the previous experiments. This tendency is reflected in the power-function model as well and is apparent in the leftwards slope of the grid in Fig. 9 that is much steeper than its rightwards slope. However, the parameter values of the power-function model for the single contrasts of the previous experiments and the power-function model parameters of Experiment 4. This may indicate an interaction.

When all relevant factors are included in a coherent model, the penalty parameter a has a MAP of − 0.03 which corresponds to a penalty of − 3% of the summed salience of the individual features. The negative value indicates a slight bonus instead of a penalty. The HDI of − 0.15 to 0.085 shows that a penalty larger than 8.5% is highly unlikely. However, because the distribution is centered on zero, the best explanation is that a perceptual bonus or penalty corresponding to an underadditive or superadditive model is not supported by the data. This clearly conflicts with the estimate by Nothdurft (2000) of at least a 25% penalty.

This parameter estimation can be interpreted as a nested-model comparison (e.g., Kruschke, 2014). The model that assumes a penalty of 25% is far less likely than the model that assumes no penalty. Because it is contained in the HDI, the value of 0 is much more credible than the value of 25% that is not contained in the HDI. Also, a comparison of the DIC reveals that the model without penalty is preferable (penalized deviance: for the model with penalty 6333, model without penalty 6330). The DIC comparison is not as distinct as the nested-model comparison is. However, this is expected because the data can be fitted nearly equally well independent of whether the parameter a is present or fixed to 0. Also, except for this single parameter, the hierarchical structure stays the same. Because the majority of the model remains unchanged, the majority of the numerical value remains unchanged. Thus, only the penalty for a single unnecessary additional parameter affects the DIC, which explains the small difference. The parameter estimation shows that a is highly likely to be 0 (meaning that it has no effect in the model) and the DIC comparison shows that the model without this parameter is to be preferred. Taken together, a penalty or bonus should not be included in the model. To sum up, a penalty or bonus is not necessary to explain salience combined from two dimensions. Note that Huang and Pashler (2005) also reported a nearly additive overall salience when the sub-salience was rather small. Although this additivity changed for larger feature contrasts, our results can be understood as partially in line with their findings.

A possible explanation of the difference of the present study to that of Nothdurft (2000) is that his participants judged the conspicuousness of stimuli explicitly which involved a report of their perceived salience. It is still unclear how far the perception of salience (or conspicuousness) relies on the same mechanisms as the distribution of attention. For instance, Kerzel, Schönhammer, Burra, Born, & Souto (2011) reported that salience changes the appearance of a stimulus. In their experiments, features that objectively were the same were judged to be more intense if the salience of the stimulus was high. The authors conclude that recurrent processes may be involved in the brain’s salience computation. If this is the case, the initial distribution of attention (e.g., Dombrowe et al., 2010; Donk & van Zoest, 2008; Donk & Soesman, 2011) may be different from subsequent and recurrent processes that cause perception of salience.

Finally, we want to draw attention to a difference in salience growth between the experiments. In Experiments 1 and 2, the growth caused by orientation contrast was nearly identical with k o = 0.15 [0.06,0.24] and the exponent n o = 0.43 [0.28,0.59] in Experiment 1 and k o = 0.15 [0.08,0.23] and the exponent n o = 0.40 [0.26,0.54] in Experiment 2. In Experiment 4, however, the growth differed with k o = 0.08 [0.04,0.15] and n o = 0.63 [0.45,0.82]. The same holds for the comparison of luminance contrast in Experiments 3 and 4. Experiment 3 showed k = 0.49 [0.23,0.8] and n = 0.53 [0.36,0.72] and Experiment 4 k l = 0.22 [0.13,0.34] and n l = 0.66 [0.53,0.81]. In all cases, the HDIs were equally broad which indicates that each pair of estimates has about the same certainty. Furthermore, C values (which did not vary with feature contrast) were rather similar in all experiments.

It thus seems that salience in conditions in which two dimensions are involved differs from salience caused by a contrast in a single dimension. There is no straightforward explanation for this difference. Because we did not find a penalty, we assume that more complex interactions cause this difference, in accord with research that found different mechanisms for the analysis of combined features (e.g., Chan & Hayward, 2009; Li, 2002). This is an interesting question to answer because if the effect of features on salience changes according to which feature they are presented with, a simple metric of salience (addition of both salience values) is bound to fail.

To sum up, a penalty cannot explain the difference in salience when the salience of single features and combined features is compared. However, the presented results suggest that the growth of salience within a dimension changes when features are added which suggests more complex interactions.

General discussion

For almost three decades, Bundesen’s TVA has offered a precise mathematical formulation of psychological concepts underlying visual attention, recognition, and selection. For an even longer period, visual salience has been studied as a distinct cause of visual attention. In the present paper, we modeled how physical contrasts cause TVA’s parameter κ to change. In Experiments 1 to 3, we showed that κ varies systematically with feature contrast and that this relationship is adequately described by a power function. In Experiment 4, this approach was applied to the question how different feature contrast combine. This experiment revealed that a penalty or bonus for combinations has to be rejected: Salience from combined contrasts adds up linearly. With this measuring and modeling approach, we showed that TVA and visual salience estimation are not mutually exclusive.

The motivation for the present research was that the κ parameter introduced by Nordfang et al. (2013) could be used to approach salience from the TVA perspective. An experimental design was derived from theory in which all TVA parameters except for κ and the overall processing capacity C were kept constant. Based on the meaning of both parameters in the TVA context, we expected κ to be a quantitative measure of salience affected by physical contrasts while the overall processing capacity C should remain unaffected by those contrasts. As expected, C did not vary with the salience manipulation except in one of overall 30 experimental conditions. Given the 29 tested conditions and the different results for the exact same condition in Experiment 4, this deviation appears as an outlier. However, if further data or hypotheses motivated by theory suggests that C may change with feature contrast, these hypotheses can be explicitly modeled with Bayesian techniques and directly compared to models that assume a constant C.

In contrast to C, κ systematically increased with orientation and luminance salience. This finding confirms the theoretical reasoning behind the approach and is promising evidence for its soundness because four distinct experiments (except for the decreasing energy condition of Experiment 3) are in line with the hypotheses.

Beyond measuring salience, we proposed a function that describes the growth of salience within and across feature dimensions, captured in a quantitative model, precisely put, a power-function model. This allows estimating two parameters that explain the κ value in each condition instead of measuring each κ independently of the further conditions. So, TVA’s quantitative view does not only allow to measure the effect of feature contrasts on attention but also to model how specific feature contrasts affect attention.

Adding feature contrasts as TVA parameters is a reasonable extension of TVA because salience undoubtedly contributes to visual attention (Nordfang et al. 2013). A resemblance of TVA’s attentional weight and the topographic salience maps as proposed by, for instance, Koch and Ullman (1985) has been noted several times (Bundesen et al., 2005; Bundesen, Vangkilde, & Petersen 2015). However, Koch and Ullman’s notion of salience has no direct equivalent in TVA. Furthermore, the present approach links TVA and salience much more closely than our own previous work on salience and TVA, in which salience was merely related to TVA’s attentional weight parameter (Krüger et al., 2016).

Because TVA explains visual processing as the result of recognition and selection mechanisms, its mathematical form allows distinguishing the effects of different influences (including salience) and hence provides an ideal framework to study the strength of effects and their interactions. When compared, for instance, to approaches based on signal detection theory, TVA provides a better explanation of why visual phenomena occur (Logan 2004). TVA allows to explain effects in a theoretic context and test quantitative predictions about visual attention including visual salience.

The present research also emphasizes the potential importance of our combined TVA-TOJ approach which renders TVA applicable to almost any kind of stimulus material and capitalizes on a very simple task, the TOJ. The stimulus material used so far include feature contrast (Krüger et al., 2016), color, pictures of natural scenes, and spatial cueing (Tünnermann et al. 2017), its merits including a new TVA-based account of spatial cueing (Tünnermann and Scharlau 2016). All of this stimulus material caused attentional parameters to vary as theoretically expected.

Beyond the extension of TVA, we also reported results on salience itself. In the four experiments, the power-function model, a compact model with two parameters (see Eq. 4), was successfully applied to the growth of salience. The DIC revealed that this model was more plausible than to assume independent conditions. Also, it was successfully tested against a simpler model, one-parametric logarithmic growth, in Experiment 2. These results indicate that a power function adequately describes the visual impression from Fig. 4 that less salience is gained the more contrasts already exists. This relation might also prove relevant for computational modeling of salience.

The decreasing energy conditions of Experiment 3, however, revealed that there are at present limits to the understanding that this approach provides. In these conditions, κ did not merely depend on the feature difference between the salient stimulus and its surroundings. Possibly, stimulus energy or contrast to the background or both had an impact, too. However, this limitation might be regarded not as a problem, but rather as motivation to understand quantitative aspects of visual salience more fully. There is a body of evidence that suggests that salience does not work as predicted by state-of-the-art computational salience models. For instance, Einhüser and König (2003) refuted the assumption that higher luminance contrast necessarily leads to more attention. Another example is color. Frey, Honey, & König (2008) reported that the effect of color on salience heavily depends on the image type. In general, it appears that high-level and low-level features contribute jointly to the deployment of attention in real scenes Onat, Açik, Schumann, & Kónig (2014). Also, it is not entirely clear what exactly the computational salience models predict. As shown recently by Betz, Kietzmann, Wilming, & König (2010), only 68% of fixations are correctly predicted by the models for free viewing of real scenes, and Koehler, Guo, Zhang, & Eckstein (2014) found that many models predict explicit salience judgments better than performance in behavioral tasks. One way to explain this is that salience of feature combinations is separately represented and is no simple combination of the intensity of its parts. This fits the qualitative prediction of Li’s (2002) V1 salience model which assumes that particular population of neurons may be tuned to feature combinations and that these neurons only are involved in the computation of salience if both feature contrasts are present. Up to now, these predictions have been tested by checking whether interactions exist at all for feature combinations (Koene and Zhaoping 2007). To sum up, there are many aspects of salience that are not fully understood, but a precise common measure of salience can help to analyze which contrasts cause salience and how they interact with each other.

We also aimed to answer a further question: Are combinations of features less salient than their individual salience added up? This question was motivated by the few behavioral studies on the topic. All of them assume a penalty to combined salience (Huang and Pashler 2005; Nothdurft 2000). The results of Experiment 4 cast doubts on a simple penalty or a bonus for multiple contrasts. Furthermore, salience grew differently within both dimensions when compared to Experiments 1 to 3. These results indicate that context plays a role for the growth. It is difficult to put this directly into perspective because only a few studies measure quantitative aspects of salience directly, but much more studies make suggestions how such quantitative aspects might be modeled (Itti and Koch 2001b; Zhao and Koch 2011). Because of their algorithmic nature, such models have to specify precisely how individual features combine. Based on the evidence from the simplified artificial stimuli in the present research, it is likely that the contribution of a feature depends already on the combination in which it occurs. That is, contrary to Koch and Ullman (1985), features may not be analyzed independently initially and combined later.

Also, the works of Chan and Hayward (2009, 2014) deviate from the classical perspective that features are initially analyzed separately and then combined on a master salience map. Based on their works on visual search, they find evidence that some searches are based on independent dimensional modules while others rely on a master salience map. Whereas they explain these results by a special mechanism for feature combinations, the model we presented here may in future provide a detailed view on the strength of attentional guidance suggesting that adding a feature may affect the capability for attentional guidance by the original feature.

To conclude, while the scope of the power-function model (i.e., capturing the growth of salience) presented here is surely narrow in comparison to other salience models ranging from neurophysiological explanations of salience (e.g., Koch & Ullman, 1985; Li, 2002) to computational models (Itti and Koch 2001a; Wolfe 2007), we integrated salience modeling in a tested and tried mathematical theory of visual attention, the TVA. TVA’s precise mathematical reasoning does not only allow a theoretical explanation of our dependent measure, the TOJ, but also provides a precise quantification of the effects of feature contrasts on attention via the salience parameter κ. The flexible experimental design provided by the TOJ task allows testing a broad range of hypotheses drawn from theory, and Bayesian statistical methods allow to let these hypotheses be compared directly in a model comparison. The integration of salience into TVA as a far advanced and well-proved model of attention is a step towards exploring quantitative changes due to, and interactions of, different attentional phenomena.