1 Introduction

A number of perceptual experiences in different modalities can be described with the Divisive Normalization interaction among the outputs of linear neurons (Carandini and Heeger 1994, 2012). In particular, in vision, the perception of color (Hillis and Brainard 2005), texture (Watson and Solomon 1997), and motion (Simoncelli and Heeger 1998) seem to be mediated by this nonlinear interaction. Intuitively, this divisive interaction modifies the response of a sensor by normalizing it with the responses of the neighbor neurons, thus explaining inhibition by the surround.

The discussion on the circuits underlying the Divisive Normalization in Carandini and Heeger (2012) suggests that there may be different architectures leading to this specific computation. Recent results suggest specific mechanisms for Divisive Normalization in certain situations (Sato et al. 2016), but the debate on the physiological implementations is still open. On the other hand, a number of functional advantages (Schwartz et al. 2009; Schwartz and Rieke 2011; Coen-Cagli et al. 2012; Coen-Cagli and Schwartz 2013) suggest that the kernel that describes the interaction in Divisive Normalization should be adaptive (i.e., signal or context dependent). Moreover, the match between the linear receptive fields and the interaction kernel in the Divisive Normalization is not trivial: The conventional Gaussian kernels in Watson and Solomon (1997), Malo and Laparra (2010) had to be tuned by hand to reproduce contrast responses (Martinez-Garcia et al. 2019).

These open questions imply that it is interesting to relate Divisive Normalization to other models of neural interaction for a better understanding of its implementation, the structure of the interaction kernel, and its eventual dependence with the signal. Interesting possibilities to consider are the classical dynamic neural field models of Wilson–Cowan (Wilson and Cowan 1972, 1973; Bressloff and Cowan 2003), Amari (Amari 1977), or Grossberg (Grossberg 1968, 1988), all of which have a similar subtractive nature (Chow and Karimipanah 2020): In subtractive models, the surround modifies the activity of a given sensor by subtracting a weighted average of its neighbor’s responses, as opposed to the division made in normalization models.

Subtractive and divisive adaptation models have been qualitatively related before (Wilson and Humanski 1993; Bressloff et al. 2002). Both models have been shown to have similar advantages in information-theoretic terms: The Wilson–Cowan interaction in a neural layer uniformizes the probability density of the responses (Bertalmio 2014), and there is less redundancy among them (Gomez-Villa et al. 2020). Similarly, Divisive Normalization layers reduce the relations between the responses (Malo et al. 2006), factorize the joint probability of the responses (Malo and Laparra 2010), and maximize the transmitted information from the input (Malo 2020, 2022). Additionally, both models provide similar descriptions of pattern discrimination (Wilson and Humanski 1993; Bertalmio et al. 2017). However, despite all these similarities, no direct analytical correspondence has been established between these models yet.

In this paper, we assume that the psychophysical behavior described by Divisive Normalization comes from underlying neural interactions that follow the Wilson–Cowan equations. In particular, we identify the Divisive Normalization response with the stationary regime of a Wilson–Cowan system. From this identification, we derive an expression for the Divisive Normalization kernel in terms of the interaction kernel of the Wilson–Cowan equations.

This analytically derived relation has the following interesting consequences:

  1. (1)

    It has been suggested that Divisive Normalization should depend on the input because of functional reasons (Coen-Cagli et al. 2012; Coen-Cagli and Schwartz 2013), but no physiological mechanism was proposed to implement this statistical adaptation. The proposed relation of the Divisive Normalization with a dynamical system with fixed wiring among neurons provides a mechanistic explanation for this dependence with the input.

  2. (2)

    The relation explains the modifications that had to be introduced ad hoc in the kernel of Divisive Normalization in Martinez-Garcia et al. (2019) to reproduce contrast responses. This implies that the Wilson–Cowan dynamics reproduce visual masking, which up to now had been mainly explained via Divisive Normalization (Foley 1994; Watson and Solomon 1997).

  3. (3)

    The relation allows to build effective image quality metrics based on the Wilson–Cowan model, something which, to the best of our knowledge, has not been considered before in the literature, as opposed to the many examples of metrics based on Divisive Normalization (Teo and Heeger 1994; Laparra et al. 2010; Malo and Laparra 2010; Laparra et al. 2017; Hepburn et al. 2020).

  4. (4)

    A standard stability analysis of a Wilson–Cowan model with the parameters obtained from our relation shows that the Divisive Normalization solution is a stable node of the dynamic model. This shows the appropriateness of our steady-state assumption. Moreover, this stability justifies the straightforward use of Divisive Normalization with time-varying stimuli, as in Simoncelli and Heeger (1998).

The structure of the paper is as follows. The Materials and Methods section reviews the retina-V1 neural path and the contrast perception of visual patterns. We also introduce the notation of the models: the Divisive Normalization and the Wilson–Cowan equations. Besides, we recall some experimental facts that will be used to illustrate the performance of the proposed relation: (1) Contrast responses curves imply certain interactions between subbands (Cavanaugh 2000; Watson and Solomon 1997), (2) the Divisive Normalization kernel must have a specific structure [identified in Martinez-Garcia et al. (2019)] to reproduce contrast response curves, and (3) the shape of the Divisive Normalization kernel should have a specific dependence with the surrounding signal (Cavanaugh et al. 2002a, b). In the Analytical Results section, we derive the relation between the Divisive Normalization and the Wilson–Cowan equations based on the steady-state assumption. The Numerical Experiments section illustrates with simulations the mathematical properties and the perceptual consequences of the proposed relation. First, we experimentally check the convergence of the Wilson–Cowan solution to the Divisive Normalization response in a wide range of model parameterizations. We quantify the error introduced by the approximations done to get the analytical results. Moreover, we illustrate the appropriateness of the steady-state assumption by showing that the Divisive Normalization is a stable node of the Wilson–Cowan system. Then, we address contrast perception facts using the proposed relation to build a psychophysically meaningful Wilson–Cowan model: We theoretically derive the specific structure of the kernel that was previously inferred empirically (Martinez-Garcia et al. 2019), we show that the proposed interaction kernel adapts with the signal, and as a result, we reproduce general trends of contrast response curves. Finally, we discuss the use of the derived kernel in predicting the perceptual metric of the image space. The Final Remarks section concludes the paper.

2 Materials and Methods

In this work, the theory is illustrated in the context of models of the retina-cortex pathway. The considered framework follows the approach suggested in Carandini and Heeger (2012): a cascade of four isomorphic linear+nonlinear modules. These four modules sequentially address brightness, contrast, frequency filtered contrast masked in the spatial domain, and orientation/scale masking. An example of the transforms of the input in such models is shown in Fig. 1.

In this general context, we focus on the cortical (fourth) layer: a set of linear sensors with wavelet-like receptive fields modeling simple cells in V1, and the nonlinear interaction between the responses of these linear sensors. Divisive Normalization has been the conventional model used for the nonlinearity to describe contrast perception psychophysics (Watson and Solomon 1997), but here we will explore the application of the Wilson–Cowan model in the contrast perception context.

Below we introduce the notation of both models of neural interaction and the facts on contrast perception that should be explained by the models.

Fig. 1
figure 1

Signal transforms in the retina-cortex pathway: a cascade of linear + nonlinear modules [example from Martinez-Garcia et al. (2018)]. The input is the spatial distribution of the spectral irradiance at the retina. (1) The linear part of the first layer consist of three positive spectral sensitivities (Long, Medium, Short, LMS, wavelengths) and a linear recombination of the LMS values with positive/negative weights. This leads to three tristimulus values in each spatial location: One of them is proportional to the luminance, and the other two have opponent chromatic meaning (red-green and yellow-blue). These linear tristimulus responses undergo adaptive saturation transforms. Perception of brightness is mediated by an adaptive Weber-like nonlinearity applied to the luminance at each location. This nonlinearity enhances the response in the regions with small linear input (low luminance). (2) The linear part of the second layer computes the deviation of the brightness at each location from the local brightness. Then, this deviation is nonlinearly normalized by the local brightness to give the local contrast. (3) The responses to local contrast are convolved by center surround receptive fields (or filtered by the Contrast Sensitivity Function). Then, the linearly filtered contrast is nonlinearly normalized by the local contrast. Again normalization increases the response in the regions with small input (low contrast). (4) After linear wavelet transform, each response is nonlinearly normalized by the activity of the neurons in the surround. Again, the activity relatively increases in the regions with low input. The common effect of the nonlinear modules throughout the network is response equalization (Color figure online)

2.1 Modeling Cortical Interactions

Our focus here is the last linear+nonlinear module of the retina-V1 cascade in Fig. 1, and specifically the nonlinear layer that describes the interactions in the primary visual cortex V1. The driving input of this final nonlinear layer is the vector of energies, \(\varvec{e}\), of the responses of linear wavelet-like simple cells, and the output of this interaction is the vector of nonlinear responses \(\varvec{x}\):

(1)

In this work, the two models considered describe the interaction \({\mathcal {N}}\) between the linear simple cells in V1. The Wilson–Cowan equations model neural firing rate dynamics that may converge to a steady state. If that is the case, the long-term behavior of the Wilson–Cowan equations may be similar to the Divisive Normalization model, since the latter models static neural firing rates.

2.2 The Divisive Normalization model

Forward transform.

The conventional expression of the canonical Divisive Normalization (Carandini and Heeger 2012) uses an element-wise formulation:

$$\begin{aligned} {x_i = k_i \,\, \frac{e_i}{b_i + \sum _j H_{ij} e_j}} \end{aligned}$$
(2)

where the output vector of nonlinear activations, \(\varvec{x}\), depends on the energy of the input linear wavelet responses, \(\varvec{e}\), which are dimension-wise normalized by a sum of neighbor energies of the input. For convenience for the derivations below, the transform can be rewritten in matrix form Martinez-Garcia et al. (2018), Martinez-Garcia et al. (2019):

$$\begin{aligned} \varvec{x} = {\mathbb {D}}_{\varvec{k}} \cdot {\mathbb {D}}^{-1}_{\left( \varvec{b} + \varvec{H} \cdot \varvec{e} \right) } \cdot \varvec{e} \end{aligned}$$
(3)

where \({\mathbb {D}}_{\varvec{v}}\) are diagonal matrices with the vector \(\varvec{v}\) in the diagonal. The non-diagonal nature of the interaction kernel \(\varvec{H}\) which is in the denominator, \(\varvec{b} + \varvec{H} \cdot \varvec{e}\), implies that the i-th element of the response is attenuated by the activity of the neighbor sensors, \(e_j\) with \(j\ne i\). Each row of the kernel \(\varvec{H}\) describes how the energies of the neighbor simple cells attenuate each simple cell after the interaction. Each element of the vectors \(\varvec{b}\) and \(\varvec{k}\) represents the semisaturation and the dynamic range of the nonlinear response of each sensor, respectively. This nonlinear interaction only affects the amplitude of the responses, not its sign. As a result, for simplicity in the notation, throughout the work \(\varvec{x}\) refers to the vector of absolute values of the responses. The sign of the normalized responses is inherited from the sign of the linear wavelet responses.

Inverse transform.

The matrix notation (Martinez-Garcia et al. 2018, 2019) is convenient to derive the analytical inverse of the Divisive Normalization, which will be used to obtain the relation between the two models considered in this work. The inverse is given by Malo et al. (2006), Martinez-Garcia et al. (2018), Martinez-Garcia et al. (2019):

$$\begin{aligned} \varvec{e} = \left( I - {\mathbb {D}}^{-1}_{\varvec{k}}\cdot {\mathbb {D}}_{\varvec{x}}\cdot \varvec{H} \right) ^{-1} \cdot {\mathbb {D}}_{\varvec{b}} \cdot {\mathbb {D}}^{-1}_{\varvec{k}} \cdot \varvec{x} \end{aligned}$$
(4)

This inverse, originally proposed in the context of image coding (Malo et al. 2006), has been used in other applications that require the reconstruction of the image (Camps et al. 2008; Laparra et al. 2017).

2.3 The Wilson–Cowan Model

The Wilson–Cowan dynamical system was proposed as a general model of the inhibitory/excitatory interactions between neural populations, and as application, it can be used to model the signal at specific stages in the visual pathway (Wilson and Cowan 1972, 1973; Bressloff and Cowan 2003). In Wilson–Cowan models, part of the neural population (part of the coefficients in the vectors \(\varvec{e}\) and \(\varvec{x}\)) is excitatory and part is inhibitory, meaning how their magnitude affects the neighbors (in additive or subtractive way, respectively). Excitatory and inhibitory populations will be referred to as \(\varvec{e^e}\), \(\varvec{x^e}\), and \(\varvec{e^i}\), \(\varvec{x^i}\), respectively. We will consider that these excitatory and inhibitory neurons (or coefficients) are interleaved in the vectors that describe the responses. Or, for simplicity, one may also represent them as separate rows in the response vectors:

(5)

In any case, the arbitrary arrangement of the neurons in the vectors does not restrict the generality of the formulation. The only effect of this choice is the interpretation of the elements of the matrices that will represent the interaction between the neurons.

Dynamical system.

In Wilson–Cowan models (Wilson and Cowan 1972, 1973; Bressloff and Cowan 2003), the transform \({\mathcal {N}}\) is defined by differential equations that describe the temporal variation of the activity of the populations. In particular, this variation is driven by three factors:

  • An external driving input (either \(\varvec{e^e}\) or \(\varvec{e^i}\)), in our case the responses of the linear V1 cells.

  • The variation of the response of a population is auto-attenuated due to its own activity.

  • The variation of the response is amplified by the excitatory responses and is moderated by the inhibitory responses.

Specifically, if in the notation of Bressloff et al. (2002), Bressloff and Cowan (2002, 2003), which considers no refractory period in V1 neurons, we explicitly identify the excitatory and inhibitory populations as done originally in Wilson and Cowan (1973), for a neuron tuned to the feature p, we have one of these (excitatory or inhibitory) equations:

$$\begin{aligned} \hspace{-1cm}\frac{\partial \, x^e_p(t)}{\partial t}= & {} e^e_p(t) - \alpha ^e_p \, x^e_p(t) + \int W^{ee}_{pp'} \, f(x^e_{p'}(t)) \, dp' - \int W^{ei}_{pp'} \, f(x^i_{p'}(t)) \, dp' \nonumber \\ \hspace{-1cm}\frac{\partial \, x^i_p(t)}{\partial t}= & {} e^i_p(t) - \alpha ^i_p \, x^i_p(t) + \int W^{ie}_{pp'} \, f(x^e_{p'}(t)) \, dp' - \int W^{ii}_{pp'} \, f(x^i_{p'}(t)) \, dp' \end{aligned}$$
(6)

or, in matrix notation:

$$\begin{aligned} \varvec{\dot{x^e}}= & {} \varvec{e^e} - {\mathbb {D}}_{\varvec{\alpha ^e}} \cdot \varvec{x^e} + \varvec{W^{ee}} \cdot f(\varvec{x^e}) - \varvec{W^{ei}} \cdot f(\varvec{x^i}) \nonumber \\ \varvec{\dot{x^i}}= & {} \varvec{e^i} - {\mathbb {D}}_{\varvec{\alpha }^i} \cdot \varvec{x^i} + \varvec{W^{ie}} \cdot f(\varvec{x^e}) - \varvec{W^{ii}} \cdot f(\varvec{x^i}) \end{aligned}$$
(7)

where \(\varvec{W^{ee}}\), \(\varvec{W^{ei}}\), \(\varvec{W^{ie}}\), \(\varvec{W^{ii}}\) are the matrices that describe the excitatory and inhibitory relations between sensors, the activation function \(f(\cdot )\) is a dimension-wise saturating nonlinearity, and the elements of the vectors \(\varvec{\alpha ^e}\) and \(\varvec{\alpha ^i}\) are the auto-attenuation parameters. The above matrices are usually considered to be a fixed set of connections (wired relations), made of positive and negative Gaussian neighborhoods, that represent the local interaction between sensors (Bressloff and Cowan 2002, 2003; Chossat and Faugeras 2009). Also, note that if in Eq. 7, the inhibitory and the excitatory components are stacked together into a single vector (with some sort of arrangement as in Eq. 5), the two equations in the traditional Wilson–Cowan formulation can be represented by a single expression, as in Bressloff and Cowan (2003), Chow and Karimipanah (2020), here in matrix form:

$$\begin{aligned} \varvec{\dot{x}}= & {} \varvec{e} - {\mathbb {D}}_{\varvec{\alpha }} \cdot \varvec{x} - \varvec{W} \cdot f(\varvec{x}) \end{aligned}$$
(8)

where

$$\begin{aligned} \varvec{\alpha } = \begin{pmatrix} \varvec{\alpha ^e}\\ \varvec{\alpha ^i} \end{pmatrix}, \,\,\,\,\,\,\,\,\,\,\, f(\varvec{x}) = f\begin{pmatrix} \varvec{x^e}\\ \varvec{x^i} \end{pmatrix}, \,\,\,\,\,\,\,\,\,\,\, \varvec{W} = \begin{pmatrix} -\varvec{W^{ee}} &{} \varvec{W^{ei}} \\ -\varvec{W^{ie}} &{} \varvec{W^{ii}} \end{pmatrix} \end{aligned}$$

The above single-equation matrix formulation of the Wilson–Cowan model, Eq. 8, is convenient to get the relation between the models and clearly shows the subtractive nature of the interactions in the kernel \(\varvec{W}\) as opposed to the divisive nature of the interactions due to the kernel \(\varvec{H}\) in Eq. 3.

Steady state and inverse.

The stationary solution of the above differential equation (obtained by taking \(\varvec{\dot{x}} =0\) in Eq. 8) leads to the following analytical inverse for static inputs:

$$\begin{aligned} \varvec{e} = {\mathbb {D}}_{\varvec{\alpha }} \cdot \varvec{x} + \varvec{W} \cdot f(\varvec{x}) \end{aligned}$$
(9)

As we will see in the Analytical Results section, the identification of the decoding equations corresponding to both models, Eqs. 4 and 9, is the key to obtain simple relations between their corresponding parameters.

2.4 Experimental Facts

2.4.1 Adaptive Contrast Response Curves

In the considered spatial vision context, the models should reproduce the fundamental trends of contrast perception. Thus, the slope of the contrast response curves should depend on the spatial frequency, so that the sensitivity at threshold contrast is different for different spatial frequencies according to the Contrast Sensitivity Function (CSF) (Campbell and Robson 1968). Also, the response curves should saturate with contrast (Legge and Foley 1980; Legge 1981). Finally, the responses should attenuate with the energy of the background or surround, and this additional saturation should depend on the texture of the background (Foley 1994; Watson and Solomon 1997): If the frequency/orientation of the test is similar to the frequency/orientation of the background, the decay should be stronger. This background-dependent adaptive saturation, or masking, is mediated by cortical sensors tuned to spatial frequency with responses that saturate depending on the background, as illustrated in Fig. 2.

The above trends are key to discard too simple models, and also to propose the appropriate modifications in the model architecture to get reasonable results (Martinez-Garcia et al. 2019).

Fig. 2
figure 2

Adaptive contrast response curves. Mean firing rate (response) of V1 neurons tuned to the signal as a function of the signal contrast in two masking situations (adapted from Schwartz and Simoncelli (2001), Cavanaugh (2000)). Note the decay in the response when signal and mask do have the same spatio-frequency characteristics (a), as opposed to the case where they do not (b). For visualization, the differences in the curves are highlighted by the green circles (Color figure online)

2.4.2 Unexplained Kernel Structure in Divisive Normalization

In the Divisive Normalization setting, the masking interaction between tests and backgrounds of different textures is classically described by using a Gaussian kernel in the denominator of Eq. 3 in wavelet-like domains: the effect of the j-th wavelet sensor on the attenuation of the i-th wavelet sensor decays with the distance in space between the i-th and j-th sensors, but also with the spatial frequency and orientation (Watson and Solomon 1997). We will refer to these unit-norm Gaussian kernels as Watson and Solomon kernels (Watson and Solomon 1997) and will be represented by \(\varvec{H}^{\varvec{ws}}\). Gaussian kernels are useful to describe the general behavior shown in Fig. 2: Activity in close neighbors leads to strong decays in the response, while activity in neighbors tuned to more distant features has smaller effect.

However, in order to have well-behaved responses in every subband with every possible background, a special balance between the wavelet representation and the Gaussian kernels is required. When using reasonable log-polar Gabor basis or steerable filters to model V1 receptive fields, as in Watson and Solomon (1997), Schwartz and Simoncelli (2001), the energies of the sensors tuned to low frequencies are notably higher than the energy of high-frequency sensors. Moreover, the smaller number of sensors in low-frequency subbands in this kind of wavelet representations implies that unit-norm Gaussian kernels have bigger values in coarse subbands. These two facts overemphasize the impact of low-frequency responses on high-frequency responses. Thus, in Martinez-Garcia et al. (2019) we found that classical unit-norm Gaussian kernels require ad hoc extra modulation to avoid excessive effect of low-frequency backgrounds on high-frequency tests. The appropriate wavelet-kernel balance was then reestablished by introducing extra high-pass filters in the Gaussian kernel \(\varvec{H}^{\varvec{ws}}\), with the aim to moderate the effect of low frequencies (Martinez-Garcia et al. 2019):

$$\begin{aligned} \varvec{H} = {\mathbb {D}}_{\varvec{l}} \cdot \varvec{H}^{\varvec{ws}} \cdot {\mathbb {D}}_{\varvec{r}} \end{aligned}$$
(10)

In this new definition of the kernel: (1) the diagonal matrix at the right, \({\mathbb {D}}_{\varvec{r}}\), pre-weights the subbands of \(\varvec{e}\) to moderate the effect of low frequencies before computing the interaction; and (2) the diagonal matrix at the left, \({\mathbb {D}}_{\varvec{l}}\), sets the relative weight of the masking for each sensor, moderating low frequencies again. The vectors \(\varvec{r}\) and \(\varvec{l}\) were tuned ad hoc in Martinez-Garcia et al. (2019) to get reasonable contrast response curves, both for low- and high-frequency tests.

However, what is the explanation for this specific structure of the kernel matrix in Eq. 10? And where do these two high-pass diagonal matrices come from?

2.4.3 Adaptive Nature of Kernel in Divisive Normalization

Previous physiological experiments on cats and macaques demonstrated that the effect of the surround on each cell does not come equally from all peripheral regions, showing up the existence of a spatially asymmetric surround (Nelson and Frost 1985; Deangelis et al. 1994; Walker et al. 1999; Cavanaugh et al. 2002a, b). As shown in Fig. 3a, the experimental cell response is suppressed due to the surround, and this attenuation is greater when the grating patches are iso-oriented and at the ends of the receptive field (as defined by the axis of preferred orientation) (Cavanaugh et al. 2002b).

Fig. 3
figure 3

Experimental context-dependent interaction (Cavanaugh et al. 2002b) and statistical model (Coen-Cagli et al. 2012). a Results of Cavanaugh et al. (2002b): images with a gray background represent the stimuli. Cell relative responses are shown as points inside the black normalization circle. The distance from the origin indicates the magnitude of the response, while its angle represents the location of the surrounding stimulus. b Cell response predicted from the statistical model of Coen-Cagli et al. (2012), and probability that center and surround receptive fields are co-assigned to the same normalization pool and contribute to the Divisive Normalization of the model response. The probability of co-assignment depends on the covariance with the surround, as shown below. c Different center-surround visual neighborhoods in a natural scene. In each case, the activity of the sensors in the surround can be co-assigned to the activity in the center (i.e., considered in the normalization pool) if the orientation of maximally responding sensors is consistent (which is the case for four of the considered regions and not the case for the first). The horizontal surround that is co-assigned with the corresponding center is highlighted in bold. d Covariance matrices learned from natural images determine co-assignment: The orientation and relative position of the receptive fields are represented by the black bars. (The thickness of the bar is proportional to the variance, while the thickness of the red lines is proportional to the covariance between the two connected bars.) This figure has been adapted from Coen-Cagli et al. (2012) (Color figure online)

In the Divisive Normalization context, this asymmetry could be explained with non-isotropic interaction kernels. Depending on the texture of the surround, the interaction strength in certain direction may change. This would change the denominator, and hence the gain in the response.

Coen-Cagli et al. (2012) proposed a specific statistical model to account for these contextual dependencies. This model includes grouping and segmentation of neighboring oriented features and leads to a flexible generalization of the Divisive Normalization. Representative center-surround configurations considered in the statistical model are shown in Fig. 3c. A surround orientation can be either co-assigned with the center group or not co-assigned. In the first case, the model assumes dependence between center and surround and includes them both in the normalization pool for the center. In the second case, the model assumes center-surround independence and does not include the surround in the normalization pool. Figure 3d shows the covariance matrices learned from natural images between the variables associated with center and surround in the proposed statistical model. As expected, the variances of the center and its co-linear neighbors, and also the covariance between them, are larger, due to the predominance of co-linear structures in natural images. The cell response that is computationally obtained assuming their statistical model is shown in Fig. 3b, together with the probability that center and surround receptive fields are co-assigned to the same normalization pool and contribute then to the divisive normalization of the model response. Note that the higher the probability of co-assignment between the center and surround, the higher the suppression (or the lower the signal) in the cell response.

This flexible (or adaptive) Divisive Normalization model based on image statistics (Coen-Cagli et al. 2012) allows to explain the experimental asymmetry in the center-surround modulation (Cavanaugh et al. 2002b). However, no direct mechanistic approach has been proposed yet to describe how this adaptation in the Divisive Normalization kernel may be implemented.

3 Analytical Results: Relation Between Models

The kernels that describe the relation between sensors in the Divisive Normalization and the Wilson–Cowan models, \(\varvec{H}\) and \(\varvec{W}\), have similar qualitative roles: both moderate the response, either by division or subtraction, taking into account the activity of the neighbor sensors.

In this section, we derive the relation between both models assuming that the Divisive Normalization behavior corresponds to the steady-state solution of the Wilson–Cowan dynamics. This leads to an interesting analytical relation between both kernels, \(\varvec{H}\) and \(\varvec{W}\).

Under the steady-state assumption, it is possible to identify the different terms in the decoding equations in both cases (Eqs. 4 and 9). However, just to get a simpler analytical relation between both kernels, we make one extra simplification on each model. Numerical experiments in the next section with natural inputs and a wide range of model parameterizations show that these simplifications are acceptable in practice.

First, in the Divisive Normalization model, Eq. 4, the identification may be simpler by taking the series expansion of the inverse. This expansion was used in Malo et al. (2006) because it clarifies the condition for invertibility:

$$\begin{aligned} \left( I - {\mathbb {D}}^{-1}_{\varvec{k}} \cdot {\mathbb {D}}_{\varvec{x}} \cdot \varvec{H} \right) ^{-1} = I + \sum _{n=1}^{\infty } \left( {\mathbb {D}}^{-1}_{\varvec{k}} \cdot {\mathbb {D}}_{\varvec{x}} \cdot \varvec{H} \right) ^n \end{aligned}$$

The inverse exist if the eigenvalues of \({\mathbb {D}}^{-1}_{\varvec{k}} \cdot {\mathbb {D}}_{\varvec{x}} \cdot \varvec{H}\) are smaller than one so that the series converges. In fact, if the eigenvalues are small, the inverse can be well approximated by a small number of terms in the Maclaurin series. Taking into account this approximation, Eq. 4 may be written as:

$$\begin{aligned} \varvec{e}= & {} {\mathbb {D}}_{\varvec{b}} \cdot {\mathbb {D}}^{-1}_{\varvec{k}} \cdot \varvec{x} + \left( {\mathbb {D}}^{-1}_{\varvec{k}} \cdot {\mathbb {D}}_{\varvec{x}} \cdot \varvec{H} \right) \cdot {\mathbb {D}}_{\varvec{b}} \cdot {\mathbb {D}}^{-1}_{\varvec{k}} \cdot \varvec{x} + \nonumber \\{} & {} \hspace{48.36958pt}+ \left( {\mathbb {D}}^{-1}_{\varvec{k}} \cdot {\mathbb {D}}_{\varvec{x}} \cdot \varvec{H} \right) ^2 \cdot {\mathbb {D}}_{\varvec{b}} \cdot {\mathbb {D}}^{-1}_{\varvec{k}} \cdot \varvec{x} + \nonumber \\{} & {} \hspace{48.36958pt}+ \left( {\mathbb {D}}^{-1}_{\varvec{k}} \cdot {\mathbb {D}}_{\varvec{x}} \cdot \varvec{H} \right) ^3 \cdot {\mathbb {D}}_{\varvec{b}} \cdot {\mathbb {D}}^{-1}_{\varvec{k}} \cdot \varvec{x} + \cdots \nonumber \\ \varvec{e}\approx & {} \left( {\mathbb {D}}_{\varvec{b}} \cdot {\mathbb {D}}^{-1}_{\varvec{k}} + {\mathbb {D}}^{-1}_{\varvec{k}} \cdot {\mathbb {D}}_{\varvec{x}} \cdot \varvec{H} \cdot {\mathbb {D}}_{\varvec{b}} \cdot {\mathbb {D}}^{-1}_{\varvec{k}} \right) \cdot \varvec{x} \end{aligned}$$
(11)

Second, in the case of the Wilson–Cowan model (Eq. 9) we approximate the saturation function \(f(\varvec{x})\) so that we can isolate the vector \(\varvec{x}\). This can be done by expressing \(f(\varvec{x})\) through an Euler integration of n terms: \(f(\varvec{x}) = f(\varvec{0}) + \int _{\varvec{0}}^{\varvec{x}} \frac{df}{dx}(\varvec{x'}) d\varvec{x'} \approx f(\varvec{0}) + \sum _{\beta =0}^{n-1} \frac{df}{dx}(\frac{\beta }{n}\varvec{x}) \frac{\varvec{x}}{n}\). Note that along the integration the derivatives are computed at different points from \(\varvec{0}\) up to \(\frac{n-1}{n}\varvec{x}\). If \(n=1\), we have the (in principle poor) Maclaurin approximation, and if \(n \rightarrow \infty \), the result is perfect. In between, for finite n, we have an approximation with certain accuracy. In this case, taking into account that in the activation functions \(f(\varvec{0}) = \varvec{0}\), and calling \(g_n(\varvec{x}) = \frac{1}{n}\sum _{\beta =0}^{n-1} \frac{df}{dx}(\frac{\beta }{n}\varvec{x})\), we can write:

$$\begin{aligned} {\varvec{e} \approx \left( {\mathbb {D}}_{\varvec{\alpha }} + \varvec{W} \cdot {\mathbb {D}}_{g_n(\varvec{x})} \right) \cdot \varvec{x}} \end{aligned}$$
(12)

Now, the identification between the approximated versions of the decoding equations, Eqs. 11 and 12, is straightforward. As a result, we get the following relations between the parameters of both models:

$$\begin{aligned} \varvec{\alpha }= & {} \frac{\varvec{b}}{\varvec{k}} \nonumber \\ \varvec{W}= & {} {\mathbb {D}}_{\left( \frac{\varvec{x}}{\varvec{k}}\right) } \cdot \varvec{H} \cdot {\mathbb {D}}^{-1}_{\left( \frac{\varvec{k}}{\varvec{b}} \odot {g_n(\varvec{x})}\right) } \end{aligned}$$
(13)

where the symbol \(\odot \) denotes the element-wise (or Hadamard) product, and the ratios between vectors are also Hadamard divisions.

Note that the Divisive Normalization kernel which is compatible with Eq. 13, \(\varvec{H} = {\mathbb {D}}_{\left( \frac{\varvec{k}}{\varvec{x}}\right) } \cdot \varvec{W} \cdot {\mathbb {D}}_{\left( \frac{\varvec{k}}{\varvec{b}} \odot {g_n(\varvec{x})}\right) }\), has exactly the same structure as the one in Eq. 10. Therefore, both models agree if the Divisive Normalization kernel inherits the structure from the Wilson–Cowan kernel left- and right-multiplied by these diagonal matrices, \({\mathbb {D}}_{\left( \frac{\varvec{k}}{\varvec{x}}\right) }\) and \({\mathbb {D}}_{\left( \frac{\varvec{k}}{\varvec{b}} \odot {g_n(\varvec{x})}\right) }\), respectively.

This theoretical result suggests an explanation for the structure that had to be introduced ad hoc in Martinez-Garcia et al. (2019) just to reproduce contrast masking. Note that the interaction in the Wilson–Cowan case may be understood as wiring between sensors tuned to similar features, so a unit-norm Gaussian, \(\varvec{W} = \varvec{H}^{\varvec{ws}}\), is a reasonable choice (Wilson and Cowan 1973; Chossat and Faugeras 2009). Note also that the weights before and after \(\varvec{W}\) (the diagonal matrices) are signal dependent. Therefore, a fixed wiring \(\varvec{W}\) implies that the kernel in Divisive Normalization should be adaptive. The one in the left, \({\mathbb {D}}_{\left( \frac{\varvec{k}}{\varvec{x}}\right) }\), has a direct dependence on the inverse of the signal, while the one in the right, \({\mathbb {D}}_{\left( \frac{\varvec{k}}{\varvec{b}} \odot {g_n(\varvec{x})}\right) }\), depends on the derivatives of the activation \(f(\varvec{x})\). Next section shows that these vectors \(\frac{\varvec{k}}{\varvec{x}}\) and \(\frac{\varvec{k}}{\varvec{b}} \odot {g_n(\varvec{x})}\) do have the high-pass frequency nature that explains why the low frequencies in \(\varvec{e}\) had to be attenuated ad hoc by introducing \({\mathbb {D}}_{\varvec{l}}\) and \({\mathbb {D}}_{\varvec{r}}\). We also show that the term of the right, \({\mathbb {D}}_{\left( \frac{\varvec{k}}{\varvec{b}} \odot {g_n(\varvec{x})}\right) }\), produces the shape changes needed on the interactions.

It is important to stress that the simplifications made in the decoding equations to get the analytical relations in Eq. 13 were done only for the sake of simplicity in the final relations obtained. In summary, the expressions in Eq. 13 are exact for the simplified versions of the models. Considering the full version of the models, Eq. 13 would be an approximation. However, the experiments below point out the validity of this approximation, because: (a) we explicitly check that the errors are small in a range of scenarios, and (b) we check that plugging these expressions into the full versions of the models also leads to consistent results.

4 Numerical Experiments

The analysis of the proposed relation between the Divisive Normalization (DN) and the Wilson–Cowan (WC) models is a three stage process. First, one should take biologically plausible parameters (either in DN, in WC, or in both) and then use the proposed expressions to build versions of the models supposed to behave similarly. Second, one should check if the models obtained in that way actually behave similarly. So that finally, third, one can elaborate on the consequences of this correspondence.

In this experimental analysis, in Sect. 4.1, we build a psychophysically inspired Wilson–Cowan model for V1 from a Divisive Normalization with psychophysically tuned parameters (Malo and Simoncelli 2015; Martinez-Garcia et al. 2018, 2019). This model also preserves the basic properties of the interaction kernel and the saturation function of the Wilson–Cowan literature (Wilson and Cowan 1973; Bressloff and Cowan 2003; Chossat and Faugeras 2009). This Wilson–Cowan model should behave similarly to the corresponding Divisive Normalization model.

Then, Sect. 4.2 experimentally checks the mathematical relation between the models. In particular, for a wide range of parameters: (a) we show that the integration of the Wilson–Cowan equation certainly converges to a solution which is close to its corresponding Divisive Normalization; (b) we quantify the accuracy of the approximations required to get the relation between the models; and (c) we show that the Divisive Normalization solution is a stable node of the dynamical system governed by the Wilson–Cowan equations.

Finally, in Sect. 4.3, we address different consequences on contrast perception using the proposed relation: (a) We analyze the signal-dependent behavior of the theoretically derived kernel and the benefits of the high-pass behavior to moderate the weight of the low-frequency components; (b) we show that the shape of the interactions between sensors changes depending on the surround; (c) we reproduce the contrast response curves with the proposed signal-dependent kernel; and (d) we discuss the use of the derived kernel in predicting the subjective metric of the image space.

4.1 Psychophysically Plausible Parameters for a Wilson–Cowan Model in V1

A possible way to check the relation between the models in V1 consists of starting from the (lower-level/mechanistic/physiological) Wilson–Cowan model and let it evolve to see if it converges to the (psychophysical) Divisive Normalization response. To this end, for our Wilson–Cowan model, we need reasonable \(\varvec{\alpha }\), \(\varvec{W}\), and \(f(\varvec{x})\), for \(\varvec{e}\) and \(\varvec{x}\) defined in certain wavelet representation.

For the wavelet representation here, we assume 4-orientation steerable transforms (Simoncelli et al. 1992) as a convenient model of the simple cells [as done in Schwartz and Simoncelli (2001), Martinez-Garcia et al. (2018, 2019)]. In the experiments involving the (computationally intensive) integration of the Wilson–Cowan differential equation, Sect. 4.2, we used wavelets with 3 scales in \(40\times 40\) images to speed up the computation. But in the psychophysical illustrations, Sect. 4.3, we used 4 scales in \(64\times 64\) images.

The reference parameters for the nonlinearity are taken from the Divisive Normalization model in Martinez-Garcia et al. (2019). In that case, the parameters corresponding to contrast computation, contrast sensitivity, and masking in the spatial domain were directly measured using Maximum Differentiation psychophysics (Malo and Simoncelli 2015), while the parameters related to brightness and to masking in the wavelet domain were tuned to reproduce subjective image quality data (Martinez-Garcia et al. 2018) and contrast perception curves (Martinez-Garcia et al. 2019).

As stated after Eq. 13, we took \(\varvec{W}\) as a Watson-Solomon separable Gaussian kernel (Watson and Solomon 1997) with widths in space/frequency/orientation taken from the psychophysically plausible values in Martinez-Garcia et al. (2019). In order to include both excitatory and inhibitory populations, we complemented this initial kernel with narrow excitatory neighborhoods whose width was a fraction of the original inhibitory neighborhoods. Finally, we normalized the absolute amplitude of the neighborhoods to have unit-norm center-surround interactions. Figure 4a–c illustrates the psychophysically sensible separable kernels \(\varvec{W}\). These unit-norm kernels \(\varvec{W}\) scaled as in Watson and Solomon (1997), Martinez-Garcia et al. (2019) are consistent with the shapes used in the Wilson–Cowan literature (Bressloff and Cowan 2003; Chossat and Faugeras 2009).

Fig. 4
figure 4

Psychophysically inspired parameters for Wilson–Cowan model. Illustration of the Gaussian neighborhoods \(\varvec{W}\) with excitatory and inhibitory parts: separable interactions depending on departure in space (a), frequency (b), and orientation (c). The spatial component of \(\varvec{W}\) is shown for the central location of the considered visual field and the 24 cpd vertical subband. Following Watson and Solomon (Watson and Solomon 1997), the Gaussian kernels (here difference of Gaussians) are separable in space, frequency, and orientation. Therefore, lower-frequency subbands have coarser sampling (and thus higher amplitude) but the same shape in space. The shape is also the same for subbands tuned to different orientations. Equivalent separability applies for variations in frequency and orientation: The interactions in the plots (b) and (c) are the same for every spatial location (and orientation and frequency, respectively). The above plots display the (more intuitive) \(-{\textbf{W}}\), where positive and negative values mean excitatory and negative interaction, respectively, but note that, according to the sign in Eq. 8, positive weights are inhibitory. (d) Auto-attenuation factor \(\varvec{\alpha }\). In this plot, the horizontal axis (wavelet coefficients in reverse order) can be qualitatively understood as frequency so high frequencies display bigger attenuation. The panels (e) and (f) show different options for the pointwise activation nonlinearity, \(f(\varvec{x})\). In pink, we have the \(\gamma \)-function proposed in Martinez-Garcia et al. (2018), and in red, we have the original proposal of Wilson–Cowan (Wilson and Cowan 1973). In both cases, we show the approximations of these functions using \(g_n(\varvec{x})\) (for increasing number of terms n), and the corresponding derivatives, which have an impact in the relation between the kernels in the different models (Eq. 13) (Color figure online)

Regarding the auto-attenuation we simply took the constants \(\varvec{k}\) and \(\varvec{b}\) from Martinez-Garcia et al. (2019) and used the first equation of the proposed relation, Eq. 13, to obtain \(\varvec{\alpha }\). Figure 4d shows the \(\varvec{\alpha }\) vector for the 3-scale wavelet (coefficients ordered from low to high frequency). Note that the response of sensors tuned to higher frequencies is more attenuated in the evolution of the differential equation, while low frequencies have lower auto-attenuation.

Finally, Fig. 4e, f, displays different activations \(f(\varvec{x})\) that we used in the experiments together with representative Euler approximations, \(g_n(\varvec{x}) \, \varvec{x}\), and the functions related to their derivatives, \(\frac{df}{dx}(\varvec{x})\) and \(g_n(\varvec{x})\). These activation functions include the original-activation in Wilson and Cowan (1973), Bressloff and Cowan (2003), and the so-called \(\gamma \)-activation inspired in retinal transduction (Martinez-Garcia et al. 2018). Appendix A gives the expressions of these activation functions. In our wavelet case, the horizontal and vertical axes of the function \(f(\cdot )\) to be applied to each coefficient x of certain subband are scaled by the average amplitude of the responses of the corresponding linear sensors to natural images. With that scaling, the nonlinearities preserve the relative scales of the input subbands in the vector \(\varvec{e}\) that comes from the linear filters.

In the next experiments, the above psychophysically sensible parameters (the reference values) are modified in several ways to show that the proposed relation works for a wide range of model parameterizations. Specifically, (a) we explored different widths of the interaction kernels by using five scaling factors applied to the reference widths: from unrealistically narrow (zero-width, identity kernels that disregard interactions), to unrealistically wide kernels (where the reference widths are increased by an order of magnitude); (b) we considered kernels with the above mentioned excitatory-inhibitory nature, and kernels with just-inhibitory nature; and (c) we considered two possible activations (the original-activation and the \(\gamma \)-activation). We considered a total of 12 parameterizations of the models: 5 kernel widths \(\times \) 2 excit-inhib configurations \(\times \) 1 activation (original-activ.) \(+\) 1 kernel width (the reference one) \(\times \) 2 excit-inhib configurations \(\times \) 1 activation (the \(\gamma \)-activ.).

The interested reader has access to the specific values of the parameters in Appendix A for the activations, and in the code that reproduces all the simulations of the paper (described in Appendix 1).

4.2 Experimental Check of Mathematical Properties

4.2.1 Wilson–Cowan Systems Converge to the Divisive Normalization

The Wilson–Cowan expression, Eq. 8, defines an initial value problem where the response at time zero evolves (or is updated) according to the right-hand side of the differential equation. In our case, we assume that the initial value of the output is just the input \(\varvec{x}(0) = \varvec{e}\). Moreover, as we deal with static images, we assume that the input is constant. And then, we solve this first-order differential equation by the simplest (Euler) integration method:

$$\begin{aligned} \varvec{x}(t+{\varDelta t}) = \varvec{x}(t) + \Bigl ( \varvec{e} - {\mathbb {D}}_{\varvec{\alpha }} \cdot \varvec{x}(t) - \varvec{W} \cdot f(\varvec{x}(t)) \Bigr ) {\varDelta t} \end{aligned}$$
(14)

Figure 5 shows the evolution of the response obtained from this integration, applied to 45 natural images taken from calibrated databases (Hateren and Schaaf 1998; Laparra et al. 2012), using the biologically sensible parameters \(\varvec{\alpha }\), \(\varvec{W}\) and \(f(\cdot )\) presented in Fig. 4 (Malo and Simoncelli 2015; Martinez-Garcia et al. 2018, 2019; Wilson and Cowan 1973; Bressloff and Cowan 2003; Chossat and Faugeras 2009), and the mentioned variations to cover a wide range of model parameterizations. Our Euler integration used a small enough discrete time step, \({\varDelta t}=10^{-5}\), and the initial responses, the vectors \(\varvec{e}\), were computed using the first 3 layers of the model in Fig. 1 (Martinez-Garcia et al. 2018, 2019) followed by a linear steerable wavelet of 4 orientations and 3 scales. The integration requires no approximation of the WC model, and the evolving solution is checked against the corresponding DN response that uses the proposed Eq. 13.

As can be seen, the solution of the Wilson–Cowan integration converges to the Divisive Normalization solution because their difference (percentage of relative mean squared error) decreases as it evolves in all the 12 considered parameterizations. The relative MSE in the psychophysically meaningful situations (\(\times \)1 width) is below \(3\%\) (lines in pink and red), and for the other configurations, it is always below \(6\%\). Therefore, the Divisive Normalization always explains more than \(94\%\) of the energy of the Wilson–Cowan solution. Moreover, these results represent the steady states because the updates of the solutions in the integrals always tend to zero (results not shown).

Fig. 5
figure 5

Convergence to the Divisive Normalization solution (I): quantitative error. Percentage of relative MSE between the solution of the Wilson–Cowan equations and the Divisive Normalization response as a function of discrete time (steps in Euler integration). Left: relative MSE for different kernel widths and activation/saturation functions in case of just inhibitory interactions. Right: relative MSE for different kernel widths and saturation functions in case of excitatory+inhibitory interactions. The configurations in pink and red are the psychophysically meaningful ones, which achieve 0.6% and 2.7% relative MSE, respectively. In all cases, the update of the solution tends to zero (results not shown) indicating the approach to a steady state. The curves in bold style represent the median (50 quantile) over 45 natural images, and the curves in light style (very close to the median) represent the 25 and 75 quantiles. This result suggests the successful convergence of WC to DN for natural images over a wide range of model parameterizations (Color figure online)

Fig. 6
figure 6

Convergence to the Divisive Normalization solution (II): qualitative similarity. a Input image (stimulus at the retina), b Responses of the linear simple cells of V1. Responses are spatially non-stationary, see regions of very low amplitude highlighted in yellow and blue. c Steady state of the Wilson–Cowan equation after 650 iterations of Eq. 14 with inhibitory kernel W of psychophysically sensible width and \(\gamma \)-activation, d Corresponding Divisive Normalization response using the kernel given by Eq. 10. In this example, the relative MSE between \(\varvec{x}_{\text {WC}}\) and \(\varvec{x}_{\text {DN}}\) is 0.93%, but, more importantly, note how the amplitude of the response of the highlighted neurons has increased similarly in the nonlinear cases leading to a more stationary response (Color figure online)

Figure 6 illustrates the qualitative similarity of the responses of the two models and their comparable equalization effect in the wavelet domain. The nonlinear response \(\varvec{x}_{\text {WC}}\) was computed by integrating Eq. 14, and \(\varvec{x}_{\text {DN}}\) was computed with Eq. 3. We used the parameters introduced in Sect. 4.1 and the corresponding parameters for Divisive Normalization using Eq. 13.

Note how the nonlinearities substantially increase the amplitude of the signal in the regions where the linear response is low. The regions highlighted in blue and orange in \(\varvec{e}\) display low activity compared to their neighbors because there are no edges in those regions of the image. However, the corresponding neurons after Wilson–Cowan or Normalization have increased their activity. The amplitude of the signal after the nonlinearities is more stationary across the subbands. Moreover, the nonlinearities lead to responses where the image structure is less apparent: The activity of a neuron is more independent from the activity of the neighbors. Equalization and increased independence qualitatively suggested in Fig. 6 are consistent with previous (quantitative) studies that report redundancy reduction both in Divisive Normalization (Schwartz and Simoncelli 2001; Malo and Laparra 2010; Malo 2020), and in the Wilson–Cowan model (Gomez-Villa et al. 2020).

4.2.2 Quantification of the Accuracy of the Approximations

The proposed relation, Eq. 13, is based on two approximations:

  • The approximation of the inverse of Divisive Normalization to obtain Eq. 11, namely: \(\left( I - {\mathbb {D}}^{-1}_{\varvec{k}} \cdot {\mathbb {D}}_{\varvec{x}} \cdot \varvec{H} \right) ^{-1} \approx I + {\mathbb {D}}^{-1}_{\varvec{k}} \cdot {\mathbb {D}}_{\varvec{x}} \cdot \varvec{H} \).

  • The approximation of \(f(\varvec{x})\) in Wilson–Cowan to obtain Eq. 12, namely: \(f(\varvec{x}) \approx g_n(\varvec{x}) \, \varvec{x} = \left( \frac{1}{n}\sum _{\beta =0}^{n-1} \frac{df}{dx}(\beta \frac{\varvec{x}}{n}) \right) \varvec{x}\).

The accuracy of such approximations depends on the model parameters, e.g., the shape and magnitude of \(\varvec{H}\) or \(f(\cdot )\), and on the responses \(\varvec{x}\) to natural images. The low amplitude of the coefficients of natural images in wavelet representations (Olshausen and Field 1996; Malo et al. 2000) and the accuracy of similar approximations for psychophysically sensible parameters (Malo et al. 2006) suggest that errors will be small. However, in this section we explicitly compute both sides of the above expressions (with and without approximation) for a range of representative images and model parameterizations, and we compute the difference between both sides. This difference is the error due to the approximation. We express the energy of the difference (the mean squared error) in terms of percentage of the energy of the function with no approximation: the relative MSE in %.

In Table 1, we show the relative MSE (in %) for both approximations (inverse and f(x)) together with the error in convergence, also in relative MSE, for 12 different model parameterizations. The approximations of \(f(\varvec{x})\) were done using \(g_{10}(\varvec{x})\), i.e., computing derivatives in 10 points.

The approximations generally explain more than \(90\%\) of the energy of the original magnitudes. The only exception is the approximation of the inverse of the Divisive Normalization for the (unrealistic) zero-width kernel, where the relative MSE amounts to \(\sim 30\%\). The deviation in this unrealistic case makes sense because reducing the width of unit-volume kernels increases their height and hence the magnitude of \(\varvec{H}\) increases so the term summed to the identity in the expression under consideration is not as small. This leads to an increased error in the approximation.

Interestingly, in the whole range of parameterizations considered, the approximations do not have a big impact in the convergence error, which is the actual measure of correspondence between the two models.

Table 1 Accuracy of the approximations and convergence error.

4.2.3 Stability Analysis of the Divisive Normalization Response

The stability of a dynamical system at the steady state is determined by the Jacobian with regard to perturbations in the response: If the eigenvalues of this Jacobian are all negative for this response, it is a stable node of the system (Logan 2015). In that situation, the evolution of the perturbations is a vector field oriented toward the stable node.

In our case, the Jacobian with regard to the output signal of the right-hand side of the Wilson–Cowan differential equation, Eq. 8, is:

$$\begin{aligned} J = - ({\mathbb {D}}_{\alpha } + \varvec{W} \cdot {\mathbb {D}}_{\frac{df}{dx}(\varvec{x})}) \end{aligned}$$
(15)

Figure 7 shows the eigenvalues of this Jacobian using a wide range of parameters (the 12 configurations obtained through variations of the reference values presented in Sect. 4.1), with responses from a set of 45 representative natural images from colorimetrically calibrated datasets (Hateren and Schaaf 1998; Laparra et al. 2012). This result shows that all the eigenvalues are negative, thus suggesting that the Divisive Normalization solution is a stable node of the dynamical system.

Fig. 7
figure 7

Stability of the Divisive Normalization solution (I). Eigenspectrum of the Jacobian of the right-hand side of the Wilson–Cowan differential equation with psychophysically tuned parameters on natural images (curves in pink and red), and different additional configurations (different widths and activation functions), including just inhibitory interactions (Left) and excitatory–inhibitory interaction (right). The curves refer to the median of the eigenvalues over 45 representative images extracted from the calibrated datasets (Hateren and Schaaf 1998; Laparra et al. 2012). The standard deviation and quartile distance are so small that cannot be seen in the plot. The result shows that the eigenvalues are all negative. This suggests that the Divisive Normalization is a stable node of the system for natural images for a wide range of model parameterizations (Color figure online)

The stability of the system can be further illustrated by the visualization of the vector field of perturbations in the phase space of the system (Logan 2015). In this case, we visualize this vector field for the Divisive Normalization solution. As the signals in our problem live in very high-dimensional spaces (the wavelet vectors in this section have dimension 10,025), it is not possible to visualize the complete phase space, so we just select some illustrative 3-dimensional and 2-dimensional examples.

Figure 8 (left) shows an example taking just 3 neurons of the V1 layer. In this case, we took a particular image (the standard image Lena) and we focused on the response of 3 specific sensors of the low-frequency scale of the Divisive Normalization vector: the 9700, 9800, and 9900-th responses. In that way, we get the red circle in Fig. 8 (left). Arbitrary perturbations of the responses of these neurons lead to the dynamics shown in the phase space: The vector field induced by the Jacobian implies that any perturbation is sent back to the original (no-perturbation) response, which is, then, a stable node of the system.

Fig. 8
figure 8

Stability of the Divisive Normalization solution (II). Vector fields in the phase space generated by the Jacobian of the psychophysically tuned Wilson–Cowan model. The example at the left (red dot) corresponds to the response of 3 low-frequency sensors at the Divisive Normalization solution. This vector field describes the evolution of the response if it is perturbed in arbitrary directions (the cardinal directions, in red, or any combination of them). The result is general: Examples at the right show similar results for pairs of sensors tuned to different frequencies for different input stimuli (Color figure online)

Similar behavior is obtained for coefficients of other subbands or other images. See Fig. 8 (right), where, for simplicity, we consider perturbations in pairs of neurons.

In summary, the Divisive Normalization solutions are stable nodes of the corresponding Wilson–Cowan systems. This conclusion confirms the assumption under the proposed relation: Divisive Normalization as a steady state of the Wilson–Cowan dynamics.

4.3 Consequences on Contrast Perception

The proposed relation implies that the Divisive Normalization kernel inherits the structure of the Wilson–Cowan interaction matrix (typically Gaussian (Wilson and Cowan 1973; Chossat and Faugeras 2009)), modified by some specific signal dependent diagonal matrices, as seen after Eq. 13, and allows to explain a range of contrast perception phenomena.

First, regarding the structure of the kernel, we show that our prediction is consistent with previously required modifications of the Gaussian kernel in Divisive Normalization to reproduce contrast perception (Martinez-Garcia et al. 2019). Second, we show that the kernel in Divisive Normalization modifies its shape depending on the signal, thus explaining the behavior previously reported in Cavanaugh et al. (2002b), Coen-Cagli et al. (2012). Third, we use the predicted signal-dependent kernel to simulate contrast response curves consistent with Foley (1994), Watson and Solomon (1997). And finally, the proposed relation is also applied to reproduce the experimental visibility of spatial patterns in more general contexts as subjective image quality assessment (Ghadiyaram and Bovik 2016; Ponomarenko et al. 2008, 2009).

In this section, we do not integrate the Wilson–Cowan differential equation, but we use the expression for the steady-state solution with the kernel obtained from the proposed relation. This alleviates computation so, as opposed to the previous Section, in the following examples we use a wavelet representation of higher dimensionality, with 4 scales and 4 orientations, applied on bigger images, \(64\times 64\). Regarding the parameters, we use unit-norm Gaussian kernels in \(\varvec{H}^{ws}\) or \(\varvec{W}\), and constants \(\varvec{k}\) and \(\varvec{b}\) also defined over 4 scales and 4 orientations, directly taken from Martinez-Garcia et al. (2019).

4.3.1 Structure of the Kernel in Divisive Normalization

Here, we compare the empirical filters \({\mathbb {D}}_{\varvec{l}}\) and \({\mathbb {D}}_{\varvec{r}}\) that had to be introduced ad hoc in Martinez-Garcia et al. (2019), with the theoretical ones obtained through Eq. 13.

Fig. 9
figure 9

Linear and nonlinear responses in V1 for an illustrative stimulus. a Retinal image composed of a natural image and two synthetic patches of frequencies 24 and 12 cpd. This image goes through the first stages of the model (see Fig. 1) up to the cortical layer, where a set of linear wavelet filters leads to the responses with energy \(\varvec{e}\), which are nonlinearly transformed into the responses \(\varvec{x}\). b.1 Wavelet panel that represents \(\varvec{e}\). c.1 Wavelet panel that represents \(\varvec{x}\). The highlighted sensors in red and blue (tuned to different locations of the 24 cpd scale, horizontal orientation) have characteristic responses given the image patterns in those locations. The plots b.2 and c.2 show the vector representation of the wavelet responses arranged according to the MatlabPyrTools convention (Simoncelli et al. 1992). These plots show how natural images typically have bigger energy in the low-frequency sensors. d Input–output scatter plots at different spatial frequencies, in cycles per degree (cpd), and demonstrate that Divisive Normalization (and the Wilson–Cowan solution) imply adaptive saturating nonlinearities depending on the neighbors (i.e., a family of sigmoid functions) (Color figure online)

Before going into the details of the kernel, let’s get some intuition on the typical structure of the vectors \(\varvec{x}\) and \(g_n(\varvec{x})\). Figure 9 shows an illustrative stimulus with oriented textures and the corresponding responses of linear and nonlinear V1-like sensors based on steerable wavelets. Typical responses for natural images are low-pass signals (see the vectors in Fig. 9b.2, c.2). The response in each subband is an adaptive (context dependent) nonlinear transduction (Fig. 9d). Each point in Fig. 9d represents the input–output relation for each neuron in the subbands of the different scales (from coarse to fine). As each neuron has a different neighborhood, there is no simple input–output transduction function, but a scatter plot representing different instances of an adaptive transduction.

The considered image is designed to lead to specific excitations in certain sensors (subbands and locations in the wavelet domain). Note, for instance, the high- and low-frequency synthetic patterns (24 and 12 cycles per degree, cpd, horizontal and vertical, respectively) in the image regions highlighted with the red and blue dots. In the wavelet representations, we also highlighted some specific sensors in red and blue corresponding to the same spatial locations and the horizontal subband tuned to 24 cpd. Given the tuning properties of the neurons highlighted in red and blue, it makes sense that wavelet sensor in red has bigger response than the sensor in blue.

Fig. 10
figure 10

Empirical and theoretical modulation of the Divisive Normalization kernel. Vectors in diagonal matrices \(D_{\varvec{l}}\) and \(D_{\varvec{r}}\) that multiply the Gaussian kernel in the empirical tuning represented by Eq. 10 (top), and in the theoretically derived Eq. 13 (bottom). These theoretical filters correspond to a specific natural image and using the \(g_{10}(\varvec{x})\) approximation of the \(\gamma \)-activation. The median relative RMSEs for the predicted filters over 45 natural images are 10.2% (for the left filter) and 5.4% (for the right filter). The difference for the right filter using the original Wilson–Cowan activation and also \(g_{10}(\varvec{x})\) is 5.1% (curve not shown). However, as the empirical filters were just ad hoc adjusted in Martinez-Garcia et al. (2019), here the relevant is the reproduction of the required high-pass structure, not the MSE

With this knowledge of the signal in mind: (1) low-pass trend in \(\varvec{x}\) shown in Fig. 9, (2) bigger derivative \(g_n(\varvec{x})\) for high frequencies because the derivative is higher for low amplitude signals (see Fig. 4), and (3) the vector \(\varvec{b}\) is bigger for low frequencies (Martinez-Garcia et al. 2019), we can understand the high-pass nature of the vectors included in the diagonal matrices that appear at the left and right sides of the theoretically derived kernel \(\varvec{H} = {\mathbb {D}}_{\left( \frac{\varvec{k}}{\varvec{x}}\right) } \cdot \varvec{W} \cdot {\mathbb {D}}_{\left( \frac{\varvec{k}}{\varvec{b}} \odot {g_n(\varvec{x})}\right) }\).

Figure 10 compares the empirical left and right vectors, \(\varvec{l}\) and \(\varvec{r}\) that were adjusted ad hoc to reproduce contrast curves in Martinez-Garcia et al. (2019), with those based on the theoretical relation proposed here. In this case, we only consider the comparison with the psychophysically sensible parameterization since the ad hoc tuning was done for that specific scenario. As these empirical filters were just qualitatively adjusted in Martinez-Garcia et al. (2019), the reproduction of their high-pass nature and their order of magnitude is more important than the specific MSE values.

The similarity of the structure of the empirical and theoretical interaction matrices (Eqs. 1013) and the coincidence of empirical and theoretical filters (Fig. 10) suggest that the proposed theory explains the modifications that had to be introduced in classical unit-norm kernels in Divisive Normalization to explain contrast response.

4.3.2 Shape Adaptation of the Kernel Depending on the Signal

Once we have shown the global high-pass nature of the vectors \(\frac{\varvec{k}}{\varvec{x}}\) and \(\frac{\varvec{k}}{\varvec{b}} \odot {g_n(\varvec{x})}\), let’s see in more detail the signal-dependent adaptivity of the kernel. In order to do so, let’s consider the interaction neighborhood of two particular sensors in the wavelet representation of an illustrative stimulus with easy to understand features. Specifically, the sensors are highlighted in red and blue in Fig. 9.

Figure 11 compares different versions of the two individual neighborhoods displayed in the same wavelet representation: left the unit-norm Gaussian kernels, \(\varvec{H}^{ws}\), and right the empirical kernel modulated by ad hoc pre- and post-filters, Eq. 10. In these diagrams, lighter gray in each j-th sensor corresponds to bigger interaction with the considered i-th sensor (highlighted in color). The gray values are normalized to the global maximum in each case. Each subband displays two Gaussians. Obviously, each Gaussian corresponds to only one of the sensors (the one highlighted in red or in blue, depending on the spatial location of the Gaussian). We used a single wavelet diagram since the two neighborhoods do not overlap, and there is no possible confusion between them.

Fig. 11
figure 11

Gaussian and empirical interaction kernels for the sensors highlighted in red and light blue in Fig. 9. Gaussian kernel (left) with overestimated contribution of low-frequency subbands (highlighted in orange). Handcrafted kernel (right) to reduce the influence of low-frequency subbands (highlighted in green) (Color figure online)

In the base-line unit-norm Gaussian case, \(\varvec{H}^{ws}\), a unit-volume Gaussian in space is defined centered in the spatial location preferred by the i-th sensor. Then, the corresponding Gaussians at every subband are weighted by a factor that decays as a Gaussian over scale and orientation from the maximum, centered at the subband of the i-th sensor.

The problem with the unit-norm Gaussian in every scale is that the reduced set of sensors for low-frequency scales lead to higher values of the kernel so that it has the required volume. In that situation, the impact of activity in low-frequency subbands is substantially higher. This fact, combined with the low-pass trend of wavelet signals, implies a strong bias of the response and ruins the contrast masking curves. This problem is represented by the relatively high values of the neighborhoods in the low-frequency subbands highlighted in orange.

This overemphasis in the low-frequency scales was corrected ad hoc using right and left multiplication in Eq. 10 by handcrafted high-pass filters. The effect of these filters is to reduce the values for the Gaussian neighborhoods at the low-frequency scales, as seen in the empirical kernel in Fig. 11-right. The positive effect of the high-pass filters is reducing the impact of the neighborhoods at low-frequency subbands (highlighted in green).

In both cases (the classical \(\varvec{H}^{\varvec{ws}}\), and the handcrafted \(\varvec{H} = {\mathbb {D}}_{\varvec{l}} \cdot \varvec{H}^{\varvec{ws}} \cdot {\mathbb {D}}_{\varvec{r}}\)), the size of the interaction neighborhood (the interaction length) is signal independent. Note that the neighborhoods for both sensors (red and blue) are the same, regardless of the different stimulation that can be seen in Fig. 9.

Figure 12 shows the kernels obtained from Eq. 13. The three components of \(\varvec{H}\) are: in Fig. 12a the term proportional to \(\frac{1}{\varvec{x}}\), in Fig. 12b the term based on Gaussian neighborhoods \(\varvec{W}\), and in Fig. 12c the term proportional to \(g_n(\varvec{x})\). Finally, Fig. 12d shows the global result of the product of the three terms and Fig. 12e zooms on the high-frequency horizontal subband that contains the co-linear situation considered in the physiological experiments (Cavanaugh et al. 2002b).

Fig. 12
figure 12

Changes in the shape of the interaction in the theoretically derived kernel. Panels ac show the isolated factors in the kernel matrix \(\varvec{H}\), assuming a Gaussian wiring in \(\varvec{W}\). The Gaussian component implies interactions with low frequencies (highlighted in orange). d Shows the interaction kernel resulting from the product of the three factors \(\varvec{H} = {\mathbb {D}}_{\left( \frac{\varvec{k}}{\varvec{x}}\right) } \cdot \varvec{W} \cdot {\mathbb {D}}_{\left( \frac{\varvec{k}}{\varvec{b}} \odot g_n(\varvec{x})\right) }\), for the two highlighted points. Here, we used \(g_{10}(\varvec{x})\). Note the high-pass effect of the left- and right-matrix product over \(\varvec{W}\), that removed the interaction in the low-frequency subbands, now highlighted in dark green. e Zoom on the high-frequency horizontal subband. The term depending on the derivatives implies changes of the shape of the kernel (from circular to horizontal ellipses) when the context is a high contrast horizontal pattern. This is compatible with the probabilities of co-assignment (Coen-Cagli et al. 2012) recalled in Fig. 3b (Color figure online)

These three terms have the following positive results: (1) the product by the high-pass terms moderates the effect of the unit-norm Gaussian at low-frequency subbands as in the empirical kernel tuned in Martinez-Garcia et al. (2019) shown in Fig. 11-right, (2) the term proportional to \(\frac{1}{\varvec{x}}\) scales the interaction length according to the signal, and (3) the shape of the kernel depends on the signal because \(H_{ij}\) is modulated by \((g_n(\varvec{x}))_j\), and this implies that when the surround is aligned with the sensor, the kernel elongates in that direction (as the probability of co-assignment in Fig. 3b). This will lead to smaller responses when the sensor is flanked by co-linear stimuli [as in Cavanaugh et al. results (Cavanaugh et al. 2002b)].

In summary, deriving the Divisive Normalization as the steady state of a Wilson–Cowan system with Gaussian unit-norm wiring explains two experimental facts: (1) the high-pass filters that had to be added to the structure of the kernel in Divisive Normalization to reproduce contrast responses (Martinez-Garcia et al. 2019), and (2) the adaptive asymmetry of the kernel that changes its shape depending on the background texture (Nelson and Frost 1985; Deangelis et al. 1994; Walker et al. 1999; Cavanaugh et al. 2002a, b).

4.3.3 Contrast Response Curves from the Wilson–Cowan Model

The above results suggest that the Wilson–Cowan model could successfully reproduce contrast response curves and masking, which have not yet been addressed through this model. Here, we explicitly check this hypothesis.

We can use the proposed relation, Eq. 13, to plug successful parameters of Divisive Normalization tuned for contrast perception into the corresponding Wilson–Cowan model. We can avoid the integration of the differential equation using the knowledge of the steady state. The only problem to compute the response through the steady-state solution is that the kernel of the Divisive Normalization depends on the (still unknown) response.

In this case, we compute a first guess of the response, \(\hat{\varvec{x}}\), using the fixed handcrafted kernel tuned in Martinez-Garcia et al. (2019), and then, this first guess is used to compute the proposed signal-dependent kernel, which in turn is used to compute the actual response, \(\varvec{x}\).

Fig. 13
figure 13

Contrast response curves obtained from the Wilson–Cowan model. Contrast response curves for low spatial frequency vertical tests (left) and high spatial frequency horizontal tests (right) seen on top of backgrounds of different spatial frequencies, orientations, and contrasts (see representative stimuli in the insets). The backgrounds include: (1) two spatial frequencies (low and high, corresponding to the top and bottom rows, respectively); (2) two orientations (vertical and horizontal, as seen in the insets); and (3) four different contrasts represented by the line styles (0.0, 0.15, 0.30, and 0.45, corresponding to the black solid line, blue solid line, dotted blue line, and dashed blue line, respectively). The responses display the qualitative trends of contrast perception: frequency selectivity, saturation with contrast, and cross-masking depending on spatio-frequency similarity between test and background (Color figure online)

Figure 13 shows the response curves corresponding to neurons that are tuned to low and high spatial frequency tests, as a function of the contrast of these tests located on top of backgrounds of different contrast, spatial frequency, and orientation. In each case, we considered four different contrasts for the background (represented by the different line styles). Representative stimuli are shown as image patches inside each plot. The results in this figure display the expected qualitative properties of contrast perception:

Frequency selectivity.

The magnitude of the response depends on the frequency of the test: Responses for the low-frequency test are bigger than the responses for the high-frequency test. This frequency-dependent behavior in Fig. 13 is consistent with the Contrast Sensitivity Function (Campbell and Robson 1968).

Saturation.

The responses increase with the contrast of the test, but this increase is nonlinear (saturates), and the responses decrease with the contrast of the background. This behavior in Fig. 13 is consistent with the contrast discrimination results in Legge and Foley (1980), Legge (1981).

Cross-masking.

Reduction of the responses depends on the frequency similarity between test and background. Note that the low-frequency test is more attenuated by the low-frequency background of the same orientation than by the high-frequency background of orthogonal orientation. Similarly, the high-frequency test is more affected by the high-frequency background of the same orientation. This behavior in Fig. 13 is consistent with cross-masking results in Foley (1994), Watson and Solomon (1997).

4.3.4 Metric in the Image Space from the Wilson–Cowan Model

As a result of the derived relation between models, Eq. 13, the Wilson–Cowan model may also be used to predict subjective image distortion scores. In this section, we explicitly check the performance of the Wilson–Cowan response to compute the visibility of distortions from neural differences following the same approach detailed in the previous section regarding the computation of the signal-dependent kernel and its use to obtain the steady state.

Fig. 14
figure 14

Subjective image distortion using the handcrafted kernel (Martinez-Garcia et al. 2019) (left), and the kernel based on Wilson–Cowan equations (right). For each scatter plot, the Pearson correlation between Mean Opinion Scores (ordinates) and predicted image distortions (abscissas) is given. Differences in the correlations are not statistically significant indicating the validity of the proposed relation

The TID database (Ponomarenko et al. 2008, 2009) contains natural images modified with many kinds of degradation and has the experimental subjective distortion for each degraded image. Given a model, the theoretical prediction of the subjective distortion is obtained from the modulus \(|\varvec{x}_{\text {orig}}-\varvec{x}_{\text {distort}}|\), i.e., the Euclidean difference of the model responses to the original and to the degraded images.

Figure 14 compares these predictions (abscissas) with the experimental distortions (ordinates) for the responses with a fixed interaction kernel (the conventional Divisive Normalization approach, in blue) and with the proposed signal-adaptive kernel obtained from the Wilson–Cowan model in red.

The high values obtained for the Pearson’s correlation coefficients in both cases, and the close similarities between the plots, prove the good performance of the models and the validity of the proposed relation between them.

5 Final Remarks

In this paper, we derived an analytical relation between two well-known models of nonlinear neural interaction: the Wilson–Cowan model (Wilson and Cowan 1972, 1973) and the Divisive Normalization model (Carandini and Heeger 1994, 2012). Specifically, assuming that the Divisive Normalization is the steady-state solution of the Wilson–Cowan differential equations, the Divisive Normalization interaction kernel may be derived from the Wilson–Cowan kernel weighted by two signal-dependent contributions.

We showed the appropriateness of the proposed relation in a range of model parameterizations by checking the convergence of the Wilson–Cowan solution to the Divisive Normalization solution, and by proving that the Divisive Normalization solution is a stable node of the Wilson–Cowan system.

Moreover, the derived relation has the following implications in contrast perception: (a) the specific structure obtained for the interaction kernel of Divisive Normalization explains the need of high-pass filters for unit-norm Gaussian interactions to describe contrast masking found in Martinez-Garcia et al. (2019; b) the signal-dependent kernel predicts elongations of the interaction neighborhood in backgrounds aligned with the sensor, thus providing a mechanistic explanation to the adaptation facts found in Cavanaugh et al. (2002a, 2002b); and (c) low-level Wilson–Cowan dynamics may also explain behavioral aspects that have been classically explained through Divisive Normalization, such as contrast response curves (Foley 1994; Watson and Solomon 1997), or image distortion metrics (Laparra et al. 2010; Berardino et al. 2017). This is the first work that justifies why the Wilson–Cowan interaction successfully reproduces image distortion metrics and contrast response curves. As stated in Bertalmío et al. (2020a), there are not many works that explore the use of Wilson–Cowan equations to model psychophysics, so the examples presented in this work are relevant to fill this gap.

The assumption of the discrete time step \(\varDelta t\) in the Euler integration of the Wilson–Cowan equations, Eq. 14, has implications which were not considered in this work. Here, the specific value for \(\varDelta t\) was just an arbitrary choice done for computational convenience. If one could assume that this \(\varDelta t = 10^{-5}\) is measured in seconds, as the process converges in about 400–500 Euler steps (as shown in Fig. 5), this would mean 4–5 ms to arrive at the steady-state. As an illustration, 4–5 ms would not be a relevant time delay for visual processing of motion, because the cutoff frequency of the temporal Contrast Sensitivity is about 70–100 Hz (so events below 10–15 ms are disregarded), see Kelly (1979). However, here, the choice for this \(\varDelta t\) is not based in the biophysics of the neural interactions, and it may indeed be larger (Zeraati 2023). If the interaction time is in fact larger, the convergence will take longer than 4–5 ms. This would imply that the actual behavior would be given by the dynamic Wilson–Cowan model and not by the Divisive Normalization approximation. This may imply that the use of static models like DN should be limited to slow-varying stimuli, or that the use of DN is more correct on a certain region of spatiotemporal frequencies (or speeds). Nevertheless, the detailed analysis of that region from a sensible biophysical estimation of \(\varDelta t\) is out of the scope here and a matter for further work.

Finally, the relation between models proposed here opens the possibility to analyze Divisive Normalization from new perspectives, following methods that have been developed for Wilson–Cowan systems (Destexhe and Sejnowski 2009). Similarly, mechanisms that generalize the Wilson–Cowan equation such as the neurons with intrinsically nonlinear receptive fields (Bertalmío et al. 2020b) could be analyzed via information theoretic tools that have been used to quantify the performance of Divisive Normalization (Malo 2020, 2022; Saproo and Serences 2014).