Cortical Divisive Normalization from Wilson-Cowan Neural Dynamics

Divisive Normalization and the Wilson-Cowan equations are well-known influential models of nonlinear neural interaction [ Carandini and Heeger Nature Rev. Neurosci. 2012; Wilson and Cowan Kybernetik 1973]. However, they have been always treated as different approaches, and have not been analytically related yet. In this work we show that Divisive Normalization can be derived from the Wilson-Cowan dynamics. Specifically, assuming that Divisive Normalization is the steady state of the Wilson-Cowan differential equations, we find that the kernel that controls neural interactions in Divisive Normalization depends on the Wilson-Cowan kernel but also depends on the signal. A standard stability analysis of a Wilson-Cowan model with the parameters obtained from our relation shows that the Divisive Normalization solution is a stable node. This stability suggests the appropriateness of our steady state assumption. The proposed theory provides a mechanistic foundation for the suggestions that have been done on the need of signal-dependent Divisive Normalization in [ Coen-Cagli et al. PLoS Comp.Biol. 2012]. Moreover, this theory explains the modifications that had to be introduced ad-hoc in Gaussian kernels of Divisive Normalization in [ Martinez et al. Front. Neurosci. 2019] to reproduce contrast responses in V1 cortex. Finally, the derived relation implies that the Wilson-Cowan dynamics also reproduce visual masking and subjective image distortion, which up to now had been explained mainly via Divisive Normalization.


Introduction
A number of perceptual experiences in different modalities can be described with the Divisive Normalization interaction among the outputs of linear neurons [1,2].In particular, in vision, the perception of color [3], texture [4], and motion [5] seem to be mediated by this nonlinear interaction.Intuitively, this divisive interaction modifies the response of a sensor by normalizing it with the responses of the neighbor neurons, thus explaining inhibition by the surround.
The discussion on the circuits underlying the Divisive Normalization in [2] suggests that there may be different architectures leading to this specific computation.Recent results suggest specific mechanisms for Divisive Normalization in certain situations [6], but the debate on the physiological implementations is still open.On the other hand, a number of functional advantages [7][8][9][10] suggest that the kernel that describes the interaction in Divisive Normalization should be adaptive (i.e.signal or context dependent).Moreover, the match between the linear receptive fields and the interaction kernel in the Divisive Normalization is not trivial: the conventional Gaussian kernels in [4,11] had to be tuned by hand to reproduce contrast responses [12].
These open questions imply that it is interesting to relate Divisive Normalization to other models of neural interaction for a better understanding of its implementation, the structure of the interaction kernel, and its eventual dependence with the signal.Interesting possibilities to consider are the classical dynamic neural field models of Wilson-Cowan [13][14][15], Amari [16], or Grossberg [17,18], all of which have a similar subtractive nature [19]: in subtractive models, the surround modifies the activity of a given sensor by substracting a weighted average of its neighbor's responses, as opposed to the division made in normalization models.
Subtractive and divisive adaptation models have been qualitatively related before [20,21].Both models have been shown to have similar advantages in information-theoretic terms: the Wilson-Cowan interaction in a neural layer uniformizes the probability density of the responses [22], and there is less redundancy among them [23].Similarly, Divisive Normalization layers reduce the relations between the responses [24], factorize the joint probability of the responses [11], and maximize the transmitted information from the input [25,26].Additionally, both models provide similar descriptions of pattern discrimination [20,27].However, despite all these similarities, no direct analytical correspondence has been established between these models yet.
In this paper, we assume that the psychophysical behavior described by Divisive Normalization comes from underlying neural interactions that follow the Wilson-Cowan equations.In particular, we identify the Divisive Normalization response with the stationary regime of a Wilson-Cowan system.From this identification we derive an expression for the Divisive Normalization kernel in terms of the interaction kernel of the Wilson-Cowan equations.
This analytically derived relation has the following interesting consequences: (1) It has been suggested that Divisive Normalization should depend on the input because of functional reasons [9,10], but no physiological mechanism was proposed to implement this statistical adaptation.The proposed relation of the Divisive Normalization with a dynamical system with fixed wiring among neurons provides a mechanistic explanation for this dependence with the input.
(2) The relation explains the modifications that had to be introduced ad-hoc in the kernel of Divisive Normalization in [12] to reproduce contrast responses.This implies that the Wilson-Cowan dynamics reproduce visual masking, which up to now had been mainly explained via Divisive Normalization [4,28].
(3) The relation allows to build effective image quality metrics based on the Wilson-Cowan model, something which, to the best of our knowledge, hasn't been considered before in the literature, as opposed to the many examples of metrics based on Divisive Normalization [11,[29][30][31][32].
(4) A standard stability analysis of a Wilson-Cowan model with the parameters obtained from our relation shows that the Divisive Normalization solution is a stable node of the dynamic model.This shows the appropriateness of our steady state assumption.Moreover, this stability justifies the straightforward use of Divisive Normalization with time-varying stimuli, as in [5].
The structure of the paper is as follows.The Materials and Methods section reviews the retina-V1 neural path and the contrast perception of visual patterns.We also introduce the notation of the models: the Divisive Normalization and the Wilson-Cowan equations.Besides, we recall some experimental facts that will be used to illustrate the performance of the proposed relation: (1) contrast responses curves imply certain interactions between subbands [4,33], (2) the Divisive Normalization kernel must have a specific structure (identified in [12]) to reproduce contrast response curves, and (3) the shape of the Divisive Normalization kernel should have a specific dependence with the surrounding signal [34,35].In the Analytical Results section we derive the relation between the Divisive Normalization and the Wilson-Cowan equations based on the steady state assumption.The Numerical Experiments section illustrates with simulations the mathematical properties and the perceptual consequences of the proposed relation.First, we experimentally check the convergence of the Wilson-Cowan solution to the Divisive Normalization response in a wide range of model parameterizations.We quantify the error introduced by the approximations done to get the analytical results.Moreover, we illustrate the appropriateness of the steady state assumption by showing that the Divisive Normalization is a stable node of the Wilson-Cowan system.Then, we address contrast perception facts using the proposed relation to build a psychophysically meaningful Wilson-Cowan model: we theoretically derive the specific structure of the kernel that was previously inferred empirically [12], we show that the proposed interaction kernel adapts with the signal, and as a result, we reproduce general trends of contrast response curves.Finally, we discuss the use of the derived kernel in predicting the perceptual metric of the image space.The Final Remarks section concludes the paper.

Materials and Methods
In this work the theory is illustrated in the context of models of the retinacortex pathway.The considered framework follows the approach suggested in [2]: a cascade of four isomorphic linear+nonlinear modules.These four modules sequentially address brightness, contrast, frequency filtered contrast masked in the spatial domain, and orientation/scale masking.An example of the transforms of the input in such models is shown in Fig. 1.
In this general context we focus on the cortical (fourth) layer: a set of linear sensors with wavelet-like receptive fields modelling simple cells in V1, and the nonlinear interaction between the responses of these linear sensors.Divisive Normalization has been the conventional model used for the nonlinearity to describe contrast perception psychophysics [4], but here we will explore the application of the Wilson-Cowan model in the contrast perception context.
Below we introduce the notation of both models of neural interaction and the facts on contrast perception that should be explained by the models.

Modelling cortical interactions
Our focus here is the last linear+nonlinear module of the retina-V1 cascade in Fig. 1, and specifically the nonlinear layer that describes the interactions in the primary visual cortex V1.The driving input of this final nonlinear layer is the vector of energies, e, of the responses of linear wavelet-like simple cells, and the output of this interaction is the vector of nonlinear responses x: In this work the two models considered describe the interaction N between the linear simple cells in V1.The Wilson-Cowan equations model neural firing rate dynamics that may converge to a steady state.If that is the case, the long-term behavior of the Wilson-Cowan equations may be similar to the Divisive Normalization model, since the latter models static neural firing rates.

The Divisive Normalization model
Forward transform.The conventional expression of the canonical Divisive Normalization [2] uses an element-wise formulation: where the output vector of nonlinear activations, x, depends on the energy of the input linear wavelet responses, e, which are dimension-wise normalized by Signal transforms in the retina-cortex pathway: a cascade of linear+nonlinear modules (example from [36]).The input is the spatial distribution of the spectral irradiance at the retina.(1) The linear part of the first layer consist of three positive spectral sensitivities (Long, Medium, Short, LMS, wavelengths) and a linear recombination of the LMS values with positive/negative weights.This leads to three tristimulus values in each spatial location: one of them is proportional to the luminance, and the other two have opponent chromatic meaning (red-green and yellow-blue).These linear tristimulus responses undergo adaptive saturation transforms.Perception of brightness is mediated by an adaptive Weber-like nonlinearity applied to the luminance at each location.This nonlinearity enhances the response in the regions with small linear input (low luminance).( 2 a sum of neighbor energies of the input.For convenience for the derivations below, the transform can be rewritten in matrix form [12,36]: where D v are diagonal matrices with the vector v in the diagonal.The nondiagonal nature of the interaction kernel H which is in the denominator, b + H • e, implies that the i-th element of the response is attenuated by the activity of the neighbor sensors, e j with j ̸ = i.Each row of the kernel H describes how the energies of the neighbor simple cells attenuate each simple cell after the interaction.Each element of the vectors b and k represents the semisaturation and the dynamic range of the nonlinear response of each sensor, respectively.This nonlinear interaction only affects the amplitude of the responses, not its sign.As a result, for simplicity in the notation, throughout the work x refers to the vector of absolute values of the responses.The sign of the normalized responses is inherited from the sign of the linear wavelet responses.
Inverse transform.The matrix notation [12,36] is convenient to derive the analytical inverse of the Divisive Normalization, which will be used to obtain the relation between the two models considered in this work.The inverse is given by [12,24,36]: This inverse, originally proposed in the context of image coding [24], has been used in other applications that require the reconstruction of the image [31,37].

The Wilson-Cowan model
The Wilson-Cowan dynamical system was proposed as a general model of the inhibitory/excitatory interactions between neural populations, and as application, it can be used to model the signal at specific stages in the visual pathway [13][14][15].In Wilson-Cowan models, part of the neural population (part of the coefficients in the vectors e and x) is excitatory and part is inhibitory, meaning how their magnitude affects the neighbors (in additive or subtractive way respectively).Excitatory and inhibitory populations will be referred to as e e , x e , and e i , x i , respectively.We will consider that these excitatory and inhibitory neurons (or coefficients) are interleaved in the vectors that describe the responses.Or, for simplicity, one may also represent them as separate rows in the response vectors: In any case, the arbitrary arrangement of the neurons in the vectors does not restrict the generality of the formulation.The only effect of this choice is the interpretation of the elements of the matrices that will represent the interaction between the neurons.
Dynamical system.In Wilson-Cowan models [13][14][15] the transform N is defined by differential equations that describe the temporal variation of the activity of the populations.In particular, this variation is driven by three factors: -An external driving input (either e e or e i ), in our case the responses of the linear V1 cells.-The variation of the response of a population is auto-attenuated due to its own activity.-The variation of the response is amplified by the excitatory responses and is moderated by the inhibitory responses.
Specifically, if in the notation of [15,38,39], which considers no refractory period in V1 neurons, we explicitly identify the excitatory and inhibitory populations as done originally in [14], for a neuron tuned to the feature p, we have one of these (excitatory or inhibitory) equations: or, in matrix notation: where W ee , W ei , W ie , W ii are the matrices that describe the excitatory and inhibitory relations between sensors, the activation function f (•) is a dimension-wise saturating nonlinearity, and the elements of the vectors α e and α i are the auto-attenuation parameters.The above matrices are usually considered to be a fixed set of connections (wired relations), made of positive and negative Gaussian neighborhoods, that represent the local interaction between sensors [15,39,40].Also note that, if in Eqs. 7, the inhibitory and the excitatory components are stacked together into a single vector (with some sort of arrangement as in Eq. 5), the two equations in the traditional Wilson-Cowan formulation can be represented by a single expression, as in [15,19], here in matrix form: where, The above single-equation matrix formulation of the Wilson-Cowan model, Eq. 8, is convenient to get the relation between the models, and clearly shows the subtractive nature of the interactions in the kernel W as opposed to the divisive nature of the interactions due to the kernel H in Eq. 3.
Steady state and inverse.The stationary solution of the above differential equation (obtained by taking ẋ = 0 in Eq. 8) leads to the following analytical inverse for static inputs: As we will see in the Analytical Results section, the identification of the decoding equations corresponding to both models, Eq. 4 and Eq. 9, is the key to obtain simple relations between their corresponding parameters.

Adaptive contrast response curves
In the considered spatial vision context, the models should reproduce the fundamental trends of contrast perception.Thus, the slope of the contrast response curves should depend on the spatial frequency, so that the sensitivity at threshold contrast is different for different spatial frequencies according to the Contrast Sensitivity Function (CSF) [41].Also, the response curves should saturate with contrast [42,43].Finally, the responses should attenuate with the energy of the background or surround, and this additional saturation should depend on the texture of the background [4,28]: if the frequency/orientation of the test is similar to the frequency/orientation of the background, the decay should be stronger.This background-dependent adaptive saturation, or masking, is mediated by cortical sensors tuned to spatial frequency with responses that saturate depending on the background, as illustrated in Fig. 2.
The above trends are key to discard too simple models, and also to propose the appropriate modifications in the model architecture to get reasonable results [12].

Unexplained kernel structure in Divisive Normalization
In the Divisive Normalization setting, the masking interaction between tests and backgrounds of different textures is classically described by using a Gaussian  [33,44]).Note the decay in the response when signal and mask do have the same spatio-frequency characteristics (a), as opposed to the case where they do not (b).For visualization, the differences in the curves are highlighted by the green circles.
kernel in the denominator of Eq. 3 in wavelet-like domains: the effect of the j-th wavelet sensor on the attenuation of the i-th wavelet sensor decays with the distance in space between the i-th and j-th sensors, but also with the spatial frequency and orientation [4].We will refer to these unit-norm Gaussian kernels as Watson and Solomon kernels [4], and will be represented by H ws .Gaussian kernels are useful to describe the general behavior shown in Fig. 2: activity in close neighbors lead to strong decays in the response, while activity in neighbors tuned to more distant features has smaller effect.
However, in order to have well behaved responses in every subband with every possible background, a special balance between the wavelet representation and the Gaussian kernels is required.When using reasonable log-polar Gabor basis or steerable filters to model V1 receptive fields, as in [4,44], the energies of the sensors tuned to low frequencies is notably higher than the energy of high-frequency sensors.Moreover, the smaller number of sensors in low frequency subbands in this kind of wavelet representations implies that unitnorm Gaussian kernels have bigger values in coarse subbands.These two facts overemphasize the impact of low-frequency responses on high-frequency responses.Thus, in [12] we found that classical unit-norm Gaussian kernels require ad-hoc extra modulation to avoid excessive effect of low frequency backgrounds on high frequency tests.The appropriate wavelet-kernel balance was then reestablished by introducing extra high-pass filters in the Gaussian kernel H ws , with the aim to moderate the effect of low frequencies [12]: In this new definition of the kernel: (1) the diagonal matrix at the right, D r , pre-weights the subbands of e to moderate the effect of low frequencies before computing the interaction; and (2) the diagonal matrix at the left, D l , sets the relative weight of the masking for each sensor, moderating low frequencies again.The vectors r and l were tuned ad-hoc in [12] to get reasonable contrast response curves, both for low and high frequency tests.
However, what is the explanation for this specific structure of the kernel matrix in Eq. 10?And where do these two high-pass diagonal matrices come from?

Adaptive nature of kernel in Divisive Normalization
Previous physiological experiments on cats and macaques demonstrated that the effect of the surround on each cell does not come equally from all peripheral regions, showing up the existence of a spatially asymmetric surround [34,35,[45][46][47].As shown in Fig. 3.a, the experimental cell response is suppressed due to the surround, and this attenuation is greater when the grating patches are iso-oriented and at the ends of the receptive field (as defined by the axis of preferred orientation) [35].
In the Divisive Normalization context, this asymmetry could be explained with non-isotropic interaction kernels.Depending on the texture of the surround, the interaction strength in certain direction may change.This would change the denominator, and hence the gain in the response.
Coen-Cagli et al. [9] proposed a specific statistical model to account for these contextual dependencies.This model includes grouping and segmentation of neighboring oriented features, and leads to a flexible generalization of the Divisive Normalization.Representative center-surround configurations considered in the statistical model are shown in Fig. 3.c.A surround orientation can be either co-assigned with the center group or not co-assigned.In the first case, the model assumes dependence between center and surround, and includes them both in the normalization pool for the center.In the second case, the model assumes center-surround independence, and does not include the surround in the normalization pool.Fig. 3.d shows the covariance matrices learned from  [9], and probability that center and surround receptive fields are co-assigned to the same normalization pool and contribute to the divisive normalization of the model response.The probability of co-assignment depends on the covariance with the surround, as shown below.(c) Different center-surround visual neighborhoods in a natural scene.In each case, the activity of the sensors in the surround can be co-assigned to the activity in the center (i.e.considered in the normalization pool) if the orientation of maximally responding sensors is consistent (which is the case for four of the considered regions and not the case for the first).The horizontal surround that is co-assigned with the corresponding center is highlighted in bold.(d) Covariance matrices learned from natural images determine co-assignment: the orientation and relative position of the receptive fields are represented by the black bars (the thickness of the bar is proportional to the variance, while the thickness of the red lines is proportional to the covariance between the two connected bars).This figure has been adapted from [9].
natural images between the variables associated with center and surround in the proposed statistical model.As expected, the variances of the center and its co-linear neighbors, and also the covariance between them, are larger, due to the predominance of co-linear structures in natural images.The cell response that is computationally obtained assuming their statistical model is shown in Fig. 3.b, together with the probability that center and surround receptive fields are co-assigned to the same normalization pool, and contribute then to the divisive normalization of the model response.Note that the higher the probability of co-assignment between the center and surround, the higher the suppression (or the lower the signal) in the cell response.
This flexible (or adaptive) Divisive Normalization model based on image statistics [9] allows to explain the experimental asymmetry in the centersurround modulation [35].However, no direct mechanistic approach has been proposed yet to describe how this adaptation in the Divisive Normalization kernel may be implemented.

Analytical Results: relation between models
The kernels that describe the relation between sensors in the Divisive Normalization and the Wilson-Cowan models, H and W , have similar qualitative roles: both moderate the response, either by division or subtraction, taking into account the activity of the neighbor sensors.
In this section, we derive the relation between both models assuming that the Divisive Normalization behavior corresponds to the steady state solution of the Wilson-Cowan dynamics.This leads to an interesting analytical relation between both kernels, H and W .
Under the steady state assumption, it is possible to identify the different terms in the decoding equations in both cases (Eq. 4 and Eq. 9).However, just to get a simpler analytical relation between both kernels, we make one extra simplification on each model.Numerical experiments in the next section with natural inputs and a wide range of model parameterizations show that these simplifications are acceptable in practice.
First, in the Divisive Normalization model, Eq. 4, the identification may be simpler by taking the series expansion of the inverse.This expansion was used in [24] because it clarifies the condition for invertibility: H are smaller than one so that the series converges.In fact, if the eigenvalues are small, the inverse can be well approximated by a small number of terms in the Maclaurin series.Taking into account this approximation, Eq. 4 may be written as: Second, in the case of the Wilson-Cowan model (Eq.9) we approximate the saturation function f (x) so that we can isolate the vector x.This can be done by expressing f (x) through an Euler integration of n terms: Note that along the integration the derivatives are computed at different points from 0 up to n−1 n x.If n = 1 we have the (in principle poor) Maclaurin approximation and if n → ∞ the result is perfect.In between, for finite n, we have an approximation with certain accuracy.In this case, taking into account that in the activation functions f (0) = 0, and calling , we can write: Now, the identification between the approximated versions of the decoding equations, Eq. 11 and Eq. 12, is straightforward.As a result, we get the following relations between the parameters of both models: where the symbol ⊙ denotes the element-wise (or Hadamard) product, and the ratios between vectors are also Hadamard divisions.Note that the Divisive Normalization kernel which is compatible with Eq. 13, , has exactly the same structure as the one in Eq. 10.Therefore, both models agree if the Divisive Normalization kernel inherits the structure from the Wilson-Cowan kernel left-and right-multiplied by these diagonal matrices, D ( k x ) and D ( k b ⊙gn(x)) , respectively.This theoretical result suggests an explanation for the structure that had to be introduced ad-hoc in [12] just to reproduce contrast masking.Note that the interaction in the Wilson-Cowan case may be understood as wiring between sensors tuned to similar features, so a unit-norm Gaussian, W = H ws , is a reasonable choice [14,40].Note also that the weights before and after W (the diagonal matrices) are signal dependent.Therefore, a fixed wiring W implies that the kernel in Divisive Normalization should be adaptive.The one in the left, D ( k x ) , has a direct dependence on the inverse of the signal, while the one in the right, D ( k b ⊙gn(x)) , depends on the derivatives of the activation f (x).Next Section shows that these vectors k x and k b ⊙ g n (x) do have the high-pass frequency nature that explains why the low frequencies in e had to be attenuated ad-hoc by introducing D l and D r .We also show that the term of the right, D ( k b ⊙gn(x)) , produces the shape changes needed on the interactions.It is important to stress that the simplifications made in the decoding equations to get the analytical relations in Eq. 13 were done only for the sake of simplicity in the final relations obtained.In summary, the expressions in Eq. 13 are exact for the simplified versions of the models.Considering the full version of the models, Eq. 13 would be an approximation.However, the experiments below point out the validity of this approximation, because: (a) we explicitly check that the errors are small in a range of scenarios, and (b) we check that plugging these expressions into the full versions of the models also leads to consistent results.

Numerical Experiments
The analysis of the proposed relation between the Divisive Normalization (DN) and the Wilson-Cowan (WC) models is a three stage process.First, one should take biologically plausible parameters (either in DN, in WC, or in both) and then, use the proposed expressions to build versions of the models supposed to behave similarly.Second, one should check if the models obtained in that way actually behave similarly.So that finally, third, one can elaborate on the consequences of this correspondence.
In this experimental analysis, in Section 4.1, we build a psychophysicallyinspired Wilson-Cowan model for V1 from a Divisive Normalization with psychophysically-tuned parameters [12,36,48].This model also preserves the basic properties of the interaction kernel and the saturation function of the Wilson-Cowan literature [14,15,40].This Wilson-Cowan model should behave similarly to the corresponding Divisive Normalization model.
Then, Section 4.2 experimentally checks the mathematical relation between the models.In particular, for a wide range of parameters: (a) we show that the integration of the Wilson-Cowan equation certainly converges to a solution which is close to its corresponding Divisive Normalization; (b) we quantify the accuracy of the approximations required to get the relation between the models; and (c) we show that the Divisive Normalization solution is a stable node of the dynamical system governed by the Wilson-Cowan equations.
Finally, in Section 4.3, we address different consequences on contrast perception using the proposed relation: (a) we analyze the signal-dependent behavior of the theoretically derived kernel and the benefits of the high-pass behavior to moderate the weight of the low-frequency components; (b) we show that the shape of the interactions between sensors changes depending on the surround; (c) we reproduce the contrast response curves with the proposed signal-dependent kernel; and (d) we discuss the use of the derived kernel in predicting the subjective metric of the image space.

Psychophysically plausible parameters for a Wilson-Cowan model in V1
A possible way to check the relation between the models in V1 consists of starting from the (lower-level / mechanistic / physiological) Wilson-Cowan model and let it evolve to see if it converges to the (psychophysical) Divisive Normalization response.To this end, for our Wilson-Cowan model, we need reasonable α, W , and f (x), for e and x defined in certain wavelet representation.
For the wavelet representation here we assume 4-orientation steerable transforms [49] as a convenient model of the simple cells (as done in [12,36,44]).
In the experiments involving the (computationally intensive) integration of the Wilson-Cowan differential equation, Section 4.2, we used wavelets with 3 scales in 40 × 40 images to speed up the computation.But in the psychophysical illustrations, Section 4.3, we used 4 scales in 64 × 64 images.Of course these choices have to be considered as illustrative options.The selected model [12,36], has the elements of previous image-computable models [4,50,51], but of course, the different instances differ in details despite they have the same qualitative flavour.
The reference parameters for the nonlinearity are taken from the Divisive Normalization model in [12].In that case, the parameters corresponding to contrast computation, contrast sensitivity, and masking in the spatial domain were directly measured using Maximum Differentiation psychophysics [48], while the parameters related to brightness and to masking in the wavelet domain were tuned to reproduce subjective image quality data [36] and contrast perception curves [12].
As stated after Eq. 13, we took W as a Watson-Solomon separable Gaussian kernel [4] with widths in space/frequency/orientation taken from the psychophysically plausible values in [12].In order to include both excitatory and inhibitory populations we complemented this initial kernel with narrow excitatory neighborhoods whose width was a fraction of the original inhibitory neighborhoods.Finally we normalized the absolute amplitude of the neighborhoods to have unit-norm center-surround interactions.Figures 4.a-c illustrate the psychophysically sensible separable kernels W .These unit-norm kernels W scaled as in [4,12] are consistent with the shapes used in the Wilson-Cowan literature [15,40].
Regarding the auto-attenuation we simply took the constants k and b from [12] and used the first equation of the proposed relation, Eq. 13, to obtain α.Fig. 4.d shows the α vector for the 3-scale wavelet (coefficients ordered from low-to-high frequency).Note that the response of sensors tuned to higher frequencies is more attenuated in the evolution of the differential equation while low frequencies have lower auto-attenuation.
Finally, Figs.4.e and 4.f, display different activations f (x) that we used in the experiments together with representative Euler approximations, g n (x) x, and the functions related to their derivatives, df dx (x) and g n (x).These activation functions include the original -activation in [14,15], and the so called γ-activation inspired in retinal transduction [36].Appendix A gives the expressions of these activation functions.In our wavelet case, the horizontal and vertical axes of the  [4] the Gaussian kernels (here difference of Gaussians) are separable in space, frequency and orientation.Therefore lower frequency subbands have coarser sampling (and thus higher amplitude) but the same shape in space.The shape is also the same for subbands tuned to different orientations.Equivalent separability applies for variations in frequency and orientation: the interactions in the plots (b) and (c) are the same for every spatial location (and orientation and frequency respectively).The above plots display the (more intuitive) −W, where positive and negative values mean excitatory and negative interaction respectively, but note that, according to the sign in Eq. 8, positive weights are inhibitory.(d) Auto-attenuation factor α. In this plot the horizontal axis (wavelet coefficients in reverse order) can be qualitatively understood as frequency so high frequencies display bigger attenuation.The panels (e) and (f) show different options for the pointwise activation nonlinearity, f (x).In pink we have the γ-function proposed in [36] and in red we have the original proposal of Wilson-Cowan [14].In both cases we show the approximations of these functions using gn(x) (for increasing number of terms n), and the corresponding derivatives, which have an impact in the relation between the kernels in the different models (Eq.13).
function f (•) to be applied to each coefficient x of certain subband are scaled by the average amplitude of the responses of the corresponding linear sensors to natural images.With that scaling the nonlinearities preserve the relative scales of the input subbands in the vector e that comes from the linear filters.
In the next experiments the above psychophysically sensible parameters (the reference values) are modified in several ways to show that the proposed relation works for a wide range of model parameterizations.Specifically, (a) we explored different widths of the interaction kernels by using five scaling factors applied to the reference widths: from unrealistically narrow (zero-width, identity kernels that disregard interactions), to unrealistically wide kernels (where the reference widths are increased by an order of magnitude); (b) we considered kernels with the above mentioned excitatory-inhibitory nature, and kernels with just-inhibitory nature; and (c) we considered two possible activations (the original-activation and the γ-activation).We considered a total of 12 parameterizations of the models: 5 kernel widths × 2 excit-inhib configurations × 1 activation (original-activ.)+ 1 kernel width (the reference one) × 2 excitinhib configurations × 1 activation (the γ-activ.).
The interested reader has access to the specific values of the parameters in Appendix A for the activations, and in the code that reproduces all the simulations of the paper (described in Appendix B).

Wilson-Cowan systems converge to the Divisive Normalization
The Wilson-Cowan expression, Eq. 8, defines an initial value problem where the response at time zero evolves (or is updated) according to the right hand side of the differential equation.In our case, we assume that the initial value of the output is just the input x(0) = e.Moreover, as we deal with static images, we assume that the input is constant.And then, we solve this first order differential equation by the simplest (Euler) integration method: Figure 5 shows the evolution of the response obtained from this integration, applied to 45 natural images taken from calibrated databases [52,53], using the biologically sensible parameters α, W and f (•) presented in Fig. 4 [12,14,15,36,40,48], and the mentioned variations to cover a wide range of model parameterizations.Our Euler integration used a small enough discrete time step, ∆t = 10 −5 , and the initial responses, the vectors e, were computed using the first 3 layers of the model in Fig. 1 [12,36] followed by a linear steerable wavelet of 4 orientations and 3 scales.The integration requires no approximation of the WC model, and the evolving solution is checked against the corresponding DN response that uses the proposed Eq. 13.
As can be seen, the solution of the Wilson-Cowan integration converges to the Divisive Normalization solution because their difference (percentage of relative Mean Squared Error) decreases as it evolves in all the 12 considered parameterizations.The relative MSE in the psychophysically meaningful situations (×1 width) is below 3% (lines in pink and red), and for the other configurations it is always below 6%.Therefore, the Divisive Normalization always explain more than 94% of the energy of the Wilson-Cowan solution.Moreover, these results represent the steady states because the updates of the solutions in the integrals always tend to zero (results not shown).
Fig. 6 illustrates the qualitative similarity of the responses of the two models and their comparable equalization effect in the wavelet domain.The nonlinear response x WC was computed by integrating Eq. 14, and x DN was computed with Eq. 3. We used the parameters introduced in Section 4.1 and the corresponding parameters for Divisive Normalization using Eq. 13.
Note how the nonlinearities substantially increase the amplitude of the signal in the regions where the linear response is low.The regions highlighted in blue and orange in e display low activity compared to their neighbors because there are no edges in those regions of the image.However, the corresponding neurons after Wilson-Cowan or Normalization have increased their activity.The amplitude of the signal after the nonlinearities is more stationary across the subbands.Moreover, the nonlinearities lead to responses where the image Corresponding Divisive Normalization response using the kernel given by Eq. 10.In this example the relative MSE between x WC and x DN is 0.93%, but, more importantly, note how the amplitude of the response of the highlighted neurons has increased similarly in the nonlinear cases leading to a more stationary response.
structure is less apparent: the activity of a neuron is more independent from the activity of the neighbors.Equalization and increased independence qualitatively suggested in Fig. 6 are consistent with previous (quantitative) studies that report redundancy reduction both in Divisive Normalization [11,25,44], and in the Wilson-Cowan model [23].

Quantification of the accuracy of the approximations
The proposed relation, Eq. 13, is based on two approximations: -The approximation of the inverse of Divisive Normalization to obtain Eq. 11, namely: -The approximation of f (x) in Wilson-Cowan to obtain Eq. 12, namely: The accuracy of such approximations depends on the model parameters, e.g. the shape and magnitude of H or f (•), and on the responses x to natural images.The low amplitude of the coefficients of natural images in wavelet representations [54,55], and the accuracy of similar approximations for psychophysically sensible parameters [24] suggest that errors will be small.However, in this section we explicitly compute both sides of the above expressions (with and without approximation) for a range of representative images and model parameterizations, and we compute the difference between both sides.This difference is the error due to the approximation.We express the energy of the difference (the Mean Squared Error) in terms of percentage of the energy of the function with no approximation: the relative MSE in %.
In Table 1 we show the relative MSE (in %) for both approximations (inverse and f (x)) together with the error in convergence, also in relative MSE, for 12 different model parameterizations.The approximations of f (x) were done using g 10 (x), i.e. computing derivatives in 10 points.
The approximations generally explain more than 90% of the energy of the original magnitudes.The only exception is the approximation of the inverse of the divisive normalization for the (unrealistic) zero-width kernel, where the relative MSE amounts to ∼ 30%.The deviation in this unrealistic case makes sense because reducing the width of unit-volume kernels increases their height and hence the magnitude of H increases so the term summed to the identity in the expression under consideration is not as small.This leads to an increased error in the approximation.
Interestingly, in the whole range of parameterizations considered, the approximations do not have a big impact in the convergence error, which is the actual measure of correspondence between the two models.

Stability analysis of the Divisive Normalization response
The stability of a dynamical system at the steady state is determined by the Jacobian with regard to perturbations in the response: if the eigenvalues of this Jacobian are all negative for this response, it is a stable node of the system [56].
In that situation the evolution of the perturbations is a vector field oriented towards the stable node.
In our case, the Jacobian with regard to the output signal of the right hand side of the Wilson-Cowan differential equation, Eq. 8, is: Fig. 7 shows the eigenvalues of this Jacobian using a wide range of parameters (the 12 configurations obtained through variations of the reference values presented in Section 4.1), with responses from a set of 45 representative natural images from colorimetrically calibrated datasets [52,53].This result shows that all the eigenvalues are negative, thus suggesting that the Divisive Normalization solution is a stable node of the dynamical system.The stability of the system can be further illustrated by the visualization of the vector field of perturbations in the phase space of the system [56].In this case we visualize this vector field for the Divisive Normalization solution.As the signals in our problem live in very high-dimensional spaces (the wavelet  [52,53].The standard deviation and quartile distance is so small that cannot be seen in the plot.The result shows that the eigenvalues are all negative.This suggests that the Divisive Normalization is a stable node of the system for natural images for a wide range of model parameterizations.vectors in this section have dimension 10025) it is not possible to visualize the complete phase space, so we just select some illustrative 3-dimensional and 2-dimensional examples.
Fig. 8 (left) shows an example taking just 3 neurons of the V1 layer.In this case we took a particular image (the standard image Lena) and we focused on the response of 3 specific sensors of the low-frequency scale of the Divisive Normalization vector: the 9700, 9800 and 9900-th responses.In that way we get the red circle in Fig. 8 (left).Arbitrary perturbations of the responses of these neurons leads to the dynamics shown in the phase space: the vector field induced by the Jacobian implies that any perturbation is sent back to the original (no-perturbation) response, which is, then, a stable node of the system.
Similar behavior is obtained for coefficients of other subbands or other images.See Fig. 8 (right), where, for simplicity, we consider perturbations in pairs of neurons.
In summary, the Divisive Normalization solutions are stable nodes of the corresponding Wilson-Cowan systems.This conclusion confirms the assumption under the proposed relation: Divisive Normalization as a steady state of the Wilson-Cowan dynamics.

Consequences on contrast perception
The proposed relation implies that the Divisive Normalization kernel inherits the structure of the Wilson-Cowan interaction matrix (typically Gaussian [14,40]), modified by some specific signal dependent diagonal matrices, as seen after Eq. 13, and allows to explain a range of contrast perception phenomena.
First, regarding the structure of the kernel, we show that our prediction is consistent with previously required modifications of the Gaussian kernel in Divisive Normalization to reproduce contrast perception [12].Second, we show that the kernel in Divisive Normalization modifies its shape depending on the signal, thus explaining the behavior previously reported in [9,35].Third, we use the predicted signal-dependent kernel to simulate contrast response curves consistent with [4,28].And finally, the proposed relation is also applied to reproduce the experimental visibility of spatial patterns in more general contexts as subjective image quality assessment [57][58][59].
In this section we do not integrate the Wilson-Cowan differential equation, but we use the expression for the steady state solution with the kernel obtained from the proposed relation.This alleviates computation so, as opposed to the previous Section, in the following examples we use a wavelet representation of higher dimensionality, with 4 scales and 4 orientations, applied on bigger images, 64 × 64.Regarding the parameters, we use unit-norm Gaussian kernels in H ws or W , and constants k and b also defined over 4 scales and 4 orientations, directly taken from [12].

Structure of the kernel in Divisive Normalization
Here we compare the empirical filters D l and D r , that had to be introduced ad-hoc in [12], with the theoretical ones obtained through Eq. 13.
Before going into the details of the kernel, lets get some intuition on the typical structure of the vectors x and g n (x).Fig. 9 shows an illustrative stimulus with oriented textures and the corresponding responses of linear and nonlinear V1-like sensors based on steerable wavelets.Typical responses for natural images are low-pass signals (see the vectors at Figs. 9.b.2 and 9.c.2).The response in each subband is an adaptive (context dependent) nonlinear transduction (Fig. 9.d).Each point in Fig. 9.d represents the input-output relation for each neuron in the subbands of the different scales (from coarse to fine).As each neuron has a different neighborhood, there is no simple input-output transduction function, but a scatter plot representing different instances of an adaptive transduction.
The considered image is designed to lead to specific excitations in certain sensors (subbands and locations in the wavelet domain).Note, for instance, the high and low frequency synthetic patterns (24 and 12 cycles per degree, cpd, horizontal and vertical, respectively) in the image regions highlighted with the red and blue dots.In the wavelet representations we also highlighted some specific sensors in red and blue corresponding to the same spatial locations and the horizontal subband tuned to 24 cpd.Given the tuning properties of the neurons highlighted in red and blue, it makes sense that wavelet sensor in red has bigger response than the sensor in blue.With this knowledge of the signal in mind: (1) low-pass trend in x shown in Fig. 9, (2) bigger derivative g n (x) for high frequencies because the derivative is higher for low amplitude signals (see Fig. 4), and (3) the vector b is bigger for low-frequencies [12], we can understand the high-pass nature of the vectors included in the diagonal matrices that appear at the left and right sides of the theoretically-derived kernel . Fig. 10 compares the empirical left and right vectors, l and r that were adjusted ad-hoc to reproduce contrast curves in [12], with those based on the theoretical relation proposed here.In this case we only consider the comparison with the psychophysically sensible parameterization since the ad-hoc tuning was done for that specific scenario.As these empirical filters were just qualitatively adjusted in [12], the reproduction of their high-pass nature and their order of magnitude is more important than the specific MSE values.Vectors in diagonal matrices D l and Dr that multiply the Gaussian kernel in the empirical tuning represented by Eq. 10 (top), and in the theoretically derived Eq. 13 (bottom).These theoretical filters correspond to a specific natural image and using the g 10 (x) approximation of the γ-activation.The median relative RMSEs for the predicted filters over 45 natural images are 10.2% (for the left filter) and 5.4% (for the right filter).The difference for the right filter using the original Wilson-Cowan activation and also g 10 (x) is 5.1% (curve not shown).However, as the empirical filters were just ad-hoc adjusted in [12], here the relevant is the reproduction of the required high-pass structure, not the MSE.
The similarity of the structure of the empirical and theoretical interaction matrices (Eq. 10 and Eq.13), and the coincidence of empirical and theoretical filters (Fig. 10) suggests that the proposed theory explains the modifications that had to be introduced in classical unit-norm kernels in Divisive Normalization to explain contrast response.

Shape adaptation of the kernel depending on the signal
Once we have shown the global high-pass nature of the vectors k x and k b ⊙g n (x), lets see in more detail the signal-dependent adaptivity of the kernel.In order to do so, lets consider the interaction neighborhood of two particular sensors in the wavelet representation of an illustrative stimulus with easy to understand features.Specifically, the sensors highlighted in red and blue in Fig. 9.
Fig. 11 compares different versions of the two individual neighborhoods displayed in the same wavelet representation: left the unit-norm Gaussian kernels, H ws , and right the empirical kernel modulated by ad-hoc pre-and postfilters, Eq. 10.In these diagrams lighter gray in each j-th sensor corresponds to bigger interaction with the considered i-th sensor (highlighted in color).The gray values are normalized to the global maximum in each case.Each subband displays two Gaussians.Obviously, each Gaussian corresponds to only one of the sensors (the one highlighted in red or in blue, depending on the spatial location of the Gaussian).We used a single wavelet diagram since the two neighborhoods do not overlap and there is no possible confusion between them.
In the base-line unit-norm Gaussian case, H ws , a unit-volume Gaussian in space is defined centered in the spatial location preferred by the i-th sensor.Then, the corresponding Gaussians at every subband are weighted by a factor . that decays as a Gaussian over scale and orientation from the maximum, centered at the subband of the i-th sensor.
The problem with the unit-norm Gaussian in every scale is that the reduced set of sensors for low-frequency scales lead to higher values of the kernel so that it has the required volume.In that situation the impact of activity in low-frequency subbands is substantially higher.This fact, combined with the low-pass trend of wavelet signals, implies a strong bias of the response and ruins the contrast masking curves.This problem is represented by the relatively high values of the neighborhoods in the low-frequency subbands highlighted in orange.
This overemphasis in the low-frequency scales was corrected ad-hoc using right-and left-multiplication in Eq. 10 by hand-crafted high-pass filters.The effect of these filters is to reduce the values for the Gaussian neighborhoods at the low-frequency scales, as seen in the empirical kernel at Fig. 11-right.The positive effect of the high-pass filters is reducing the impact of the neighborhoods at low-frequency subbands (highlighted in green).
In both cases (the classical H ws , and the hand-crafted H = D l • H ws • D r ) the size of the interaction neighborhood (the interaction length) is signal independent.Note that the neighborhoods for both sensors (red and blue) are the same, regardless of the different stimulation that can be seen in Fig. 9. Fig. 12 shows the kernels obtained from Eq. 13.The three components of H are: in Fig. 12.a the term proportional to 1 x , in Fig. 12.b the term based on Gaussian neighborhoods W , and in Fig. 12.c the term proportional to g n (x).Finally, Fig. 12.d shows the global result of the product of the three terms and the Fig. 12.e zooms on the high-frequency horizontal subband that contains the co-linear situation considered in the physiological experiments [35].
These three terms have the following positive results: (1) the product by the high-pass terms moderates the effect of the unit-norm Gaussian at low-frequency subbands as in the empirical kernel tuned in [12] shown in Fig. 11-right, (2) the term proportional to 1 x scales the interaction length according to the signal, and (3) the shape of the kernel depends on the signal because H ij is modulated by (g n (x)) j , and this implies that when the surround is aligned with the sensor, the kernel elongates in that direction (as the probability of co-assignment in Fig. 3.b).This will lead to smaller responses when the sensor is flanked by co-linear stimuli (as in Cavanaugh et al. results [35]).
In summary, deriving the Divisive Normalization as the steady state of a Wilson-Cowan system with Gaussian unit-norm wiring explains two experimental facts: (1) the high-pass filters that had to be added to the structure of the kernel in Divisive Normalization to reproduce contrast responses [12], and (2) the adaptive asymmetry of the kernel that changes its shape depending on the background texture [34,35,[45][46][47]. , for the two highlighted points.Here we used g 10 (x).Note the high-pass effect of the left-and right-matrix product over W , that removed the interaction in the low-frequency subbands, now highlighted in dark green.(e) Zoom on the high-frequency horizontal subband.The term depending on the derivatives implies changes of the shape of the kernel (from circular to horizontal ellipses) when the context is a high contrast horizontal pattern.This is compatible with the probabilities of co-assignment [9] recalled in Fig. 3.b.

Contrast response curves from the Wilson-Cowan model
The above results suggest that the Wilson-Cowan model could successfully reproduce contrast response curves and masking, which have not yet been addressed through this model.Here we explicitly check this hypothesis.
We can use the proposed relation, Eq. 13, to plug successful parameters of Divisive Normalization tuned for contrast perception into the corresponding Wilson-Cowan model.We can avoid the integration of the differential equation using the knowledge of the steady state.The only problem to compute the response through the steady state solution is that the kernel of the Divisive Normalization depends on the (still unknown) response.
In this case we compute a first guess of the response, x, using the fixed hand-crafted kernel tuned in [12], and then, this first guess is used to compute the proposed signal-dependent kernel, which in turn is used to compute the actual response, x.
Fig. 13 shows the response curves corresponding to neurons that are tuned to low and high spatial frequency tests, as a function of the contrast of these tests located on top of backgrounds of different contrast, spatial frequency, and orientation.In each case we considered four different contrasts for the background (represented by the different line styles).Representative stimuli are shown as image patches inside each plot.The results in this figure display the expected qualitative properties of contrast perception: Frequency selectivity.The magnitude of the response depends on the frequency of the test: responses for the low-frequency test are bigger than the responses for the high-frequency test.This frequency-dependent behavior in Fig. 13 is consistent with the Contrast Sensitivity Function [41].
Saturation.The responses increase with the contrast of the test, but this increase is non-linear (saturates), and the responses decrease with the contrast of the background.This behavior in Fig. 13 is consistent with the contrast discrimination results in [42,43].
Cross-masking.Reduction of the responses depends on the frequency similarity between test and background.Note that the low-frequency test is more attenuated by the low-frequency background of the same orientation than by the high-frequency background of orthogonal orientation.Similarly, the high-frequency test is more affected by the high-frequency background of the same orientation.This behavior in Fig. 13 is consistent with cross-masking results in [4,28].

Metric in the image space from the Wilson-Cowan model
As a result of the derived relation between models, Eq. 13, the Wilson-Cowan model may also be used to predict subjective image distortion scores.In this section we explicitly check the performance of the Wilson-Cowan response to compute the visibility of distortions from neural differences following the same approach detailed in the previous section regarding the computation of the signal-dependent kernel and its use to obtain the steady state.
The TID database [58,59] contains natural images modified with many kinds of degradation and has the experimental subjective distortion for each degraded image.Given a model, the theoretical prediction of the subjective distortion is obtained from the modulus |x orig − x distort |, i.e. the Euclidean difference of the model responses to the original and to the degraded images.
Figure 14 compares these predictions (abscissas) with the experimental distortions (ordinates) for the responses with a fixed interaction kernel (the conventional Divisive Normalization approach, in blue), and with the proposed signal-adaptive kernel obtained from the Wilson-Cowan model in red.
The high values obtained for the Pearson's correlation coefficients in both cases, and the close similarities between the plots, prove the good performance of the models and the validity of the proposed relation between them.

Final remarks
In this paper we derived an analytical relation between two well-known models of nonlinear neural interaction: the Wilson-Cowan model [13,14] and the Divisive Normalization model [1,2].Specifically, assuming that the Divisive Normalization is the steady state solution of the Wilson-Cowan differential equations, the Divisive Normalization interaction kernel may be derived from the Wilson-Cowan kernel weighted by two signal-dependent contributions.
We showed the appropriateness of the proposed relation in a range of model parameterizations by checking the convergence of the Wilson-Cowan solution to the Divisive Normalization solution, and by proving that the Divisive Normalization solution is a stable node of the Wilson-Cowan system.
Moreover, the derived relation has the following implications in contrast perception: (a) the specific structure obtained for the interaction kernel of Divisive Normalization explains the need of high-pass filters for unit-norm Gaussian interactions to describe contrast masking found in [12]; (b) the signal-dependent kernel predicts elongations of the interaction neighborhood in backgrounds aligned with the sensor, thus providing a mechanistic explanation to the adaptation facts found in [34,35]; and (c) low-level Wilson-Cowan dynamics may also explain behavioral aspects that have been classically explained through Divisive Normalization, such as contrast response curves [4,28], or image distortion metrics [30,60].This is the first work that justifies why the Wilson-Cowan interaction successfully reproduces image distortion metrics and contrast response curves.As stated in [61] there are not many works that explore the use of Wilson-Cowan equations to model psychophysics, so the examples presented in this work are relevant to fill this gap.
The assumption of the discrete time step ∆t in the Euler integration of the Wilson-Cowan equations, Eq. 14, has implications which were not considered in this work.Here the specific value for ∆t was just an arbitrary choice done for computational convenience.If one could assume that this ∆t = 10 −5 is measured in seconds, as the process converges in about 400-500 Euler steps (as seen in Fig. 5), this would mean 4-5 ms to arrive at the steady-state.As an illustration, 4-5 ms would not be a relevant time delay for visual processing of motion, because the cutoff frequency of the temporal Contrast Sensitivity is about 70-100 Hz (so events below 10-15 ms are disregarded), see [62].However, here, the choice for this ∆t is not based in the biophysics of the neural interactions, and it may indeed be larger [63].If the interaction time is in fact larger, the convergence will take longer than 4-5 ms.This would imply that the actual behavior would be given by the dynamic Wilson-Cowan model and not by the Divisive Normalization approximation.This may imply that the use of static models like DN should be limited to slow-varying stimuli, or that the use of DN is more correct on a certain region of spatio-temporal frequencies (or speeds).Nevertheless, the detailed analysis of that region from a sensible biophysical estimation of ∆t is out of the scope here and a matter for further work.
Finally, the relation between models proposed here opens the possibility to analyze Divisive Normalization from new perspectives, following methods that have been developed for Wilson-Cowan systems [64].Similarly, mechanisms that generalize the Wilson-Cowan equation such as the neurons with intrinsically nonlinear receptive fields [65] could be analyzed via information theoretic tools used to quantify the transmission performance of Divisive Normalization [25,26,66], or the functional connectivity [67].

A Activation functions
A.1 Original activation.Wilson & Cowan [13,14] proposed the use of typical sigmoids such as the logistic function as appropriate activation functions: where , is just a scaling factor of the response and the domain so that f (e ⋆ ) = e ⋆ .In the work we selected the constants e ⋆ , i.e. we scaled f (•), using the standard deviation of the linear responses of the V1 cells over the subbands.In this way the activation preserves the scale between the different subbands.The derivative of this original activation function, invoked n times in gn(x), is given by: A.2 γ-activation.Following [68] that uses power laws to model activation functions, we also used another sigmoid inspired in the luminance-brightness nonlinearities of the retinal photoreceptors [36].Due to its similarity to the exponential γ-correction transforms, it is called γ-activation or γ-saturation throughout the work.Regular exponential functions were modified around the origin in [36] in order to avoid and control the singularity of the derivative in the origin: where γ < 1 so that the function saturates, and the constant C = (e ⋆ ) 1−γ , ensures that f (e ⋆ ) = e ⋆ .In this work γ = 0.6 and, as stated for the original activation, the scaling e ⋆ was selected to preserve the relation between the standard deviations of the different subbands in the linear responses.The neighborhood, ε, was selected to be close to the origin ε = 1 • 10 −3 e ⋆ .The constants, a and b are chosen to ensure continuity of the derivative at ε: a The derivative of this function, useful to compute gn(x), is: Figure 4 in the main text shows that both functions share the same qualitative properties: saturating sigmoids with derivatives that peak at the origin and decrease with the signal.

B Matlab code
This appendix lists the main Matlab routines associated to each experiment described in the main text.All the material is in http://isp.uv.es/docs/DivNorm from Wilson Cowan.zip.Detailed parameters of the models and the instructions on how to use these routines are given in the corresponding *.m files.
-The Divisive Normalization retina-cortex model: The Matlab toolbox that implements the 4-layer network for spectral or color images considered in Fig. 1   -Experiments on convergence: The connectivity W and the activation f are applied together in integrability WC with f after KindReview.m to check the convergence of the system.That script applies Euler integration and shows the convergence of the dynamic solution to the corresponding Divisive Normalization solution.
-Experiments on stability: The stability of the dynamic Wilson-Cowan system is studied through the eigen-decomposition of the Jacobian that controls the amplification of the perturbations in (Converg Stability ND excit inhibit.m),which includes visualizations of the phase diagram.
-Signal-dependent kernel: The script signal dependent kernel with f.m generates an illustrative image made of high contrast patterns with selected frequencies to stimulate specific subbands of the models.Then, it computes the responses to such stimulus and the corresponding signal dependent-filters according to the relations derived in the main text, Eq. 13.These theoretical filters are compared to the empirical filters found in [12].Finally, in environments where the surround is aligned with the wavelet sensors, the shape of the interaction kernel is found to change as in [35].
-Contrast response curves: The script contrast response WC.m generates a series of noisy Gabor patterns of controlled frequency and contrast displayed on top of noisy sinusoids of different frequencies, orientations and contrasts.it computes the visibility of these patterns seen on top of the backgrounds by applying the Divisive Normalization model with the signal-dependent lernel derived from the Wilson-Cowan model.The visibility was computed from the response of the neurons tuned to the tests.
-Image distortion metric: The series of scripts images TID atd thorugh WC model x.m compute the Divisive Normalization response with the signal-dependent kernel derived from the Wilson-Cowan model for the original and distorted images of the TID database (previously expressed in the appropriate ATD color space).Then, the Euclidean distance is applied to compute the visibility of the distortions.The distances are computed by applying metric deep DN iso colorWC.m that computes the responses by calling deep model DN iso colorWC.m.

C Small-scale Divisive Normalization
Overview.Similarly to [23] the code includes a reduced-scale model which can be applied to 3-pixel images for full visualization of the responses and the phase-space.This reduced-scale model consists of two linear+nonlinear layers: (1) a linear radiance-to-luminance transform using a standard Spectral Sensitivity Function, V λ , in the spectral integration [69], followed by a simple exponential for the luminance-to-brightness nonliniearity applied pixel-wise in the spatial domain, that simulates the Weber-Fechner response to luminance [70], and (2) a linear+nonlinear layer in which the linear transform is a discrete cosine transform (a orthonormal rotation) followed by a low-pass weighting function that simulate frequency-tuned sensors and the Contrast Sensitivity Function (CSF) [41].Then, the outputs of the frequency sensors undergo a nonlinear interaction that may be a Divisive Normalization [1,2,12,36], or its equivalent Wilson-Cowan network, with parameters computed according to Eq. 13.
x 0 L (1)   # # # # Transform.The actual inputs of our code are the responses of the linear photoreceptors: 3pixel image vectors with normalized luminance values, i.e. r 1 ∈ R 3 .The normalized luminance was computing dividing the absolute luminance in cd/m 2 by the value corresponding to the 95% percentile of the luminance, in our case 260 cd/m 2 .
-The luminance-to-brightness transform, N (1) , is just: x 1 = (r 1 ) γ where γ = 0.6 -The linear transform of frequency-tuned sensors with CSF gain, L (2) , is: DN , is: • |r 2 | γ where γ = 0.7, and, WC , is defined by the differential equation 8, where the auto-attenuation, α, and the interaction matrix, W , are: Note that the interaction neighborhoods have unit volume, j W ij = 1 ∀j, as suggested in [4], and then, the Divisive Normalization kernel is given by the product of this unit-volume neighborhood and two left and right filters in the diagonal matrices, D l and Dr [12].The values for the semisaturation, b, and the diagonal matrices D l and Dr were inspired by the contrast response results in [12]: we set the semisaturation according to the average response of natural images (low-pass in nature), and we initialized the left and right filters to high-pass.However, afterwards, in order to make N DN and N WC consistent, we applied the Divisive Normalization over natural images and we iteratively updated the values of the right and left filters according to Eq. 13.In the end, we arrived to the values in the above expressions (where the filter at the left is high-pass, but the filter at the right is not).Note that the attenuation in Wilson-Cowan is computed using Eq. 13.
Jacobian.The perceptual metric [36] and information transmission [23,25] strongly depend on how the system (locally) deforms the signal representation.This is described by the Jacobian of the transform with regard to the signal, ∇ r 1 S = ∇ r 2 N (2) • ∇ x 1 L (2) • ∇ r 1 N (1) .In this reduced-scale model, this Jacobian (for the Wilson-Cowan case) is: (27) Fig 1.  Signal transforms in the retina-cortex pathway: a cascade of linear+nonlinear modules (example from[36]).The input ) The linear part of the second layer computes the deviation of the brightness at each location from the local brightness.Then, this deviation is nonlinearly normalized by the local brightness to give the local contrast.(3) The responses to local contrast are convolved by center surround receptive fields (or filtered by the Contrast Sensitivity Function).Then the linearly filtered contrast is nonlinearly normalized by the local contrast.Again normalization increases the response in the regions with small input (low contrast).(4) After linear wavelet transform, each response is nonlinearly normalized by the activity of the neurons in the surround.Again, the activity relatively increases in the regions with low input.The common effect of the nonlinear modules throughout the network is response equalization.

Fig 2 .
Fig 2. Adaptive contrast response curves.Mean firing rate (response) of V1 neurons tuned to the signal as a function of the signal contrast in two masking situations (adapted from[33,44]).Note the decay in the response when signal and mask do have the same spatio-frequency characteristics (a), as opposed to the case where they do not (b).For visualization, the differences in the curves are highlighted by the green circles.

Fig 3 .
Fig 3.  Experimental context-dependent interaction[35] and statistical model[9].(a)Results of Cavanaugh et al.[35]: images with a gray background represent the stimuli.Cell relative responses are shown as points inside the black normalization circle.The distance from the origin indicates the magnitude of the response, while its angle represents the location of the surrounding stimulus.(b) Cell response predicted from the statistical model of Coen-Cagli et al.[9], and probability that center and surround receptive fields are co-assigned to the same normalization pool and contribute to the divisive normalization of the model response.The probability of co-assignment depends on the covariance with the surround, as shown below.(c) Different center-surround visual neighborhoods in a natural scene.In each case, the activity of the sensors in the surround can be co-assigned to the activity in the center (i.e.considered in the normalization pool) if the orientation of maximally responding sensors is consistent (which is the case for four of the considered regions and not the case for the first).The horizontal surround that is co-assigned with the corresponding center is highlighted in bold.(d) Covariance matrices learned from natural images determine co-assignment: the orientation and relative position of the receptive fields are represented by the black bars (the thickness of the bar is proportional to the variance, while the thickness of the red lines is proportional to the covariance between the two connected bars).This figure has been adapted from[9].

Fig 4 .
Fig 4. Psychophysically-inspired parameters for Wilson-Cowan model.Illustration of the Gaussian neighborhoods W with excitatory and inhibitory parts: separable interactions depending on departure in space (a), frequency (b), and orientation (c).The spatial component of W is shown for the central location of the considered visual field and the 24 cpd vertical subband.Following Watson and Solomon[4] the Gaussian kernels (here difference of Gaussians) are separable in space, frequency and orientation.Therefore lower frequency subbands have coarser sampling (and thus higher amplitude) but the same shape in space.The shape is also the same for subbands tuned to different orientations.Equivalent separability applies for variations in frequency and orientation: the interactions in the plots (b) and (c) are the same for every spatial location (and orientation and frequency respectively).The above plots display the (more intuitive) −W, where positive and negative values mean excitatory and negative interaction respectively, but note that, according to the sign in Eq. 8, positive weights are inhibitory.(d) Auto-attenuation factor α. In this plot the horizontal axis (wavelet coefficients in reverse order) can be qualitatively understood as frequency so high frequencies display bigger attenuation.The panels (e) and (f) show different options for the pointwise activation nonlinearity, f (x).In pink we have the γ-function proposed in[36] and in red we have the original proposal of Wilson-Cowan[14].In both cases we show the approximations of these functions using gn(x) (for increasing number of terms n), and the corresponding derivatives, which have an impact in the relation between the kernels in the different models (Eq.13).

Fig 5 .
Fig 5. Convergence to the Divisive Normalization solution (I): quantitative error.Percentage of relative MSE between the solution of the Wilson-Cowan equations and the Divisive Normalization response as a function of discrete time (steps in Euler integration).Left: relative MSE for different kernel widths and activation/saturation functions in case of just inhibitory interactions.Right: relative MSE for different kernel widths and saturation functions in case of excitatory+inhibitory interactions.The configurations in pink and red are the psychophysically meaningful ones, which achieve 0.6% and 2.7% relative MSE respectively.In all cases, the update of the solution tends to zero (results not shown) indicating the approach to a steady state.The curves in bold style represent the median (50 quantile) over 45 natural images, and the curves in light style (very close to the median), represent the 25 and 75 quantiles.This result suggest the succesful convergence of WC to DN for natural images over a wide range of model parameterizations.

Fig 6 .
Fig 6.Convergence to the Divisive Normalization solution (II): qualitative similarity.(a) Input image (stimulus at the retina), (b) Responses of the linear simple cells of V1.Responses are spatially non-stationary, see regions of very low amplitude highlighted in yellow and blue.(c) Steady state of the Wilson-Cowan equation after 650 iterations of Eq. 14 with inhibitory kernel W of psychophysically sensible width and γ-activation, (d)Corresponding Divisive Normalization response using the kernel given by Eq. 10.In this example the relative MSE between x WC and x DN is 0.93%, but, more importantly, note how the amplitude of the response of the highlighted neurons has increased similarly in the nonlinear cases leading to a more stationary response.

Fig 7 .
Fig 7. Stability of the Divisive Normalization solution (I).Eigenspectrum of the Jacobian of the right-hand side of the Wilson-Cowan differential equation with psychophysicallytuned parameters on natural images (curves in pink and red), and different additional configurations (different widths and activation functions), including just inhibitory interactions (Left) and excitatory-inhibitory interaction (right).The curves refer to the median of the eigenvalues over 45 representative images extracted from the calibrated datasets[52,53].The standard deviation and quartile distance is so small that cannot be seen in the plot.The result shows that the eigenvalues are all negative.This suggests that the Divisive Normalization is a stable node of the system for natural images for a wide range of model parameterizations.

Fig 8 .
Fig 8. Stability of the Divisive Normalization solution (II).Vector fields in the phase space generated by the Jacobian of the psychophysically-tuned Wilson-Cowan model.The example at the left (red dot) corresponds to the response of 3 low-frequency sensors at the Divisive Normalization solution.This vector field describes the evolution of the response if it is perturbed in arbitrary directions (the cardinal directions, in red, or any combination of them).The result is general: examples at the right show similar results for pairs of sensors tuned to different frequencies for different input stimuli.

Fig 9 .
Fig 9. Linear and nonlinear responses in V1 for an illustrative stimulus.(a) Retinal image composed of a natural image and two synthetic patches of frequencies 24 and 12 cpd.This image goes through the first stages of the model (see Fig. 1) up to the cortical layer, where a set of linear wavelet filters leads to the responses with energy e, which are nonlinearly transformed into the responses x. (b.1) Wavelet panel that represents e. (c.1) Wavelet panel that represents x.The highlighted sensors in red and blue (tuned to different locations of the 24 cpd scale, horizontal orientation) have characteristic responses given the image patterns in those locations.The plots (b.2) and (c.2) show the vector representation of the wavelet responses arranged according to the MatlabPyrTools convention [49].These plots show how natural images typically have bigger energy in the low-frequency sensors.(d) Input-output scatter plots at different spatial frequencies, in cycles per degree (cpd), and demonstrate that Divisive Normalization (and the Wilson-Cowan solution) imply adaptive saturating nonlinearities depending on the neighbors (i.e. a family of sigmoid functions).

Fig 10 .
Fig 10.Empirical and theoretical modulation of the Divisive Normalization kernel.Vectors in diagonal matrices D l and Dr that multiply the Gaussian kernel in the empirical tuning represented by Eq. 10 (top), and in the theoretically derived Eq. 13 (bottom).These theoretical filters correspond to a specific natural image and using the g 10 (x) approximation of the γ-activation.The median relative RMSEs for the predicted filters over 45 natural images are 10.2% (for the left filter) and 5.4% (for the right filter).The difference for the right filter using the original Wilson-Cowan activation and also g 10 (x) is 5.1% (curve not shown).However, as the empirical filters were just ad-hoc adjusted in[12], here the relevant is the reproduction of the required high-pass structure, not the MSE.

Fig 11 .
Fig 11.Gaussian and empirical interaction kernels for the sensors highlighted in red and light blue in Fig. 9. Gaussian kernel (left) with overestimated contribution of low-frequency subbands (highlighted in orange).Hand-crafted kernel (right) to reduce the influence of low-frequencies subbands (highlighted in green).

Fig 12 .
Fig 12. Changes in the shape of the interaction in the theoretically-derived kernel.Panels (a), (b), and (c) show the isolated factors in the kernel matrix H, assuming a Gaussian wiring in W .The Gaussian component implies interactions with low frequencies (highlighted in orange).(d) Shows the interaction kernel resulting from the product of the three factors H = D ( k x ) •W •D ( k b ⊙gn(x)), for the two highlighted points.Here we used g 10 (x).Note the high-pass effect of the left-and right-matrix product over W , that removed the interaction in the low-frequency subbands, now highlighted in dark green.(e) Zoom on the high-frequency horizontal subband.The term depending on the derivatives implies changes of the shape of the kernel (from circular to horizontal ellipses) when the context is a high contrast horizontal pattern.This is compatible with the probabilities of co-assignment[9] recalled in Fig.3.b.

Fig 13 .
Fig 13.Contrast response curves obtained from the Wilson-Cowan model.Contrast response curves for low spatial frequency vertical tests (left) and high spatial frequency horizontal tests (right) seen on top of backgrounds of different spatial frequencies, orientations, and contrasts (see representative stimuli in the insets).The backgrounds include: (1) two spatial frequencies (low and high, corresponding to the top and bottom rows, respectively); (2) two orientations (vertical and horizontal, as seen in the insets); and (3) four different contrasts represented by the line styles (0.0, 0.15, 0.30, and 0.45, corresponding to the black solid line, blue solid line, dotted blue line, and dashed blue line, respectively).The responses display the qualitative trends of contrast perception: frequency selectivity, saturation with contrast, and cross-masking depending on spatio-frequency similarity between test and background.

Fig 14 .
Fig 14.Subjective image distortion using the hand-crafted kernel [12] (left), and the kernel based on Wilson-Cowan equations (right).For each scatter plot, the Pearson correlation between Mean Opinion Scores (ordinates) and predicted image distortions (abscissas) is given.Differences in the correlations are not statistically significant indicating the validity of the proposed relation.

-
Psychophysically-sensible parameters for the Wilson-Cowan model: The toolbox includes the functions saturation f.m and inv saturation f.m that compute and invert the dimension-wise saturating response of the Wilson-Cowan model depicted in Fig. 4.1 of the main text.These functions also compute the corresponding derivative with regard to the stimuli.The routine Converg Stability ND excit inhibit.m defines and plots the parameters of the Wilson-Cowan model based on psychophysically-tuned Divisive Normalization.

-
The Divisive Normalization of the frequency-tuned sensors, N

-
The equivalent Wilson-Cowan interaction, N


the saturation function is: f (x) = c x γ where γ = 0.4, and, (26) the scaling constant is, c = x1−γ , and x is the average response over natural images (for the Divisive Normalization transform): This exponent is also used for the definition of energy in Wilson-Cowan, e = |r| γ .

Table 1 .
Accuracy of the approximations and convergence error.Relative Mean Squared Error (rel.MSE in %) of three magnitudes: (a) the error in the approximation of the inverse of Divisive Normalization, (b) the error in the approximation of f (x) in Wilson-Cowan, and (c) the final error in the convergence shown in Fig. 5.The convergence errors for the psychophysically sensible configurations are highlighted in bold style and color.Values represent the median relative MSE in % over 45 images, and uncertainty intervals represent the 50% interquantile distance.
is in BioMultiLayer L NL color.zip.This toolbox includes the model, its inverse and Jacobians, and a distortion metric based on the model.The file demo deep DN iso color spectral.mshows how to choose the parameters of the model, how to apply it to spectral images and images in opponent color representations, and how to compute the responses, the Jacobians and the inverse.The demo function demo metric deep DN iso color.mshows how to represent conventional digital images in the appropriate opponent color representation.