A computational theory of visual receptive fields
 3.8k Downloads
 29 Citations
Abstract
A receptive field constitutes a region in the visual field where a visual cell or a visual operator responds to visual stimuli. This paper presents a theory for what types of receptive field profiles can be regarded as natural for an idealized vision system, given a set of structural requirements on the first stages of visual processing that reflect symmetry properties of the surrounding world. These symmetry properties include (i) covariance properties under scale changes, affine image deformations, and Galilean transformations of space–time as occur for realworld image data as well as specific requirements of (ii) temporal causality implying that the future cannot be accessed and (iii) a timerecursive updating mechanism of a limited temporal buffer of the past as is necessary for a genuine realtime system. Fundamental structural requirements are also imposed to ensure (iv) mutual consistency and a proper handling of internal representations at different spatial and temporal scales. It is shown how a set of families of idealized receptive field profiles can be derived by necessity regarding spatial, spatiochromatic, and spatiotemporal receptive fields in terms of Gaussian kernels, Gaussian derivatives, or closely related operators. Such image filters have been successfully used as a basis for expressing a large number of visual operations in computer vision, regarding feature detection, feature classification, motion estimation, object recognition, spatiotemporal recognition, and shape estimation. Hence, the associated socalled scalespace theory constitutes a both theoretically wellfounded and general framework for expressing visual operations. There are very close similarities between receptive field profiles predicted from this scalespace theory and receptive field profiles found by cell recordings in biological vision. Among the family of receptive field profiles derived by necessity from the assumptions, idealized models with very good qualitative agreement are obtained for (i) spatial oncenter/offsurround and offcenter/onsurround receptive fields in the fovea and the LGN, (ii) simple cells with spatial directional preference in V1, (iii) spatiochromatic doubleopponent neurons in V1, (iv) space–time separable spatiotemporal receptive fields in the LGN and V1, and (v) nonseparable space–time tilted receptive fields in V1, all within the same unified theory. In addition, the paper presents a more general framework for relating and interpreting these receptive fields conceptually and possibly predicting new receptive field profiles as well as for prewiring covariance under scaling, affine, and Galilean transformations into the representations of visual stimuli. This paper describes the basic structure of the necessity results concerning receptive field profiles regarding the mathematical foundation of the theory and outlines how the proposed theory could be used in further studies and modelling of biological vision. It is also shown how receptive field responses can be interpreted physically, as the superposition of relative variations of surface structure and illumination variations, given a logarithmic brightness scale, and how receptive field measurements will be invariant under multiplicative illumination variations and exposure control mechanisms.
Keywords
Receptive field Scale space Gaussian derivative Scale covariance Affine covariance Galilean covariance Illumination invariance LGN Primary visual cortex Visual area V1 Functional model Simple cell Doubleopponent cell Complex cell Vision Theoretical neuroscience Theoretical biology1 Introduction
If one considers the theoretical and algorithmic problems of designing a vision system that is going to make use of incoming reflected light to infer properties of the surrounding world, one may ask what types of image operations should be performed on the image data. Would any type of image operation be reasonable? Specifically, regarding the notion of receptive fields, one may ask what types of receptive field profiles would be reasonable? Is it possible to derive a theoretical model of how receptive fields “ought to” respond to visual data?
Initially, such a problem might be regarded as intractable unless the question can be further specified. It is, however, possible to study this problem systematically using approaches that have been developed in the area of computer vision known as scalespace theory (Iijima 1962; Witkin 1983; Koenderink 1984; Koenderink and Doorn 1992; Lindeberg 1994a, b, 2008; Sporring et al. 1996; Florack 1997; ter Haar Romeny 2003). A paradigm that has been developed in this field is to impose structural constraints on the first stages of visual processing that reflect symmetry properties of the environment. Interestingly, it turns out to be possible to substantially reduce the class of permissible image operations from such arguments.
The subject of this article is to describe how structural requirements on the first stages of visual processing as formulated in scalespace theory can be used for deriving idealized models of receptive fields and implications of how these theoretical results can be used when modelling biological vision. A main theoretical argument is that idealized models for linear receptive fields can be derived by necessity given a small set of symmetry requirements that reflect properties of the world that one may naturally require an idealized vision system to be adapted to. In this respect, the treatment bears similarities to approaches in theoretical physics, where symmetry properties are often used as main arguments in the formulation of physical theories of the world. The treatment that will follow will be general in the sense that spatial, spatiochromatic, and spatiotemporal receptive fields are encompassed by the same unified theory.
Specifically, explicit functional models will be given of spatial and spatiotemporal response properties of LGN neurons and simple cells in V1, which will be compared to related models in terms of Gabor functions (Marcelja 1980; Jones and Palmer 1987b, a), differences of Gaussians (Rodieck 1965), and Gaussian derivatives (Koenderink and Doorn 1987; Young 1987; Young et al. 2001; Young RA, Lesperance 2001; Lindeberg 1994a, b, 1997, 2011). For chromatic input, the model also accounts for coloropponent spatiochromatic cells in V1. Notably, the diffusion equations that describe the evolution properties over scale of these linear receptive field models are suitable for implementation on a biological architecture, since the computations can be expressed in terms of communications between neighboring computational units, where either a single computational unit or a group of computational units may be interpreted as corresponding to a neuron or a group of neurons.
Compared to previous approaches of learning receptive field properties and visual models from the statistics of natural image data (Field 1987; van der Schaaf and van Hateren 1996; Olshausen and Field 1996; Rao and Ballard 1998; Simoncelli and Olshausen 2001; Geisler 2008; Hyvärinen et al. 2009; Lörincz et al. 2012), the proposed theoretical model makes it possible to determine spatial and spatiotemporal receptive fields from first principles and thus without need for any explicit training stage or gathering of representative image data. In relation to such learningbased models, the proposed theory provides a normative approach that can be seen as describing the solutions that an ideal learningbased system may converge to, if exposed to a sufficiently large and representative set of natural image data. For these reasons, the presented approach should be of interest when modelling biological vision.
We will also show how receptive field responses can be interpreted physically as a superposition of relative variations of surface structure and illumination variations, given a logarithmic brightness scale, and how receptive field measurements will be invariant under multiplicative illumination variations and exposure control mechanisms. Despite the image measurements fundamentally being of an indirect nature, in terms of reflected light from external objects subject to unknown or uncontrolled illumination, this result shows how receptive field measurements can nevertheless be related to inherent physical properties of objects in the environment. This result therefore provides a formal justification for using receptive field responses as a basis for visual processes, analogous to the way linear receptive fields in the fovea, LGN and V1 provide the basic input to higher visual areas in biological vision.
We propose that these theoretical results contribute to an increased understanding of the role of early receptive fields in vision. Specifically, if one aims at building a neuroinspired artificial vision system that solves actual visual tasks, we argue that an approach based on the proposed idealized models of linear receptive fields should require a significantly lower amount of training data compared to approaches that involve specific learning of receptive fields or compared to approaches that are not based on covariant receptive field models. We also argue that the proposed families of covariant receptive fields will be better at handling natural image transformations as resulting from variabilities in relation to the surrounding world.
In their survey of our knowledge of the early visual system, Carandini et al. (2005) emphasize the need for functional models to establish a link between neural biology and perception. Einhäuser and König (2010) argue for the need for normative approaches in vision. This paper can be seen as developing the consequences of such ways of reasoning by deriving functional models of linear receptive fields using a normative approach. Due to the formulation of the resulting receptive fields in terms of spatial and spatiotemporal derivatives of convolution kernels, it furthermore becomes feasible to analyze how receptive field responses can be related to properties of the environment using mathematical tools from differential geometry and thereby analyzing possibilities as well as constraints for visual perception.
1.1 Outline of the presentation
The treatment will be organized as follows: Sect. 2 formulates a set of structural requirements on the first stages of visual processing with respect to symmetry properties of the surrounding world and in relation to internal representations that are to be computed by an idealized vision system. Then, Sect. 3 describes the consequences of these assumptions with regard to intensity images defined over a spatial domain, with extensions to color information in Sect. 4. Sect. 5 develops a corresponding theory for spatiotemporal image data, taking into account the special nature of timedependent image information.
Section 6 presents a comparison between spatial and spatiotemporal receptive fields measured from biological vision to receptive field profiles generated by the presented spatial, spatiochromatic, and spatiotemporal scalespace theories, showing a very good qualitative agreement. Section 7 describes how a corresponding foveal scalespace model can be formulated for a foveal sensor to account for a spatially dependent lowest resolution with suggestions for extensions in Sect. 8.
Section 9 relates the contributions in the paper to previous work in the area in a retrospective manner, and Sect. 10 concludes with a summary and discussion, including an outline of further applications of how the presented theory can be used for modelling biological vision.
2 Structural requirements of an idealized visual front end
The notion of a visual front end refers to a set of processes at the first stages of visual processing, which are assumed to be of a general nature and whose output can be used as input to different laterstage processes, without being too specifically adapted to a particular task that would limit the applicability to other tasks. Major arguments for the definition of a visual front end are that the first stages of visual processing should be as uncommitted as possible and allow initial processing steps to be shared between different laterstage visual modules, thus implying a uniform structure on the first stages of visual computations (Koenderink et al. 1992; Lindeberg 1994b, Sect. 1.1).
In the following, we will describe a set of structural requirements that can be stated concerning (i) spatial geometry, (ii) spatiotemporal geometry, (iii) the image measurement process with its close relationship to the notion of scale, (iv) internal representations of image data that are to be computed by a general purpose vision system, and (v) the parameterization of image intensity with regard to the influence of illumination variations.
The treatment that will follow can be seen as a unification, abstraction and extension of developments in the area of scalespace theory (Iijima 1962; Witkin 1983; Koenderink 1984; Koenderink and Doorn 1992; Lindeberg 1994a, b, 2008; Sporring et al. 1996; Florack 1997; ter Haar Romeny 2003) as obtained during the last decades, see Sect. 9.2 and (Lindeberg 1996, 2011; Weickert et al. 1999; Duits et al. 2004) for complementary surveys. It will then be shown how a generalization of this theory to be presented next can be used for deriving idealized models of receptive fields by necessity, including new extensions for modelling illumination variations in the intensity domain. Specifically, we will describe how these results can be used for computational neuroscience modelling of receptive fields with regard to biological vision.
2.1 Static image data over spatial domain
2.1.1 Linearity and convolution structure
2.1.2 Image measurements at different scales
2.1.3 Structural requirements on a scalespace representation
Together, the requirements of a semigroup structure and selfsimilarity over scales imply that the parameter \(s\) gets both a (i) qualitative interpretation of the notion of scale in terms of an abstract ordering relation due to the cascade property in Eq. (9) and (ii) a quantitative interpretation of scale, in terms of the scaledependent spatial transformations in Eqs. (10) and (11). When these conditions are simultaneously satisfied, we say that the intermediate representation \(L(\cdot ;\; s)\) constitutes a candidate for being regarded as a (weak) scalespace representation.
Smoothing property: nonenhancement of local extrema A further requirement on a scalespace representation is that convolution with the scalespace kernel \(T(\cdot ;\; s)\) should correspond to a smoothing transformation in the sense that coarserscale representations should be guaranteed to constitute simplifications of corresponding finer scale representations and that new image structures must not be created at coarser scales \(L(\cdot ;\; s)\) that do not correspond to simplifications of corresponding structures in the original data \(f\).
For onedimensional signals \(f :{\mathbb R}\rightarrow {\mathbb R}\), such a condition can be formalized as the requirement that the number of local extrema or equivalently the number of zerocrossings in the data must not increase with scale and is referred to as noncreation of local extrema (Lindeberg 1990). For higherdimensional signals, however, it can be shown that there are no nontrivial linear transformations guaranteed to never increase the number of local extrema in an image (Lifshitz and Pizer 1990; Lindeberg 1990).
2.1.4 Requirements regarding spatial geometry
For a scalespace representation based on a multidimensional scale parameter, one may also consider a weaker requirement of rotational invariance at the level of a family of kernels, for example regarding a set of elongated kernels with different orientations in image space. Then, although the individual kernels in the filter family are not rotationally symmetric as individual filters, a collection or a group of such kernels may nevertheless capture image data of different orientation in a rotationally invariant manner, for example if all image orientations are explicitly represented or if the receptive fields corresponding to different orientations in image space can be related by linear combinations.
A natural requirement on an idealized vision system that observes objects whose projections on the image plane are being deformed in different ways depending on the viewing conditions is that the vision system should be able to relate or match the different internal representations of external objects that are acquired under different viewing conditions. Such a requirement is natural to enable a stable interpretation of objects in the world under variations of the orientation of the object relative to the observer, to enable invariance under variations of the viewing direction.
2.2 Timedependent image data over a spatiotemporal domain
Regarding spatiotemporal image data \(f(x, t)\), which we assume to be defined on a 2+1D spatiotemporal domain \({\mathbb R}^2 \times {\mathbb R}\) with \(x = (x_1, x_2)^T\) denoting image space and \(t\) denoting time, it is natural to inherit the abovementioned symmetry requirements expressed for the spatial domain. Hence, corresponding structural requirements as stated in Sects. 2.1.1, 2.1.2, and 2.1.3 should be imposed on a spatiotemporal scale space, with space \(x \in {\mathbb R}^2\) replaced by space–time \((x, t) \in {\mathbb R}^2 \times {\mathbb R}\) and with the scale parameter now encompassing also a notion of temporal scale \(\tau \), such that the multidimensional scale parameter \(s\) will be of the form \(s = (s_1, \ldots , s_N, \tau )\).
2.2.1 Additional requirements regarding spatiotemporal geometry
Again, within the class of linear transformations \(\mathcal{T}_s\), it is not possible to realize such a Galilean covariance property within a spatiotemporal scale concept based solely on a scalar spatial scale parameter \(s \in {\mathbb R}\) and a scalar temporal scale parameter \(\tau \in {\mathbb R}\). As will be shown later, Galilean covariance can, however, be achieved within a fourparameter linear spatiotemporal scale space.
2.2.2 Specific constraints regarding a realtime system
Time recursivity and temporal memory When dealing with spatiotemporal image data in a realtime setting, we cannot expect the vision system to have direct access to all information from the past, since we cannot assume a computer vision system or a biological organism to store a complete recording of all visual information it has seen.
If we assume that the vision system should compute internal image representations at different temporal scales, the only reasonable approach will therefore be that these computations have to be expressed in terms of computations on some internal temporal buffer \(M(x, t)\), which we assume is to be much more condensed than a complete video recording of the past. Such an internal representation is referred to as a temporal memory, and the restriction of the set of possible computations to a combination of the current image data \(f(x, t)\) with such a compact temporal memory \(M(x, t)\) is referred to as time recursivity. Specifically, this temporal memory \(M(x, t)\) must be updated over time \(t\) according to some timerecursive model.

the kernel \(U\) performs the update on the internal representation \(L\) while simultaneously respecting a cascade property for \(L\) over spatial scales \(s\) and

the kernel \(h\) incorporates new information from the new image data \(f(x, t)\) that arrive between \(t= t_1\) and \(t= t_2\).
Notably, this formulation of a temporal evolution property has an interesting interpretation of enforcing a smooth (stabilizing) temporal behavior of the internal representation \(L(x, t;\; s, \tau )\) of the surrounding world as the spatiotemporal data \(f(x, t)\) varies over time \(t\).
2.3 Influence of illumination variations
The abovementioned symmetry requirements essentially refer to the geometry of space and space–time and its relation to image measurements over noninfinitesimal regions over space or space–time as formalized into the notion of a scalespace representation. Regarding the actual image intensities, these have so far been assumed to be given beforehand.
We may, however, consider different ways of parameterizing the intensity domain. Essentially, any monotonic intensity transformation will preserve the ordering of the intensity values from dark to bright. The perceptual impression of an image may, however, be substantially different after a nonlinear intensity transformation. Hence, one may ask whether we should assume the image data \(f\) to be proportional to image irradiance \(f \sim I\) (in units of power per unit area), some selfsimilar power of image irradiance \(f \sim I^{\gamma }\) or whether there is a better choice?
2.3.1 Behavior under illumination variations: spatial image data
In this section, we will express properties of a logarithmic brightness scale in relation to a physical illumination model and image measurements in terms of receptive fields.
This model has a similar behavior as Lambertian surface model, with the extension that the surface may be regarded as “gray” by not reflecting all incident light. Please note, however, that this reflectance model constitutes a substantial simplification of the bidirectional reflectance function and does not comprise, e.g., specularities or materials with diffraction grating effects.
 (i)
properties of surfaces of objects in the world as condensed into the spatially dependent albedo factor \(\rho (x)\) with the implicit understanding that this entity may in general refer to different surfaces in the world depending on the viewing direction \((x_1, x_2, 1)^T\) and thus the image position \(x = (x_1, x_2)^T\),
 (ii)
properties of the illumination field as reflected in the spatially dependent illumination \(i(x)\), which also may refer to the amount of incoming light on different surfaces in the world depending on the value of \(x\),
 (iii)
geometric properties of the camera as condensed into a dependency on the effective \(f\)number \(\tilde{f}\) captured by \(C_\mathrm{cam}(\tilde{f})\), and
 (iv)
a geometric natural vignetting effect of the explicit form \(V(x) = V(x_1, x_2) =  2 \log (1 + x_1^2 + x_2^2 )\).

For a smooth surface with a spatially dependent surface pattern \(\rho (X)\), the first term \(\partial _{x_k} \rho /\rho \) reflects inherent relative spatial variations of this surface pattern as deformed by the perspective projection model in analogy with the affine deformation model (24).

The second term \(\partial _{x_k} i/i\) reflects relative spatial variations in the illumination field \(i\) as arising from the interaction between the external illumination field \(i(X, \theta (X), \varphi (X))\) and the local surface geometry \((\theta (X), \varphi (X))\) at every surface point \(X\) according to (42).

The third term \((\partial _{x_k} V)(x) = (\partial _{x_k} V)(x_1, x_2) = 4 x_k/(1 + x_1^2 + x_2^2)\) constitutes a geometric bias due to vignetting effects inherent to the camera. (Please note that the image coordinates in this treatment are expressed in units of the focal length with \(x = \sqrt{x_1^2 + x_2^2} \ll 1\) in the central field of view.) This term will disappear for a spherical camera geometry.
2.3.2 Behavior under illumination variations: spatiotemporal image data
Regarding temporal derivatives, it follows that the influence of the vignetting effect \(V(x)\) will be cancelled by any temporal derivative operator with \(\beta \ge 0\). The temporal derivative operator will also suppress the effect of any other solely spatial illumination variation.
2.3.3 Summary regarding intensity and illumination variations

relative variations in the albedo of the observed surface patterns corresponding to the term \(\partial _{x^{\alpha } t^{\beta }} \left( \mathcal{T}_s \, \log \rho (x) \right) \) in (54), and

relative variations in the illumination field corresponding to the term \(\partial _{x^{\alpha } t^{\beta }} \left( \mathcal{T}_s \, \log i(x) \right) \) in (54)
3 Spatial domain with pure intensity information
We shall now describe how the structural requirements on an idealized vision system as formulated in Sect. 2.1 restrict the class of possible image operations at the first stages of visual processing. For image data \(f :{\mathbb R}^2 \rightarrow {\mathbb R}\) defined over a twodimensional spatial domain, let us assume that the first stage of visual processing as represented by the operator \(\mathcal{T}_s\) should be (i) linear, (ii) shift invariant, and (iii) obey a semigroup structure over spatial scales \(s\), where we also have to assume (iv) certain regularity properties of the semigroup \(\mathcal{T}_s\) over scale \(s\) in terms of Sobolev norms^{10} to guarantee sufficient differentiability properties with respect to space \(x \in {\mathbb R}^2\) and scale \(s\). Let us furthermore require (v) nonenhancement of local extrema to hold for any smooth image function \(f \in C^{\infty }({\mathbb R}^2) \cap L^1({\mathbb R}^2)\).
3.1 Gaussian receptive fields
3.2 Affineadapted Gaussian receptive fields
With respect to biological vision, the affine Gaussian kernels as well as directional derivatives of these can be used for modelling receptive fields that are oriented in the spatial domain, as will be described in connection with Eq. (111) in Sect. 6. For computational vision, they can be used for computing affine invariant image features and image descriptors for, e.g., cues to surface shape, imagebased matching, and recognition (Lindeberg 1994b; Lindeberg and Gårding 1997; Baumberg 2000; Mikolajczyk and Schmid 2004; Tuytelaars and Gool 2004; Lazebnik et al. 2005; Rothganger et al. 2006).
3.3 Necessity of derived receptive fields in terms of derivatives
For directional derivatives that have been derived from elongated kernels whose underlying zeroorder convolution kernels are not rotationally symmetric, it should be noted that we have aligned the directions of the directional derivative operators to the orientations of the underlying kernels. A structural motivation for making such an alignment can be obtained from a requirement of a weaker form of rotational symmetry at the group level. If we would like the family of receptive fields to be rotationally symmetric as a group, then it is natural to require the directional derivative operators to be transformed in a similar way as the underlying kernels.
4 Spatial domain with color information
To define a corresponding scalespace concept for color images, the simplest approach would be by computing a Gaussian scalespace representation for each color channel individually. Since the values of the color channels will usually by highly correlated, it is, however, preferable to decorrelate the dependencies by computing a coloropponent representation. Such a representation is also in good agreement with human vision, where a separation into red/green and yellow/blue coloropponent channels takes place at an early stage in the visual pathways.
4.1 Gaussian coloropponent receptive fields
In Hall et al. (2000), Linde and Lindeberg (2004, 2012), and Sande et al. (2010), it is shown how such spatiochromatic receptive fields in combination with regular spatial receptive fields can constitute an effective basis for object recognition.
Another type of Gaussian color model has been proposed by Koenderink and later used by Geusebroek and his coworkers (Burghouts and Geusebroek 2009) with receptive fields defined over the spectrum of wavelengths in the color spectrum, corresponding to zero, first, and secondorder derivatives with respect to wavelength.
5 Spatiotemporal image data
5.1 Noncausal spatiotemporal receptive fields
Let us first apply a similar way of reasoning as in Sect. 3 with space \(x \in {\mathbb R}^2\) replaced by space–time \((x, t)^T \in {\mathbb R}^2 \times {\mathbb R}\) and disregarding temporal causality, thereby allowing unlimited access to information over both space and time. Given image data \(f :{\mathbb R}^2 \times {\mathbb R}\rightarrow {\mathbb R}\) defined over a 2+1D spatiotemporal domain, let us therefore again assume that the first stage of visual processing as represented by the operator \(\mathcal{T}_s\) should be (i) linear, (ii) shift invariant, and (iii) obey a semigroup structure over both spatial and temporal scales \(s\), where we also assume (iv) certain regularity properties of the semigroup \(\mathcal{T}_s\) over scale \(s\) in terms of Sobolev norms^{12} to guarantee sufficient differentiability properties with respect to space \(x\), time \(t\) and spatiotemporal scales \(s\). Let us furthermore require (iv) nonenhancement of local extrema to hold for any smooth image function \(f \in C^{\infty }({\mathbb R}^2 \times {\mathbb R}) \cap L^1({\mathbb R}^2 \times {\mathbb R})\) and for any positive scale direction \(s\).
5.2 Timecausal spatiotemporal receptive fields
If we on the other hand with regard to realtime biological vision want to respect both temporal causality and temporal recursivity, we obtain different families of receptive fields. Specifically, two different families of timecausal receptive fields can be derived depending on whether we require (i) a continuous semigroup structure over a continuum of temporal scales or (ii) fixate the temporal scale levels to be discrete a priori.

\(g(x  v t;\; s;\; \varSigma )\) is a velocityadapted 2D affine Gaussian kernel with spatial covariance matrix \(\varSigma \) and

\(\phi (t;\; \tau )\) is a timecausal smoothing kernel over time with temporal scale parameter \(\tau \), which is related to the regular onedimensional Gaussian kernel according to \(\phi (t;\; \tau ) =  \partial _{\tau } g(\tau ;\; t)\). (Please note the shift of the order of the arguments between \(\phi \) and \(g\).)
5.3 Distributions of spatiotemporal receptive fields
5.4 Geometric covariance properties

image data acquired with different spatial and/or temporal sampling rates,

image structures of different spatial and/or temporal extent,

objects at different distances from the camera,

the linear component of relative motions between objects in the world and the observer, and

the linear component of perspective deformations.
6 Computational modelling of biological receptive fields
In two comprehensive reviews, DeAngelis et al. (1995), DeAngelis and Anzai (2004) present overviews of spatial and temporal response properties of (classical) receptive fields in the central visual pathways. Specifically, the authors point out the limitations of defining receptive fields in the spatial domain only and emphasize the need to characterize receptive fields in the joint space–time domain, to describe how a neuron processes the visual image. Conway and Livingstone (2006) show the result of a corresponding investigation concerning color receptive fields.
In the following, we will describe how the abovementioned spatial and spatiotemporal scalespace concepts can be used for modelling the spatial, spatiochromatic, and spatiotemporal response properties of biological receptive fields. Indeed, it will be shown that the Gaussian and timecausal scalespace concepts lead to predictions of receptive field profiles that are qualitatively very similar to all the receptive field types presented in DeAngelis et al. (1995), DeAngelis and Anzai (2004), and schematic simplifications of most of the receptive fields shown in Conway and Livingstone (2006).
6.1 LGN neurons

\(\pm \) determines the polarity (oncenter/offsurround versus offcenter/onsurround),

\(\partial _{x_1 x_1} + \partial _{x_2 x_2}\) denotes the spatial Laplacian operator,

\(g(x_1, x_2;\; s)\) denotes a rotationally symmetric spatial Gaussian,

\(\partial _{{t'}}\) denotes a temporal derivative operator with respect to a possibly selfsimilar transformation of time \(t' = t^{\alpha }\) or \(t' = \log t\) such that \(\partial _{{t'}} = t^{\kappa } \, \partial _t\) for some constant \(\kappa \in [0, 1]\) ( Lindeberg 2011, Sect. 5.1, pages 59–61)^{13},

\(h(t;\; \tau )\) is a temporal smoothing kernel over time corresponding to the timecausal smoothing kernel \(\phi (t;\; \tau ) = \tfrac{1}{\sqrt{2 \pi } \, t^{3/2}} \, \tau \, e^{\tau ^2/2t}\) in (95), a noncausal timeshifted Gaussian kernel \(g(t;\; \tau , \delta ) = \tfrac{1}{\sqrt{2 \pi \tau }} e^{(t  \delta )^2/2 \tau }\) according to (76) or a timecausal kernel corresponding to a set of firstorder integrators over time coupled in cascade having a Laplace transform \(H_\mathrm{composed}(q;\; \mu ) = \prod _{i=1}^{k} \frac{1}{1 + \mu _i q}\) according to (99),

\(n\) is the order of temporal differentiation,

\(s\) is the spatial scale parameter and

\(\tau \) is the temporal scale parameter.
Concerning the application of the Laplacian of Gaussian model for oncenter/offsurround and offcenter/onsurround receptive fields in the retina, it should be emphasized that the retina also contains other types of receptive fields that are not modelled here, such as brisk transient (Y) ganglion cells that respond to rapid transients and directional selective ganglion cells that respond to visual motion (Wässle 2004).
Note: In all illustrations in Sect. 6, where spatial and spatiotemporal derivative expressions are aligned to biological data, the unit for the spatial scale parameter \(s\) corresponds to \([\text{ degrees }^2]\) of visual angle and the units for the temporal scale parameter \(\tau \) in the Gaussian spatiotemporal scalespace representation are \([\text{ milliseconds }^2]\), whereas the units for the temporal scale parameter \(\tau \) in the timecausal spatiotemporal scalespace representation are \([\sqrt{\text{ milliseconds }}]\). For image velocities \(v\) of velocityadapted filters, the units are \([\text{ degrees/millisecond }]\). The reason why the units are different for the three types of spatiotemporal scale spaces is that the dimensionality of the temporal scale parameter is different in each of these spatiotemporal scalespace concepts.
6.2 Doubleopponent spatiochromatic cells
6.3 Simple cells

oriented in the spatial domain and

sensitive to specific stimulus velocities.
6.3.1 Spatial dependencies

\(\partial _{\varphi } = \cos \varphi \, \partial _{x_1} + \sin \varphi \, \partial _{x_2}\) is a directional derivative operator,

\(m\) is the order of spatial differentiation, and

\(g(x_1, x_2;\; \varSigma )\) is an affine Gaussian kernel with spatial covariance matrix \(\varSigma \) as can be parameterized according to (68)
In the specific case when the covariance matrix is proportional to a unit matrix \(\varSigma = s \, I\), with \(s\) denoting the spatial scale parameter, these directional derivatives correspond to regular Gaussian derivatives as proposed as a model for spatial receptive fields by Koenderink and Doorn (1987, 1992). The use of nonisotropic covariance matrices does on the other hand allow for a higher degree of orientation selectivity and does additionally allow for closedness under affine transformations (affine covariance).
Conceptually, the ripples of the Gabor functions, which are given by complex sine waves, are related to the ripples of Gaussian derivatives, which are given by Hermite functions. A Gabor function, however, requires the specification of a scale parameter and a spatial frequency, whereas a Gaussian derivative requires a scale parameter and the order of differentiation (per spatial dimension). With the Gaussian derivative model, receptive fields of different orders can be mutually related by derivative operations and be computed from each other by nearestneighbor operations. The zeroorder receptive fields as well as the derivativebased receptive fields can be modelled by diffusion equations and can therefore be implemented by computations between neighboring computational units.
In relation to invariance properties, the family of affine Gaussian kernels is closed under affine image deformations, whereas the family of Gabor functions obtained by multiplying rotationally symmetric Gaussians with sine and cosine waves is not closed under affine image deformations. This means that it is not possible to compute truly affine invariant image representations from such Gabor functions. Instead, given a pair of images that are related by a nonuniform image deformation, the lack of affine covariance implies that there will be a systematic bias in the image representations derived from such Gabor functions, corresponding to the difference between the backprojected Gabor functions in the two image domains. If using receptive profiles defined from directional derivatives of affine Gaussian kernels, it will on the other hand be possible to compute provably affine invariant image representations.
With regard to invariance to multiplicative illumination variations, the even cosine component of a Gabor function does in general not have its integral equal to zero, which means that the illumination invariant properties under multiplicative illumination variations or exposure control mechanisms described in Sect. 2.3 do not hold for Gabor functions.
In this respect, the Gaussian derivative model is simpler, it can be related to image measurements by differential geometry, be derived axiomatically from symmetry principles, be computed from a minimal set of connections and allows for provable invariance properties under locally linearized image deformations (affine transformations) as well as local multiplicative illumination variations and exposure control mechanisms. Young (1987) has more generally shown how spatial receptive fields in cats and monkeys can be well modelled by Gaussian derivatives up to order four.
In the area of computer vision, a multiscale differential geometric framework in terms of Gaussian derivatives and closely related operators has become an accepted and de facto standard for defining image features for feature detection, feature classification, stereo matching, motion estimation, object recognition, spatiotemporal recognition, shape analysis, and image enhancement. Specifically, the formulation of image primitives in terms of scalespace derivatives makes it possible to use tools from differential geometry for deriving relationships between image features and physical properties of objects in the environment, allowing for computationally operational and theoretically wellfounded modelling of possibilities or constraints for visual perception.
Given the model (111) of orientation selective receptive fields as depending on a spatial covariance matrix \(\varSigma \), this property is in good qualitative agreement with a distribution of receptive fields over a population over covariance matrices with different preferred orientations as determined from the eigenvectors of the covariance matrix and different ratios between the scale parameters along the preferred orientations as determined by the square root of the ratio between the eigenvalues of the covariance matrix. Specifically, the property of the orientation selectivity of being lowest at the positions of the centers of the pinwheels would be compatible with the covariance matrix there being close to alternatively closer to a unit matrix, implying that the orientations of the eigenvectors being sensitive to minor perturbations of the covariance matrix, thus causing the ratio between the eigenvalues being close to alternatively closer to one at the center of the pinwheel.
6.3.2 Spatiotemporal dependencies
In terms of temporal derivatives, a biphasic behavior arises from firstorder derivatives, a monophasic behavior from zeroorder derivatives, and a triphasic behavior from secondorder derivatives. Concerning the oriented spatial response characteristics, there is a high similarity with directional derivatives of Gaussian kernels (Young 1987).
 noncausal Gaussian spatiotemporal derivative kernels$$\begin{aligned}&h_\mathrm{Gaussian}(x_1, x_2, t;\; s, \tau , v, \delta ) \nonumber \\&\quad = \partial _{\varphi }^{m_1} \, \partial _{\bot \varphi }^{m_2} \, \partial _{\bar{t}^n} g(x_1, x_2, t;\; s, \tau , v, \delta ) \end{aligned}$$(114)
 timecausal spatiotemporal derivative kernels$$\begin{aligned}&h_\mathrm{timecausal}(x_1, x_2, t;\; s, \tau , v)\nonumber \\&\quad = (\partial _{\bar{x_1}^{\alpha _1} \bar{x_2}^{\alpha _2}} \partial _{\bar{t}^{\beta }} h)(x_1, x_2, t;\; s, \tau , v) \end{aligned}$$(115)

\(\partial _{\varphi } = \cos \varphi \, \partial _{x_1} + \sin \varphi \, \partial _{x_2}\) and \(\partial _{\bot \varphi } = \sin \varphi \, \partial _{x_1}  \cos \varphi \, \partial _{x_2}\) denote spatial directional derivative operators according to (69) in two orthogonal directions \(\varphi \) and \(\bot \varphi \),

\(m_1 \ge 0\) and \(m_2 \ge 0\) denote the orders of differentiation in the two orthogonal directions in the spatial domain with the overall spatial order of differentiation \(m = m_1 + m_2\),

\(v_1 \, \partial _{x_1} + v_2 \, \partial _{x_2} + \partial _t\) denotes a velocityadapted temporal derivative operator,

\(v = (v_1, v_2)^T\) denotes the image velocity,

\(n\) denotes the order of temporal differentiation,

\(g(x_1  v_1 t, x_2  v_2 t;\; \varSigma )\) denotes a spatial affine Gaussian kernel according to (63) that moves with image velocity \(v = (v_1, v_2)^T\) in space–time,

\(\varSigma \) denotes a spatial covariance matrix that can be parameterized by two eigenvalues \(\lambda _1\) and \(\lambda _2\) as well as a spatial orientation \(\theta \) of the form (68),

\(h(t;\; \tau )\) is a temporal smoothing kernel over time corresponding to the timecausal smoothing kernel \(\phi (t;\; \tau ) = \tfrac{1}{\sqrt{2 \pi } \, t^{3/2}} \, \tau \, e^{\tau ^2/2t}\) in (95), a noncausal timeshifted Gaussian kernel \(g(t;\; \tau , \delta ) = \tfrac{1}{\sqrt{2 \pi \tau }} e^{(t  \delta )^2/2 \tau }\) according to (76) or a timecausal kernel corresponding to a set of firstorder integrators over time coupled in cascade having a Laplace transform \(H_\mathrm{composed}(q;\; \mu ) = \prod _{i=1}^{k} \frac{1}{1 + \mu _i q}\) according to (99),

\(s\) denotes the spatial scale and

\(\tau \) denotes the temporal scale.
Young et al. (2001) and Young RA, Lesperance (2001) have also shown how spatiotemporal receptive fields can be modelled by Gaussian derivatives over a spatiotemporal domain, corresponding to the Gaussian spatiotemporal concept described here, although with a different type of parameterization; see also Lindeberg (1997, 2001) for closely related earlier work. These scalespace models can therefore be regarded as idealized functional and phenomenological models of receptive fields, whose actual realization can then be implemented in different ways depending on available hardware or wetware.
Relations to approaches for learning receptive fields from natural image statistics Work has also been performed on learning receptive field properties and visual models from the statistics of natural image data (Field 1987; van der Schaaf and van Hateren 1996; Olshausen and Field 1996; Rao and Ballard 1998; Simoncelli and Olshausen 2001; Geisler 2008; Hyvärinen et al. 2009; Lörincz et al. 2012) and been shown to lead to the formation of similar receptive fields as found in biological vision. The proposed theory of receptive fields can be seen as describing basic physical constraints under which a learningbased method for the development of receptive fields will operate and the solutions to which an optimal adaptive system may converge to, if exposed to a sufficiently large and representative set of natural image data. Field (1987) as well as Doi and Lewicki (2005) have described how ”natural images are not random, instead they exhibit statistical regularities” and have used such statistical regularities for constraining the properties of receptive fields. The theory presented in this paper can be seen as a theory at a higher level of abstraction, in terms of basic principles that reflect properties of the environment that in turn determine properties of the image data, without need for explicitly constructing specific statistical models for the image statistics. Specifically, the proposed theory can be used for explaining why the abovementioned statistical models lead to qualitatively similar types of receptive fields as the idealized receptive fields obtained from our theory.
An interesting observation that can be made from the similarities between the receptive field families derived by necessity from the assumptions and receptive profiles found by cell recordings in biological vision is that receptive fields in the retina, LGN, and V1 of higher mammals are very close to ideal in view of the stated structural requirements/symmetry properties. In this sense, biological vision can be seen as having adapted very well to the transformation properties of the surrounding world and the transformations that occur when a threedimensional world is projected to a twodimensional image domain.
6.4 Spatiochromtemporal receptive fields
6.5 Motion selectivity
Concerning motion selectivity, DeAngelis et al. (1995), DeAngelis and Anzai (2004) report that most cortical neurons are quite sensitive to stimulus velocity and the speed tuning is more narrow than for LGN cells. Simple cells with inseparable receptive fields have directional preference while cells with space–time separable receptive fields do not. Moreover, the preferred direction of motion corresponds to the orientation of the filter in space–time.

space–time separable receptive fields correspond to spatiotemporal scalespace kernels without velocity adaptation, whereas

inseparable receptive fields correspond to kernels that are explicitly adapted to nonzero velocities.
The abovementioned fact that a majority of the cells are inseparable in space–time is indeed nicely compatible with a description in terms of a multiparameter scale space as outlined in Sect. 2.1.3. If the vision system is to give a reasonable coverage of a set of filter parameters \(\varSigma \) and \(v\), then the set of filters corresponding to space–time separable receptive fields (corresponding to the filter parameters \(v = 0\)) will be much smaller than the set of filters allowing for nonzero values of the mixed parameters \(\varSigma \) and \(v\) over space and time.
6.6 Complex cells
Besides the abovementioned linear receptive fields, there is a large number of early nonlinear receptive fields that do not obey the superposition principle and whose response properties are rather insensitive to the phase of the visual stimuli. The response profile of such a cell in the spatial domain is typically of the form illustrated in Fig. 21c. Such cells for which the response properties are independent of the polarity of the stimuli are referred to as complex cells (Hubel and Wiesel 1959, 1962).
In a detailed study of the response properties of complex cells, Touryan et al. (2002) observed an additive interaction between the eigenvectors of a quadratic nonlinear model supporting the energy model (Adelson and Bergen 1985; Heeger 1992). In a more recent study, Rust et al. (2005) found that complex cell responses are better described by more linear filters than the one or two used in previous models. The abovementioned quasiquadrature models are in qualitative agreement with such computational structures. Specifically, the secondstage smoothing (125) of the pointwise quasiquadrature measure is in good agreement with the model of complex cell responses in (Rust et al. (2005), Fig. 8, page 953) based on weighted averaging of a set of quadrature pairs.
Suppressive influence can also be obtained by allowing for (ii) nonlinear feedback that alters the conductivities in the diffusion equation (112) alternatively the corresponding spatiotemporal extension based on local image measurements or by considering (iii) recurrent feedback from higher levels that influence the gain control of the feature detectors. With these extensions, the resulting model corresponds to an integration of a hierarchical and recurrent models as advocated by Martinez and Alonso (2003).
In contrast to the previous treatment of linear receptive field models, which were determined by necessity from first principles, it should be emphasized that the structure of the quasiquadrature model is not at all determined by necessity. Instead, it is presented as one possible nonlinear extension that reproduces some of the qualitative properties of complex cells.
7 Foveated vision
Given these assumption, it follows that the minimum receptive field size will increase linearly with the distance from the fovea, a distribution that is compatible with neurophysiological and psychophysical findings (Lindeberg and Florack 1992). Given such a spatially varying resolution limit, internal representations at coarser scales can then be constructed from these image measurements based on the semigroup property or the diffusion equation. Specifically, with a logpolar retinotopic mapping, the diffusion equation that governs the evolution properties over scale can equivalently be expressed on a logpolar domain (Lindeberg and Florack 1994). In all other respects, the receptive field profiles will be similar as for a translationally invariant spatial domain.
This foveal scalespace model has been used for computing scaleinvariant image descriptors for object recognition by Kokkinos and Yuille (2008). A closely related model for foveal vision in terms of an inverted pyramid has been proposed by Crowley and his coworkers (1994) with close relations to the spotlight model for visual attention by Tsotsos (1995).
A notable property of the receptive field measurements taken in the retina as shown in Fig. 35 is that the receptive field sizes are clustered along linear functions, whereas the foveal scalespace model in Fig. 36 is based on the assumptions that all receptive field sizes above a linearly increasing minimum receptive field size should be present. Given the semigroup property (8), it follows, however, that receptive fields at scales coarser than those displayed in Fig. 35 can be constructed by combining receptive fields at finer scales. The distribution in Fig. 35 would therefore correspond to a sampling of the outer layer of the inverted cone of receptive field sizes in the foveal scalespace model shown in Fig. 36. Receptive fields in the interior of this cone can therefore be constructed from linear combinations of receptive field responses in the outer layer.
An interesting question concerns whether the existence of coarserscale receptive fields corresponding to the interior of this cone could be established by cell recording of linear receptive fields in the LGN or in V1. An alternative possibility could be to investigate whether receptive fields corresponding to the outer layer of this cone could be directly combined into nonlinear receptive fields corresponding to the interior of this cone, without representing the intermediate linear receptive fields explicitly in terms of simple cells. Such investigations could then answer whether and how shift invariance is explicitly represented at the earliest levels of linear receptive fields or at higher nonlinear levels in the visual hierarchy.
8 Extensions
With regard to camera geometry, we have throughout based the analysis on a planar perspective projection model with a flat image plane. This choice has been made to simplify the mathematical treatment, since the translational group properties and the diffusion equations are much easier to express for a flat image geometry. To model biological vision more accurately, it would, however, be more appropriate to express a corresponding model based on a spherical camera geometry with a spherical image surface, which will lead to a scalespace concept based on diffusion equations on a sphere. Such a model would also have attractive theoretical properties in the sense that geometric distortions toward the periphery, such as vignetting, will disappear, and certain properties of global motion fields will become simpler. From such a background, the present model can be regarded as a local linearization applied in the tangent plane of the spherical camera model at the center of the visual sensor.
9 Relations to previous work
9.1 Biological vision
The notion of receptive field was originally defined by Sherrington (1906) to describe the somatosensory area of a body surface where a stimulus could cause a reflex. Hartline (1938) extended this notion to light stimuli and defined a visual receptive field as the area of the retina that must receive illumination in order to cause a discharge in a particular ganglion cell or nerve fiber. Kuffler (1953) studied the substructure of retinal receptive fields and found that they are concentric with specific “on” or “off” zones. He also coined the term “on–off” receptive fields. The Nobel laurates Hubel and Wiesel (1959, 1962, 2005) investigated and characterized the response properties of cells in the primary visual cortex (V1), discovered their orientation tuning, and proposed a taxonomy in terms of simple or complex cells based on how the cells respond to the polarity of visual stimuli. In the first wave of studies, specific stimuli such as points, bars, or sine wave gratings were used as stimuli for probing the visual cells.
Later, a new methodology for receptive field mappings was developed based on white noise stimuli, which allow for a complete characterization of the response properties of visual neurons if they can be assumed to be linear. Based on this technique, DeAngelis et al. (1995) were able to derive more detailed maps of receptive fields, including their response properties in the joint space–time domain; see DeAngelis and Anzai (2004) for a comprehensive overview of these developments. Conway and Livingstone (2006) performed a corresponding investigation of spatiochromatic and spatiochromtemporal response properties of receptive fields in the macaque monkey. Ringach et al. (2002) showed how receptive field profiles of neurons can be derived using natural image sequences as stimuli. Felsen et al. (2005) have presented comparisons between response properties of neurons to natural image features versus noise stimuli and found that in the responses of complex cells, but not of simple cells, the sensitivity is markedly higher for natural image data than for random stimuli.
Adelson and Bergen (1985) developed a spatiotemporal energy model for motion perception based on oriented filters in the space–time domain. The quasiquadrature approach in (118) and (119) in combination with a multiparameter scale space can be seen as an analogue and extension of such a representation within the Gaussian derivative framework. More recently, Young et al. (2001) showed how spatiotemporal receptive fields can be modelled by Gaussian derivatives over a spatiotemporal domain, corresponding to the Gaussian spatiotemporal concept described here, although with a different type of parameterization.
The scalespace models described in this article and our earlier work (Lindeberg 1997, 2001, 2011) unify these treatments into a joint framework and do also comprise new extensions in the following ways: (i) a new continuous timecausal scalespace model that respects forbidden access to future information, (ii) a time recursive update mechanism based on a limited temporal buffer, (iii) a better parameterization of the spatiotemporal filters with respect to image velocities and image deformations, and (iv) necessity results showing how these scalespace models can be uniquely determined from a small set of structural assumptions regarding an idealized vision system.
It should be emphasized, however, that the theoretical necessity results presented in this paper concern linear receptive fields. Characterizing nonlinear receptive fields is a much more complex issue, see Ringach (2004) for an overview of different approaches for mapping receptive fields. Nonlinear gain control mechanisms in the retina have been modelled and related to biological cell recordings by Schwartz et al. (2002). Nonlinear receptive fields in V1 have been investigated and modelled in more detail by Mechler and Ringach (2002), Touryan et al. (2002), Priebe et al. (2004), and Rust et al. (2005). During recent years, there has been some questioning of whether the taxonomy by Hubel and Wiesel into simple and complex cells corresponds to distinct classes or whether V1 cells have response properties along a continuum (Mechler and Ringach 2002). Bardy et al. (2006) have shown that the response properties of some classes of complex cells can be converted to putative simple cells depending on influences originating from the classical receptive field. The experimental results can, however, be strongly dependent on the experimental conditions (Kagan et al. 2002; Mata and Ringach 2005; Chen et al. 2002) and bimodal distributions have been found by Kagan et al. (2002), Ibbitson et al. (2005), and Chen et al. (2002). Moreover, Martinez and Alonso (2003) argue that a large body of neurophysiological evidence indicates that simple cells are a separate population from the total of cortical cells in cat visual cortex. In relation to the classification of complex cells, Kagan et al. (2002) have suggested that distinctions in the classification of complex cells should be made on whether the cells are dominated by magnocellular or parvocellular input. Martinez and Alonso (2003) have suggested that complex cells should be divided into firstorder complex cells that receive direct input from the LGN and secondorder complex cells that receive input from simple cells. More recently, Williams and Shapley (2007) have found spatial phasesensitive detectors in V1 that respond to contrast boundaries of one sign but not the opposite. Our knowledge about nonlinear cells in area V1 is therefore far from complete (Olshausen and Field 2004; Carandini et al. 2005).
The notion of a logarithmic brightness scale goes back to the Greek astronomer Hipparchus, who constructed a subjective scale for the brightness of stars in six steps labelled “1 ...6,” where the brightest stars were said to be of the first magnitude (\(m = 1\)) while the faintest stars near the limits of human perception were of the sixth magnitude. Later, when quantitative physical measurements were made possible of the intensities of different stars, it was noted that Hipparchus subjective scale did indeed correspond to a logarithmic scale. In astronomy today, the apparent brightness of stars is still measured on a logarithmic scale, although extended over a much wider span of intensity values. A logarithmic transformation of image intensities is also used in the retinex theory (Land 1974, 1986).
For a strictly positive entity \(z\), there are also information theoretic arguments to regard \(\log z\) as a default parameterization (Jaynes 1968). This property is essentially related to the fact that the ratio \(dz/z\) then becomes a dimensionless integration measure. A general recommendation of care should, however, be taken when using such reasoning based on dimensionality arguments, since important phenomena could be missed, e.g., in the presence of hidden variables. The physical modelling of the effect on illumination variation on receptive field measurements in Sect. 2.3 provides a formal justification for using a logarithmic brightness scale in this context as well as an additional contribution of showing how the receptive field measurements can be related to inherent physical properties of object surfaces in the environment.
9.2 Computer vision
In the area of computer vision, multiscale representations were first constructed by repeated smoothing and subsampling, leading to the notion of pyramids (Burt 1981; Crowley 1981; Burt and Adelson 1983; Crowley and Stern 1984; Crowley and Parker 1984; Crowley and Sanderson 1987).
Concerning the development of scalespace theory, Witkin (1983) proposed to treat scale as a continuous parameter and noted that Gaussian convolution leads to a decreasing number of zerocrossings or local extrema for a onedimensional signal. The first necessity results in the Western literature concerning the uniqueness of the Gaussian kernel for generating a linear scalespace representation were derived by Koenderink (1984) based on the assumption of causality, which means that isosurfaces in scale space should point with their convex side toward coarser scales. Related uniqueness results were presented by Babaud et al. (1986) and by Yuille and Poggio (1986).
Lindeberg (1990) showed how a reformulation of Koenderink’s causality requirement in terms of nonenhancement of local extrema in combination with the requirement of a semigroup structure could be used for deriving a scalespace theory for discrete signals. Corresponding necessity results concerning scalespace representations of continuous image data based were then presented in Lindeberg (1996). A cascade property was also used in the construction of binomial pyramids by Crowley (1981), Crowley and Stern (1984).
Florack and Haar Romeny (1992) proposed to the use of scale invariance as a basic scalespace axiom and Pauwels et al. (1995) showed that in combination with a semigroup structure, there exists a more general oneparameter family of (weak) scalespace kernels that obey these axioms, including the Poisson scale space studied by Felsberg and Sommer (2004), Duits et al. (2004) have investigated the properties of these scale spaces in detail and showed that the socalled \(\alpha \)scale spaces can be modelled by pseudopartial differential equations. Except for the Gaussian scale space contained in this class, these selfsimilar scale spaces do, however, not obey nonenhancement of local extrema.
Closely related axiomatic derivations of image processing operators based on scale invariance have also been given in the earlier Japanese literature (Iijima 1962; Weickert et al. 1999). Koenderink and Doorn (1992) showed that Gaussian derivative operators are natural operators to derive from a scalespace representation, given the assumption of scale invariance.
The connections between the strong regularizing properties of Gaussian convolution with Schwartz distribution theory have been pointed out by Florack et al. (1992).
Generalizations of rotationally symmetric smoothing operations to the affine Gaussian scalespace concept were introduced in (Lindeberg 1994b) and applied in (Lindeberg and Gårding 1997) for the computation of affine invariant image descriptors. Specifically, a mechanism of affine shape adaptation was proposed for reaching affine covariant interest points in affine scale space, and it was shown that the computation of such affineadapted image measurements improved the accuracy of laterstage processes in situations when there are significant perspective image deformations outside the similarity group. Baumberg (2000) and Schaffalitzky and Zisserman (2001) furthered this approach to wide baseline image matching. Mikolajczyk and Schmid (2004) proposed a more efficient algorithm and quantified its performance experimentally. Tuytelaars and Gool (2004) performed corresponding matching of widely separated views with application to object modelling. Related investigations of elongated directional filters over the spatial domain have been presented by Freeman and Adelson (1991); Simoncelli et al. (1992) and Perona (1992).
Scalespace representations of color information have been developed by Geusebroek et al. (2001) based on a Gaussian color model proposed by Koenderink, from which a set of differential color invariants were defined and by Hall et al. (2000) who computed firstorder partial derivatives of coloropponent channels and demonstrated the applicability of such features for object recognition. Linde and Lindeberg (2004, 2012) extended this idea by showing that highly discriminative image descriptors for object recognition can be obtained from spatiochromatic derivatives and differential invariants up to order two. More recently, Sande et al. (2010) have presented an evaluation of different colorbased image descriptors for recognition.
Concerning temporal scale spaces, Koenderink (1988) proposed the first scalespace concept that respects temporal causality, based on a logarithmic transformation of the time axis with the present moment as the origin. Such temporal smoothing filters have been considered in followup works by Florack (1997) and ter Haar Romeny et al. (2001). These approaches, however, appear to require infinite memory of the past and have so far not been developed for computational applications.
To handle time causality in a manner more suitable for realtime implementation, Lindeberg and Fagerström (1996) expressed a strictly timerecursive space–time separable spatiotemporal scalespace model based on the cascades of temporal scalespace kernels in terms of either truncated exponential functions or firstorder recursive filters, based on a characterization of onedimensional scalespace filters that guarantee noncreation of local extrema with increasing scale (Lindeberg 1990). These scale spaces were also time recursive in the sense that no extensive memory of the past was needed. Instead, a compact temporal buffer allowed for efficient computation of the temporal smoothing operation and temporal derivatives directly from a set of internal representations at different temporal scales. A closely related timerecursive computation of temporal derivatives has been used by Fleet and Langley (1995).
Lindeberg (1997) proposed a nonseparable spatiotemporal scalespace concept comprising the notion of velocityadapted derivatives for a continuous model based on a Gaussian spatiotemporal scalespace and for a semidiscrete timecausal model; see also Lindeberg (2001) for a more detailed description of the corresponding spatiotemporal scalespace theory. Velocity adaptation was applied to optic flow estimation by Nagel and Gehrke (1998) and was shown to improve the accuracy in optic flow estimates in a similar manner as affine shape adaptation improves the accuracy of image descriptors under perspective image deformations outside the similarity group. A closely related approach for optic flow computation with corresponding deformation of the image filters was developed by Florack et al. (1998). An extension of nonseparable spatiotemporal fields into timecausal velocityadapted recursive filters was given in (Lindeberg 2002).
Laptev and Lindeberg (2004b) investigated the use of families of velocityadapted filters for computing Galilean invariant image descriptors. Given an ensemble of spatiotemporal scalespace filters with different orientations in the space–time domain in a manner similar to Adelson and Bergen (1985), simultaneous adaptation to spatial scales, temporal scales, and image velocities was performed by a multiparameter scale selection mechanism over these parameters. Specifically, it was shown that the use of velocityadapted filters improved the separability between classes of spatiotemporal actions in situations when there are unknown relative motions between the objects and the observer. Generalizations of this approach to the context of Galilean invariant interest points were then presented in Lindeberg (2004) with an integrated Galilean invariant spatiotemporal recognition scheme in (Laptev et al. 2007).
Fagerström (2005) investigated selfsimilar temporal scalespace concepts derived from the assumptions of a semigroup structure combined with scale invariance, with an extension to the spatiotemporal domain in Fagerström (2007) that also comprises the notion of velocityadapted filters. Lindeberg (2011) gives a unified treatment of the scalespace axiomatics of linear, affine, and spatiotemporal scale space for continuous images based on the assumption of nonenhancement of local extrema over spatial and spatiotemporal domains, including more explicit statements of the uniqueness results regarding the Gaussian spatiotemporal scale space earlier outlined in Lindeberg (2001) and the application of nonenhancement of local extrema to a continuous timecausal and timerecursive spatiotemporal scale space.
10 Summary and conclusions
Neurophysiological recordings have shown that mammalian vision has developed receptive fields that are tuned to different sizes and orientations in the image domain as well as to different image velocities in space–time. A main message of this article has been to show that it is possible to derive such families of receptive field profiles by necessity, given a set of structural requirements on the first stages of visual processing as formalized into the notion of an idealized vision system. These structural requirements reflect structural properties of the world in terms of scale covariance, affine covariance, and Galilean covariance, which are natural to adapt to for a vision system that is to interact with the surrounding world in a successful manner. In a competition between different organisms, adaptation to these properties may constitute an evolutionary advantage.
The presented theoretical model provides a normative theory for deriving functional models of linear receptive fields based on Gaussian derivatives and closely related operators. In addition, a set of plausible mechanisms have been presented of how nonlinear receptive fields can be constructed from this theory, based on a generalized energy model. Specifically, the proposed theory can explain the different shapes of receptive field profiles that are found in biological vision from a requirement that the visual system should be able to compute covariant receptive field responses under the natural types of image transformations that occur in the environment, to enable the computation of invariant representations for perception at higher levels.
The proposed receptive field model has been related to Gabor functions, and we have presented several theoretical arguments for preferring a Gaussian derivative model or equivalently a formulation in terms of diffusion equations, with the shapes of the receptive fields parameterized by a spatial covariance matrix \(\varSigma \), an image velocity \(v\) and a temporal scale parameter \(\tau \), where the spatial covariance matrix \(\varSigma \) can also encompass the spatial scale parameter \(s\) depending on the choice of parameterization.
In the most idealized version of the theory, one can see the covariance matrix \(\varSigma \) in the diffusion equation and the image velocity \(v\) as locally constant within the support region of each receptive field, corresponding to a pure feedforward model. More generally, one can consider covariance matrices and image velocities that are locally adapted to the local image structures, leading to richer families of pseudolinear or nonlinear scale spaces, corresponding to topdown or feedback mechanisms in biological vision.
When the image data undergo natural image transformations due to variations in viewing distance, viewing direction, relative motion between the object and the observer or illumination variations, we can linearize the possibly nonlinear image transformations locally by derivatives (Jacobians), from which transformation properties in terms of the filter parameters (scale parameters, covariance matrices, and image velocities) of the receptive fields can be derived, provided that the family of receptive fields is closed under the relevant group or subgroup of image transformations in the tangent space, leading to an algebra of transformation properties of receptive fields. In this article, we have presented a coherent and unified framework for handling such locally linearized image transformations in terms of local scaling transformations, local affine transformations, local Galilean transformations, and local multiplicative intensity transformations, such that the influence of these image transformations on the receptive field responses can be well understood. More generally, the formulation of image primitives in terms of receptive field responses that are expressed in terms of scalespace derivatives makes it possible to use tools from differential geometry for deriving relationships between image features and physical properties of objects or events in the environment, thus allowing for computationally operational and theoretically wellfounded modelling of possibilities or constraints for visual perception.
We have also related the proposed approach to approaches for learning receptive field profiles from natural image statistics and argued that the presented model in such a context provides a normative theory for the solutions that an idealized learning system may reach if exposed to a sufficient large and representative set of natural image data. The presented theory can therefore be used for explaining why such learning approaches lead to qualitatively similar types of receptive fields.
Several of the theoretically derived receptive field profiles presented in this article have been successfully used in a large number of computer vision applications regarding feature detection, feature classification, stereo matching, motion estimation, shape analysis, and imagebased recognition. Hence, these receptive field profiles can generally serve as a basis for expressing a large number of visual operations and have empirically been shown to lead to robust algorithms. In this respect, a vision system based on these receptive field families allows for sharing of early visual modules between different higher level vision functionalities, which for a biological vision system can be motivated by efficiency of resource utilization.
The linear receptive fields obtained from this theory have been compared to receptive fields found by cell recordings in the LGN and simple cells in V1.
The proposed nonlinear quasiquadrature model has also been related to qualitatively similar properties observed for complex cells in V1.
A striking conclusion from the comparisons in Sect. 6 is that the receptive field profiles derived by the axiomatic theory in Sects. 3–5 are in very good qualitative agreement with receptive field profiles recorded in biological vision. Thus, we have a very good match between consequences of the theory and experimental data.
Furthermore, this indicates that the earliest receptive fields in higher mammal vision have reached a state that is very close to ideal in view of the stated structural requirements or symmetry properties. In this sense, biological vision can be seen as having adapted very well to the transformation properties of the surrounding world and the transformations that occur when a threedimensional world is projected onto a twodimensional image domain.
10.1 Applications to biological vision

The Gaussian and the timecausal receptive field families with their spatial and spatiotemporal derivative operators applied to luminance and coloropponent channels can be used for generating wider and more general families of receptive field profiles beyond those explicitly shown in the figures in this article. The idealized model for simple cells (116) comprises receptive fields of different orders of spatial and temporal differentiations, where a subset of combinations of spatial and spatiotemporal derivative operators has been demonstrated to lead to receptive field profiles in good qualitative agreement with receptive field profiles measured by cell recordings in biological vision. An interesting question concerns whether the existence of linear receptive fields corresponding to other combinations of spatial and spatiotemporal derivatives can be demonstrated, in particular when the receptive fields are measured as functions over two spatial dimensions and one temporal dimension and concerning the existence of receptive fields corresponding to higher orders of derivatives. Concerning spatiochromatic and spatiochromtemporal receptive fields, the models for doubleopponent receptive fields (110) and (117) are both based on rotationally symmetric Laplacians of Gaussians (alternatively differences of Gaussians) concerning the spatial dependencies. Another interesting question concerns whether biological vision implements nonsymmetric spatiochromatic receptive fields corresponding to, e.g., directional or partial derivatives of coloropponent channels as shown in Fig. 9, and whether or whether not tighter couplings could be established between the chromatic and temporal dimensions. Answering these questions would provide cues to what types of image structure the visual system explicitly responds to and therefore possibilities as well as limitations for perception. Hence, this theory may be used for generating predictions about new hitherto unnoticed or unreported receptive fields and for explaining their properties in terms of differential geometric measurements. This theory can also be used for raising questions about which animals have early receptive fields with properties compatible with general purpose visual operations according to the notion of an idealized visual front end.

Concerning orientation maps and population coding over image orientations and image velocities, the notion of multiparameter receptive field families over different spatial covariance matrices \(\varSigma \), image velocities \(v\), and temporal scales \(\tau \) raises questions of how the receptive fields in V1 are distributed over different orientations and directional tunings. Since receptive fields have been found with different degrees of spatial eccentricities, corresponding to different scale parameters in different directions, this raises questions of whether the distribution over different degrees of spatial elongation is such that it could be explained by a geometric model over spatial covariance matrices \(\varSigma _i\) corresponding to structural properties of the environment. More generally and as we have previously discussed in Sect. 6.6, given that we have a population of nonlinear receptive fields that are tuned to different spatial orientations and motion directions that respond according to an energy model, an interesting question concerns how to combine the responses of a set of such nonlinear receptive fields that respond at different spatial locations and tuned to different orientations and motion directions. Could a sufficient amount of cell recordings be gathered to answer the question of how this information should be combined from a population of such nonlinear detectors, e.g., for setting the relative weights for divisive normalization or by changing the conductivities in the diffusion equations that determine the properties of the underlying receptive fields. In connection with the foveal scalespace model in Sect. 7 and the dominance of receptive fields with a linearly increasing receptive field size as function of eccentricity found by cell recordings of retinal ganglion cells, it would also as discussed in at the end of Sect. 7 be interesting to know whether and where the existence of coarserscale receptive fields corresponding to the interior of the inverted cone in Fig. 36 could be established. In these and other ways, the presented mathematical framework for receptive fields could be used for expressing and raising questions about computational mechanisms.

The theoretical covariance properties of the associated scalespace concepts allow for explicit handling of invariance properties with respect to scale variations, image deformations, and relative motions. In computational models, such as neural networks, explicit incorporation of such transformation properties may be used for bypassing the need for an explicit training stage to learn corresponding invariance properties. From a biological standpoint, it appears natural that biological organisms should develop the possibility of having these transformations hardwired or softwired (the latter notion meaning that a set of initial connections being trimmed after birth), since these transformations are universal. In terms of receptive fields, these transformations will then correspond to certain parameter ranges of the scale parameters, determined by the statistics of natural images. This theory may therefore be more generally used for reducing or bypassing the need for explicit learning the spatial, spatiochromatic, and spatiotemporal response properties of early receptive fields in computational models of visual perception. In this respect, the presented theory could allow for lower needs for training data and a lower amount of computational resources in the training stage of computational vision models, by faster formation of receptive fields given a hardwired or softwired architecture. The theory may also imply higher robustness of early receptive fields in computational models and require less variability in the training data.

With regard to a possible biological implementation of this theory, the evolution properties of the presented scalespace models are governed by diffusion equations, which can be implemented by operations over neighborhoods. Hence, the computations can naturally be implemented in terms of connections between different cells. Diffusion equations are also used in mean field theory for approximating the computations that are performed by populations of neurons (Omurtag et al. 2000; Mattia and Guidice 2002; Faugeras et al. 2009). The generalized semigroup property (8) with the corresponding cascade property (9) possibly expressed for a multiparameter scale space and the diffusion equations in terms of infinitesimal generators (13) and (14) describe how receptive fields corresponding to different possibly multidimensional scale parameters can be related and hence how receptive fields at coarser scales can be computed from receptive fields at finer scales. In a neural network implementation, these relations can hence be used for setting the weights for communications between different cells. This theory also provides a framework for modelling and explaining the temporal dynamics of neural computations between cells at different levels of processing. In this respect, the theory naturally leads to a hierarchical architecture with explicit expressions for how receptive fields in the fovea can constitute the basis for receptive fields in the LGN and these in turn can be used for defining receptive fields in V1 and later stages in the visual cortex.
In this way, specific properties of specific organisms are suppressed (and not considered here because of reasons of scope). The approach is therefore more related to approaches in theoretical physics, where symmetry properties of the world are used as fundamentals in the formulation of physical theories. In the area of scalespace theory, these structural assumptions are referred as scalespace axioms.
Footnotes
 1.
Concerning notation, we will throughout use a notation similar to physics or mathematics, with scalars and vectors represented by lower case letters, \(a \in {\mathbb R}\) and \(x \in {\mathbb R}^2\), (without explicit notational overhead for vectors) and matrices represented by upper case letters, \(A\) or \(\varSigma \). Operators that act on functions will be represented by calligraphic symbols, \(\mathcal{T}\) and \(\mathcal{A}\), and we use either lower case or upper case letters for functions, \(f\) and \(L\). The overall convention is that the meaning of a symbol is defined the first time it is used.
 2.
In Eq. (1), the symbol “\(\cdot \)” at the position of the first argument of \(L\) is a place holder to emphasize that in this relation, \(L\) is regarded as a function and not evaluated with respect to its first argument \(x\). The following semicolon emphasizes the different natures of the image coordinates \(x\) and the filter parameters \(s\).
 3.
More precisely, we will assume that linearity should hold for some transformation \(f = z(I)\) of the original luminosity values \(I\) in units of local energy measurements. In Sect. 2.3 it will be shown that a logarithmic intensity mapping \(f \sim \log I\) is particularly attractive in this respect by allowing for invariance of receptive field responses under local multiplicative intensity transformations.
 4.
For us humans and other higher mammals, the retina is obviously not translationally invariant. Instead, finer scale receptive fields are concentrated to the fovea in such a way that the minimum receptive field size increases essentially linearly with eccentricity (see Sect. 7). With respect to such a sensor space, the assumption about translational invariance should be taken as an idealized model for the region in space where there are receptive fields above a certain size.
 5.
The symbol “\(\cdot \)” used as placeholder for the first argument of \(T\) and the argument of \(f\) in Eq. (4) indicate that the convolution operation “\(*\)” is performed over the corresponding variable.
 6.
With \(s = (s_1, \ldots , s_N)\) representing a multidimensional scale parameter \(s \in {\mathbb R}_+^N\), Eq. (7) should be interpreted as \(\lim _{s \downarrow 0} L(\cdot ;\; s) = \lim _{s \downarrow 0} \mathcal{T}_s f = f\) with \(s = \sqrt{s_1^2 + \cdots + s_N^2}\).
 7.
With \(s_1 = (s_{1,1}, \dots , s_{1,N})\) and \(s_2 = (s_{2,1}, \dots , s_{2,N})\) denoting two \(N\)dimensional scale parameters, the inequality \(s_2 \ge s_1\) should be interpreted as a requirement that the scale levels \(s_1\) and \(s_2\) have to be ordered in the sense that the increment \(u = s_2  s_1\) should correspond to a positive direction in parameter space that can be interpreted as increasing levels of scale. For example, for the affine spatial scalespace concept \(L(x;\; \varSigma )\) to be considered later in Sect. 3, which for twodimensional images \(f\) can be parameterized by positive semidefinite \(2 \times 2\) covariance matrices \(\varSigma \), the requirement of an ordered and positive scale direction \(u\) between the scalespace representations computed for two different covariance matrices \(\varSigma _1\) and \(\varSigma _2\) means that the difference between these covariance matrices \(\varSigma _u = \varSigma _2  \varSigma _1\) must be positive semidefinite. With the corresponding multidimensional scale parameters \(s_1\) and \(s_2\) expressed as vectors \(s_1 = (\varSigma _{1,11}, \varSigma _{1,12}, \varSigma _{1,22})\) and \(s_2 = (\varSigma _{2,11}, \varSigma _{2,12}, \varSigma _{2,22})\) where \(\varSigma _{k,ij}\) denote the elements of \(\varSigma _k\) for \(k = 1\) and \(2\), the condition for \(u = (u_1, u_2, u_3) = s_2  s_1\) to correspond to a positive direction in parameter space can therefore be expressed as \(u_1 u_3  u_2^2 \ge 0\) and \(u_1 + u_3 \ge 0\).
 8.
This constant brightness assumption is guaranteed to hold for a Lambertian reflectance model extended with a spatially varying albedo, if the surface pattern is subject to illumination that is constant over time for corresponding surface points, see Sect. 2.3 for a more detailed model of receptive field responses under illumination variations. If the illumination intensity or the orientation of the surface normal in relation to the light source varies over time, however, the constant brightness assumption may be violated, or if the reflectance model comprises nonLambertian, e.g., specular components. In such situations, a motion field computed from the optical flow obtained from the constant brightness assumption may therefore be different than the projected motion field of physical particles in the world. This situation can on the other hand be improved by instead applying a constancy assumption to spatial derivatives of the image intensity instead of the original zeroorder image intensity. As explained in Sect. 2.3, such an assumption will in the specific case of a logarithmic brightness scale cancel the influence of local multiplicative illumination variations. By furthermore applying the constancy assumption to the output from several derivative operators simultaneously and additionally combining this assumption with an assumption of local coherence of the motion, e.g., in terms of a low parameter motion model over local regions in image space, one may additionally address the ambiguity of the aperture problem, provided that the local region of image space at which the low parameter image model is applied contains a sufficiently rich distribution of image structures of different orientations. Otherwise, the aperture problem states that under the assumption of constant brightness of corresponding physical points over time, only the motion component that is parallel to the local image gradient can be computed. The notion a Reichardt detector (Reichardt 1961; Reichardt and Schögl 1988) also addresses this issue by delaycoupled receptive fields in the retina. For the purpose of describing motion selective and motionadapted properties of receptive fields, we shall, however, here for simplicity of presentation model temporal motions in terms of local Galilean transformations applied to image intensities, bearing in mind that this model can in a straightforward manner be transferred to the assumption of constancy of spatial derivative responses over time. Indeed, the spatiotemporal biological receptive fields that we shall describe in more detail in Sect. 6.3.2 do all support such a view by all comprising nonzero first, second, or third orders of spatial differentiation.
 9.
Note that the form of the vignetting effect may be different for lens systems composed of several lenses, and that lens systems are usually constructed to reduce the vignetting effect over some central part of the field of view. Notably, this natural vignetting effect will not be present with a spherical camera geometry, which is of high relevance with regard to biological vision.
 10.
To ensure sufficient differentiability properties such that an infinitesimal generator exists and the resulting multiscale representation obtained by convolution with the semigroup of convolution kernels can be differentiated with respect to both space and scale such that the requirement of nonenhancement of local extrema can be applied, we do formally for an \(N\)dimensional spatial domain require the semigroup \(\mathcal{T}_s\) to be \(C_1\)continuous such that \(\lim _{h \downarrow 0} \left\ \frac{1}{h} \int _{s = 0}^{h} \mathcal{T}(s) f \, ds  f \right\ _{H^k({\mathbb R}^N)} = 0\) should hold for some \(k > N/2\) and for all smooth functions \(f \in L^1({\mathbb R}^N) \cap C^{\infty }({\mathbb R}^N)\) with \(\Vert \cdot \Vert _{H^k({\mathbb R}^N)}\) denoting the \(L^2\)based Sobolev norm \( \Vert u \Vert _{H^k({\mathbb R}^N)} = \left( \int _{\omega \in {\mathbb R}^N} \left( 1 + \omega ^2 \right) ^k \, \hat{u}(\omega )^2 \hbox {d}\omega \right) ^{1/2}\) and \(\hat{u}\) denoting the Fourier transform of \(u\) over \({\mathbb R}^N\); see (Lindeberg (2011), Sect. 3.2 and “Appendix A”) regarding details.
 11.
With “rotational invariance at the group level” meaning that although a set of receptive fields may not be rotationally symmetric as individuals, a collection or a group of such receptive fields may nevertheless make it possible to generate rotationally invariant responses, for example if all orientations are explicitly represented or if the receptive fields of different orientations can be related by linear combinations.
 12.
To ensure sufficient differentiability properties such that an infinitesimal generator exists and the resulting multiscale representation obtained by convolution with the semigroup of convolution kernels can be differentiated with respect to both space–time and spatiotemporal scales such that the requirement of nonenhancement of local extrema can be applied, we do formally for an \(N+1\)dimensional space–time require the semigroup \(\mathcal{T}_s\) to be \(C_1\)continuous in the sense that \(\lim _{h \downarrow 0} \left\ \frac{1}{h} \int _{s = 0}^{h} \mathcal{T}(s) f \, ds  f \right\ _{H^k({\mathbb R}^N \times {\mathbb R})} = 0\) should hold for some \(k > (N+1)/2\) and for all smooth functions \(f \in L^1({\mathbb R}^N \times {\mathbb R}) \cap C^{\infty }({\mathbb R}^N \times {\mathbb R})\) with \(\Vert \cdot \Vert _{H^k({\mathbb R}^2 \times {\mathbb R})}\) denoting the \(L^2\)based Sobolev norm \( \Vert u \Vert _{H^k({\mathbb R}^N \times {\mathbb R})} = \left( \int _{\omega \in {\mathbb R}^N \times {\mathbb R}} \left( 1 + \omega ^2 \right) ^k \, \hat{u}(\omega )^2 \hbox {d}\omega \right) ^{1/2}\) and \(\hat{u}\) denoting the Fourier transform of \(u\) over \({\mathbb R}^N \times {\mathbb R}\); see (Lindeberg (2011), Sect. 3.2 and “Appendix A”) regarding details.
 13.
It can be shown that this definition is compatible with spatiotemporal scale invariance for scale selection based on local extrema over temporal scales of scalenormalized derivatives (manuscript in preparation). Specifically, the value \(\kappa = 1/2\) can be motivated both from theoretical considerations and agreement with biological receptive fields.
 14.
By the use of locally adapted feedback, the resulting evolution equation does not obey the original linearity and shift invariance (homogeneity) requirements used for deriving the idealized affine Gaussian receptive field model, if the covariance matrices \(\varSigma _0\) are determined from properties of the image data that are determined in a nonlinear way. For a fixed set of covariance matrices \(\varSigma _0\) at any image point, the evolution equation will still be linear and will specifically obey nonenhancement of local extrema. In this respect, the resulting model could be regarded as a simplest form of nonlinear extension of the linear receptive field model.
Notes
Acknowledgments
I would like to thank Benjamin Auffarth, Oskar Linde, and Prof. Per Roland for valuable discussions and comments.
The support from the Swedish Research Council, Vetenskapsrådet (contracts 20044680, 20104766) and from the Royal Swedish Academy of Sciences as well as the Knut and Alice Wallenberg Foundation is gratefully acknowledged.
References
 Adelson E, Bergen J (1985) Spatiotemporal energy models for the perception of motion. J Opt Soc Am A2:284–299Google Scholar
 Almansa A, Lindeberg T (2000) Fingerprint enhancement by shape adaptation of scalespace operators with automatic scaleselection. IEEE Trans Image Process 9(12):2027–2042PubMedGoogle Scholar
 Babaud J, Witkin AP, Baudin M, Duda RO (1986) Uniqueness of the Gaussian kernel for scalespace filtering. IEEE Trans Pattern Anal Mach Intell 8(1):3–26Google Scholar
 Bardy C, Huang JY, Wang C, Fitzgibbon T, Dreher B (2006) ‘Simplification’ of responses of complex cells in cat striate cortex; suppressive surrounds and ’feedback’ inactivation. J Physiol 574(3):731–750PubMedGoogle Scholar
 Baumberg A (2000) Reliable feature matching across widely separated views. In: Proceedings of the CVPR, Hilton Head, SC, vol I, pp 1774–1781Google Scholar
 Bay H, Ess A, Tuytelaars T, van Gool L (2008) Speeded up robust features (SURF). Comput Vis Image Underst 110(3):346–359Google Scholar
 Blasdel GG (1992) Orientation selectivity, preference and continuity in monkey striate cortex. J Neurosci 12(8):3139–3161PubMedGoogle Scholar
 Bonhoeffer T, Grinvald A (1991) Isoorientation domains in cat visual cortex are arranged in pinwheellike patterns. Nature 353:429–431PubMedGoogle Scholar
 Bonin V, Mante V, Carandini M (2005) The suppressive field of neurons in the lateral geniculate nucleus. J Neurosci 25(47):10844–10856PubMedGoogle Scholar
 Burghouts GJ, Geusebroek JM (2009) Performance evaluation of local colour invariants. Comput Vis Image Underst 113(1):48–62Google Scholar
 Burt PJ (1981) Fast filter transforms for image processing. Comput Vis Graph Image Process 16:20–51Google Scholar
 Burt PJ, Adelson EH (1983) The Laplacian pyramid as a compact image code. IEEE Trans Commun 9(4):532–540Google Scholar
 Carandini M, Demb JB, Mante V, Tolhurst DJ, Dan Y, Olshausen BA, Gallant JL, Rust NC (2005) Do we know what the early visual system does. J Neurosci 25(46):10577–10597PubMedGoogle Scholar
 Carslaw HS, Jaeger JC (1959) Conduction of heat in solids. Clarendon Press, OxfordGoogle Scholar
 Cavanaugh JR, Bair W, Movshon A (2001a) Nature and interaction of signals from the receptive field center and surround in Macaque V1 neurons. J Neurophysiol 88:2530–2546Google Scholar
 Cavanaugh JR, Bair W, Movshon A (2001b) Selectivity and spatial distribution of signals from the receptive field surround in Macaque V1 neurons. J Neurophysiol 88:2547–2556Google Scholar
 Chen Y, Anand S, MartinezConde S, Macknik SL, Bereshpolova Y, Swadlow HA, Alonso JM (2002) The linearity and selectivity of neuronal responses in awake visual cortex. J Vis 9(9):1–17Google Scholar
 Chomat O, de Verdiere V, Hall D, Crowley J (2000) Local scale selection for Gaussian based description techniques. In: Proceedings of the ECCV’00, Lecture Notes in Computer Science, vol 1842. Springer, Dublin, Ireland I:117–133Google Scholar
 Conway BR (2006) Spatial and temporal properties of cone signals in alert macaque primary visual cortex. J Neurosci 26(42):10826–10846PubMedGoogle Scholar
 Crowley JL (1981) A representation for visual information. Ph.D. Thesis, CarnegieMellon University, Robotics Institute, Pittsburgh, PennsylvaniaGoogle Scholar
 Crowley JL, Christensen HI (1994) Vision as process. Springer, HeidelbergGoogle Scholar
 Crowley JL, Parker AC (1984) A representation for shape based on peaks and ridges in the difference of lowpass transform. IEEE Trans Pattern Anal Mach Intell 6(2):156–170PubMedGoogle Scholar
 Crowley JL, Sanderson AC (1987) Multiple resolution representation and probabilistic matching of 2d grayscale shape. IEEE Trans Pattern Anal Mach Intell 9(1):113–121PubMedGoogle Scholar
 Crowley JL, Stern RM (1984) Fast computation of the difference of low pass transform. IEEE Trans Pattern Anal Mach Intell 6:212–222PubMedGoogle Scholar
 DeAngelis GC, Anzai A (2004) A modern view of the classical receptive field: Linear and nonlinear spatiotemporal processing by V1 neurons. In: Chalupa LM, Werner JS (eds) The visual neurosciences, vol 1. MIT Press, Cambridge, pp 704–719Google Scholar
 DeAngelis GC, Ohzawa I, Freeman RD (1995) Receptive field dynamics in the central visual pathways. Trends Neurosci 18(10):451–457PubMedGoogle Scholar
 Doi E, Lewicki MS (2005) Relations between the statistical regularities of natural images and the response properties of the early visual system. In: Japanese cognitive science society: Sig P & P. Kyoto University, pp 1–8Google Scholar
 Duits R, Florack L, de Graaf J (2004) On the axioms of scale space theory. J Math Imaging Vis 22:267–298Google Scholar
 Einhäuser W, König P (2010) Getting real—sensory processing of natural stimuli. Curr Opinn Neurobiol 20(3):389–395Google Scholar
 Fagerström D (2005) Temporal scalespaces. Int J Comput Vis 2–3:97–106Google Scholar
 Fagerström D (2007) Spatiotemporal scalespaces. In: Gallari F, Murli A, Paragios N (eds) Proceedings of the 1st international conference on scalespace theories and variational methods in computer vision, Lecture Notes in Computer Science, vol. 4485. Springer, pp 326–337Google Scholar
 Faugeras O, Toubol J, Cessac B (2009) A constructive meanfield analysis of multipopulation neural networks with random synaptic weights and stochastic inputs. Frontiers in Computational Neuroscience 3(1). doi: 10.3389/neuro.10.001.2009
 Felsberg M, Sommer G (2004) The monogenic scalespace: a unifying approach to phasebased image processing in scalespace. J Math Imaging Vis 21:5–26Google Scholar
 Felsen G, Touryan J, Han F, Dan Y (2005) Cortical sensitivity to visual features in natural scenes. PLoS Biol 3(10):e342PubMedGoogle Scholar
 Field DJ (1987) Relations between the statistics of natural images and the response properties of cortical cells. J Opt Soc Am 4:2379–2394Google Scholar
 Fleet DJ, Langley K (1995) Recursive filters for optical flow. IEEE Trans Pattern Anal Mach Intell 17(1):61–67Google Scholar
 Florack L, Niessen W, Nielsen M (1998) The intrinsic structure of optic flow incorporating measurement duality. Int J Comput Vis 27(3):263–286Google Scholar
 Florack LMJ (1997) Image structure. Series in Mathematical Imaging and Vision. Springer, BerlinGoogle Scholar
 Florack LMJ, ter Haar Romeny BM, Koenderink JJ, Viergever MA (1992) Images: regular tempered distributions. In: Ying Y, Toet A, Heijmanns H (eds) Proceedings NATO workshop ’Shape in Picture, NATO ASI Series F. Springer, New York, Driebergen, Netherlands, pp 651–659Google Scholar
 Florack LMJ, ter Haar Romeny BM (1992) Scale and the differential structure of images. Image Vis Comput 10(6):376–388Google Scholar
 Freeman WT, Adelson EH (1991) The design and use of steerable filters. IEEE Trans Pattern Anal Mach Intell 13(9):891–906Google Scholar
 Geisler WS (2008) Visual perception and the statistical properties of natural scenes. Annu Rev Psychol 59:10.1–10.26Google Scholar
 Geusebroek JM, van den Boomgaard R, Smeulders AWM, Geerts H (2001) Color invariance. IEEE Trans Pattern Anal Mach Intell 23(12):1338–1350Google Scholar
 Hall D, de Verdiere V, Crowley J (2000) Object recognition using coloured receptive fields. In: Proceedings of the ECCV’00, Lecture Notes in Computer Science, vol 1842. Springer, Dublin, Ireland I:164–177Google Scholar
 Hartline HK (1938) The response of single optic nerve fibers of the vertebrate eye to illumination of the retina. Am J Physiol 121:400–415Google Scholar
 Heeger DJ (1992) Normalization of cell responses in cat striate cortex. Vis Neurosci 9:181–197PubMedGoogle Scholar
 Hille E, Phillips RS (1957) Functional analysis and semigroups, vol XXXI. American Mathematical Society Colloquium Publications, USAGoogle Scholar
 Hirschmann II, Widder DV (1955) The convolution transform. Princeton University Press, PrincetonGoogle Scholar
 Horn BKP (1986) Robot vision. MIT Press, CambridgeGoogle Scholar
 Hubel DH, Wiesel TN (1959) Receptive fields of single neurones in the cat’s striate cortex. J Physiol 147:226–238PubMedGoogle Scholar
 Hubel DH, Wiesel TN (1962) Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. J Physiol 160:106–154PubMedGoogle Scholar
 Hubel DH, Wiesel TN (2005) Brain and visual perception: the story of a 25year collaboration. Oxford University Press, OxfordGoogle Scholar
 Hyvärinen A, Hurri J, Hoyer PO (2009) Natural image statistics: a probabilistic approach to early computational vision. Computational imaging and vision. Springer, BerlinGoogle Scholar
 Ibbitson MR, Price NSC, Crowder NA (2005) On the division of cortical cells into simple and complex types: a comparative viewpoint. J Neurophysiol 93:3699–3702Google Scholar
 Iijima T (1962) Observation theory of twodimensional visual patterns. Technical report. Papers of technical group on automata and automatic control, IECE, JapanGoogle Scholar
 Jaynes ET (1968) Prior probabilities. Trans Syst Sci Cybern 4(3):227–241Google Scholar
 Jones J, Palmer L (1987) An evaluation of the twodimensional Gabor filter model of simple receptive fields in cat striate cortex. J Neurophysiol 58:1233–1258PubMedGoogle Scholar
 Jones J, Palmer L (1987) The twodimensional spatial structure of simple receptive fields in cat striate cortex. J Neurophysiol 58:1187–1211PubMedGoogle Scholar
 Kagan I, Gur M, Snodderly DM (2002) Spatial organization of receptive fields of V1 neurons of alert monkeys: comparison with responses to gratings. J Neurophysiol 88:2557–2574PubMedGoogle Scholar
 Koch C (1999) Biophysics of computation: information processing in single neurons. Oxford University Press, OxfordGoogle Scholar
 Koenderink JJ (1984) The structure of images. Biol Cybern 50:363–370PubMedGoogle Scholar
 Koenderink JJ (1988) Scaletime. Biol Cybern 58:159–162Google Scholar
 Koenderink JJ, Kaeppers A, van Doorn AJ (1992) Local operations: the embodiment of geometry. In: Orban G, Nagel HH (eds) Artificial and biological vision systems, pp 1–23Google Scholar
 Koenderink JJ, van Doorn AJ (1978) Visual detection of spatial contrast; influence of location in the visual field, target extent and illuminance level. Biol Cybern 30:157–167PubMedGoogle Scholar
 Koenderink JJ, van Doorn AJ (1987) Representation of local geometry in the visual system. Biol Cybern 55:367–375PubMedGoogle Scholar
 Koenderink JJ, van Doorn AJ (1990) Receptive field families. Biol Cybern 63:291–298Google Scholar
 Koenderink JJ, van Doorn AJ (1992) Generic neighborhood operators. IEEE Trans Pattern Anal Mach Intell 14(6):597–605Google Scholar
 Kokkinos I, Yuille A (2008) Scale invariance without scale selection. In: Proceedings of the CVPR, pp 1–8Google Scholar
 Kuffler SW (1953) Discharge patterns and functional organization of mammalian retina. J Neurophysiol 16(1):37–68PubMedGoogle Scholar
 Land EH (1974) The retinex theory of colour vision. Proc R Inst Great Britain 57:23–58Google Scholar
 Land EH (1986) Recent advances in retinex theory. Vis Res 26(1):7–21PubMedGoogle Scholar
 Laptev I, Caputo B, Schuldt C, Lindeberg T (2007) Local velocityadapted motion events for spatiotemporal recognition. Comput Vis Image Underst 108:207–229Google Scholar
 Laptev I, Lindeberg T (2003) Space–time interest points. In: Proceedings of the 9th international conference on computer vision, Nice, France, pp 432–439Google Scholar
 Laptev I, Lindeberg, T (2004a) Local descriptors for spatiotemporal recognition. In: Proceedings of the ECCV’04 workshop on spatial coherence for visual motion analysis, Lecture Notes in Computer Science, vol 3667. Springer, Prague, Czech Republic, pp 91–103Google Scholar
 Laptev I, Lindeberg T (2004) Velocityadapted spatiotemporal receptive fields for direct recognition of activities. Image Vis Comput 22(2):105–116Google Scholar
 Lazebnik S, Schmid C, Ponce J (2005) A sparse texture representation using local affine regions. IEEE Trans Pattern Anal Mach Intell 27(8):1265–1278PubMedGoogle Scholar
 Lifshitz L, Pizer S (1990) A multiresolution hierarchical approach to image segmentation based on intensity extrema. IEEE Trans Pattern Anal Mach Intell 12:529–541Google Scholar
 Linde O, Lindeberg T (2004) Object recognition using composed receptive field histograms of higher dimensionality. In: International conference on pattern recognition, vol. 2, Cambridge, pp 1–6Google Scholar
 Linde O, Lindeberg T (2012) Composed complexcue histograms: an investigation of the information content in receptive field based image descriptors for object recognition. Comput Vis Image Underst 116:538–560Google Scholar
 Lindeberg T (1990) Scalespace for discrete signals. IEEE Trans Pattern Anal Mach Intell 12(3):234–254Google Scholar
 Lindeberg T (1994a) Scalespace theory: a basic tool for analysing structures at different scales. J Appl Stat 21(2):225–270. Also available from http://www.csc.kth.se/tony/abstracts/Lin94SIabstract.html Google Scholar
 Lindeberg T (1994) ScaleSpace Theory in Computer Vision. Springer, The Springer International Series in Engineering and Computer ScienceGoogle Scholar
 Lindeberg T (1996) On the axiomatic foundations of linear scalespace. In: Sporring J, Nielsen M, Florack L, Johansen P (eds) Gaussian scalespace theory: proceedings of the PhD School on scalespace theory. Springer, Copenhagen, DenmarkGoogle Scholar
 Lindeberg T (1997) Linear spatiotemporal scalespace. In: ter Haar Romeny BM, Florack LMJ, Koenderink JJ, Viergever MA (eds) Scalespace theory in computer vision: proceedings of the first international conference ScaleSpace’97, Lecture Notes in Computer Science, vol 1252. Springer, Utrecht, The Netherlands, pp 113–127. Extended version available as technical report ISRN KTH NA/P01/22SE from KTH.Google Scholar
 Lindeberg T (1997) On automatic selection of temporal scales in timecasual scalespace. In: Sommer G, Koenderink JJ (eds) Proceedings of the AFPAC’97: algebraic frames for the perceptionaction cycle, Lecture Notes in Computer Science vol 1315. Springer, Kiel, Germany, pp 94–113Google Scholar
 Lindeberg T (1998) Edge detection and ridge detection with automatic scale selection. Int J Comput Vis 30(2):117–154Google Scholar
 Lindeberg T (1998) Feature detection with automatic scale selection. Int J Comput Vis 30(2):77–116Google Scholar
 Lindeberg T (1999) Principles for automatic scale selection. In: Handbook on computer vision and applications. Academic Press, Boston, USA, pp 239–274. Also available from http://www.csc.kth.se/cvap/abstracts/cvap222.html
 Lindeberg T (2001) Linear spatiotemporal scalespace. report, ISRN KTH/NA/P01/22SE, Department of Numerical Analysis and Computing Science, KTHGoogle Scholar
 Lindeberg T (2002) Timerecursive velocityadapted spatiotemporal scalespace filters. In: Johansen P (ed) Proceedings of the ECCV’02, Lecture Notes in Computer Science, vol 2350. Springer, Copenhagen, Denmark, pp 52–67Google Scholar
 Lindeberg T (2008) Scalespace. In: Wah B (ed) Encyclopedia of computer science and engineering. Wiley, Hoboken, pp 2495–2504Google Scholar
 Lindeberg T (2011) Generalized Gaussian scalespace axiomatics comprising linear scalespace, affine scalespace and spatiotemporal scalespace. J Math Imaging Vis 40(1):36–81Google Scholar
 Lindeberg T (2013) Scale selection. In: Encyclopedia of computer vision. Springer (in press)Google Scholar
 Lindeberg T, Akbarzadeh A, Laptev I (2004) Galileancorrected spatiotemporal interest operators. In: International conference on pattern recognition, Cambridge, I:57–62Google Scholar
 Lindeberg T, Fagerström D (1996) Scalespace with causal time direction. In: Proceedings of the ECCV’96, vol 1064. Springer, Cambridge, UK, pp 229–240Google Scholar
 Lindeberg T, Florack L (1992) On the decrease of resolution as a function of eccentricity for a foveal vision system. report, ISRN KTH/NA/P92/29SE, Department of Numerical Analysis and Computing Science, KTHGoogle Scholar
 Lindeberg T, Florack L (1994) Foveal scalespace and linear increase of receptive field size as a function of eccentricity. report, ISRN KTH/NA/P94/27SE, Department of Numerical Analysis and Computing Science, KTH. Available from http://www.csc.kth.se/tony/abstracts/CVAP166.html
 Lindeberg T, Gårding J (1997) Shapeadapted smoothing in estimation of 3D depth cues from affine distortions of local 2D structure. Image Vis Comput 15:415–434Google Scholar
 Lörincz A, Palotal Z, Szirtes G (2012) Efficient sparse coding in early sensory processing: lessons from signal recovery. PLoS Comput Biol 8(3)(e1002372) doi: 10.1371/journal.pcbi.1002372
 Lowe D (1999) Object recognition from local scaleinvariant features. In: Proceedings of the 7th international conference on computer vision, Corfu, Greece, pp 1150–1157Google Scholar
 Lowe D (2004) Distinctive image features from scaleinvariant keypoints. Int J Comput Vis 60(2):91–110Google Scholar
 Marcelja S (1980) Mathematical description of the responses of simple cortical cells. J Opt Soc Am 70(11):1297–1300PubMedGoogle Scholar
 Martin PR, Grünert U (2004) Ganglion cells in mammalian retinae. In: Chalupa LM, Werner JS (eds) The visual neurosciences, vol 1. MIT Press, Cambridge, pp 410–421Google Scholar
 Martinez LM, Alonso JM (2003) Complex receptive fields in primary visual cortex. Neuroscientist 9(5):317–331PubMedGoogle Scholar
 Mata ML, Ringach DL (2005) Spatial overlap of ON and OFF subregions and its relation to response modulation ratio in Macaque primary visual cortex. J Neurophysiol 93:919–928PubMedGoogle Scholar
 Mattia M, Guidice PD (2002) Population dynamics of interacting spiking neurons. Phys Rev E 65(5):051917Google Scholar
 Mechler F, Ringach DL (2002) On the classification of simple and complex cells. Vis Res 22:1017–1033Google Scholar
 Mikolajczyk K, Schmid C (2004) Scale and affine invariant interest point detectors. Int J Comput Vis 60(1):63–86Google Scholar
 Nagel H, Gehrke A (1998) Spatiotemporal adaptive filtering for estimation and segmentation of optical flow fields. In: Proceedings of the ECCV’98. Springer, Freiburg, Germany, pp 86–102Google Scholar
 Olshausen BA, Field DJ (1996) Emergence of simplecell receptive field properties by learning a sparse code for natural images. J Optl Soc Am 381:607–609Google Scholar
 Olshausen BA, Field DJ (2004) What is the other 85 % of V1 doing. In: Sejnowski TJ, van Hemmen L (eds) Problems in systems neuroscience. Oxford University Press, OxfordGoogle Scholar
 Omurtag A, Knight BW, Sirovich L (2000) On the simulation of large populations of neurons. J Comput Neurosci 8:51–63PubMedGoogle Scholar
 Palmer SE (1999) Vision science: photons to phenomenology first edition. MIT Press, CambridgeGoogle Scholar
 Pauwels EJ, Fiddelaers P, Moons T, van Gool LJ (1995) An extended class of scaleinvariant and recursive scalespace filters. IEEE Trans Pattern Anal Mach Intell 17(7):691–701Google Scholar
 Pazy A (1983) Semigroups of linear operators and applications to partial differential equations. Applied Mathematical Sciences. Springer, BerlinGoogle Scholar
 Perona P (1992) Steerablescalable kernels for edge detection and junction analysis. Image Vis Comput 10:663–672Google Scholar
 Priebe NJ, Mechler F, Carandini M, Ferster D (2004) The contribution of spike threshold to the dichotomy of cortical simple and complex cells. Nat Neurosci 7(10):1113–1122PubMedGoogle Scholar
 Rao RPN, Ballard DH (1998) Development of localized oriented receptive fields by learning a translationinvariant code for natural images. Comput Neural Syst 9(2):219–234Google Scholar
 Reichardt WE (1961) Autocorrelation: a principle for the evaluation of sensory information by the central nervous system. In: Rosenblith WA (ed) Sensory communication. MIT Press, Cambridge, pp 303–317Google Scholar
 Reichardt WE, Schögl RW (1988) A two dimensional field theory for motion computation. Biol Cybern 60:23–35PubMedGoogle Scholar
 Ringach DL (2002) Spatial structure and symmetry of simplecell receptive fields in macaque primary visual cortex. J Neurophysiol 88:455–463PubMedGoogle Scholar
 Ringach DL (2004) Mapping receptive fields in primary visual cortex. J Physiol 558(3):717–728PubMedGoogle Scholar
 Ringach DL, Bredfeldt CE, Shapley RM, Hawken MJ (2002) Suppression of neural responses to nonoptimal stimuli correlates with tuning selectivity in Macaque V1. J Neurophysiol 87: 1018–1027Google Scholar
 Ringach DL, Hawken MJ, Shapley R (2002) Receptive field structure of neurons in monkey primary visual cortex revealed by stimulation with natural image sequences. J Vis 2(1):12–24PubMedGoogle Scholar
 Rodieck RW (1965) Quantitative analysis of cat retinal ganglion cell response to visual stimuli. Vis Res 5(11):583–601PubMedGoogle Scholar
 Rothganger F, Lazebnik S, Schmid C, Ponce J (2006) 3D object modeling and recognition using local affineinvariant image descriptors and multiview spatial constraints. Int J Comput Vis 66(3):231–259Google Scholar
 Rust NC, Schwartz O, Movshon JA, Simoncelli EP (2005) Spatiotemporal elements of V1 receptive fields. Neuron 46(6):945–956PubMedGoogle Scholar
 Schaffalitzky F, Zisserman A (2001) Viewpoint invariant texture matching and wide baseline stereo. In: Proceedings of the 8th international conference on computer vision, Vancouver, Canada, II:636–643Google Scholar
 Schiele B, Crowley J (1996) Object recognition using multidimensional receptive field histograms. In: Proceedings of the ECCV’96, Lecture Notes in Computer Science, vol 1064. Springer, Cambridge, UK, pp 610–619Google Scholar
 Schiele B, Crowley J (2000) Recognition without correspondence using multidimensional receptive field histograms. Int J Comput Vis 36(1):31–50Google Scholar
 Schwartz O, Chichilnsky EJ, Simoncelli EP (2002) Characterizing neural gain control using spiketriggered covariance. In: Dietterich TG, Becker S, Ghahramani Z (eds) Advances in neural information processing systems, vol 14. MIT Press, Cambridge, pp 269–276Google Scholar
 Schwartz O, Simoncelli EP (2001) Natural signal statistics and sensory gain control. Nat Neurosci 4:819–825PubMedGoogle Scholar
 Sherrington CS (1906) The integrative action of the nervous system. C Scribner and Sons, New YorkGoogle Scholar
 Simoncelli EP, Freeman WT, Adelson EH, Heeger DJ (1992) Shiftable multiscale transforms. IEEE Trans Inf Theory 38(2)Google Scholar
 Simoncelli EP, Olshausen BA (2001) Natural image statistics and neural representations. Annu Rev Neurosci 24:1193–1216PubMedGoogle Scholar
 Somers DC, Nelson SB, Sur M (1995) An emergent model of orientation selectivity in cat visual cortical simple cells. J Neurosci 15(8):5448–5465PubMedGoogle Scholar
 Sompolinsky H, Shapley R (1997) New perspectives on the mechanisms for orientation selectivity. Curr Opin Neurobiol 7:514–522PubMedGoogle Scholar
 Sporring J, Nielsen M, Florack L, Johansen P (eds) (1996) Gaussian ScaleSpace Theory: Proc. PhD School on ScaleSpace Theory. Series in Mathematical Imaging and Vision. Springer, Copenhagen, DenmarkGoogle Scholar
 Stork DG, Wilson HR (1990) Do Gabor functions provide appropriate descriptions of visual cortical receptive fields. J Opt Soc Am 7(8):1362–1373 Google Scholar
 ter Haar Romeny B, Florack L, Nielsen, M (2001) Scaletime kernels and models. In: Scalespace and morphology: proceedings of the scalespace’01, Lecture Notes in Computer Science. Springer, Vancouver, CanadaGoogle Scholar
 ter Haar Romeny B (2003) Frontend vision and multiscale image analysis. Springer, BerlinGoogle Scholar
 Touryan J, Lau B, Dan Y (2002) Isolation of relevant visual features from random stimuli for cortical complex cells. J Neurosci 22(24):10811–10818PubMedGoogle Scholar
 Tsotsos J (1995) Modeling visual attention via selective tuning. Artif Intell 78(1–2):507–545Google Scholar
 Tuytelaars T, van Gool L (2004) Matching widely separated views based on affine invariant regions. Int J Comput Vis 59(1):61–85Google Scholar
 Valois RLD, Cottaris NP, Mahon LE, Elfer SD, Wilson JA (2000) Spatial and temporal receptive fields of geniculate and cortical cells and directional selectivity. Vis Res 40(2):3685–3702PubMedGoogle Scholar
 van der Schaaf, van Hateren JH (1996) Modelling the power spectra of natural images: statistics and information. Vis Res 36(17):2759–2770PubMedGoogle Scholar
 van de Sande KEA, Gevers T, Snoek CGM (2010) Evaluating color descriptors for object and scene recognition. IEEE Trans Pattern Anal Mach Intell 32(9):1582–1596PubMedGoogle Scholar
 Wässle H (2004) Parallel processing in the mammalian retina. Nat Rev Neurosci 5:747–757PubMedGoogle Scholar
 Watanabe M, Rodieck RW (1989) Parasol and midget ganglion cells in the primate retina. J Comput Neurol 289:434–454Google Scholar
 Weickert J (1998) Anisotropic diffusion in image processing. TeubnerVerlag, StuttgartGoogle Scholar
 Weickert J, Ishikawa S, Imiya A (1999) Linear scalespace has first been proposed in Japan. J Math Imaging and Vis 10(3):237–252Google Scholar
 Willems G, Tuytelaars T, van Gool L (2008) An efficient dense and scaleinvariant spatiotemporal interest point detector. In: Proceedings of the ECCV’08, Lecture Notes in Computer Science, vol 5303. Springer, Marseille, France, pp 650–663Google Scholar
 Williams PE, Shapley RM (2007) A dynamic nonlinearity and spatial phase specificity in macaque V1 neurons. J Neurosci 27:5706–5718Google Scholar
 Witkin AP (1983) Scalespace filtering. In: Proceedings of the 8th international joint conference on artificial intelligence, Karlsruhe, Germany, pp 1019–1022Google Scholar
 Young RA (1987) The Gaussian derivative model for spatial vision: I. Retinal mechanisms. Spatial Vis 2:273–293Google Scholar
 Young RA, Lesperance RM (2001) The Gaussian derivative model for spatiotemporal vision: II. Cortical data. Spatial Vis 14(3,4):321–389Google Scholar
 Young RA, Lesperance RM, Meyer WW (2001) The Gaussian derivative model for spatiotemporal vision: I. Cortical model. Spatial Vis 14(3,4):261–319Google Scholar
 Yuille AL, Poggio TA (1986) Scaling theorems for zerocrossings. IEEE Trans Pattern Anal Mach Intell 8:15–25PubMedGoogle Scholar
 ZelnikManor L, Irani M (2001) Eventbased analysis of video. In: Proceedings of the CVPR, Kauai Marriott, Hawaii, II:123–130Google Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.