TimeCausal and TimeRecursive SpatioTemporal Receptive Fields
 2.2k Downloads
 11 Citations
Abstract
We present an improved model and theory for timecausal and timerecursive spatiotemporal receptive fields, obtained by a combination of Gaussian receptive fields over the spatial domain and firstorder integrators or equivalently truncated exponential filters coupled in cascade over the temporal domain. Compared to previous spatiotemporal scalespace formulations in terms of nonenhancement of local extrema or scale invariance, these receptive fields are based on different scalespace axiomatics over time by ensuring noncreation of new local extrema or zerocrossings with increasing temporal scale. Specifically, extensions are presented about (i) parameterizing the intermediate temporal scale levels, (ii) analysing the resulting temporal dynamics, (iii) transferring the theory to a discrete implementation in terms of recursive filters over time, (iv) computing scalenormalized spatiotemporal derivative expressions for spatiotemporal feature detection and (v) computational modelling of receptive fields in the lateral geniculate nucleus (LGN) and the primary visual cortex (V1) in biological vision. We show that by distributing the intermediate temporal scale levels according to a logarithmic distribution, we obtain a new family of temporal scalespace kernels with better temporal characteristics compared to a more traditional approach of using a uniform distribution of the intermediate temporal scale levels. Specifically, the new family of timecausal kernels has much faster temporal response properties (shorter temporal delays) compared to the kernels obtained from a uniform distribution. When increasing the number of temporal scale levels, the temporal scalespace kernels in the new family do also converge very rapidly to a limit kernel possessing true selfsimilar scaleinvariant properties over temporal scales. Thereby, the new representation allows for true scale invariance over variations in the temporal scale, although the underlying temporal scalespace representation is based on a discretized temporal scale parameter. We show how scalenormalized temporal derivatives can be defined for these timecausal scalespace kernels and how the composed theory can be used for computing basic types of scalenormalized spatiotemporal derivative expressions in a computationally efficient manner.
Keywords
Scale space Receptive field Scale Spatial Temporal Spatiotemporal Scalenormalized derivative Scale invariance Differential invariant Natural image transformations Feature detection Computer vision Computational modelling Biological vision1 Introduction
Spatiotemporal receptive fields constitute an essential concept for describing neural functions in biological vision [11, 12, 31, 32, 33] and for expressing computer vision methods on video data [1, 35, 43, 88, 99].
For offline processing of prerecorded video, noncausal Gaussian or Gaborbased spatiotemporal receptive fields may in some cases be sufficient. When operating on video data in a realtime setting or when modelling biological vision computationally, one does however need to take into explicit account the fact that the future cannot be accessed and that the underlying spatiotemporal receptive fields must therefore be timecausal, i.e. the image operations should only require access to image data from the present moment and what has occurred in the past. For computational efficiency and for keeping down memory requirements, it is also desirable that the computations should be timerecursive, so that it is sufficient to keep a limited memory of the past that can be recursively updated over time.
The subject of this article is to present an improved temporal scalespace model for spatiotemporal receptive fields based on timecausal temporal scalespace kernels in terms of firstorder integrators or equivalently truncated exponential filters coupled in cascade, which can be transferred to a discrete implementation in terms of recursive filters over discretized time. This temporal scalespace model will then be combined with a Gaussian scalespace concept over continuous image space or a genuinely discrete scalespace concept over discrete image space, resulting in both continuous and discrete spatiotemporal scalespace concepts for modelling timecausal and timerecursive spatiotemporal receptive fields over both continuous and discrete spatiotemporal domains. The model builds on previous work by Fleet and Langley [20], Lindeberg and Fagerström [66], Lindeberg [56, 57, 58, 59] and is here complemented by (i) a better design for the degrees of freedom in the choice of time constants for the intermediate temporal scale levels from the original signal to any higher temporal scale level in a cascade structure of temporal scalespace representations over multiple temporal scales, (ii) an analysis of the resulting temporal response dynamics, (iii) details for discrete implementation in a spatiotemporal visual frontend, (iv) details for computing spatiotemporal image features in terms of scalenormalized spatiotemporal differential expressions at different spatiotemporal scales and (v) computational modelling of receptive fields in the lateral geniculate nucleus (LGN) and the primary visual cortex (V1) in biological vision.
In previous use of the temporal scalespace model by Lindeberg and Fagerström [66], a uniform distribution of the intermediate scale levels has mostly been chosen when coupling firstorder integrators or equivalently truncated exponential kernels in cascade. By instead using a logarithmic distribution of the intermediate scale levels, we will here show that a new family of temporal scalespace kernels can be obtained with much better properties in terms of (i) faster temporal response dynamics and (ii) fast convergence towards a limit kernel that possesses true scaleinvariant properties (selfsimilarity) under variations in the temporal scale in the input data. Thereby, the new family of kernels enables (i) significantly shorter temporal delays (as always arise for truly timecausal operations), (ii) much better computational approximation to true temporal scale invariance and (iii) computationally much more efficient numerical implementation. Conceptually, our approach is also related to the timecausal scaletime model by Koenderink [39], which is here complemented by a truly timerecursive formulation of timecausal receptive fields more suitable for realtime operations over a compact temporal buffer of what has occurred in the past, including a theoretically wellfounded and computationally efficient method for discrete implementation.
Specifically, the rapid convergence of the new family of temporal scalespace kernels to a limit kernel when the number of intermediate temporal scale levels tends to infinity is theoretically very attractive, since it provides a way to define truly scaleinvariant operations over temporal variations at different temporal scales, and to measure the deviation from true scale invariance when approximating the limit kernel by a finite number of temporal scale levels. Thereby, the proposed model allows for truly selfsimilar temporal operations over temporal scales while using a discretized temporal scale parameter, which is a theoretically new type of construction for temporal scale spaces.
Based on a previously established analogy between scalenormalized derivatives for spatial derivative expressions and the interpretation of scale normalization of the corresponding Gaussian derivative kernels to constant \(L_p\)norms over scale [53], we will show how scaleinvariant temporal derivative operators can be defined for the proposed new families of temporal scalespace kernels. Then, we will apply the resulting theory for computing basic spatiotemporal derivative expressions of different types and describe classes of such spatiotemporal derivative expressions that are invariant or covariant to basic types of natural image transformations, including independent rescaling of the spatial and temporal coordinates, illumination variations and variabilities in exposure control mechanisms.
In these ways, the proposed theory will present previously missing components for applying scalespace theory to spatiotemporal input data (video) based on truly timecausal and timerecursive image operations.
A conceptual difference between the timecausal temporal scalespace model that is developed in this paper and Koenderink’s fully continuous scaletime model [39] or the fully continuous timecausal semigroup derived by Fagerström [16] and Lindeberg [56] is that the presented timecausal scalespace model will be semidiscrete, with a continuous time axis and discretized temporal scale parameter. This semidiscrete theory can then be further discretized over time (and for spatiotemporal image data also over space) into a fully discrete theory for digital implementation. The reason why the temporal scale parameter has to be discrete in this theory is that according to theoretical results about variation diminishing linear transformations by Schoenberg [81, 82, 83, 84, 85, 86, 87] and Karlin [36] that we will build upon, there is no continuous parameter semigroup structure or continuous parameter cascade structure that guarantees noncreation of new structures with increasing temporal scale in terms of noncreation of new local extrema or new zerocrossings over a continuum of increasing temporal scales.
When discretizing the temporal scale parameter into a discrete set of temporal scale levels, we do however show that there exists such a discrete parameter semigroup structure in the case of a uniform distribution of the temporal scale levels and a discrete parameter cascade structure in the case of a logarithmic distribution of the temporal scale levels, which both guarantee noncreation of new local extrema or zerocrossings with increasing temporal scale. In addition, the presented semidiscrete theory allows for an efficient timerecursive formulation for realtime implementation based on a compact temporal buffer, which Koenderink’s scaletime model [39] does not, and much better temporal dynamics than the timecausal semigroup previously derived by Fagerström [16] and Lindeberg [56].
Specifically, we argue that if the goal is to construct a vision system that analyses continuous video streams in real time, as is the main scope of this work, a restriction of the theory to a discrete set of temporal scale levels with the temporal scale levels determined in advance before the image data are sampled over time is less of a practical constraint, since the vision system anyway has to be based on a finite amount of sensors and hardware/wetware for sampling and processing the continuous stream of image data.
1.1 Structure of this Article
To give the contextual overview to this work, Sect. 2 starts by presenting a previously established computational model for spatiotemporal receptive fields in terms of spatial and temporal scalespace kernels, based on which we will replace the temporal smoothing step.
Section 3 starts by reviewing previously theoretical results for temporal scalespace models based on the assumption of noncreation of new local extrema with increasing scale, showing that the canonical temporal operators in such a model are firstorder integrators or equivalently truncated exponential kernels coupled in cascade. Relative to previous applications of this idea based on a uniform distribution of the intermediate temporal scale levels, we present a conceptual extension of this idea based on a logarithmic distribution of the intermediate temporal scale levels, and show that this leads to a new family of kernels that have faster temporal response properties and correspond to more skewed distributions with the degree of skewness determined by a distribution parameter c.
Section 4 analyses the temporal characteristics of these kernels and shows that they lead to faster temporal characteristics in terms of shorter temporal delays, including how the choice of distribution parameter c affects these characteristics. In Sect. 5, we present a more detailed analysis of these kernels, with emphasis on the limit case when the number of intermediate scale levels K tends to infinity, and making constructions that lead to true selfsimilarity and scale invariance over a discrete set of temporal scaling factors.
Section 6 shows how these spatial and temporal kernels can be transferred to a discrete implementation while preserving scalespace properties also in the discrete implementation and allowing for efficient computations of spatiotemporal derivative approximations. Section 7 develops a model for defining scalenormalized derivatives for the proposed temporal scalespace kernels, which also leads to a way of measuring how far from the scaleinvariant timecausal limit kernel a particular temporal scalespace kernel is when using a finite number K of temporal scale levels.
In Sect. 8, we combine these components for computing spatiotemporal features defined from different types of spatiotemporal differential invariants, including an analysis of their invariance or covariance properties under natural image transformations, with specific emphasis on independent scalings of the spatial and temporal dimensions, illumination variations and variations in exposure control mechanisms. Finally, Sect. 9 concludes with a summary and discussion, including a description about relations and differences to other temporal scalespace models.
To simplify the presentation, we have put some of the theoretical analysis in the appendix. Appendix 1 presents a frequency analysis of the proposed timecausal scalespace kernels, including a detailed characterization of the limit case when the number of temporal scale levels K tends to infinity and explicit expressions their moment (cumulant) descriptors up to order four. Appendix 2 presents a comparison with the temporal kernels in Koenderink’s scaletime model, including a minor modification of Koenderink’s model to make the temporal kernels normalized to unit \(L_1\)norm and a mapping between the parameters in his model (a temporal offset \(\delta \) and a dimensionless amount of smoothing \(\sigma \) relative to a logarithmic time scale) and the parameters in our model (the temporal variance \(\tau \), a distribution parameter c and the number of temporal scale levels K) including graphs of similarities vs. differences between these models. Appendix 3 shows that for the temporal scalespace representation given by convolution with the scaleinvariant timecausal limit kernel, the corresponding scalenormalized derivatives become fully scale covariant/invariant for temporal scaling transformations that correspond to exact mappings between the discrete temporal scale levels.

the theory that implies that the temporal scales are implied to be discrete (Sects. 3.1–3.2),

more detailed modelling of biological receptive fields (Sect. 3.6),

the construction of a truly selfsimilar and scaleinvariant timecausal limit kernel (Sect. 5),

theory for implementation in terms of discrete timecausal scalespace kernels (Sect. 6.1),

details concerning more rotationally symmetric implementation over spatial domain (Sect. 6.3),

definition of scalenormalized temporal derivatives for the resulting timecausal scalespace (Sect. 7),

a framework for spatiotemporal feature detection based on timecausal and timerecursive spatiotemporal scale space, including scale normalization as well as covariance and invariance properties under natural image transformations and experimental results (Sect. 8),

a frequency analysis of the timecausal and timerecursive scalespace kernels (Appendix 1),

a comparison between the presented semidiscrete model and Koenderink’s fully continuous model, including comparisons between the temporal kernels in the two models and a mapping between the parameters in our model and Koenderink’s model (Appendix 2) and

a theoretical analysis of the evolution properties over scales of temporal derivatives obtained from the timecausal limit kernel, including the scaling properties of the scale normalization factors under \(L_p\)normalization and a proof that the resulting scalenormalized derivatives become scale invariant/covariant (Appendix 3).
2 SpatioTemporal Receptive Fields

\(x = (x_1, x_2)^T\) denotes the image coordinates,

t denotes time,

s denotes the spatial scale,

\(\tau \) denotes the temporal scale,

\(v = (v_1, v_2)^T\) denotes a local image velocity,

\({\varSigma }\) denotes a spatial covariance matrix determining the spatial shape of an affine Gaussian kernel \(g(x;\; s, {\varSigma }) = \frac{1}{2 \pi s \sqrt{\det {\varSigma }}} \mathrm{e}^{x^T {\varSigma }^{1} x/2s}\),

\(g(x_1  v_1 t, x_2  v_2 t;\; s, {\varSigma })\) denotes a spatial affine Gaussian kernel that moves with image velocity \(v = (v_1, v_2)\) in spacetime and

\(h(t;\; \tau )\) is a temporal smoothing kernel over time.
For simplicity, we shall here restrict the above family of affine Gaussian kernels over the spatial domain to rotationally symmetric Gaussians of different size s, by setting the covariance matrix \({\varSigma }\) to a unit matrix. We shall also mainly restrict ourselves to spacetime separable receptive fields by setting the image velocity v to zero.
A conceptual difference that we shall pursue is by relaxing the requirement of a semigroup structure over a continuous temporal scale parameter in the above axiomatic derivations by a weaker Markov property over a discrete temporal scale parameter. We shall also replace the previous axiom about noncreation of new image structures with increasing scale in terms of nonenhancement of local extrema (which requires a continuous scale parameter) by the requirement that the temporal smoothing process, when seen as an operation along a onedimensional temporal axis only, must not increase the number of local extrema or zerocrossings in the signal. Then, another family of timecausal scalespace kernels becomes permissible and uniquely determined, in terms of firstorder integrators or truncated exponential filters coupled in cascade.
The main topics of this paper are to handle the remaining degrees of freedom resulting from this construction about (i) choosing and parameterizing the distribution of temporal scale levels, (ii) analysing the resulting temporal dynamics, (iii) describing how this model can be transferred to a discrete implementation over discretized time, space or both while retaining discrete scalespace properties, (iv) using the resulting theory for computing scalenormalized spatiotemporal derivative expressions for purposes in computer vision and (v) computational modelling of biological vision.
3 TimeCausal Temporal ScaleSpace
When constructing a system for realtime processing of sensor data, a fundamental constraint on the temporal smoothing kernels is that they have to be timecausal. The ad hoc solution of using a truncated symmetric filter of finite temporal extent in combination with a temporal delay is not appropriate in a timecritical context. Because of computational and memory efficiency, the computations should furthermore be based on a compact temporal buffer that contains sufficient information for representing the sensor information at multiple temporal scales and computing features therefrom. Corresponding requirements are necessary in computational modelling of biological perception.
3.1 TimeCausal ScaleSpace Kernels for Pure Temporal Domain
Following Lindeberg [45], let us further define a scalespace kernel as a kernel that guarantees that the number of local extrema in the convolved signal can never exceed the number of local extrema in the input signal. Equivalently, this condition can be expressed in terms of the number of zerocrossings in the signal. Following Lindeberg and Fagerström [66], let us additionally define a temporal scalespace kernel as a kernel that both satisfies the temporal causality requirement \(h(t;\; \tau ) = 0\) if \(t< 0\) and guarantees that the number of local extrema does not increase under convolution. If both the raw transformation kernels \(h(u;\ \tau )\) and the cascade kernels \((\Delta h)(t;\; \tau _1 \mapsto \tau _2)\) are scalespace kernels, we do hence guarantee that the number of local extrema in \(L(t;\; \tau _2)\) can never exceed the number of local extrema in \(L(t;\; \tau _1)\). If the kernels \(h(u;\ \tau )\) and additionally the cascade kernels \((\Delta h)(t;\; \tau _1 \mapsto \tau _2)\) are temporal scalespace kernels, these kernels do hence constitute natural kernels for defining a temporal scalespace representation.
3.2 Classification of ScaleSpace Kernels for Continuous Signals
Interestingly, the classes of scalespace kernels and temporal scalespace kernels can be completely classified based on classical results by Schoenberg and Karlin regarding the theory of variation diminishing linear transformations. Schoenberg studied this topic in a series of papers over about 20 years [81, 82, 83, 84, 85, 86, 87], and Karlin [36] then wrote an excellent monograph on the topic of total positivity.
3.3 Temporal ScaleSpace Kernels Over Continuous Temporal Domain
In the above expressions, the first class of scalespace kernels (8) corresponds to using a noncausal Gaussian scalespace concept over time, which may constitute a straightforward model for analysing prerecorded temporal data in an offline setting where temporal causality is not critical and can be disregarded by the possibility of accessing the virtual future in relation to any prerecorded time moment.
3.4 Distributions of the Temporal Scale Levels
When implementing this temporal scalespace concept, a set of intermediate scale levels \(\tau _k\) has to be distributed between some minimum and maximum scale levels \(\tau _\mathrm{min} = \tau _1\) and \(\tau _\mathrm{max} = \tau _K\). Next, we will present three ways of discretizing the temporal scale parameter over K temporal scale levels.
Logarithmic Memory of the Past When using a logarithmic distribution of the temporal scale levels according to either of the last two methods, the different levels in the temporal scalespace representation at increasing temporal scales will serve as a logarithmic memory of the past, with qualitative similarity to the mapping of the past onto a logarithmic time axis in the scaletime model by Koenderink [39]. Such a logarithmic memory of the past can also be extended to later stages in the visual hierarchy.
3.5 Temporal Receptive Fields
In general, these kernels are all highly asymmetric for small values of K, whereas the kernels based on a uniform distribution of the intermediate temporal scale levels become gradually more symmetric around the temporal maximum as K increases. The degree of continuity at the origin and the smoothness of transition phenomena increase with K such that coupling of \(K \ge 2\) kernels in cascade implies a \(C^{K2}\)continuity of the temporal scalespace kernel. To guarantee at least \(C^1\)continuity of the temporal derivative computation kernel at the origin, the order n of differentiation of a temporal scalespace kernel should therefore not exceed \(K  2\). Specifically, the kernels based on a logarithmic distribution of the intermediate scale levels (i) have a higher degree of temporal asymmetry which increases with the distribution parameter c and (ii) allow for faster temporal dynamics compared to the kernels based on a uniform distribution.
In the case of a logarithmic distribution of the intermediate temporal scale levels, the choice of the distribution parameter c leads to a tradeoff issue in that smaller values of c allow for a denser sampling of the temporal scale levels, whereas larger values of c lead to faster temporal dynamics and a more skewed shape of the temporal receptive fields with larger deviations from the shape of Gaussian derivatives of the same order (Fig. 2).
3.6 Computational Modelling of Biological Receptive Fields

\(\partial _{\varphi } = \cos \varphi \, \partial _{x_1} + \sin \varphi \, \partial _{x_2}\) and \(\partial _{\bot \varphi } = \sin \varphi \, \partial _{x_1}  \cos \varphi \, \partial _{x_2}\) denote spatial directional derivative operators in two orthogonal directions \(\varphi \) and \(\bot \varphi \),

\(m_1 \ge 0\) and \(m_2 \ge 0\) denote the orders of differentiation in the two orthogonal directions in the spatial domain with the overall spatial order of differentiation \(m = m_1 + m_2\),

\(v_1 \, \partial _{x_1} + v_2 \, \partial _{x_2} + \partial _t\) denotes a velocityadapted temporal derivative operator
Figure 4 shows the result of modelling the spatiotemporal receptive fields of simple cells in V1 in this way, using the general idealized model of spatiotemporal receptive fields in Eq. (1) in combination with a temporal smoothing kernel obtained by coupling a set of firstorder integrators or truncated exponential kernels in cascade. As can be seen from the figures, the proposed idealized receptive field models do well reproduce the qualitative shape of the neurophysiologically recorded biological receptive fields.
These results complement the general theoretical model for visual receptive fields in Lindeberg [57] by (i) temporal kernels that have better temporal dynamics than the timecausal semigroup derived in Lindeberg [56] by decreasing faster with time (decreasing exponentially instead of polynomially) and with (ii) explicit modelling results and a theory (developed in more detail in following sections)^{1} for choosing and parameterizing the intermediate discrete temporal scale levels in the timecausal model.
With regard to a possible biological implementation of this theory, the evolution properties of the presented scalespace models over scale and time are governed by diffusion and difference equations [see Eqs. (23–24) in the next section], which can be implemented by operations over neighbourhoods in combination with firstorder integration over time. Hence, the computations can naturally be implemented in terms of connections between different cells. Diffusion equations are also used in mean field theory for approximating the computations that are performed by populations of neurons, see e.g. Omurtag et al. [76], Mattia and Guidice [73], Faugeras et al. [18].
By combination of the theoretical properties of these kernels regarding scalespace properties between receptive field responses at different spatial and temporal scales as well as their covariance properties under natural image transformations (described in more detail in the next section), the proposed theory can be seen as a both theoretically wellfounded and biologically plausible model for timecausal and timerecursive spatiotemporal receptive fields.
3.7 Theoretical Properties of TimeCausal SpatioTemporal ScaleSpace
If the family of receptive fields in Eq. (1) is defined over the full group of positive definite spatial covariance matrices \({\varSigma }\) in the spatial affine Gaussian scalespace [48, 56, 69], then the receptive field family also obeys (vi) closedness and covariance under timeindependent affine transformations of the spatial image domain, \((x', t')^T = (A x, t)^T\) implying \(L'(x', t';\; s, \tau _k;\; {\varSigma }', v') = L(x, t;\; s, \tau _k;\; {\varSigma }, v)\) with \({\varSigma }' = A{\varSigma }A^T\) and \(v' = Av\), and as resulting from, e.g., local linearizations of the perspective mapping (with locality defined as over the support region of the receptive field). When using rotationally symmetric Gaussian kernels for smoothing, the corresponding spatiotemporal scalespace representation does instead obey (vii) rotational invariance.
Over the temporal domain, convolution with these kernels obeys (viii) linearity over the temporal domain, (ix) shift invariance over the temporal domain, (x) temporal causality, (xi) cascade property over temporal scales, (xii) noncreation of local extrema for any purely temporal signal. If using a uniform distribution of the intermediate temporal scale levels, the spatiotemporal scalespace representation obeys a (xiii) semigroup property over discrete temporal scales. Due to the finite number of discrete temporal scale levels, the corresponding spatiotemporal scalespace representation cannot however for general values of the time constants \(\mu _k\) obey full selfsimilarity and scale covariance over temporal scales. Using a logarithmic distribution of the temporal scale levels and an additional limit case construction to the infinity, we will however show in Sect. 5 that it is possible to achieve (xiv) selfsimilarity (41) and scale covariance (49) over the discrete set of temporal scaling transformations \((x', t')^T = (x, c^j t)^T\) that precisely corresponds to mappings between any pair of discretized temporal scale levels as implied by the logarithmically distributed temporal scale parameter with distribution parameter c.
Over the composed spatiotemporal domain, these kernels obey (xv) positivity and (xvi) unit normalization in \(L_1\)norm. The spatiotemporal scalespace representation also obeys (xvii) closedness and covariance under local Galilean transformations in spacetime, in the sense that for any Galilean transformation \((x', t')^T = (x  ut, t)^T\) with two video sequences related by \(f'(x', t') = f(x, t)\), their corresponding spatiotemporal scalespace representations will be equal for corresponding parameter values \(L'(x', t';\; s, \tau _k;\; {\varSigma }, v') = L(x, t;\; s, \tau _k;\; {\varSigma }, v)\) with \(v' = vu\).
If additionally the velocity value v and/or the spatial covariance matrix \({\varSigma }\) can be adapted to the local image structures in terms of Galilean and/or affine invariant fixed point properties [48, 56, 64, 69], then the spatiotemporal receptive field responses can additionally be made (xviii) Galilean invariant and/or (xix) affine invariant.
4 Temporal Dynamics of the TimeCausal Kernels
By comparing Eqs. (26) and (27), we can specifically note that with increasing number of intermediate temporal scale levels, a logarithmic distribution of the intermediate scales implies shorter temporal delays than a uniform distribution of the intermediate scales.
Numerical values of the temporal delay in terms of the temporal mean \(m = \sum _{k=1}^K \mu _k\) in units of \(\sigma = \sqrt{\tau }\) for timecausal kernels obtained by coupling K truncated exponential kernels in cascade in the cases of a uniform distribution of the intermediate temporal scale levels \(\tau _k = k \tau /K\) or a logarithmic distribution \(\tau _k = c^{2(kK)} \tau \)
Temporal mean values m of timecausal kernels  

K  \(m_\mathrm{uni}\)  \(m_\mathrm{log}\) (\(c = \sqrt{2}\))  \(m_\mathrm{log}\) (\(c = 2^{3/4}\))  \(m_\mathrm{log}\) (\(c = 2\)) 
2  1.414  1.414  1.399  1.366 
3  1.732  1.707  1.636  1.549 
4  2.000  1.914  1.777  1.641 
5  2.236  2.061  1.860  1.686 
6  2.449  2.164  1.910  1.709 
7  2.646  2.237  1.940  1.721 
8  2.828  2.289  1.957  1.726 
Numerical values for the temporal delay of the local maximum in units of \(\sigma = \sqrt{\tau }\) for timecausal kernels obtained by coupling K truncated exponential kernels in cascade in the cases of a uniform distribution of the intermediate temporal scale levels \(\tau _k = k \tau /K\) or a logarithmic distribution \(\tau _k = c^{2(kK)} \tau \) with \(c > 1\)
Temporal delays \(t_\mathrm{max}\) from the maxima of timecausal kernels  

K  \(t_\mathrm{uni}\)  \(t_\mathrm{log}\) (\(c = \sqrt{2}\))  \(t_\mathrm{log}\) (\(c = 2^{3/4}\))  \(t_\mathrm{log}\) (\(c = 2\)) 
2  0.707  0.707  0.688  0.640 
3  1.154  1.122  1.027  0.909 
4  1.500  1.385  1.199  1.014 
5  1.789  1.556  1.289  1.060 
6  2.041  1.669  1.340  1.083 
7  2.268  1.745  1.370  1.095 
8  2.475  1.797  1.388  1.100 
If we consider a temporal event that occurs as a step function over time (e.g. a new object appearing in the field of view) and if the time of this event is estimated from the local maximum over time in the firstorder temporal derivative response, then the temporal variation in the response over time will be given by the shape of the temporal smoothing kernel. The local maximum over time will occur at a time delay equal to the time at which the temporal kernel has its maximum over time. Thus, the position of the maximum over time of the temporal smoothing kernel is highly relevant for quantifying the temporal response dynamics.
5 The ScaleInvariant TimeCausal Limit Kernel
In this section, we will show that in the case of a logarithmic distribution of the intermediate temporal scale levels, it is possible to extend the previous temporal scalespace concept into a limit case that permits for covariance under temporal scaling transformations, corresponding to closedness of the temporal scalespace representation to a compression or stretching of the temporal scale axis by any integer power of the distribution parameter c.
Concerning the need for temporal scale invariance of a temporal scalespace representation, let us first note that one could possibly first argue that the need for temporal scale invariance in a temporal scalespace representation is different from the need for spatial scale invariance in a spatial scalespace representation. Spatial scaling transformations always occur because of perspective scaling effects caused by variations in the distances between objects in the world and the observer and do therefore always need to be handled by a vision system, whereas the temporal scale remains unaffected by the perspective mapping from the scene to the image.
Temporal scaling transformations are, however, nevertheless important because of physical phenomena or spatiotemporal events occurring faster or slower. This is analogous to another source of scale variability over the spatial domain, caused by objects in the world having different physical size. To handle such scale variabilities over the temporal domain, it is therefore desirable to develop temporal scalespace concepts that allow for temporal scale invariance.
In Fig. 9 in Appendix 1(a), we show graphs of the corresponding skewness and kurtosis measures as function of the distribution parameter c, showing that both these measures increase with the distribution parameter c. In Fig. 12 in Appendix 2, we provide a comparison between the behaviour of this limit kernel and the temporal kernel in Koenderink’s scaletime model showing that although the temporal kernels in these two models to a first approximation share qualitatively coarsely similar properties in terms of their overall shape (see Fig. 11 in Appendix 2), the temporal kernels in these two models differ significantly in terms of their skewness and kurtosis measures.
SelfSimilarity and Scale Invariance of the Limit Kernel Combining the recurrence relations of the limit kernel with its transformation property under scaling transformations, it follows that the limit kernel can be regarded as truly selfsimilar over scale in the sense that (i) the scalespace representation at a coarser temporal scale (here \(\tau \)) can be recursively computed from the scalespace representation at a finer temporal scale (here \(\tau /c^2\)) according to (41), (ii) the representation at the coarser temporal scale is derived from the input in a functionally similar way as the representation at the finer temporal scale and (iii) the limit kernel and its Fourier transform are transformed in a selfsimilar way (44) and (43) under scaling transformations.
In these respects, the temporal receptive fields arising from temporal derivatives of the limit kernel share structurally similar mathematical properties as continuous wavelets [10, 30, 71, 75] and fractals [5, 6, 72], while with the here conceptually novel extension that the scaling behaviour and selfsimilarity over scale is achieved over a timecausal and timerecursive temporal domain.
6 Computational Implementation
The computational model for spatiotemporal receptive fields presented here is based on spatiotemporal image data that are assumed to be continuous over time. When implementing this model on sampled video data, the continuous theory must be transferred to discrete space and discrete time.
In this section, we describe how the temporal and spatiotemporal receptive fields can be implemented in terms of corresponding discrete scalespace kernels that possess scalespace properties over discrete spatiotemporal domains.
6.1 Classification of ScaleSpace Kernels for Discrete Signals
In Sect. 3.2, we described how the class of continuous scalespace kernels over a onedimensional domain can be classified based on classical results by Schoenberg regarding the theory of variation diminishing transformations as applied to the construction of discrete scalespace theory in Lindeberg [45] [48, Sect. 3.3]. To later map the temporal smoothing operation to theoretically wellfounded discrete scalespace kernels, we shall in this section describe corresponding classification result regarding scalespace kernels over a discrete temporal domain.
 twopoint weighted average or generalized binomial smoothing$$\begin{aligned} \begin{aligned} f_\mathrm{out}(x)&= f_\mathrm{in}(x) + \alpha _i \, f_\mathrm{in}(x  1) \quad (\alpha _i \ge 0),\\ f_\mathrm{out}(x)&= f_\mathrm{in}(x) + \delta _i \, f_\mathrm{in}(x + 1) \quad (\delta _i \ge 0), \end{aligned} \end{aligned}$$(53)
 moving average or firstorder recursive filtering$$\begin{aligned} \begin{aligned} f_\mathrm{out}(x)&= f_\mathrm{in}(x) + \beta _i \, f_\mathrm{out}(x  1) \quad (0 \le \beta _i < 1), \\ f_\mathrm{out}(x)&= f_\mathrm{in}(x) + \gamma _i \, f_\mathrm{out}(x + 1) \quad (0 \le \gamma _i < 1), \end{aligned}\nonumber \\ \end{aligned}$$(54)

infinitesimal smoothing^{2} or diffusion as arising from the continuous semigroups made possible by the factor\(\mathrm{e}^{(q_{1}z^{1} + q_1z)}\).
6.2 Discrete Temporal ScaleSpace Kernels Based on Recursive Filters
By the timerecursive formulation of this temporal scalespace concept, the computations can be performed based on a compact temporal buffer over time, which contains the temporal scalespace representations at temporal scales \(\tau _k\) and with no need for storing any additional temporal buffer of what has occurred in the past to perform the corresponding temporal operations.
6.3 Discrete Implementation of Spatial Gaussian Smoothing
6.4 Discrete Implementation of SpatioTemporal Receptive Fields
For nonseparable spatiotemporal receptive fields corresponding to a nonzero image velocity \(v = (v_1, v_2)^T\), we implement the spatiotemporal smoothing operation by first warping the video data \((x_1', x_2')^T = (x_1  v_1 t, x_2  v_2 t)^T\) using spline interpolation. Then, we apply separable spatiotemporal smoothing in the transformed domain and unwarp the result back to the original domain. Over a continuous domain, such an operation is equivalent to convolution with corresponding velocityadapted spatiotemporal receptive fields, while being significantly faster in a discrete implementation than explicit convolution with nonseparable receptive fields over three dimensions.
7 Scale Normalization for SpatioTemporal Derivatives
When computing spatiotemporal derivatives at different scales, some mechanism is needed for normalizing the derivatives with respect to the spatial and temporal scales, to make derivatives at different spatial and temporal scales comparable and to enable spatial and temporal scale selection.
7.1 Scale Normalization of Spatial Derivatives
7.2 Scale Normalization of Temporal Derivatives
If using a noncausal Gaussian temporal scalespace concept, scalenormalized temporal derivatives can be defined in an analogous way as scalenormalized spatial derivatives as described in the previous section.
For the discrete temporal scalespace concept over discrete time, scale normalization factors for discrete \(l_p\)normalization are defined in an analogous way with the only difference that the continuous \(L_p\)norm is replaced by a discrete \(l_p\)norm.
Numerical values of scale normalization factors for discrete temporal derivative approximations, using either variancebased normalization \(\tau ^{n/2}\) or \(l_p\)normalization \(\alpha _{n,\gamma _{\tau }}(\tau )\), for temporal derivatives of order \(n = 1\) and at temporal scales \(\tau = 1\), \(\tau = 16\) and \(\tau = 256\) relative to a unit temporal sampling rate with \(\Delta t = 1\) and with \(\gamma _{\tau } = 1\), for timecausal kernels obtained by coupling K firstorder recursive filters in cascade with either a uniform distribution of the intermediate scale levels or a logarithmic distribution for \(c = \sqrt{2}\), \(c = 2^{3/4}\) and \(c = 2\)
K  \(\tau ^{n/2}\)  \(\alpha _{n,\gamma _{\tau }}(\tau )\) (uni)  \(\alpha _{n,\gamma _{\tau }}(\tau )\) (\(c = \sqrt{2}\))  \(\alpha _{n,\gamma _{\tau }}(\tau )\) (\(c = 2^{3/4}\))  \(\alpha _{n,\gamma _{\tau }}(\tau )\) (\(c = 2\)) 

Temporal scale normalization factors for \(n = 1\) at \(\tau = 1\)  
2  1.000  0.744  0.744  0.737  0.723 
4  1.000  0.847  0.814  0.771  0.737 
8  1.000  0.935  0.823  0.772  0.738 
16  1.000  0.998  0.823  0.772  0.738 
Temporal scale normalization factors for \(n = 1\) at \(\tau = 16\)  
2  4.000  3.056  3.056  3.016  2.938 
4  4.000  3.553  3.432  3.223  3.068 
8  4.000  3.809  3.459  3.228  3.071 
16  4.000  3.891  3.460  3.338  3.071 
Temporal scale normalization factors for \(n = 1\) at \(\tau = 256\)  
2  16.000  12.270  12.270  12.084  11.711 
4  16.000  14.242  13.732  12.932  12.162 
8  16.000  15.145  13.817  12.922  12.151 
16  16.000  15.583  13.816  12.922  12.151 
Numerical values of scale normalization factors for discrete temporal derivative approximations, for either variancebased normalization \(\tau ^{n/2}\) or \(l_p\)normalization \(\alpha _{n,\gamma _{\tau }}(\tau )\), for temporal derivatives of order \(n = 2\) and at temporal scales \(\tau = 1\), \(\tau = 16\) and \(\tau = 256\) relative to a unit temporal sampling rate with \(\Delta t = 1\) and with \(\gamma _{\tau } = 1\), for timecausal kernels obtained by coupling K firstorder recursive filters in cascade with either a uniform distribution of the intermediate scale levels or a logarithmic distribution for \(c = \sqrt{2}\), \(c = 2^{3/4}\) and \(c = 2\)
K  \(\tau ^{n/2}\)  \(\alpha _{n,\gamma _{\tau }}(\tau )\) (uni)  \(\alpha _{n,\gamma _{\tau }}(\tau )\) (\(c = \sqrt{2}\))  \(\alpha _{n,\gamma _{\tau }}(\tau )\) (\(c = 2^{3/4}\))  \(\alpha _{n,\gamma _{\tau }}(\tau )\) (\(c = 2\)) 

Temporal scale normalization factors for \(n = 2\) at \(\tau = 1\)  
2  1.000  0.617  0.617  0.606  0.586 
4  1.000  0.738  0.718  0.659  0.609 
8  1.000  0.787  0.722  0.660  0.609 
16  1.000  0.824  0.722  0.660  0.609 
Temporal scale normalization factors for \(n = 2\) at \(\tau = 16\)  
2  16.000  4.622  4.622  4.472  4.172 
4  16.000  10.184  9.160  7.885  6.208 
8  16.000  13.106  10.068  7.862  6.305 
16  16.000  14.575  10.058  7.862  6.305 
Temporal scale normalization factors for \(n = 2\) at \(\tau = 256\)  
2  256.00  58.95  58.95  56.63  51.84 
4  256.00  165.14  148.96  124.04  101.16 
8  256.00  211.10  159.23  126.55  101.12 
16  256.00  233.78  159.28  126.55  101.12 
7.3 Computation of Temporal Scale Normalization Factors
Numerical estimates of the relative deviation from the limit case when using different numbers K of temporal scale levels for a uniform vs. a logarithmic distribution of the intermediate scale levels
K  \(\varepsilon _n\) (uni)  \(\varepsilon _n\) (\(c = \sqrt{2}\))  \(\varepsilon _n\) (\(c = 2^{3/4}\))  \(\varepsilon _n\) (\(c = 2\)) 

Relative deviation from limit of scale normalization factors for \(n = 1\) at \(\tau = 256\)  
2  0.233  \(1.1 \times 10^{1}\)  \(6.5 \times 10^{2}\)  \(3.6 \times 10^{2}\) 
4  0.110  \(6.1 \times 10^{3}\)  \(8.5 \times 10^{4}\)  \(8.6 \times 10^{4}\) 
8  0.053  \(4.9 \times 10^{4}\)  \(1.1 \times 10^{5}\)  \(2.0 \times 10^{7}\) 
16  0.026  \(1.2 \times 10^{7}\)  \(9.0 \times 10^{13}\)  \(1.5 \times 10^{15}\) 
32  0.013  \(3.1 \times 10^{14}\)  \(2.9 \times 10^{14}\)  \(3.4 \times 10^{14}\) 
Relative deviation from limit of scale normalization factors for \(n = 2\) at \(\tau = 256\)  
2  0.770  \(6.3 \times 10^{1}\)  \(5.5 \times 10^{1}\)  \(4.9 \times 10^{1}\) 
4  0.354  \(6.5 \times 10^{2}\)  \(2.0 \times 10^{2}\)  \(4.1 \times 10^{2}\) 
8  0.174  \(3.2 \times 10^{4}\)  \(1.3 \times 10^{5}\)  \(1.6 \times 10^{8}\) 
16  0.085  \(1.8 \times 10^{7}\)  \(1.0 \times 10^{12}\)  \(9.6 \times 10^{15}\) 
32  0.042  \(1.2 \times 10^{13}\)  \(6.2 \times 10^{14}\)  \(4.0 \times 10^{14}\) 
Notably, the numerical values of the resulting scale normalization factors may differ substantially depending on the type of scale normalization method and the underlying number of firstorder recursive filters that are coupled in cascade. Therefore, the choice of temporal scale normalization method warrants specific attention in applications where the relations between numerical values of temporal derivatives at different temporal scales may have critical influence.
Specifically, we can note that the temporal scale normalization factors based on \(L_p\)normalization differ more from the scale normalization factors from variancebased normalization (i) in the case of a logarithmic distribution of the intermediate temporal scale levels compared to a uniform distribution, (ii) when the distribution parameter c increases within the family of temporal receptive fields based on a logarithmic distribution of the intermediate scale levels or (iii) a very low number of recursive filters are coupled in cascade. In all three cases, the resulting temporal smoothing kernels become more asymmetric and do hence differ more from the symmetric Gaussian model.
On the other hand, with increasing values of K, the numerical values of the scale normalization factors converge much faster to their limit values when using a logarithmic distribution of the intermediate scale levels compared to using a uniform distribution. Depending on the value of the distribution parameter c, the scale normalization factors do reasonably well approach their limit values after \(K = 4\) to \(K = 8\) scale levels, whereas much larger values of K would be needed if using a uniform distribution. The convergence rate is faster for larger values of c.
7.4 Measuring the Deviation from the ScaleInvariant TimeCausal Limit Kernel
Not even \(K = 32\) scale levels is sufficient to drive the relative deviation measure below \(1~\%\) for a uniform distribution, whereas the corresponding deviation measures are down to machine precision when using \(K = 32\) levels for a logarithmic distribution. When using \(K = 4\) scale levels, the relative derivation measure is down to \(10^{2}\) to \(10^{4}\) for a logarithmic distribution. If using \(K = 8\) scale levels, the relative deviation measure is down to \(10^{4}\) to \(10^{8}\) depending on the value of the distribution parameter c and the order n of differentiation.
From these results, we can conclude that one should not use a too low number of recursive filters that are coupled in cascade when computing temporal derivatives. Our recommendation is to use a logarithmic distribution with a minimum of four recursive filters for derivatives up to order two at finer scales and a larger number of recursive filters at coarser scales. When performing computations at a single temporal scale, we often use \(K = 7\) or \(K = 8\) as default.
8 SpatioTemporal Feature Detection
In the following, we shall apply the above theoretical framework for separable timecausal spatiotemporal receptive fields for computing different types of spatiotemporal feature, defined from spatiotemporal derivatives of different spatial and temporal orders, which may additionally be combined into composed (linear or nonlinear) differential expressions.
8.1 Partial Derivatives
8.2 Directional Derivatives
Note that as long as the spatiotemporal smoothing operations are performed based on rotationally symmetric Gaussians over the spatial domain and using spacetime separable kernels over spacetime, the responses to these directional derivative operators can be directly related to corresponding partial derivative operators by mere linear combinations. If extending the rotationally symmetric Gaussian scalespace concept to an anisotropic affine Gaussian scalespace and/or if we make use of nonseparable velocityadapted receptive fields over spacetime in a spatiotemporal scale space, to enable true affine and/or Galilean invariances, such linear relationships will, however, no longer hold on a similar form.
For the image orientations \(\varphi \) and \(\bot \varphi \), it is for purely spatial derivative operations, in the case of rotationally symmetric smoothing over the spatial domain, in principle sufficient to to sample the image orientation according to a uniform distribution on the semicircle using at least \(m+1\) directional derivative filters for derivatives of order m.
8.3 Differential Invariants Over Spatial Derivative Operators
8.4 SpaceTimeCoupled SpatioTemporal Derivative Expressions
A more general approach to spatiotemporal feature detection than partial derivatives or directional derivatives consists of defining spatiotemporal derivative operators that combine spatial and temporal derivative operators in an integrated manner.
By combining these two entities into a quasi quadrature measure \(\mathcal{Q}_t(\nabla _{(x,y)}^2 L)\) over time, we obtain a differential entity that can be expected to give strong responses when the intensity varies strongly over both image space and over time, while giving no response if there are no intensity variations over space or time. Hence, these three differential operators could be regarded as primitive spatiotemporal interest operators that can be seen as compatible with existing knowledge about neural processes in the LGN.
If aiming at defining a spatiotemporal analogue of the Laplacian operator, one does, however, need to consider that the most straightforward way of defining such an operator \(\nabla _{(x, y, t)}^2 L = L_{xx} + L_{yy} + L_{tt}\) is not covariant under independent scaling of the spatial and temporal coordinates as occurs if observing the same scene with cameras having independently different spatial and temporal sampling rates. Therefore, the choice of the relative weighting factor \(\varkappa ^2\) between temporal vs. spatial derivatives introduced in Eq. (103) is in principle arbitrary. By the homogeneity of the determinant of the Hessian (101) and the spatiotemporal Gaussian curvature (102) in terms of the orders of spatial vs. temporal differentiation that are multiplied in each term, these expressions are on the other hand truly covariant under independent rescalings of the spatial and temporal coordinates and therefore better candidates for being used as spatiotemporal interest operators, unless the relative scaling and weighting of temporal vs. spatial coordinates can be handled by some complementary mechanism.
The formulation of these quasi quadrature entities is inspired by the existence of nonlinear complex cells in the primary visual cortex that (i) do not obey the superposition principle, (ii) have response properties independent of the polarity of the stimuli and (iii) are rather insensitive to the phase of the visual stimuli as discovered by Hubel and Wiesel [31, 32]. Specifically, De Valois et al. [92] show that first and secondorder receptive fields typically occur in pairs that can be modelled as approximate Hilbert pairs.
Within the framework of the presented spatiotemporal scalespace concept, it is interesting to note that nonlinear receptive fields with qualitatively similar properties can be constructed by squaring first and secondorder derivative responses and summing up these components as proposed by Koenderink and van Doorn [40]. The use of quasi quadrature model can therefore be interpreted as a Gaussian derivativebased analogue of energy models as proposed by Adelson and Bergen [1] and Heeger [29]. To obtain local phase independence over variations over both space and time simultaneously, we do here additionally extend the notion of quasi quadrature to composed spacetime, by simultaneously summing up squares of odd and even filter responses over both space and time, leading to quadruples or octuples of filter responses, complemented by additional terms to achieve rotational invariance over the spatial domain.
8.5 ScaleNormalized SpatioTemporal Derivative Expressions
8.6 Experimental Results
Figure 7 shows the result of computing the above differential expressions for a video sequence of a paddler in a kayak.
Comparing the spatiotemporal scalespace representation L in the top middle figure to the original video f in the top left, we can first note that a substantial amount of fine scale spatiotemporal textures, e.g. waves of the water surface, is suppressed by the spatiotemporal smoothing operation. The illustrations of the spatiotemporal scalespace representation L in the top middle figure and its first and secondorder temporal derivatives \(L_{t,\mathrm{norm}}\) and \(L_{tt,\mathrm{norm}}\) in the left and middle figures in the second row do also show the spatiotemporal traces that are left by a moving object; see in particular the image structures below the raised paddle that respond to spatial points in the image domain where the paddle has been in the past.
The slight jagginess in the bright response that can be seen below the paddle in the response to the secondorder temporal derivative \(L_{tt,\mathrm{norm}}\) is a temporal sampling artefact caused by sparse temporal sampling in the original video. With 25 frames per second, there are 40 ms between adjacent frames, during which there may happen a lot in the spatial image domain for rapidly moving objects. This situation can be compared to mammalian vision where many receptive fields operate continuously over time scales in the range 20100 ms. With 40 ms between adjacent frames, it is not possible to simulate such continuous receptive fields smoothly over time, since such a frame rate corresponds to either zero, one or at best two images within the effective time span of the receptive field. To simulate rapid continuous time receptive fields more accurately in a digital implementation, one should therefore preferably aim at acquiring the input video with a higher temporal frame rate. Such higher frame rates are indeed now becoming available, even in consumer cameras. Despite this limitation in the input data, we can observe that the proposed model is able to compute geometrically meaningful spatiotemporal image features from the raw video.
The illustrations of \(\partial _t (\nabla _{(x,y),\mathrm{norm}}^2 L)\) and \(\partial _{tt} (\nabla _{(x,y),\mathrm{norm}}^2 L)\) in the left and middle of the third row show the responses of our idealized model of nonlagged and lagged LGN cells complemented by a quasi quadrature energy measure of these responses in the right column. These entities correspond to applying a spatial Laplacian operator to the first and secondorder temporal derivatives in the second row and it can be seen how this operation enhances spatial variations. These spatiotemporal entities can also be compared to the purely spatial interest operators, the Laplacian \(\nabla _{(x, y),\mathrm{norm}}^2 L\) and the determinant of the Hessian \(\det \mathcal{H}_{(x, y),\mathrm{norm}} L\) in the first and second rows of the third column. Note how the genuine spatiotemporal recursive fields enhance spatiotemporal structures compared to purely spatial operators and how static structures, such as the label in the lower right corner, disappear altogether by genuine spatiotemporal operators. The fourth row shows how three other genuine spatiotemporal operators, the spatiotemporal Hessian \(\partial _t (\nabla _{(x,y),\mathrm{norm}}^2 L)\), the rescaled Gaussian curvature \(\mathcal{G}_{(x,y,t),\mathrm{norm}} L\) and the quasi quadrature measure \(\mathcal{Q}_t(\det \mathcal{H}_{(x, y),\mathrm{norm}} L)\), also respond to points where there are simultaneously both strong spatial and strong temporal variations.
The bottom row shows three idealized models defined to mimic qualitatively known properties of complex cells and expressed in terms of quasi quadrature measures of spatiotemporal scalespace derivatives. For the first quasi quadrature entity \(\mathcal{Q}_{1,(x,y,t),\mathrm{norm}} L\) to respond, in which time is treated in a largely qualitatively similar manner as space, it is sufficient if there are strong variations over either space or time. It can be seen that this measure is therefore not highly selective. For the second and the third entities \(\mathcal{Q}_{2,(x,y,t),\mathrm{norm}} L\) and \(\mathcal{Q}_{3,(x,y,t),\mathrm{norm}} L\), it is necessary that there are simultaneous variations over both space and time, and it can be seen how these entities are as a consequence more selective. For the third entity \(\mathcal{Q}_{3,(x,y,t),\mathrm{norm}} L\), simultaneous selectivity over both space and time is additionally enforced on each primitive linear receptive field that is then combined into the nonlinear quasi quadrature measure. We can see how this quasi quadrature entity also responds stronger to the moving paddle than the two other quasi quadrature measures.
8.7 Geometric Covariance and Invariance Properties
Rotations in Image Space The spatial differential expressions \(\nabla _{(x, y)} L\), \(\nabla _{(x, y)}^2 L\), \(\det \mathcal{H} _{(x, y)}\), \(\tilde{\kappa }(L)\) and \(\mathcal{Q}_{(x, y)} L\) are all invariant under rotations in the image domain and so are the spatiotemporal derivative expressions \(\partial _t (\nabla _{(x,y)}^2 L)\), \(\partial _{tt} (\nabla _{(x,y)}^2L)\), \(\mathcal{Q}_t(\nabla _{(x,y)}^2 L)\), \(\partial _t (\det \mathcal{H}_{(x,y)} L)\), \(\partial _{tt} (\det \mathcal{H}_{(x,y)} L)\), \(\mathcal{Q}_t (\det \mathcal{H}_{(x,y)} L)\), \(\det \mathcal{H}_{(x, y, t)} L\), \(\mathcal{G}_{(x, y, t)} L\), \(\nabla _{(x, y, t)}^2 L\), \(\mathcal{Q} _{1,(x, y, t)} L\), \(\mathcal{Q}_{2,(x, y, t)} L\) and \(\mathcal{Q} _{3,(x, y, t)} L\) as well as their corresponding scalenormalized expressions.
Uniform Rescaling of the Spatial Domain Under a uniform scaling transformation of image space, the spatial differential invariants \(\nabla _{(x, y)} L\), \(\nabla _{(x, y)}^2 L\), \(\det \mathcal{H} _{(x, y)}\) and \(\tilde{\kappa }(L)\) are covariant under spatial scaling transformations in the sense that their magnitude values are multiplied by a power of the scaling factor, and so are their corresponding scalenormalized expressions. Also the spatiotemporal differential invariants \(\partial _t (\nabla _{(x,y)}^2 L)\), \(\partial _{tt} (\nabla _{(x,y)}^2L)\), \(\partial _t (\det \mathcal{H}_{(x,y)} L)\), \(\partial _{tt} (\det \mathcal{H}_{(x,y)} L)\), \(\det \mathcal{H}_{(x, y, t)} L\) and \(\mathcal{G}_{(x, y, t)} L\) and their corresponding scalenormalized expressions are covariant under spatial scaling transformations in the sense that their magnitude values are multiplied by a power of the scaling factor under such spatial scaling transformations.
The quasi quadrature entity \(\mathcal{Q}_{(x, y),\mathrm{norm}} L\) is however not covariant under spatial scaling transformations and not the spatiotemporal differential invariants \(\mathcal{Q}_{t,\mathrm{norm}}(\nabla _{(x,y)}^2 L)\),\(\mathcal{Q}_{t,\mathrm{norm}} (\det \mathcal{H}_{(x,y)} L)\), \(\mathcal{Q} _{1,(x, y, t),\mathrm{norm}} L\), \(\mathcal{Q} _{2,(x, y, t),\mathrm{norm}} L\) and \(\mathcal{Q} _{3,(x, y, t),\mathrm{norm}} L\) either. Due to the form of \(\mathcal{Q}_{(x, y),\mathrm{norm}} L\), \(\mathcal{Q}_{t,\mathrm{norm}}(\nabla _{(x,y)}^2 L)\), \(\mathcal{Q}_{t,\mathrm{norm}} (\det \mathcal{H}_{(x,y)} L)\), \(\mathcal{Q} _{2,(x, y, t),\mathrm{norm}} L\) and \(\mathcal{Q} _{3,(x, y, t),\mathrm{norm}} L\) as being composed of sums of scalenormalized derivative expressions for \(\gamma = 1\), these derivative expressions can, however, anyway be made scale invariant when combined with a spatial scale selection mechanism.
Uniform Rescaling of the Temporal Domain Independent of the Spatial Domain Under an independent rescaling of the temporal dimension while keeping the spatial dimension fixed, the partial derivatives \(L_{x_1^{m_1} x_2^{m_1} t^n}(x_1, x_2, t;\; s, \tau )\) are covariant under such temporal rescaling transformations, and so are the directional derivatives \(L_{\varphi ^{m_1} \bot \varphi ^{m_2} t^n}\) for image velocity \(v = 0\). For nonzero image velocities, the image velocity parameters of the receptive field would on the other hand need to be adapted to the local motion direction of the objects/spatiotemporal events of interest to enable matching between corresponding spatiotemporal directional derivative operators.
Under an independent rescaling of the temporal dimension while keeping the spatial dimension fixed, also the spatiotemporal differential invariants \(\partial _t (\nabla _{(x,y)}^2 L)\), \(\partial _{tt} (\nabla _{(x,y)}^2L)\), \(\partial _t (\det \mathcal{H}_{(x,y)} L)\), \(\partial _{tt} (\det \mathcal{H}_{(x,y)} L)\), \(\det \mathcal{H}_{(x, y, t)} L\) and \(\mathcal{G}_{(x, y, t)} L\) are covariant under independent rescaling of the temporal vs. spatial dimensions. The same applies to their corresponding scalenormalized expressions.
8.8 Invariance to Illumination Variations and Exposure Control Mechanisms
Because of all these expressions being composed of spatial, temporal and spatiotemporal derivatives of nonzero order, it follows that all these differential expressions are invariant under additive illumination transformations of the form \(L \mapsto L + C\).
This means that if we would take the image values f as representing the logarithm of the incoming energy \(f \sim \log I\) or \(f \sim \log I^{\gamma } = \gamma \log I\), then all these differential expressions will be invariant under local multiplicative illumination transformations of the form \(I \mapsto C \, I\) implying \(L \sim \log I + \log C\) or \(L \sim \log I^{\gamma } = \gamma (\log I + \log C)\). Thus, these differential expressions will be invariant to local multiplicative variabilities in the external illumination (with locality defined as over the support region of the spatiotemporal receptive field) or multiplicative exposure control parameters such as the aperture of the lens and the integration time or the sensitivity of the sensor.
From the structure of Eq. (112), we can note that for any nonzero order of spatial differentiation \(m_1 + m_2 > 0\), the influence of the internal camera parameters in \(C_{cam}(\tilde{f}(t))\) will disappear because of the spatial differentiation with respect to \(x_1\) or \(x_2\), and so will the effects of any other multiplicative exposure control mechanism. Furthermore, for any multiplicative illumination variation \(i'(x, y) = C \, i(x, y)\), where C is a scalar constant, the logarithmic luminosity will be transformed as \(\log i'(x, y) = \log C + \log i(x, y)\), which implies that the dependency on C will disappear after spatial differentiation. For purely temporal derivative operators, that do not involve any order of spatial differentiation, such as the first and secondorder derivative operators, \(L_t\) and \(L_{tt}\), strong responses may on the other hand be obtained due to illumination compensation mechanisms that vary over time as the results of rapid variations in the illumination. If one wants to design spatiotemporal feature detectors that are robust to illumination variations and to variations in exposure compensation mechanisms caused by these, it is therefore essential to include nonzero orders of spatial differentiation. The use of Laplacianlike filtering in the first stages of visual processing in the retina and the LGN can therefore be interpreted as a highly suitable design to achieve robustness of illumination variations and adaptive variations in the diameter of the pupil caused by these, while still being expressed in terms of rotationally symmetric linear receptive fields over the spatial domain.
The quasi quadrature entities \(\mathcal{Q} _{1,(x, y, t)} L\) and \(\mathcal{Q}_{2,(x, y, t)} L\) are however not invariant to such position and timedependent illumination variations. This property can in particular be noted for the quasi quadrature entity \(\mathcal{Q} _{1,(x, y, t)} L\), for which what seems as initial timevarying exposure compensation mechanisms in the camera lead to large responses in the initial part of the video sequence (see Fig. 8, left). Out of the three quasi quadrature entities \(\mathcal{Q} _{1,(x, y, t)} L\), \(\mathcal{Q}_{2,(x, y, t)} L\) and \(\mathcal{Q} _{3,(x, y, t)} L\), the third quasi quadrature entity does therefore possess the best robustness properties to illumination variations (see Fig. 8, right).
9 Summary and Discussion
We have presented an improved computational model for spatiotemporal receptive fields based on timecausal and timerecursive spatiotemporal scalespace representation defined from a set of firstorder integrators or truncated exponential filters coupled in cascade over the temporal domain in combination with a Gaussian scalespace concept over the spatial domain. This model can be efficiently implemented in terms of recursive filters over time and we have shown how the continuous model can be transferred to a discrete implementation while retaining discrete scalespace properties. Specifically, we have analysed how remaining design parameters within the theory, in terms of the number of firstorder integrators coupled in cascade and a distribution parameter of a logarithmic distribution, affect the temporal response dynamics in terms of temporal delays.
Compared to other spatial and temporal scalespace representations based on continuous scale parameters, a conceptual difference with the temporal scalespace representation underlying the proposed spatiotemporal receptive fields is that the temporal scale levels have to be discrete. Thereby, we sacrifice a continuous scale parameter and full scale invariance as resulting from the Gaussian scalespace concepts based on causality or nonenhancement of local extrema proposed by Koenderink [38] and Lindeberg [56] or used as a scalespace axiom in the scalespace formulations by Iijima [34], Florack et al. [23], Pauwels et al. [77] and Weickert et al. [93, 94, 95], Duits et al. [14, 15] and Fagerström [16, 17]; see also the approaches by Witkin [97], Babaud et al. [3], Yuille and Poggio [98], Koenderink and van Doorn [40, 41], Lindeberg [45, 48, 49, 50, 51, 58], Florack et al. [21, 22, 23], Alvarez et al. [2], Guichard [26], ter Haar Romeny et al [27, 28], Felsberg and Sommer [19] and Tschirsich and Kuijper [90] for other scalespace formulations closely related to this work, as well as Fleet and Langley [20], Freeman and Adelson [25], Simoncelli et al. [89] and Perona [78] for more filteroriented approaches, Miao and Rao [74], Duits and Burgeth [13], Cocci et al. [9], Barbieri et al. [4] and Sharma and Duits [91] for Lie group approaches for receptive fields and Lindeberg and Friberg [67, 68] for the application of closely related principles for deriving idealized computational models of auditory receptive fields.
When using a logarithmic distribution of the intermediate scale levels, we have however shown that by a limit construction when the number of intermediate temporal scale levels tends to infinity, we can achieve true selfsimilarity and scale invariance over a discrete set of scaling factors. For a vision system intended to operate in real time using no other explicit storage of visual data from the past than a compact timerecursive buffer of spatiotemporal scalespace at different temporal scales, the loss of a continuous temporal scale parameter may however be less of a practical constraint, since one would anyway have to discretize the temporal scale levels in advance to be able to register the image data to be able to perform any computations at all.
In the special case when all the time constants of the firstorder integrators are equal, the resulting temporal smoothing kernels in the continuous model (29) correspond to Laguerre functions (Laguerre polynomials multiplied by a truncated exponential kernel), which have been previously used for modelling the temporal response properties of neurons in the visual system by den Brinker and Roufs [8] and for computing spatiotemporal image features in computer vision by Berg et al. [79] and Rivero Moreno and Bres [7]. Regarding the corresponding discrete model with all time constants equal, the corresponding discrete temporal smoothing kernels approach Poisson kernels when the number of temporal smoothing steps increases while keeping the variance of the composed kernel fixed [66]. Such Poisson kernels have also been used for modelling biological vision by Fourtes and Hodgkin [24]. Compared to the special case with all time constants equal, a logarithmic distribution of the intermediate temporal scale levels (18) does on the other hand allow for larger flexibility in the tradeoff between temporal smoothing and temporal response characteristics, specifically enabling faster temporal responses (shorter temporal delays) and higher computational efficiency when computing multiple temporal or spatiotemporal receptive field responses involving coarser temporal scales.
From the detailed analysis in Sect. 5 and Appendix 1, we can conclude that when the number of firstorder integrators that are coupled in cascade increases while keeping the variance of the composed kernel fixed, the timecausal kernels obtained by composing truncated exponential kernels with equal time constants in cascade tend to a limit kernel with skewness and kurtosis measures zero, or equivalently third and fourthorder cumulants equal to zero, whereas the timecausal kernels obtained by composing truncated exponential kernels having a logarithmic distribution of the intermediate scale levels tend to a limit kernel with nonzero skewness and nonzero kurtosis This property reveals a fundamental difference between the two classes of timecausal scalespace kernels based on either a logarithmic or a uniform distribution of the intermediate temporal scale levels.
In a complementary analysis in Appendix 2, we have also shown how our timecausal kernels can be related to the temporal kernels in Koenderink’s scaletime model [39]. By identifying the first and secondorder temporal moments of the two classes of kernels, we have derived closedform expressions to relate the parameters between the two models, and showed that although the two classes of kernels to a large extent share qualitatively similar properties, the two classes of kernels differ significantly in terms of their third and fourthorder skewness and kurtosis measures.
The closedform expressions for Koenderink’s scaletime kernels are analytically simpler than the explicit expressions for our kernels, which will be sums of truncated exponential kernels for all the time constants with the coefficients determined from a partial fraction expansion. In this respect, the derived mapping between the parameters of our and Koenderink’s models can be used, e.g., for estimating the time of the temporal maximum of our kernels, which would otherwise have to be determined numerically. Our kernels do on the other hand have a clear computational advantage in that they are truly timerecursive, meaning that the primitive firstorder integrators in the model contain sufficient information for updating the model to new states over time, whereas the kernels in Koenderink’s scaletime model appear to require a complete memory of the past, since they do not have any known timerecursive formulation.
Regarding the purely temporal scalespace concept used in our spatiotemporal model, we have notably replaced the assumption of a semigroup structure over temporal scales by a weaker Markov property, which however anyway guarantees a necessary cascade property over temporal scales, to ensure gradual simplification of the temporal scalespace representation from any finer to any coarser temporal scale. By this relaxation of the requirement of a semigroup over temporal scales, we have specifically been able to define a temporal scalespace concept with much better temporal dynamics than the timecausal semigroups derived by Fagerström [16] and Lindeberg [56]. Since this new timecausal temporal scalespace concept with a logarithmic distribution of the intermediate temporal scale levels would not be found if one would start from the assumption about a semigroup over temporal scales as a necessary requirement, we propose that in the area of scalespace axiomatics, the assumption of a semigroup over temporal scales should not be regarded as a necessary requirement for a timecausal temporal scalespace representation.
Recently, and during the development of this article, Mahmoudi [70] has presented a very closely related while more neurophysiologically motivated model for visual receptive fields, based on an electrical circuit model with spatial smoothing determined by local spatial connections over a spatial grid and temporal smoothing by firstorder temporal integration. The spatial component in that model is very closely related to our earlier discrete scalespace models over spatial and spatiotemporal grids [45, 51, 54] as can be modelled by Ztransforms of the discrete convolution kernels and an algebra of spatial or spatiotemporal covariance matrices to model the transformation properties of the receptive fields under locally linearized geometric image transformations. The temporal component in that model is in turn similar to our temporal smoothing model by firstorder integrators coupled in cascade as initially proposed in [45, 66], suggested as one of three models for temporal smoothing in spatiotemporal visual receptive fields in [57, 58, 59] and then refined and further developed in [62, 63] and this article. Our model can also be implemented by electric circuits, by combining the temporal electric model in Fig. 1 with the spatial discretization in Sect. 6.3 or more general connectivities between adjacent layers to implement velocityadapted receptive fields as can then be described by their resulting spatiotemporal covariance matrices. Mahmoudi compares such electrically modelled receptive fields to results of neurophysiological recordings in the LGN and the primary visual cortex in a similar way as we compared our theoretically derived receptive fields to biological receptive fields in [51, 56, 57, 62] and in this article.
Mahmoudi shows that the resulting transfer function in the layered electric circuit model approaches a Gaussian when the number of layers tends to infinity. This result agrees with our earlier results that the discrete scalespace kernels over a discrete spatial grid approach the continuous Gaussian when the spatial scale increment tends to zero, while the spatial scale level is held constant [45] and that the temporal smoothing function corresponding to a set of firstorder integrators with equal time constants coupled in cascade tends to the Poisson kernel (which in turn approaches the Gaussian kernel) when the temporal scale increment tends to zero while the temporal scale level is held constant [66].
In his article, Mahmoudi [70] makes a distinction between our scalespace approach, which is motivated by the mathematical structure of the environment in combination with a set of assumptions about the internal structure of a vision system to guarantee internal consistency between image representations at different spatial and temporal scales, and his model motivated by assumptions about neurophysiology. One way to reconcile these views is by following the evolutionary arguments proposed in Lindeberg [57, 59]. If there is a strong evolutionary pressure on a living organism that uses vision as a key source of information about its environment (as there should be for many higher mammals), then in the competition between two species or two individuals from the same species, there should be a strong evolutionary advantage for an organism that as much as possible adapts the structure of its vision system to be consistent with the structural and transformation properties of its environment. Hence, there could be an evolutionary pressure for the vision system of such an organism to develop similar types of receptive fields as can be derived by an idealized mathematical theory, and specifically develop neurophysiological wetware that permits the computation of sufficiently good approximations to idealized receptive fields as derived from mathematical and physical principles. From such a viewpoint, it is highly interesting to see that the neurophysiological cell recordings in the LGN and the primary visual cortex presented by DeAngelis et al. [11, 12] are in very good qualitative agreement with the predictions generated by our mathematically and physically motivated normative theory (see Figs. 3 and 4).
Given the derived timecausal and timerecursive formulation of our basic linear spatiotemporal receptive fields, we have described how this theory can be used for computing different types of both linear and nonlinear scalenormalized spatiotemporal features. Specifically, we have emphasized how scale normalization by \(L_p\)normalization leads to fundamentally different results compared to more traditional variancebased normalization. By the formulation of the corresponding scale normalization factors for discrete temporal scale space, we have also shown how they permit the formulation of an operational criterion to estimate how many intermediate temporal scale levels are needed to approximate true scale invariance up to a given tolerance.
Finally, we have shown how different types of spatiotemporal features can defined in terms of spatiotemporal differential invariants built from spatiotemporal receptive field responses, including their transformation properties under natural image transformations, with emphasis on independent scaling transformations over space vs. time, rotational invariance over the spatial domain and illumination and exposure control variations. We propose that the presented theory can be used for computing features for generic purposes in computer vision and for computational modelling of biological vision for image data over a timecausal spatiotemporal domain, in an analogous way as the Gaussian scalespace concept constitutes a canonical model for processing image data over a purely spatial domain.
Footnotes
 1.
The theoretical results following in Sect. 5 state that temporal scale covariance becomes possible using a logarithmic distribution of the temporal scale levels. Section 4 states that the temporal response properties are faster for a logarithmic distribution of the intermediate temporal scale levels compared to a uniform distribution. If one has requirements about how fine the temporal scale sampling needs to be or maximally allowed temporal delays, then Table 2 in Sect. 4 provides constraints on permissable values of the distribution parameter c. Finally, the quantitative criterion in Sect. 7.4 (see Table 5) states how many intermediate temporal scale levels are needed to approximate temporal scale invariance up to a given accuracy.
 2.
These kernels correspond to infinitely divisible distributions as can be described with the theory of Lévy processes [80], where specifically the case \(q_{1} = q_1\) corresponds to convolution with the noncausal discrete analogue of the Gaussian kernel [45] and the case \(q_{1} = 0\) to convolution with timecausal Poisson kernel [66].
 3.
 4.
When using the Laplacian operator in this paper, the notation \(\nabla _{(x, y)}^2\) should be understood as the covariant expression \(\nabla _{(x, y)}^2 =\nabla _{(x, y)}^T \nabla _{(x, y)}\) with \(\nabla _{(x, y)} = (\partial _x, \partial _y)^T\), etc.
 5.
To make the differential entities in Eqs. (104), (105) and (106) fully consistent and meaningful, they do additionally have to be transformed into scalenormalized derivatives as later done in Eqs. (109), (110) and (111). With scalenormalized derivatives for \(\gamma = 1\), the resulting scalenormalized derivatives then become dimensionless, which makes it possible to add first and secondorder derivatives of the same variable (over either space or time) in a scaleinvariant manner. Then, similar arguments as are used for deriving the blending parameter C between first and secondorder temporal derivatives in [52] can be used for deriving a similar blending parameter between first and secondorder spatial derivatives.
Notes
Acknowledgments
The support from the Swedish Research Council (contract 20144083) is gratefully acknowledged. An earlier version of this manuscript containing some additional details has been deposited at arXiv [63].
References
 1.Adelson, E., Bergen, J.: Spatiotemporal energy models for the perception of motion. J. Opt. Soc. Am. A 2, 284–299 (1985)CrossRefGoogle Scholar
 2.Alvarez, L., Guichard, F., Lions, P.L., Morel, J.M.: Axioms and fundamental equations of image processing. Arch. Ration. Mech. 123(3), 199–257 (1993)MathSciNetCrossRefzbMATHGoogle Scholar
 3.Babaud, J., Witkin, A.P., Baudin, M., Duda, R.O.: Uniqueness of the Gaussian kernel for scalespace filtering. IEEE Trans. Pattern Anal. Mach. Intell. 8(1), 26–33 (1986)CrossRefzbMATHGoogle Scholar
 4.Barbieri, D., Citti, G., Cocci, G., Sarti, A.: A corticalinspired geometry for contour perception and motion integration. J. Math. Imaging Vis. 49(3), 511–529 (2014)CrossRefzbMATHGoogle Scholar
 5.Barnsley, M.F., Devaney, R.L., Mandelbrot, B.B., Peitgen, H.O., Saupe, D., Voss, R.F.: The Science of Fractals. Springer, New York (1988)zbMATHGoogle Scholar
 6.Barnsley, M.F., Rising, H.: Fractals Everywhere. Academic Press, Boston (1993)Google Scholar
 7.van der Berg, E.S., Reyneke, P.V., de Ridder, C.: Rotational image correlation in the GaussLaguerre domain. In: Third SPIE Conference on Sensors, MEMS and ElectroOptic Systems: Proc. of SPIE, vol. 9257, pp. 92,570F–1–92,570F–17 (2014)Google Scholar
 8.den Brinker, A.C., Roufs, J.A.J.: Evidence for a generalized Laguerre transform of temporal events by the visual system. Biol. Cybern. 67(5), 395–402 (1992)CrossRefzbMATHGoogle Scholar
 9.Cocci, C., Barbieri, D., Sarti, A.: Spatiotemporal receptive fields in V1 are optimally shaped for stimulus velocity estimation. J. Opt. Soc. Am. A 29(1), 130–138 (2012)CrossRefGoogle Scholar
 10.Daubechies, I.: Ten Lectures on Wavelets. SIAM, Philadelphia (1992)CrossRefzbMATHGoogle Scholar
 11.DeAngelis, G.C., Anzai, A.: A modern view of the classical receptive field: linear and nonlinear spatiotemporal processing by V1 neurons. In: Chalupa, L.M., Werner, J.S. (eds.) The Visual Neurosciences, vol. 1, pp. 704–719. MIT Press, Cambridge (2004)Google Scholar
 12.DeAngelis, G.C., Ohzawa, I., Freeman, R.D.: Receptive field dynamics in the central visual pathways. Trends Neurosci. 18(10), 451–457 (1995)CrossRefGoogle Scholar
 13.Duits, R., Burgeth, B.: Scale spaces on Lie groups. In: Gallari, F., Murli, A., Paragios, N. (eds.) Proceedings of International Conference on ScaleSpace Theories and Variational Methods in Computer Vision (SSVM 2007), Lecture Notes in Computer Science, vol. 4485, pp. 300–312. Springer, Berlin (2007)CrossRefGoogle Scholar
 14.Duits, R., Felsberg, M., Florack, L., Platel, B.: \(\alpha \)scalespaces on a bounded domain. In: Griffin, L., Lillholm, M. (eds.) Proceedings of ScaleSpace Methods in Computer Vision (ScaleSpace’03). Lecture Notes in Computer Science, vol. 2695, pp. 494–510. Springer, Isle of Skye (2003)CrossRefGoogle Scholar
 15.Duits, R., Florack, L., de Graaf, J., ter Haar Romeny, B.: On the axioms of scale space theory. J. Math. Imaging Vis. 22, 267–298 (2004)MathSciNetCrossRefGoogle Scholar
 16.Fagerström, D.: Temporal scalespaces. Int. J. Comput. Vis. 2–3, 97–106 (2005)CrossRefGoogle Scholar
 17.Fagerström, D.: Spatiotemporal scalespaces. In: Gallari, F., Murli, A., Paragios, N. (eds.) Proceedings of International Conference on ScaleSpace Theories and Variational Methods in Computer Vision (SSVM 2007). Lecture Notes in Computer Science, vol. 4485, pp. 326–337. Springer, Berlin (2007)CrossRefGoogle Scholar
 18.Faugeras, O., Toubol, J., Cessac, B.: A constructive meanfield analysis of multipopulation neural networks with random synaptic weights and stochastic inputs. Front. Comput. Neurosci. 3, 1 (2009). doi: 10.3389/neuro.10.001.2009 CrossRefGoogle Scholar
 19.Felsberg, M., Sommer, G.: The monogenic scalespace: a unifying approach to phasebased image processing in scalespace. J. Math. Imaging Vis. 21, 5–26 (2004)MathSciNetCrossRefGoogle Scholar
 20.Fleet, D.J., Langley, K.: Recursive filters for optical flow. IEEE Trans. Pattern Anal. Mach. Intell. 17(1), 61–67 (1995)CrossRefGoogle Scholar
 21.Florack, L.M.J.: Image Structure. Series in Mathematical Imaging and Vision. Springer, Berlin (1997)CrossRefGoogle Scholar
 22.Florack, L.M.J., ter Haar Romeny, B.M., Koenderink, J.J., Viergever, M.A.: Families of tuned scalespace kernels. In: Sandini, G. (ed.) Proceedings of European Conference on Computer Vision (ECCV’92). Lecture Notes in Computer Science, vol. 588, pp. 19–23. Springer, Santa Margherita Ligure (1992)Google Scholar
 23.Florack, L.M.J., ter Haar Romeny, B.M., Koenderink, J.J., Viergever, M.A.: Scale and the differential structure of images. Image Vis. Comput. 10(6), 376–388 (1992)MathSciNetCrossRefzbMATHGoogle Scholar
 24.Fourtes, M.G.F., Hodgkin, A.L.: Changes in the time scale and sensitivity in the omatidia of limulus. J. Physiol. 172, 239–263 (1964)CrossRefGoogle Scholar
 25.Freeman, W.T., Adelson, E.H.: The design and use of steerable filters. IEEE Trans. Pattern Anal. Mach. Intell. 13(9), 891–906 (1991)CrossRefGoogle Scholar
 26.Guichard, F.: A morphological, affine, and Galilean invariant scalespace for movies. IEEE Trans. Image Process. 7(3), 444–456 (1998)CrossRefGoogle Scholar
 27.ter Haar Romeny, B. (ed.): GeometryDriven Diffusion in Computer Vision. Series in Mathematical Imaging and Vision. Springer, Berlin (1994)Google Scholar
 28.ter Haar Romeny, B., Florack, L., Nielsen, M.: Scaletime kernels and models. Proceedings of International Conference ScaleSpace and Morphology in Computer Vision (ScaleSpace’01). Lecture Notes in Computer Science. Springer, Vancouver (2001)Google Scholar
 29.Heeger, D.J.: Normalization of cell responses in cat striate cortex. Vis. Neurosci. 9, 181–197 (1992)CrossRefGoogle Scholar
 30.Heil, C.E., Walnut, D.F.: Continuous and discrete wavelet transforms. SIAM Rev. 31(4), 628–666 (1989)MathSciNetCrossRefzbMATHGoogle Scholar
 31.Hubel, D.H., Wiesel, T.N.: Receptive fields of single neurones in the cat’s striate cortex. J. Physiol. 147, 226–238 (1959)CrossRefGoogle Scholar
 32.Hubel, D.H., Wiesel, T.N.: Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. J. Physiol. 160, 106–154 (1962)CrossRefGoogle Scholar
 33.Hubel, D.H., Wiesel, T.N.: Brain and Visual Perception: The Story of a 25Year Collaboration. Oxford University Press, Oxford (2005)Google Scholar
 34.Iijima, T.: Observation theory of twodimensional visual patterns. Technical Report, Papers of Technical Group on Automata and Automatic Control, IECE, Japan (1962)Google Scholar
 35.Jhuang, H., Serre, T., Wolf, L., Poggio, T.: A biologically inspired system for action recognition. In: International Conference on Computer Vision (ICCV’07), pp. 1–8 (2007)Google Scholar
 36.Karlin, S.: Total Positivity. Stanford University Press, Stanford (1968)zbMATHGoogle Scholar
 37.Koch, C.: Biophysics of Computation: Information Processing in Single Neurons. Oxford University Press, Oxford (1999)Google Scholar
 38.Koenderink, J.J.: The structure of images. Biol. Cybern. 50, 363–370 (1984)MathSciNetCrossRefzbMATHGoogle Scholar
 39.Koenderink, J.J.: Scaletime. Biol. Cybern. 58, 159–162 (1988)MathSciNetCrossRefzbMATHGoogle Scholar
 40.Koenderink, J.J., van Doorn, A.J.: Receptive field families. Biol. Cybern. 63, 291–298 (1990)MathSciNetCrossRefzbMATHGoogle Scholar
 41.Koenderink, J.J., van Doorn, A.J.: Generic neighborhood operators. IEEE Trans. Pattern Anal. Mach. Intell. 14(6), 597–605 (1992)CrossRefGoogle Scholar
 42.Laptev, I., Caputo, B., Schuldt, C., Lindeberg, T.: Local velocityadapted motion events for spatiotemporal recognition. Comput. Vis. Image Underst. 108, 207–229 (2007)CrossRefGoogle Scholar
 43.Laptev, I., Lindeberg, T.: Local descriptors for spatiotemporal recognition. Proceedings of ECCV’04 Workshop on Spatial Coherence for Visual Motion Analysis. Lecture Notes in Computer Science, pp. 91–103. Springer, Prague (2004)Google Scholar
 44.Laptev, I., Lindeberg, T.: Velocityadapted spatiotemporal receptive fields for direct recognition of activities. Image Vis. Comput. 22(2), 105–116 (2004)CrossRefGoogle Scholar
 45.Lindeberg, T.: Scalespace for discrete signals. IEEE Trans. Pattern Anal. Mach. Intell. 12(3), 234–254 (1990)CrossRefGoogle Scholar
 46.Lindeberg, T.: Discrete derivative approximations with scalespace properties: a basis for lowlevel feature extraction. J. Math. Imaging Vis. 3(4), 349–376 (1993)CrossRefGoogle Scholar
 47.Lindeberg, T.: Effective scale: a natural unit for measuring scalespace lifetime. IEEE Trans. Pattern Anal. Mach. Intell. 15(10), 1068–1074 (1993)CrossRefGoogle Scholar
 48.Lindeberg, T.: ScaleSpace Theory in Computer Vision. Springer, Berlin (1993)zbMATHGoogle Scholar
 49.Lindeberg, T.: Scalespace theory: a basic tool for analysing structures at different scales. J. Appl. Stat. 21(2), 225–270 (1994). http://www.csc.kth.se/~tony/abstracts/Lin94SIabstract.html
 50.Lindeberg, T.: On the axiomatic foundations of linear scalespace. In: Sporring, J., Nielsen, M., Florack, L., Johansen, P. (eds.) Gaussian ScaleSpace Theory: Proceedings of PhD School on ScaleSpace Theory, pp. 75–97. Springer, Copenhagen (1996)Google Scholar
 51.Lindeberg, T.: Linear spatiotemporal scalespace. In: ter Haar Romeny, B.M., Florack, L.M.J., Koenderink, J.J., Viergever, M.A. (eds.) ScaleSpace Theory in Computer Vision: Proceedings of First International Conference ScaleSpace’97. Lecture Notes in Computer Science, vol. 1252, pp. 113–127. Springer, Utrecht (1997)CrossRefGoogle Scholar
 52.Lindeberg, T.: On automatic selection of temporal scales in timecasual scalespace. In: Sommer, G., Koenderink, J.J. (eds.) Proceedings of AFPAC’97: Algebraic Frames for the PerceptionAction Cycle. Lecture Notes in Computer Science, vol. 1315, pp. 94–113. Springer, Kiel (1997)CrossRefGoogle Scholar
 53.Lindeberg, T.: Feature detection with automatic scale selection. Int. J. Comput. Vis. 30(2), 77–116 (1998)Google Scholar
 54.Lindeberg, T.: Timerecursive velocityadapted spatiotemporal scalespace filters. In: Johansen, P. (ed.) Proceedings of European Conference on Computer Vision (ECCV 2002). Lecture Notes in Computer Science, vol. 2350, pp. 52–67. Springer, Copenhagen (2002)Google Scholar
 55.Lindeberg, T.: Scalespace. In: Wah, B. (ed.) Encyclopedia of Computer Science and Engineering, pp. 2495–2504. Wiley, Hoboken (2008)Google Scholar
 56.Lindeberg, T.: Generalized Gaussian scalespace axiomatics comprising linear scalespace, affine scalespace and spatiotemporal scalespace. J. Math. Imaging Vis. 40(1), 36–81 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
 57.Lindeberg, T.: A computational theory of visual receptive fields. Biol. Cybern. 107(6), 589–635 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
 58.Lindeberg, T.: Generalized axiomatic scalespace theory. In: Hawkes, P. (ed.) Advances in Imaging and Electron Physics, vol. 178, pp. 1–96. Elsevier, Amsterdam (2013)Google Scholar
 59.Lindeberg, T.: Invariance of visual operations at the level of receptive fields. PLOS One 8(7), e66,990 (2013)CrossRefGoogle Scholar
 60.Lindeberg, T.: Scale selection properties of generalized scalespace interest point detectors. J. Math. Imaging Vis. 46(2), 177–210 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
 61.Lindeberg, T.: Image matching using generalized scalespace interest points. J. Math. Imaging Vis. 52(1), 3–36 (2015)MathSciNetCrossRefzbMATHGoogle Scholar
 62.Lindeberg, T.: Separable timecausal and timerecursive spatiotemporal receptive fields. Proceedings of ScaleSpace and Variational Methods for Computer Vision (SSVM 2015). Lecture Notes in Computer Science, vol. 9087, pp. 90–102. Springer, Berlin (2015)Google Scholar
 63.Lindeberg, T.: Timecausal and timerecursive spatiotemporal receptive fields. Tech. Rep. (2015). Preprint arXiv:1504.02648
 64.Lindeberg, T., Akbarzadeh, A., Laptev, I.: Galileancorrected spatiotemporal interest operators. In: International Conference on Pattern Recognition, Cambridge, pp. I:57–62 (2004)Google Scholar
 65.Lindeberg, T., Bretzner, L.: Realtime scale selection in hybrid multiscale representations. In: Griffin, L., Lillholm, M. (eds.) Proceedings of ScaleSpace Methods in Computer Vision (ScaleSpace’03). Lecture Notes in Computer Science, vol. 2695, pp. 148–163. Springer, Isle of Skye (2003)CrossRefGoogle Scholar
 66.Lindeberg, T., Fagerström, D.: Scalespace with causal time direction. Proceedings of European Conference on Computer Vision (ECCV’96). Lecture Notes in Computer Science, vol. 1064, pp. 229–240. Springer, Cambridge (1996)Google Scholar
 67.Lindeberg, T., Friberg, A.: Idealized computational models of auditory receptive fields. PLOS One 10(3), e0119,032:1–e0119,032:58 (2015)CrossRefGoogle Scholar
 68.Lindeberg, T., Friberg, A.: Scalespace theory for auditory signals. Proceedings of ScaleSpace and Variational Methods for Computer Vision (SSVM 2015). Lecture Notes in Computer Science, vol. 9087, pp. 3–15. Springer, Berlin (2015)Google Scholar
 69.Lindeberg, T., Gårding, J.: Shapeadapted smoothing in estimation of 3D depth cues from affine distortions of local 2D structure. Image Vis. Comput. 15, 415–434 (1997)CrossRefGoogle Scholar
 70.Mahmoudi, S.: Linear neural circuitry model for visual receptive fields. Tech. Rep. (2015). Preprint at http://eprints.soton.ac.uk/375838/
 71.Mallat, S.G.: A Wavelet Tour of Signal Processing. Academic Press, London (1999)zbMATHGoogle Scholar
 72.Mandelbrot, B.B.: The Fractal Geometry of Nature. W H Freeman and Co, San Francisco (1982)zbMATHGoogle Scholar
 73.Mattia, M., Guidice, P.D.: Population dynamics of interacting spiking neurons. Phys. Rev. E 66(5), 051,917 (2002)MathSciNetCrossRefGoogle Scholar
 74.Miao, X., Rao, R.P.N.: Learning the Lie group of visual invariance. Neural Comput. 19, 2665–2693 (2007)MathSciNetCrossRefzbMATHGoogle Scholar
 75.Misiti, M., Misiti, Y., Oppenheim, G., Poggi, J.M. (eds.): Wavelets and Their Applications. ISTE Ltd., London (2007)Google Scholar
 76.Omurtag, A., Knight, B.W., Sirovich, L.: On the simulation of large populations of neurons. J. Comput. Neurosci. 8, 51–63 (2000)CrossRefzbMATHGoogle Scholar
 77.Pauwels, E.J., Fiddelaers, P., Moons, T., van Gool, L.J.: An extended class of scaleinvariant and recursive scalespace filters. IEEE Trans. Pattern Anal. Mach. Intell. 17(7), 691–701 (1995)CrossRefGoogle Scholar
 78.Perona, P.: Steerablescalable kernels for edge detection and junction analysis. Image Vis. Comput. 10, 663–672 (1992)CrossRefGoogle Scholar
 79.RiveroMoreno, C.J., Bres, S.: Spatiotemporal primitive extraction using Hermite and Laguerre filters for early vision video indexing. In: Image Analysis and Recognition. Lecture Notes in Computer Science, vol. 3211, pp. 825–832. Springer (2004)Google Scholar
 80.Sato, K.I.: Lévy Processes and Infinitely Divisible Distributions, Cambridge Studies in Advanced Mathematics. Cambridge University Press, Cambridge (1999)Google Scholar
 81.Schoenberg, I.J.: Über Variationsvermindernde Lineare Transformationen. Mathematische Zeitschrift 32, 321–328 (1930)MathSciNetCrossRefzbMATHGoogle Scholar
 82.Schoenberg, I.J.: Contributions to the problem of approximation of equidistant data by analytic functions. Q. Appl. Math. 4, 45–99 (1946)MathSciNetGoogle Scholar
 83.Schoenberg, I.J.: On totally positive functions, Laplace integrals and entire functions of the LaguerrePòlyaSchur type. Proc. Natl. Acad. Sci. 33, 11–17 (1947)MathSciNetCrossRefzbMATHGoogle Scholar
 84.Schoenberg, I.J.: Some analytical aspects of the problem of smoothing. Courant Anniversary Volume, Studies and Essays, pp. 351–370. Interscience, New York (1948)Google Scholar
 85.Schoenberg, I.J.: On Pòlya frequency functions. ii. Variationdiminishing integral operators of the convolution type. Acta Sci. Math. 12, 97–106 (1950)MathSciNetzbMATHGoogle Scholar
 86.Schoenberg, I.J.: On smoothing operations and their generating functions. Bull. Am. Math. Soc. 59, 199–230 (1953)MathSciNetCrossRefzbMATHGoogle Scholar
 87.Schoenberg, I.J.: I. J. Schoenberg Selected Papers, vol. 2. Springer, Berlin (1988). Edited by C. de BoorGoogle Scholar
 88.Shabani, A.H., Clausi, D.A., Zelek, J.S.: Improved spatiotemporal salient feature detection for action recognition. In: British Machine Vision Conference (BMVC’11), pp. 1–12. Dundee, UK (2011)Google Scholar
 89.Simoncelli, E.P., Freeman, W.T., Adelson, E.H., Heeger, D.J.: Shiftable multiscale transforms. IEEE Trans. Inf. Theory 38(2), 587–607 (1992)MathSciNetCrossRefGoogle Scholar
 90.Tschirsich, M., Kuijper, A.: Notes on discrete Gaussian scale space. J. Math. Imaging Vis. 51, 106–123 (2015)MathSciNetCrossRefzbMATHGoogle Scholar
 91.Sharma, U., Duits, R.: Leftinvariant evolutions of wavelet transforms on the similitude group. Appl. Comput. Harmon. Anal. 39(1), 110–137 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
 92.Valois, R.L.D., Cottaris, N.P., Mahon, L.E., Elfer, S.D., Wilson, J.A.: Spatial and temporal receptive fields of geniculate and cortical cells and directional selectivity. Vis. Res. 40(2), 3685–3702 (2000)CrossRefGoogle Scholar
 93.Weickert, J.: Anisotropic Diffusion in Image Processing. TeubnerVerlag, Stuttgart (1998)zbMATHGoogle Scholar
 94.Weickert, J., Ishikawa, S., Imiya, A.: On the history of Gaussian scalespace axiomatics. In: Sporring, J., Nielsen, M., Florack, L., Johansen, P. (eds.) Gaussian ScaleSpace Theory: Proceedings of PhD School on ScaleSpace Theory, pp. 45–59. Springer, Copenhagen (1997)Google Scholar
 95.Weickert, J., Ishikawa, S., Imiya, A.: Linear scalespace has first been proposed in Japan. J. Math. Imaging Vis. 10(3), 237–252 (1999)MathSciNetCrossRefzbMATHGoogle Scholar
 96.Willems, G., Tuytelaars, T., van Gool, L.: An efficient dense and scaleinvariant spatiotemporal interest point detector. Proceedings of European Conference on Computer Vision (ECCV 2008). Lecture Notes in Computer Science, vol. 5303, pp. 650–663. Springer, Marseille (2008)Google Scholar
 97.Witkin, A.P.: Scalespace filtering. In: Proceedings of 8th International Joint Conference Artificial Intelligence, pp. 1019–1022. Karlsruhe, Germany (1983)Google Scholar
 98.Yuille, A.L., Poggio, T.A.: Scaling theorems for zerocrossings. IEEE Trans. Pattern Anal. Mach. Intell. 8, 15–25 (1986)CrossRefzbMATHGoogle Scholar
 99.ZelnikManor, L., Irani, M.: Eventbased analysis of video. In: Proceedings of Computer Vision and Pattern Recognition, pp. II:123–130. Kauai Marriott, Hawaii (2001)Google Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.