Discrete approximations of Gaussian smoothing and Gaussian derivatives

This paper develops an in-depth treatment concerning the problem of approximating the Gaussian smoothing and Gaussian derivative computations in scale-space theory for application on discrete data. With close connections to previous axiomatic treatments of continuous and discrete scale-space theory, we consider three main ways discretizing these scale-space operations in terms of explicit discrete convolutions, based on either (i) sampling the Gaussian kernels and the Gaussian derivative kernels, (ii) locally integrating the Gaussian kernels and the Gaussian derivative kernels over each pixel support region and (iii) basing the scale-space analysis on the discrete analogue of the Gaussian kernel, and then computing derivative approximations by applying small-support central difference operators to the spatially smoothed image data. We study the properties of these three main discretization methods both theoretically and experimentally, and characterize their performance by quantitative measures, including the results they give rise to with respect to the task of scale selection, investigated for four different use cases, and with emphasis on the behaviour at fine scales. The results show that the sampled Gaussian kernels and derivatives as well as the integrated Gaussian kernels and derivatives perform very poorly at very fine scales. At very fine scales, the discrete analogue of the Gaussian kernel with its corresponding discrete derivative approximations performs substantially better. The sampled Gaussian kernel and the sampled Gaussian derivatives do, on the other hand, lead to numerically very good approximations of the corresponding continuous results, when the scale parameter is sufficiently large, in the experiments presented in the paper, when the scale parameter is greater than a value of about 1, in units of the grid spacing.


Introduction
When operating on image data, the earliest layers of image operations are usually expressed in terms of receptive fields, which means that the image information is integrated over local support regions in image space.For modelling such operations, the notion of scale-space theory (Iijima 1962;Witkin 1983;Koenderink 1984;Koenderink andvan Doorn 1987, 1992;Lindeberg 1993aLindeberg , 1994Lindeberg , 2011;;Florack 1997;Sporring et al. 1997;Weickert et al. 1999;ter Haar Romeny 2003) stands out as a principled theory, by which the shapes of the receptive fields can be determined from axiomatic derivations, that reflect desirable theoretical properties of the first stages of visual operations.
The theory for the notion of scale-space representation does, however, mainly concern continuous image data, while implementations of this theory on digital computers requires a discretization over image space.The subject of this article is to describe and compare a number of basic approaches for discretizing the Gaussian convolution operation, as well as convolutions with Gaussian derivatives.
While one could possibly argue that at sufficiently coarse scales, where sampling effects ought to be small, the influence of choosing one form of discrete implementation compared to some other ought to be negligible, or at least of minor effect, there are situations where it is desirable to apply scale-space operations at rather fine scales, and then also to be reasonably sure that one would obtain desirable response properties of the receptive fields.
One such domain, and which motivates the present deeper study of discretization effects for Gaussian smoothing operations and Gaussian derivative computations at fine scales, is when applying Gaussian derivative operations in deep networks, as done in a recently developed subdomain of deep learning (Jacobsen et al. 2016, Lindeberg 2021c, 2022, Pintea et al. 2021, Sangalli et al. 2022, Penaud-Polge et al. 2022, Gavilima-Pilataxi and Ibarra-Fiallo 2023).
A practical observation, that one may make, when working with deep learning, is that deep networks may tend to have a preference to computing image representations at very fine scale levels.For example, empirical results indicate that deep networks often tend to perform image classification based on very fine-scale image information, corresponding to the local image texture on the surfaces of objects in the world.Indirect support for such a view may also be taken from the now well-established fact that deep networks may be very sensitive to adversarial perturbations, based on adding deliberately designed noise patterns of very low amplitude to the image data (Szegedy et al. 2013, Moosavi-Dezfooli et al. 2017, Athalye et al. 2018, Baker et al. 2018, Geirhos et al. 2018, Hermann et ak. 2020, Hendrycks et ak. 2021, Veerabadran et al. 2023).That observation demonstrates that deep networks may be very strongly influenced by fine-scale structures in the input image.Another observation may be taken from working with deep networks based on using Gaussian derivative kernels as the filter weights.If one designs such a network with complementary training of the scale levels for the Gaussian derivatives, then a common result is that the network will prefer to base its decisions based on receptive fields at rather fine scale levels.
When to implement such Gaussian derivative networks in practice, one hence faces the need for being able to go below the rule of thumb for classical computer vision, of not attempting to operate below a certain scale threshold, where the standard deviation of the Gaussian derivative kernel should then not be below a value of say 1/ √ 2 or 1, in units of the grid spacing.
From a viewpoint of theoretical signal processing, one may possibly take the view to argue that one should use the sampling theorem to express a lower bound on the scale level, which one should then never go below.For regular images, as obtained from digital cameras, or already acquired data sets as compiled by the computer vision community, such an approach based on the sampling theorem, is, however, not fully possible in practice.First of all, we almost never, or at least very rarely, have explicit information about the sensor characteristics of the image sensor.Secondly, it would hardly be possible to model the imaging process in terms of an ideal bandlimited filter with frequency characteristics near the spatial sampling density of the image sensor.Applying an ideal bandpass filter to an already given digital image may lead to ringing phenomena near the discontinuities in the image data, which will lead to far worse artefacts for spatial image data than for e.g.signal transmission over information carriers in terms of sine waves.
Thus, the practical problem that one faces when designing and applying a Gaussian derivative network to image data, is to in a practically feasible manner express a spatial smoothing process, that can smooth a given digital input image for any fine scale of the discrete approximation to a Gaussian derivative filter.A theoretical problem, that then arises, concerns how to design such a process, so that it can operate from very fine scale levels, possibly starting even at scale level zero corresponding to the original input data, without leading to severe discretization artefacts.
A further technical problem that arises is that, even if one would take the a priori view of basing the implementation on the purely discrete theory for scale-space smoothing and scale-space derivatives developed in (Lindeberg 1990(Lindeberg , 1993b)), and as we have taken in our previous work on Gaussian derivative networks (Lindeberg 2021c(Lindeberg , 2022)), one then faces the problem of handling the special mathematical functions used as smoothing primitives in this theory (the modified Bessel functions of integer order) when propagating gradients for training deep networks backwards by automatic differentiation, when performing learning of the scale levels in the network.These necessary mathematical primitives do not exist as built-in functions in e.g.PyTorch (Paszke et al. 2017), which implies that the user then would have to im-plement a PyTorch interface for these functions himself, or choose some other type of discretization method, if aiming to learn the scale levels in the Gaussian derivative networks by back propagation.There are a few related studies of discretizations of scale-space operations (Wang 1999, Lim and Stiehl 2003, Tschirsich and Kuijper 2015, Slavík and Stehlík 2015, Rey-Otero and Delbracio 2016).These do not, however, answer the questions that need to be addressed for the intended use cases for our developments.Wang (1999) proposed pyramid-like algorithms for computing multi-scale differential operators using a spline technique, however, then taking rather coarse steps in the scale direction.Lim and Stiehl (2003) studied properties of discrete scale-space representations under discrete iterations in the scale direction, based on Euler's forward method.For our purpose, we do, however, need to consider the scale direction as a continuum.Tschirsich and Kuijper (2015) investigated the compatibility of topological image descriptors with a discrete scale-space representations, and did also derive an eigenvalue decomposition in relation to the semidiscrete diffusion equation, that determines the evolution properties over scale, to enable efficient computations of discrete scale-space representations of the same image at multiple scales.With respect to our target application area, we are, however, more interested in computing image features based on Gaussian derivative responses, and then mostly also computing discrete scale-space representation at a single scale only, for each input image.Slavík and Stehlík (2015) developed a theory for more general evolution equations over semi-discrete domains, which incorporates the 1-D discrete scale-space evolution family, that we consider here, as corresponding to convolutions with the discrete analogue of the Gaussian kernel, as a special case.For our purposes, we are, however, more interested in performing an in-depth study of different discrete approximations of the axiomatically determined class of Gaussian smoothing operations and Gaussian derivative operators, than expanding the treatment to other possible evolution equations over discrete spatial domains.
Rey-Otero and Delbracio (2016) assumed that the image data can be regarded as bandlimited, and did then use a Fourier-based approach for performing closed-form Gaussian convolution over a reconstructed Fourier basis, which in that way constitutes a way to eliminate the discretization errors, provided that a correct reconstruction of an underlying continuous image, notably before the image acquisition step, can be performed.1A very closely related approach, for computing Gaussian convolutions based on a reconstruction of an assumed-to-be bandlimited signal, has also been previously outlined by Åström and Heyden (1997).
As argued earlier in this introduction, the image data that is processed in computer vision are, however, not generally accompanied with characteristic information regarding the image acquisition process, specifically not with regard to what extent the image data could be regarded as bandlimited.Furthermore, one could question if the image data obtained from a modern camera sensor could at all be modelled as bandlimited, for a cutoff frequency very near the resolution of the image.Additionally, with regard to our target application domain of deep learning, one could also question if it would be manageable to invoke a Fourier-based image reconstruction step for each convolution operation in a deep network.In the work to be developed here, we are, on the other hand, more interested in developing a theory for discretizing the Gaussian smoothing and the Gaussian derivative operations at very fine levels of scale, in terms of explicit convolution operations, and based on as minimal as possible assumptions regarding the nature of the image data.
The purpose of this article is thus to perform a detailed theoretical analysis of the properties of different discretizations of the Gaussian smoothing operation and Gaussian derivative computations at any scale, and with emphasis on reaching as near as possible to the desirable theoretical properties of the underlying scale-space representation, to hold also at very fine scales for the discrete implementation.
For performing such analysis, we will consider basic approaches for discretizing the Gaussian kernel in terms of either pure spatial sampling (the sampled Gaussian kernel) or local integration over each pixel support region (the integrated Gaussian kernel) and compare to the results of a genuinely discrete scale-space theory (the discrete analogue of the Gaussian kernel), see Figure 1 for graphs of such kernels.dAfter analysing and numerically quantifying the properties of these basic types of discretizations, we will then extend the analysis to discretizations of Gaussian derivatives in terms of either sampled Gaussian derivatives, integrated Gaussian derivatives, and compare to the results of a genuinely discrete theory based on convolutions with the discrete analogue of the Gaussian kernel followed by discrete derivative approximations computed by applying small-support central difference operators to the discrete scale-space representation.We will also extend the analysis to the computation of local directional derivatives, as a basis for filter-bank approaches for receptive fields, based on either the scalespace representation generated by convolution with rotationally symmetric Gaussian kernels, or the affine Gaussian scale space.
It will be shown that, with regard to the topic of raw Gaussian smoothing, the discrete analogue of the Gaussian kernel has the best theoretical properties, out of the discretization methods considered.For scale values when the standard deviation of the continuous Gaussian kernel is above 0.75 or 1, the sampled Gaussian kernel does also have very good properties, and leads to very good approximations of the corresponding fully continuous results.The integrated Gaussian kernel is better at handling fine scale levels than the sampled Gaussian kernel, but does, however, comprise a scale offset that hampers its accuracy in approximating the underlying continuous theory.
Concerning the topic of approximating the computation of Gaussian derivative responses, it will be shown that the approach based on convolution with the discrete analogue of the Gaussian kernel followed by central difference operations has the clearly best properties at fine scales, out of the studied three main approaches.In fact, when the standard deviation of the underlying continuous Gaussian kernel is a bit below about 0.75, the sampled Gaussian derivative kernels and the integrated Gaussian derivative kernels do not lead to accurate numerical estimates of derivatives, when applied to monomials of the same order as the order of spatial differentiation, or lower.Over an intermediate scale range in the upper part of this scale interval, the integrated Gaussian derivative kernels do, however, have somewhat better properties than the sampled Gaussian derivative kernels.For the discrete approximations of Gaussian derivatives defined from convolutions with the discrete analogue of the Gaussian kernel followed by central differences, the numerical estimates of derivatives obtained by applying this approach to monomials of the same order as the order of spatial differentiation do, on the other hand, lead to derivative estimates exactly equal to their continuous counterparts, and also over the entire scale range.
For larger scale values, for standard deviations greater than about 1, relative to the grid spacing, in the experiments to be reported in the paper, the discrete approximations of Gaussian derivatives obtained from convolutions with sampled Gaussian derivatives do on the other hand lead to numerically very accurate approximations of the corresponding results obtained from the purely continuous scale-space theory.For the discrete derivative approximations obtained by convolutions with the integrated Gaussian derivatives, the box integration introduces a scale offset, that hampers the accuracy of the approximation of the corresponding expressions obtained from the fully continuous scale-space theory.The integrated Gaussian derivative kernels do, however, degenerate less seriously than the sampled Gaussian derivative kernels within a certain range of very fine scales.Therefore, they may constitute an interesting alternative, if the mathematical primitives needed for the discrete analogues of the Gaussian derivative are not fully available within a given system for programming deep networks.
For simplicity, we do in this treatment restrict ourselves to image operations that operate in terms of discrete convolutions only.In this respect, we do not consider implementations in terms of Fourier transforms, which are also possible, while less straightforward in the context of deep learning.
We do furthermore not consider extensions to spatial interpolation operations, which operate between the positions of the image pixels, and which can be highly useful, for example, for locating the positions of image features with subpixel accuracy (Unser et al. 1991, 1993, Wang and Lee 1998, Bouma et al. 2007, Bekkers 2020, Zheng et al. 2022).We do additionally not consider representations that perform subsamplings at coarser scales, which can be useful for reducing the amount of computational work (Burt and Adelson 1983, Crowley 1984, Simoncelli et al. 1992, Simoncelli and Freeman 1995, Lindeberg and Bretzner 2003, Crowley and Riff 2003, Lowe 2004), or representations that aim at speeding up the spatial convolutions on serial computers based on performing the computations in terms of spatial recursive filters (Deriche 1992, Young and van Vliet 1995, van Vliet et al. 1998, Geusebroek et al. 2003, Farnebäck and Westin 2006, Charalampidis 2016).For simplicity, we develop the theory for the special cases of 1-D signals or 2-D images, while extensions to higher-dimensional volumetric images is straightforward, as implied by separable convolutions for the scale-space concept based on convolutions with rotationally symmetric Gaussian kernels.
Concerning experimental evaluations, we do in this paper deliberately focus on and restrict ourselves to the theoretical properties of different discretization methods, and only report performance measures based on such theoretical properties.One motivation for this approach is that the integration with different types of visual modules may call for different relative properties of the discretization methods.We therefore want this treatment to be timeless, and not biased to the integration with particular computer vision methods or algorithms that operate on the output from Gaussian smoothing operations or Gaussian derivatives.Experimental evaluations with regard to Gaussian derivative networks will be reported in follow-up work.The results from this theoretical analysis should therefore be more generally applicable to larger variety of approaches in classical computer vision, as well as to other deep learning approaches that involve Gaussian derivative operators.

Discrete approximations of Gaussian smoothing
The Gaussian scale-space representation L(x, y; s) of a 2-D spatial image f (x, y) is defined by convolution with 2-D Gaussian kernels of different sizes (1) according to (Iijima 1962, Witkin 1983, Koenderink 1984, Lindeberg 1993a, 2011, Florack 1997, Weickert et al. 1999, ter Haar Romeny et al. 2003) (2) Equivalently, this scale-space representation can be seen as defined by the solution of the 2-D diffusion equation with initial condition L(x, y; 0) = f (x, y).

Theoretical properties of Gaussian scale-space representation
2.1.1Non-creation of new structure with increasing scale The Gaussian scale space, generated by convolving an image with Gaussian kernels, obeys a number of special properties, that ensure that the transformation from any finer scale level to any coarser scale level is guaranteed to always correspond to a simplification of the image information: -Non-creation of local extrema: For any one-dimensional signal f , it can be shown that the number of local extrema in the 1-D Gaussian scale-space representation at any coarser scale s 2 is guaranteed to not be higher than the number of local extrema at any finer scale s 1 < s 2 .-Non-enhancement of local extrema: For any N -dimensional signal, it can be shown that the derivative of the scale-space representation with respect to the scale parameter ∂ s L is guaranteed to obey ∂ s L ≤ 0 at any local spatial maximum point and ∂ s L ≥ 0 at any local spatial minimum point.In this respect, the Gaussian convolution operation has a strong smoothing effect.
In fact, the Gaussian kernel can be singled out as the unique choice of smoothing kernel as having these properties, from axiomatic derivations, if combined with the requirement of a semi-group property over scales and certain regularity assumptions, see Theorem 5 in (Lindeberg 1990), Theorem 3.25 in (Lindeberg 1993a) and Theorem 5 in (Lindeberg 2011) for more specific statements.

Cascade smoothing property
Due to the semi-group property, it follows that the scalespace representation at any coarser scale L(x, y; s 2 ) can be obtained by convolving the scale-space representation at any finer scale L(x, y; s 1 ) with a Gaussian kernel parameterized by the scale difference s 2 − s 1 : (5) This form of cascade smoothing property is an essential property of a scale-space representation, since it implies that the transformation from any finer scale level s 1 to any coarser scale level s 2 will always be a simplifying transformation, provided that the convolution kernel used for the cascade smoothing operation corresponds to a simplifying transformation.

Spatial averaging
The Gaussian kernel is non-negative and normalized to unit L 1 -norm In these respects, Gaussian smoothing corresponds to a spatial averaging process, which constitutes one of the desirable attributes of a smoothing process intended to reflect different spatial scales in image data.

Separable Gaussian convolution
Due to the separability of the 2-D Gaussian kernel g 2D (x, y; s) = g(x; s) g(y; s), where the 1-D Gaussian kernel is of the form the 2-D Gaussian convolution operation (2) can also be written as two separable 1-D convolutions of the form Methods that implement Gaussian convolution in terms of explicit discrete convolutions usually exploit this separability property, since if the Gaussian kernel is truncated 2 at the tails for x = ±N , the computational work for separable convolution will be of the order per image pixel, whereas it would be of order for non-separable 2-D convolution.

Modelling situation for theoretical analysis of different approaches for implementing Gaussian smoothing discretely
From now on, we will, for simplicity, only consider the case with 1-D Gaussian convolutions of the form which are to be implemented in terms of discrete convolutions of the form for some family of discrete filter kernels T (n; s).

Measures of the spatial extent of smoothing kernels
The spatial extent of these 1-D kernels can be described by the scale parameter s, which represents the spatial variance of the convolution kernel and which can also be parameterized in terms of the standard deviation For the discrete kernels, the spatial variance is correspondingly measured as

The sampled Gaussian kernel
The presumably simplest approach for discretizing the 1-D Gaussian convolution integral (13) in terms of a discrete convolution of the form ( 14), is by choosing the discrete kernel T (n; s) as the sampled Gaussian kernel While this choice is easy to implement in practice, there are, however, three major conceptual problems with using such a discretization at very fine scales: the filter coefficients may not be limited to the interval [0, 1], the sum of the filter coefficients may become substantially greater than 1, and the resulting filter kernel may have too narrow shape, in the sense that the spatial variance of the discrete kernel V (T sampl (•; s)) is substantially smaller than the spatial variance V (g(•; s)) of the continuous Gaussian kernel.
The first two problems imply that the resulting discrete spatial smoothing kernel is no longer a spatial weighted averaging kernel in the sense of Section 2.1.3,which implies problems, if attempting to interpret the result of convolutions with the sampled Gaussian kernels as reflecting different spatial scales.The third problem implies that there will not be a direct match between the value of the scale parameter provided as argument to the sampled Gaussian kernel and the scales that the discrete kernel would reflect in the image data.
Figures 2 and 3 show numerical characterizations of these entities for a range of small values of the scale parameter.
More fundamentally, it can be shown (see Section VII.A in Lindeberg 1990) that convolution with the sampled Gaussian kernel is guaranteed to not increase the number of local extrema (or zero-crossings) in the signal from the input signal to any coarser level of scale.The transformation from an arbitrary scale level to some other arbitrary coarser scale level is, however, not guaranteed to obey such a simplification property between any pair of scale levels.In this sense, convolutions with sampled Gaussian kernels do not truly obey non-creation of local extrema from finer to coarser levels of scale, in the sense described in Section 2.1.1.

The normalized sampled Gaussian kernel
A straightforward, but ad hoc, way of avoiding the problems that the discrete filter coefficients may, for small values of the scale parameter have their sum exceed 1, is by normalizing the sampled Gaussian kernel with its discrete l 1 -norm: Continuous Gauss Sampled Gauss Integrated Gauss Discrete Gauss By definition, we in this way avoid this problems that the regular sampled Gaussian kernel is not spatial weighted averaging kernel in the sense of Section 2.1.3.
The problem that the spatial variance of the discrete kernel V (T normsampl (•; s)) is substantially smaller that the spatial variance V (g(•; s)) of the continuous Gaussian kernel, will, however, persist, since the variance of a kernel is not affected by a uniform scaling of its amplitude values.In this sense, the resulting discrete kernels will not for small scale values accurately reflect the spatial scale corresponding to the scale argument, as specified by the scale parameter s.

The integrated Gaussian kernel
A possibly better way of enforcing the weights of the filter kernels to sum up to 1, is by instead letting the discrete kernel be determined by the integral of the continuous Gaussian kernel over each pixel support region (Lindeberg 1993a Equation (3.89)) which in terms of the scaled error function erg(x; s) can be expressed as where erf(x) denotes the regular error function according to A conceptual argument for defining the integrated Gaussian kernel model is that, we may, given a discrete signal f (n), define a continuous signal f (x), by letting the values of the signal in each pixel support region be equal to the value of the corresponding discrete signal, see Appendix A.2 for an explicit derivation.In this sense, there is a possible physical motivation for using this form of scale-space discretization.By the continuous Gaussian kernel having its integral equal to 1, it follows that the sum of the discrete filter coefficients will over an infinite spatial domain also be exactly equal to 1. Furthermore, the discrete filter coefficients are also guaranteed to be in the interval [0, 1].In these respects, the resulting discrete kernels will represent a true spatial weighting process, in the sense of Section 2.1.3.
Concerning the spatial variances V (T int (•; s)) of the resulting discrete kernels, they will also for smaller scale values be closer to the spatial variances V (g(•; s)) of the continuous Gaussian kernel, than for the sampled Gaussian kernel or the normalized sampled Gaussian kernel, as shown in Figures 3 and 4. For larger scale values, the box integration over each pixel support region, will, on the other hand, however, introduce a scale offset, which for larger values of the scale parameter s approaches which, in turn, corresponds to the spatial variance of a continuous box filter over each pixel support region, defined by and which is used for defining the integrated Gaussian kernel from the continuous Gaussian kernel in (20).Figure 3 shows a numerical characterization of the difference in scale values between the variance V (T int (n; s)) of the discrete integrated Gaussian kernel and the scale parameter s provided as argument to this function.
In terms of theoretical scale-space properties, it can be shown that the transformation from the input signal to any coarse scale always implies a simplification, in the sense that the number of local extrema (or zero-crossings) at any coarser level of scale is guaranteed to not exceed the number of local extrema (or zero-crossings) in the input signal (see Section 3.6.3 in Lindeberg 1993a).The transformation from any finer scale level to any coarser scale level will, however, not be guaranteed to obey such a simplification property.In this respect, the integrated Gaussian kernel does not fully represent a discrete scale-space transformation, in the sense of Section 2.1.1.

The discrete analogue of the Gaussian kernel
According to a genuinely discrete theory for spatial scalespace representation in Lindeberg (1990), the discrete scale space is defined from discrete kernels of the form where I n (s) denote the modified Bessel functions of integer order (see Abramowitz and Stegun 1964), which are related to the regular Bessel functions J n (z) of the first kind according to and which for integer values of n, as we will restrict ourselves to here, can be expressed as The discrete analogue of the Gaussian kernel T disc (n; s) does specifically have the practically useful properties that: the filter coefficients are guaranteed to be in the interval [0, 1], the filter coefficients sum up to 1 (see Equation (3.43) the spatial variance of the discrete kernel is exactly equal to the scale parameter (see Equation (3.53) in Lindeberg 1993a) These kernels do also exactly obey a semi-group property over spatial scales (see Equation (3.41) in Lindeberg 1993a) which implies that the resulting discrete scale-space representation also obeys an exact cascade smoothing property More fundamentally, these discrete kernels do furthermore preserve scale-space properties to the discrete domain, in the sense that: the number of local extrema (or zero-crossings) at a coarser scale is guaranteed to not exceed the number of local extrema (or zero-crossings) at any finer scale, the resulting discrete scale-space representation is guaranteed to obey non-enhancement of local extrema, in the sense that the value at any local maximum is guaranteed to not increase with increasing scale, and that the value at any local minimum is guaranteed to not decrease with increasing scale.
In these respects, the discrete analogue of the Gaussian kernel obeys all the desirable theoretical properties of a discrete scale-space representation, corresponding to discrete analogues of the theoretical properties of the Gaussian scalespace representation stated in Section 2.1.Specifically, the theoretical properties of the discrete analogue of the Gaussian kernel are better than the theoretical properties of the sampled Gaussian kernel, the normalized sampled Gaussian kernel or the integrated Gaussian kernel.

Diffusion equation interpretation of the genuinely discrete scale-space representation concept
In terms of diffusion equations, the discrete scale-space representation generated by convolving a 1-D discrete signal f by the discrete analogue of the Gaussian kernel according to (26) satisfies the semi-discrete 1-D diffusion equation (Lindeberg 1993a Theorem 3.28) with initial condition L(x; 0) = f (x), where δ xx denotes the second-order discrete difference operator Over a 2-D discrete spatial domain, the discrete scale-space representation of an image f (x, y), generated by separable convolution with the discrete analogue of the Gaussian kernel satisfies the semi-discrete 2-D diffusion equation (Lindeberg 1993a Proposition 4.14) with initial condition L(x, y; 0) = f (x, y), where ∇ 2 5 denotes the following discrete approximation of the Laplacian operator In this respect, the discrete scale-space representation generated by convolution with the discrete analogue of the Gaussian kernel can be seen as a purely spatial discretization of the continuous diffusion equation ( 3), which can serve as an equivalent way of defining the continuous scale-space representation.

Performance measures for quantifying deviations from theoretical properties of discretizations of Gaussian kernels
To quantify the deviations between properties of the discrete kernels, and desirable properties of the discrete kernels that are to transfer the desirable properties of a continuous scalespace representation to a corresponding discrete implementation, we will in this section quantity such deviations in terms of the following error measures: -Normalization error: The difference between the l 1norm of the discrete kernels and the desirable unit l 1norm normalization will be measured by 3 -Absolute scale difference: The difference between the variance of the discrete kernel and the argument of the scale parameter will be measured by This error measure is expressed in absolute units of the scale parameter.The reason, why we express this measure in units of the variance of the discretizations of the Gaussian kernel, is that variances are additive under convolutions of non-negative kernels.
3 When implementing this operation in practice, the infinite sum is replaced by a finite sum Enorm(T ( for a small ϵ chosen as 10 −8 , according to Footnote 2.
-Relative scale difference: The relative scale difference, between the actual standard deviation of the discrete kernel and the argument of the scale parameter, will be measured by This error measure is expressed in relative units of the scale parameter. 4The reason, why we express this entity in units of the standard deviations of the discretizations of the Gaussian kernels, is that these standard deviations correspond to interpretations of the scale parameter in units of [length], in a way that is thus proportional to the scale level.
-Cascade smoothing error: The deviation from the cascade smoothing property of a scale-space kernel according to (5) and the actual result of convolving a discrete approximation of the scale-space representation at a given scale s, with its corresponding discretization of the Gaussian kernel, will be measured by While this measure of cascade smoothing error could in principle instead be formulated for arbitrary relations between the scale level of the discrete approximation of the scale-space representation and the amount of additive spatial smoothing, we fix these scale levels to be equal for the purpose of conceptual simplicity.5 In the ideal theoretical case, all of these error measures should be equal to zero (up to numerical errors in the discrete computations).Any deviations from zero of these error measures do therefore represent a quantification of deviations from desirable theoretical properties in a discrete approximation of the Gaussian smoothing operation.

Numerical quantifications of performance measures
In the following, we will show results of computing the above measures concerning desirable properties of discretizations  40) and in units of the spatial variance V (T (•; s)), for the discrete analogue of the Gaussian kernel, the sampled Gaussian kernel and the integrated Gaussian kernel.This scale difference is exactly equal to zero for the discrete analogue of the Gaussian kernel.For scale values σ < 0.75, the absolute scale difference is substantial for the sampled Gaussian kernel, and then rapidly tends to zero for larger scales.For the integrated Gaussian kernel, the absolute scale difference does, however, not approach zero with increasing scale.Instead, it approaches the numerical value ∆s ≈ 0.0833, close to the spatial variance 1/12 of a box filter over each pixel support region.The spatial variance-based absolute scale difference for the normalized sampled Gaussian kernel is equal to the spatial variance-based absolute scale difference for the regular sampled Gaussian kernel.(Horizontal axis: Scale parameter in delimiting the scale parameter to the lower bound of σ ≥ 0.1 is to avoid the singularity at σ = 0.)

Normalization error
Figure 2 shows graphs of the l 1 -norm-based normalization error E norm (T (•; s)) according to (39) for the main classes of discretizations of Gaussian kernels.For the integrated Gaussian kernel, the discrete analogue of the Gaussian kernel and the normalized sampled Gaussian kernel, the normalization error is identically equal to zero.For σ ≤ 0.5, the normalization error is, however, substantial for the regular sampled Gaussian kernel.

Standard deviations of the discrete kernels
Figure 3 shows graphs of the standard deviations V (T (•; s)) for the different main types of discretizations of the Gaussian kernels, which constitutes a natural measure of their spatial extent.For the discrete analogue of the Gaussian kernel, the standard deviation of the discrete kernel is exactly equal to the value of the scale parameter in units of σ = √ s.For the sampled Gaussian kernel, the standard deviation is substantially lower than the value of the scale parameter in  41) and in units of the spatial standard deviation of the discrete kernels, for the discrete analogue of the Gaussian kernel, the sampled Gaussian kernel and the integrated Gaussian kernel.This relative scale error is exactly equal to zero for the discrete analogue of the Gaussian kernel.For scale values σ < 0.75, the relative scale difference is substantial for sampled Gaussian kernel, and then rapidly tends to zero for larger scales.For the integrated Gaussian kernel, the relative scale difference is significantly larger, while approaching zero with increasing scale.The relative scale difference for the normalized sampled Gaussian kernel is equal to the relative scale difference for the regular sampled Gaussian kernel.(Horizontal axis: Scale parameter in units of σ = √ s for σ ≤ 0.5.For the integrated Gaussian kernel, the standard deviation is for smaller values of the scale parameter closer to a the desirable linear trend.For larger values of the scale parameter, the standard deviation of the discrete kernel is, however, notably higher than σ.

Spatial variance offset of the discrete kernels
To quantify in a more detailed manner how the scale offset of the discrete approximations of Gaussian kernels depends upon the scale parameter, Figure 4 shows graphs of the spatial variance-based scale difference measure E ∆s (T (•; s)) according to (40) for the different discretization methods.For the discrete analogue of the Gaussian kernel, the scale difference is exactly equal to zero.For the sampled Gaussian kernel, the scale difference measure differs significantly from zero for σ < 0.75, while then rapidly approaching zero for larger scales.For the integrated Gaussian kernel, the variance-based scale difference measure does, however, not approach zero for larger scales.Instead, it approaches the numerical value ∆s ≈ 0.0833, close to the spatial variance 1/12 of a box filter over each pixel support region.The spatial variance-based scale difference for the normalized sampled Gaussian kernel is equal to the spatial variance-based scale difference for the regular sampled Gaussian kernel.42), for the discrete analogue of the Gaussian kernel, the sampled Gaussian kernel, the integrated Gaussian kernel, as well as the normalized sampled Gaussian kernel.For exact numerical computations, this cascade smoothing error would be identically equal to zero for the discrete analogue of the Gaussian kernel.In the numerical implementation underlying these computations, there are, however, numerical errors of a low amplitude.For the sampled Gaussian kernel, the cascade smoothing error is very large for σ ≤ 0.5, notable for σ < 0.75, and then rapidly decreases with increasing scale.For the normalized sampled Gaussian kernel, the cascade smoothing error is for σ ≤ 0.5 significantly lower than for the regular sampled Gaussian kernel.For the integrated Gaussian kernel, the cascade smoothing error is lower than for the sampled Gaussian kernel for σ ≤ 0.5, while then decreasing substantially slower to zero than for the sampled Gaussian kernel.(Horizontal axis: Scale parameter in units of

Spatial standard-deviation-based relative scale difference
Figure 5 shows the spatial standard-deviation-based relative scale difference E relscale (T (•; s)) according to (41) for the main classes of discretizations of Gaussian kernels.This relative scale difference is exactly equal to zero for the discrete analogue of the Gaussian kernel.For scale values σ < 0.75, the relative scale difference is substantial for sampled Gaussian kernel, and then rapidly tends to zero for larger scales.
For the integrated Gaussian kernel, the relative scale difference is significantly larger, while approaching zero with increasing scale.The relative scale difference for the normalized sampled Gaussian kernel is equal to the relative scale difference for the regular sampled Gaussian kernel.

Cascade smoothing error
Figure 6 shows the cascade smoothing error E cascade (T (•; s)) according to (42) for the main classes of discretizations of Gaussian kernels, while here complemented also with results for the normalized sampled Gaussian kernel, since the results for the latter kernel are different than for the regular sampled Gaussian kernel.
For exact numerical computations, this cascade smoothing error should be identically equal to zero for the discrete analogue of the Gaussian kernel.In the numerical implementation underlying these computations, there are, however, numerical errors of a low amplitude.For the sampled Gaussian kernel, the cascade smoothing error is very large for σ ≤ 0.5, notable for σ < 0.75, and then rapidly decreases with increasing scale.For the normalized sampled Gaussian kernel, the cascade smoothing error is for σ ≤ 0.5 significantly lower than for the regular sampled Gaussian kernel.For the integrated Gaussian kernel, the cascade smoothing error is lower than for the sampled Gaussian kernel for σ ≤ 0.5, while then decreasing much slower than for the sampled Gaussian kernel.

Summary of the characterization results from the theoretical analysis and the quantitative performance measures
To summarize the theoretical and the experimental results presented in this section, the discrete analogue of the Gaussian kernel stands out as having the best theoretical properties in the stated respects, out of the set of treated discretization methods for the Gaussian smoothing operation.
The choice, concerning which method is preferable out of the choice between either the sampled Gaussian kernel or the integrated kernel, depends on whether one would prioritize the behaviour at either very fine scales or at coarse scales.The integrated Gaussian kernel has significantly better approximation of theoretical properties at fine scales, whereas its variance-based scale offset at coarser scales implies significantly larger deviations from the desirable theoretical properties at coarser scales, compared to either the sampled Gaussian kernel or the normalized sampled Gaussian kernel.The normalized sampled Gaussian kernel has properties closer to the desirable properties than the regular sampled Gaussian kernel.If one would introduce complementary mechanisms to compensate for the scale offset of the integrated Gaussian kernel, that kernel could, however, also constitute a viable solution at coarser scales.

Discrete approximations of Gaussian derivative operators
According to the theory by Koenderink andvan Doorn (1987, 1992), Gaussian derivatives constitute a canonical family of operators to derive from a Gaussian scale-space representation.Such Gaussian derivative operators can be equivalently defined by, either differentiating the Gaussian scale-space representation or by convolving the input image by Gaussian derivative kernels where and α and β are non-negative integers.

Theoretical properties of Gaussian derivatives
Due to the cascade smoothing property of the Gaussian smoothing operation, in combination with the commutative property of differentiation under convolution operations, it follows that the Gaussian derivative operators also satisfy a cascade smoothing property over scales: Combined with the simplification property of the Gaussian kernel under increasing values of the scale parameter, it follows that the Gaussian derivative responses also obey such a simplifying property from finer to coarser levels of scale, in terms of (i) non-creation of new local extrema from finer to coarser levels of scale for 1-D signals, or (ii) non-enhancement of local extrema for image data over any number of spatial dimensions.

Separable Gaussian derivative operators
By the separability of the Gaussian derivative kernels the 2-D Gaussian derivative response can also be written as a separable convolution of the form In analogy with the previous treatment of purely Gaussian convolution operations, we will henceforth, for simplicity, consider the case with 1-D Gaussian derivative convolutions of the form which are to be implemented in terms of discrete convolutions of the form for some family of discrete filter kernels T x α (n; s).

Measures of the spatial extent of Gaussian derivative or derivative approximation kernels
The spatial extent (spread) of a Gaussian derivative operator g x α (ξ; s) of the form (49) will be measured by the variance of its absolute value Explicit expressions for these spread measures computed for continuous Gaussian derivative kernels up to order 4 are given in Appendix A.4. Correspondingly, the spatial extent of a discrete kernel T x α (n; s) designed to approximate a Gaussian derivative operator will be measured by the entity (52)

Sampled Gaussian derivative kernels
In analogy with the previous treatment for the sampled Gaussian kernel in Section 2.3, the presumably simplest way to discretize the Gaussian derivative convolution integral (49), is by letting the discrete filter coefficients in the discrete convolution operation (50) be determined as sampled Gaussian derivatives Appendix A.1 describes how the Gaussian derivative kernels are related to the probabilistic Hermite polynomials, and does also give explicit expressions for the 1-D Gaussian derivative kernels up to order 4. For small values of the scale parameter, the resulting discrete kernels may, however, suffer from the following problems: the l 1 -norms of the discrete kernels may deviate substantially from the L 1 -norms of the corresponding continuous Gaussian derivative kernels (with explicit expressions for the L 1 -norms of the continuous Gaussian derivative kernels up to order 4 given in Appendix A.3), the resulting filters may have too narrow shape, in the sense that the spatial variance of the absolute value of the discrete kernel V (|T sampl,x α (•; s)|) may differ substantially from the spatial variance of the absolute value of the corresponding continuous Gaussian derivative kernel V (|g x α (•; s)|) (see Appendix A.4 for explicit expressions for these spatial spread measures for the continuous Gaussian derivatives up to order 4).

Integrated Gaussian derivative kernels
In analogy with the treatment of the integrated Gaussian kernel in Section 2.5, a possible way of making the l 1 -norm of the discrete approximation of a Gaussian derivative kernel closer to the L 1 -norm of its continuous counterpart, is by defining the discrete kernel as the integral of the continuous Gaussian derivative kernel over each pixel support region again with a physical motivation of extending the discrete input signal f (n) to a continuous input signal f c (x), defined to be equal to the discrete value within each pixel support region, and then integrating that continuous input signal with a continuous Gaussian kernel, which does then correspond to convolving the discrete input signal with the corresponding integrated Gaussian derivative kernel (see Appendix A.2 for an explicit derivation).
Given that g x α−1 (x; s) is a primitive function of g x α (x; s), we can furthermore for α ≥ 1, write the relationship (54) as 55) With this definition, it follows immediately that the contributions to the l 1 -norm of the discrete kernel T int,x α (n; s) will be equal to the contributions to the L 1 -norm of g x α (n; s) over those pixels where the continuous kernel has the same sign over the entire pixel support region.For those pixels where the continuous kernel changes its sign within the support region of the pixel, however, the contributions will be different, thus implying that the contributions to the l 1 -norm of the discrete kernel may be lower than the contributions to the L 1 -norm of the corresponding continuous Gaussian derivative kernel (see Figure 1 for an illustration of such graphs of integrated Gaussian derivative kernels).
Similarly to the previously treated case with the integrated Gaussian kernel, the integrated Gaussian derivative kernels will also imply a certain scale offset, as shown in Figures 10(a

Discrete analogues of Gaussian derivative kernels
A common characteristics of the approximation methods for computing discrete Gaussian derivative responses considered so far, is that the computation of each Gaussian derivative operator of a given order will imply a spatial convolution with a large-support kernel.Thus, the amount of necessary computational work will increase by the number of Gaussian derivative responses, that are to be used when constructing visual operations that base their processing steps on using Gaussian derivative responses as input.
A characteristic property of the theory for discrete derivative approximations with scale-space properties in Lindeberg (1993bLindeberg ( , 1993a)), however, is that discrete derivative approximations can instead be computed by applying smallsupport central difference operators to the discrete scalespace representation, and with preserved scale-space properties in terms of either (i) non-creation of local extrema with increasing scale for 1-D signals, or (ii) non-enhancement of local extrema towards increasing scales in arbitrary dimensions.With regard to the amount of computational work, this property specifically means that the amount of additive computational work needed, to add more Gaussian derivative responses as input to a visual module, will be substantially lower than for the previously treated discrete approximations, based on computing each Gaussian derivative response using convolutions with large-support spatial filters.
According to the genuinely discrete theory for defining discrete analogues of Gaussian derivative operators, discrete derivative approximations are from the discrete scale-space representation, generated by convolution with the discrete analogue of the Gaussian kernel according to (26) computed as where δ x α are small-support difference operators of the following forms in the special cases when α = 1 or α = 2 to ensure that the estimates of the first-and second-order derivatives are located at the pixel values, and not in between, and of the following forms for higher values of α: for integer i, where the special cases α = 3 and α = 4 then correspond to the difference operators For 2-D images, corresponding discrete derivative approximations are then computed as straightforward extensions of the 1-D discrete derivative approximation operators where L(x, y; s) here denotes the discrete scale-space representation (36) computed using separable convolution with the discrete analogue of the Gaussian kernel (26) along each dimension.
In terms of explicit convolution kernels, computation of these types of discrete derivative approximations correspond to applying discrete derivative approximation kernels of the form to the input data.In practice, such explicit derivative approximation kernels should not, however, never be applied for actual computations of discrete Gaussian derivative responses, since those operations can be carried out much more efficiently by computations of the forms ( 57) or ( 63), provided that the computations are carried out with sufficiently high numerical accuracy, so that the numerical errors do not grow too much because of cancellation of digits.

Cascade smoothing property
A theoretically attractive property of these types of discrete approximations of Gaussian derivative operators, is that they exactly obey a cascade smoothing property over scales, in 1-D of the form and in 2-D of the form where T disc (•, •; s) here denotes the 2-D extension of the 1-D discrete analogue of the Gaussian kernel by separable convolution In practice, this cascade smoothing property implies that the transformation from any finer level of scale to any coarser level of scale is always a simplifying transformation, implying that this transformation always ensures: (i) non-creation of new local extrema (or zero-crossings) from finer to coarser levels of scale for 1-D signals, and (ii) non-enhancement of local extrema, in the sense that the derivative of the scalespace representation with respect to the scale parameter, always satisfies ∂ s L ≤ 0 at any local spatial maximum point and ∂ s L ≥ 0 at any local spatial minimum point.

Numerical correctness of the derivative estimates
To measure how well a discrete approximation of a Gaussian derivative operator reflects a differentiation operator, one can study the response properties to polynomials.6Specifically, in the 1-D case, the M :th-order derivative of an Morder monomial should be: Additionally, the derivative of any lower-order polynomial should be zero: With respect to Gaussian derivative responses to monomials of the form the commutative property between continuous Gaussian smoothing and the computation of continuous derivatives then specifically implies that and If these relationships are not sufficiently well satisfied for the corresponding result of replacing a continuous Gaussian derivative operator by a numerical approximations of a Gaussian derivative, then the corresponding discrete approximation cannot be regarded as a valid approximation of the Gaussian derivative operator, that in turn is intended to reflect the differential structures in the image data.
It is therefore of interest to consider entities of the following type to characterize how well a discrete approximation T x α (n; s) of a Gaussian derivative operator of order α serves as a differentiation operator on a monomial of order k.Fig. 7: The responses to M :th-order monomials f (x) = x M for different discrete approximations of M :th-order Gaussian derivative kernels, for orders up to M = 4, for either discrete analogues of Gaussian derivative kernels T disc,x α (n; s) according to (64), sampled Gaussian derivative kernels T sampl,x α (n; s) according to (53) or integrated Gaussian derivative kernels T int,x α (n; s) according to (54).In the ideal continuous case, the resulting value should be equal to M !. (Horizontal axis: Scale parameter in units of to 1. Figure 7(b) shows the entity P 2,2 (s), which in the continuous case should be equal to 2. Figure 7(c) shows the entity P 3,3 (s), which in the continuous case should be equal to 6. Figure 7(d) shows the entity P 4,4 (s), which in the continuous case should be equal to 24. Figure 8(a) and Figure 8(b) show the entities P 3,1 (s) and P 4,2 (s), respectively, which in the continuous case should be equal to zero.
As can be seen from the graphs, the responses of the derivative approximation kernels to monomials of the same order as the order of differentiation do for the sampled Gaussian derivative kernels deviate notably from the corresponding ideal results obtained for continuous Gaussian derivatives, when the scale parameter is a bit below 0.75.For the integrated Gaussian derivative kernels, the responses of the derivative approximation kernels do also deviate when the scale parameter is a bit below 0.75.Within a narrow range of scale values in intervals of the order of [0.5, 0.75], the integrated Gaussian derivative kernels do, however, lead to somewhat lower deviations in the derivative estimates than for the sampled Gaussian derivative kernels.Also the responses of the third-order sampled Gaussian and integrated Gaussian derivative approximation kernels to a first-order monomial as well as the response of the fourth-order sampled Gaussian and integrated Gaussian derivative approximation kernels to a second-order monomial differ substan- Fig. 8: The responses to different N :th-order monomials f (x) = x N for different discrete approximations of M :th-order Gaussian derivative kernels, for M > N , for either discrete analogues of Gaussian derivative kernels T disc,x α (n; s) according to (64), sampled Gaussian derivative kernels T sampl,x α (n; s) according to (53) or integrated Gaussian derivative kernels T int,x α (n; s) according to (54).In the ideal continuous case, the resulting value should be equal to 0, whenever the order M of differentiation is higher than the order N of the monomial.(Horizontal axis: Scale parameter in units of σ = tially from the ideal continuous values when the scale parameter is a bit below 0.75.
For the discrete analogues of the Gaussian derivative kernels, the results are, on the other hand, equal to the corresponding continuous counterparts, in fact, in the case of exact computations, exactly equal.This property can be shown by studying the responses of the central difference operators to the monomials, which are given by and Since the central difference operators commute with the spatial smoothing step with the discrete analogue of the Gaussian kernel, the responses of the discrete analogues of the Gaussian derivatives to the monomials are then obtained as implying that and In this respect, there is a fundamental difference between the discrete approximations of Gaussian derivatives obtained from the discrete analogues of Gaussian derivatives, compared to the sampled or the integrated Gaussian derivatives.At very fine scales, the discrete analogues of Gaussian derivatives produce much better estimates of differentiation operators, than the sampled or the integrated Gaussian derivatives.
The requirement that the Gaussian derivative operators and their discrete approximations should lead to numerically accurate derivative estimates for monomials of the same order as the order of differentiation is a natural consistency requirement for non-infinitesimal derivative approximation operators.The use of monomials as test functions, as used here, is particularly suitable in a multi-scale context, since the monomials are essentially scale-free, and are not associated with any particular intrinsic scales.

Additional performance measures for quantifying deviations from theoretical properties of discretizations of Gaussian derivative kernels
To additionally quantify the deviations between the properties of the discrete kernels, designed to approximate Gaussian derivative operators, and desirable properties of discrete kernels, that are to transfer the desirable properties of the continuous Gaussian derivatives to a corresponding discrete implementation, we will in this section quantify these deviations in terms of the following complementary error measures: -Normalization error: The difference between the l 1norm of the discrete kernel and the desirable l 1 -normal-ization to a similar L 1 -norm as for the continuous Gaussian derivative kernel will be measured by -Spatial spread measure: The spatial extent of the discrete derivative approximation kernel will be measured by the entity and will be graphically compared to the spread measure -Cascade smoothing error: The deviation, from the cascade smoothing property of continuous Gaussian derivatives according to (46) and the actual result of convolving a discrete approximation of a Gaussian derivative response at a given scale with its corresponding discretization of the Gaussian kernel, will be measured by For simplicity, we here restrict ourselves to the special case, when the scale parameter for the amount of incremental smoothing with a discrete approximation of the Gaussian kernel is equal to the scale parameter for the finer scale approximation of the Gaussian derivative response. 7 Similarly to the previous treatment about error measures in Section 2.7, the normalization error and the cascade smoothing error should also be equal to zero in the ideal theoretical case.Any deviations from zero of these error measures do therefore represent a quantification of deviations from desirable theoretical properties in a discrete approximation of Gaussian derivative computations.
7 Notably, it could, however, also be of interest to study these effects deeper for the case when the scale of the incremental smoothing is significantly lower than the scale of the discrete approximation of the Gaussian derivative response, which we, however, leave to future work.
3.8 Numerical quantification of deviations from theoretical properties of discretizations of Gaussian derivative kernels 3.8.1 l 1 -norms of discrete approximations of Gaussian derivative approximation kernels Figures 9(a)-9(d) show the l 1 -norms ∥T x α (•; s)∥ 1 for the different methods for approximating Gaussian derivative kernels with corresponding discrete approximations for differentiation orders up to 4, together with graphs of the L 1norms ∥g x α (•; s)∥ 1 of the corresponding continuous Gaussian derivative kernels.
From these graphs, we can first observe that the behaviour between the different methods differ significantly for values of the scale parameter σ up to about 0.75, 1.25 or 1.5, depending on the order of differentiation.
For the sampled Gaussian derivatives, the l 1 -norms tend to zero as the scale parameter σ approaches zero for the kernels of odd order, whereas the l 1 -norms tend to infinity for the kernels of even order.For the kernels of even order, the behaviour of the sampled Gaussian derivative kernels has the closest similarity to the behaviour of the corresponding continuous Gaussian derivatives.For the kernels of odd order, the behaviour is, on the other hand, worst.
For the integrated Gaussian derivatives, the behaviour is for the kernels of odd order markedly less singular as the scale parameter σ tends to zero, than for the sampled Gaussian derivatives.For the kernels of even order, the behaviour does, on the other hand differ more.There is also some jaggedness behaviour at fine scales for the third-and fourthorder derivatives, caused by positive and negative values of the kernels cancelling their contribution within the support regions of single pixels.
For the discrete analogues of Gaussian derivatives, the behaviour is, qualitatively different for finer scales, in that the discrete analogues of the Gaussian derivatives tend to the basic central difference operators, as the scale parameter σ tends to zero, and do therefore show a much more smooth behaviour as σ → 0. As can be seen from these graphs, the spatial spread measures differ significantly from the corresponding continuous measures for smaller values of the the scale parameters; for σ less than about 1 or 1.5, depending on the order of differentiation, and caused by the fact that too fine scales

Spatial spread measures
in the data cannot be appropriately resolved after a spatial discretization.For the sampled and the integrated Gaussian kernels, there is a certain jaggedness in some of the curves at fine scales, caused by interactions between the grid and the lobes of the continuous Gaussian kernels, that the discrete kernels are defined from.For the discrete analogues of the Gaussian kernels, these spatial spread measures are notably bounded from below by the corresponding measures for the central difference operators, that they approach with decreasing scale parameter.
Figures 11(a)-11(d) show more detailed visualizations of the deviations between these spatial spread measures and their corresponding ideal values for continuous Gaussian deriva-tive kernels in terms of the spatial spread measure offset O α (s) according to (81), for the different orders of spatial differentiation.The jaggedness of these curves for orders of differentiation greater than one, is due to interactions between the lobes in the derivative approximation kernels and the grid.As can be seen from these graphs, the relative properties of the spatial spread measure offsets for the different discrete approximations to the Gaussian derivative operators differ somewhat, depending on the order of spatial differentiation.We can, however, note that the spatial spread measure offset for the integrated Gaussian derivative kernels is mostly somewhat higher than the spatial spread measure offset for the sampled Gaussian derivative kernels, consistent Case: α = 4.Note that the spatial spread measure of the discrete analogue of the fourth-order Gaussian derivative kernel is delimited from below by the spatial variance of the absolute value of the fourth-order central difference operator |δxxxx|, which is with the previous observation that the spatial box integration used for defining the integrated Gaussian derivative kernel introduces an additional amount of spatial smoothing in the spatial discretization.

Cascade smoothing errors
Figures 12(a)-12(d) show graphs of the cascade smoothing error E cascade (T x α (•; s)) according to (82) for the main classes of methods for discretizing Gaussian derivative operators.
For the sampled Gaussian kernels, the cascade smoothing error is substantial for σ < 0.75 or σ < 1.0, depending Spread measure offsets for 1st-order derivative kernels Fig. 11: Graphs of the spatial spread measure offset Oα(s), relative to the spatial spread of a continuous Gaussian kernel, according to (81), for different discrete approximations of Gaussian derivative kernels of order α, for either discrete analogues of Gaussian derivative kernels T disc,x α (n; s) according to (64), sampled Gaussian derivative kernels T sampl,x α (n; s) according to (53) or integrated Gaussian derivative kernels T int,x α (n; s) according to (54).(Horizontal axis: Scale parameter in units of on the order of differentiation.Then, for larger scale values, this error measure decreases rapidly.
For the integrated Gaussian kernels, the cascade smoothing error is lower than the cascade smoothing error for the sampled Gaussian kernels for σ < 0.5, σ < 0.75 or σ < 1.0, depending on the order of differentiation.For larger scale values, the cascade smoothing error for the integrated Gaussian kernels do, on the other hand, decrease much less rapidly with increasing scale than for the sampled Gaussian kernels, due to the additional spatial variance in the filters caused by the box integration, underlying the definition of the integrated Gaussian derivative kernels.
For the discrete analogues of the Gaussian derivatives, the cascade smoothing error should in the ideal case of ex-act computations lead to a zero error.In the graphs of these errors, we do, however, see a jaggedness at a very low level, caused by numerical errors.

Summary of the characterization results from the theoretical analysis and the quantitative performance measures
To summarize the theoretical and the experimental results presented in this section, there is a substantial difference in the quality of the discrete approximations of Gaussian derivative kernels at fine scales: For values of the scale parameter σ less than about a bit below 0.75, the sampled Gaussian kernels and the inte- grated Gaussian kernels do not produce numerically accurate or consistent estimates of the derivatives of monomials.
In this respects, these discrete approximations of Gaussian derivatives do not serve as good approximations of derivative operations at very fine scales.Within a narrow scale interval below about 0.75, the integrated Gaussian derivative kernels do, however, degenerate in a somewhat less serious manner than the sampled Gaussian derivative kernels.
For the discrete analogues of Gaussian derivatives, obtained by convolution with the discrete analogue of the Gaussian kernel followed by central difference operators, the corresponding derivative approximation are, on the other hand, exactly equal to their continuous counterparts.This property does, furthermore, hold over the entire scale range.
For larger values of the scale parameter, the sampled Gaussian kernel and the integrated Gaussian kernel do, on the other hand, lead to successively better numerical approximations of the corresponding continuous counterparts.In fact, when the value of the scale parameter is above about 1, the sampled Gaussian kernel leads to the numerically most accurate approximations of the corresponding continuous results, out of the studied three methods.

Hence, the choice between what discrete approximation to use for approximating the Gaussian derivatives, depends
upon what scale ranges are important for the analysis, in which the Gaussian derivatives should be used.
In next section, we will build upon these results, and extend them further, by studying the effects of different discretization methods for the purpose of performing automatic scale selection.The motivation for studying that problem, as a benchmark proxy task for evaluating the quality of different discrete approximations of Gaussian derivatives, is that it involves explicit comparisons of feature responses at different scales.

Application to scale selection from local extrema over scale of scale-normalized derivatives
When performing scale-space operations at multiple scales jointly, a critical problem concerns how to compare the responses of an image operator between different scales.Due to the scale-space smoothing operation, the amplitude of both Gaussian smoothed image data and of Gaussian derivative responses can be expected to decrease with scale.A practical problem then concerns how to compare a response of the same image operator at some coarser scale to a corresponding response at a finer scale.This problem is particularly important regarding the topic of scale selection (Lindeberg 2021a), where the goal is to determine locally appropriate scale levels, to process and analyze particular image structures in a given image.

Scale-normalized derivative operators
A theoretically well-founded way of performing scale normalization, to enable comparison between the responses of scale-space operations at different scales, is by defining scalenormalized derivative operators according to (Lindeberg 1998a(Lindeberg , 1998b) where γ > 0 is a scale normalization power, to be chosen for the feature detection task at hand, and then basically replacing the regular Gaussian derivative responses by corresponding scale-normalized Gaussian derivative responses in the modules that implement visual operations on image data.

Scale covariance property of scale-normalized derivative responses
It can be shown that, given two images f (x, y) and f ′ (x ′ , y ′ ) that are related according to a uniform scaling transformation for some spatial scaling factor S > 0, and with corresponding Gaussian derivative responses defined over the two respective image domains according to these Gaussian derivative responses in the two domains will then be related according to (Lindeberg 1998a, Equation (25)) provided that the values of the scale parameters are matched according to (Lindeberg 1998a, Equation ( 15)) Specifically, in the special case when γ = 1, the corresponding scale-normalized Gaussian derivative responses will then be equal

Scale selection from local extrema over scales of scale-normalized derivative responses
A both theoretically well-founded and experimentally extensively verified methodology to perform automatic scale selection, is by choosing hypotheses for locally appropriate scale levels from local extrema over scales of scale-normalized derivative responses (Lindeberg 1998a(Lindeberg , 1998b).In the following, we will apply this methodology to four basic tasks in feature detection.

Interest point detection
With regard to the topic of interest point detection, consider the scale-normalized Laplacian operator (Lindeberg 1998a Equation ( 30)) or the scale-normalized determinant of the Hessian (Lindeberg 1998a Equation ( 31)) where we have here chosen γ = 1 for simplicity.It can then be shown that, if we consider the responses of these operators to a Gaussian blob of size s 0 for which the scale-space representation by the semi-group property of the Gaussian kernel (4) will be of the form then the scale-normalized Laplacian response according to (90) and the scale-normalized determinant of the Hessian response according to (91) assume their global extrema over space and scale at (Lindeberg 1998a, Equations (36) and ( 37)) (x, ŷ, ŝ) = argmax (x,y; s) (det HL blob,s0 )(x, y; s) In this way, both a linear feature detector (the Laplacian) and a non-linear feature detector (the determinant of the Hessian) can be designed to respond in a scale-selective manner, with their maximum response over scale at a scale that correspond to the inherent scale in the input data.

Edge detection
Consider next the following idealized model of a diffuse edge (Lindeberg 1998b Equation ( 18)) where erg(x; s 0 ) denotes the primitive function of a 1-D Gaussian kernel Following a differential definition of edges, let us measure local scale-normalized edge strength by the scale-normalized gradient magnitude (Lindeberg 1998b Equation (15)) which for the scale-space representation of the idealized edge model ( 96) leads to a response of the form Then, it can be shown that this scale-normalized edge response will, at the spatial location of the edge at x = 0, assume its maximum over scale at the scale provided that we choose the value of the scale normalization power γ as (Lindeberg 1998b, Equation ( 23))

Ridge detection
Let us next consider the following idealized model of a ridge (Lindeberg 1998b Equation ( 52)) For a differential definition of ridges, consider a local coordinate system (p, q) aligned with the eigendirections of the Hessian matrix, such that the mixed second-order derivative L pq = 0. Let us measure local scale-normalized ridge strength by the scale-normalized second-order derivative in the direction p (Lindeberg 1998b, Equations ( 42) and ( 47)): which for the idealized ridge model ( 102) reduces to the form Then, it can be shown that, at the spatial location of the ridge at x = 0, this scale-normalized ridge response will assume its maximum over scale at the scale provided that we choose the value of the scale normalization power γ as (Lindeberg 1998b, Equation ( 56))

Measures of scale selection performance
In the following, we will compare the results of using different ways of discretizing the Gaussian derivative operators, when applied task of performing scale selection for -Gaussian blobs of the form (92), idealized diffuse step edges of the form (96), and idealized Gaussian ridges of the form (102).
To quantify the performance of the different scale normalization, we will measure deviations from the ideal results in terms of: -Relative scale estimation error: The difference, between a computed scale estimate ŝ and the ideal scale estimate ŝref = s 0 , will be measured by the entity A motivation for measuring this entity in units of σ = √ s is to have the measurements in dimension of [length].
In the ideal continuous case, with the scale-space derivatives computed from continuous Gaussian derivatives, this error measure should be zero.Any deviations from zero, when computed from a discrete implementation based on discrete approximations of Gaussian derivative kernels, do therefore characterize the properties of the discretization.
4.5 Numerical quantification of deviations from theoretical properties resulting from different discretizations of scale-normalized derivatives Our experimental investigation will focus on computing the relative scale estimation error for: scale selection based on the scale-normalized Laplacian operator (90) for scale normalization power γ = 1, applied to an ideal Gaussian blob of the form (92), scale selection based on the scale-normalized determinant of the Hessian operator (91) for scale normalization power γ = 1, applied to an ideal Gaussian blob of the form (92), scale selection based on the scale-normalized gradient magnitude (98) for scale normalization power γ = 1/2, applied to an ideal diffuse edge of the form (96), and scale selection based on the scale-normalized ridge strength measure (103) for scale normalization power γ = 3/4, applied to an ideal Gaussian ridge of the form (102).
With the given calibration of the scale normalization powers γ to the specific feature detection tasks, the estimated scale level ŝ will in the ideal continuous case correspond to the scale estimate reflecting the inherent scale of the feature model for all of the cases of ideal Gaussian blobs, ideal diffuse edges or ideal Gaussian ridges.This bears relationships to the matched filter theorem (Woodward 1953, Turin 1960), in that the scale selection mechanism will choose filters for detecting the different types of image structures image data, that as well as possible match their size.

Experimental methodology
The experimental procedure, that we will follow in the following experiments, consists of: 1.For a dense set of 50 logarithmically distributed scale levels σ ref,i = A 1 r i 1 within the range σ ∈ [0.1, 4.0], where r 1 > 1, generate an ideal model signal (a Gaussian blob, a diffuse step edge or a diffuse ridge) with scale parameter σ ref,i , that represents the size in dimension [length].ŝ, when applying scale selection from local extrema over scale of the scale-normalized gradient magnitude response according to (100) to a set of diffuse step edges of different width σ ref = σ 0 , for different discrete approximations of the Gaussian derivative kernels, for either discrete analogues of Gaussian derivative kernels T disc,x α (n; s) according to (64), sampled Gaussian derivative kernels T sampl,x α (n; s) according to (53), or integrated Gaussian derivative kernels T int,x α (n; s) according to (54).ŝ, when applying scale selection from local extrema over scale of the scale-normalized principal curvature response according to (105) to a set of diffuse ridges of different width σ ref = σ 0 , for different discrete approximations of the Gaussian derivative kernels, for either discrete analogues of Gaussian derivative kernels T disc,x α (n; s) according to (64), sampled Gaussian derivative kernels T sampl,x α (n; s) according to (53), or integrated Gaussian derivative kernels T int,x α (n; s) according to (54).For comparison, the reference scale σ ref = √ s ref = σ 0 obtained in the continuous case for continuous Gaussian derivatives is also shown.(Horizontal axis: Reference scale Relative scale error for principal curvature scale selection Fig. 20: Graphs of the relative scale estimation error E scaleest,rel (σ), according to (107), when applying scale selection from local extrema over scale of the scale-normalized principal curvature response according to (105) to a set of diffuse ridges of different width σ ref = σ 0 , for different discrete approximations of the Gaussian derivative kernels, for either discrete analogues of Gaussian derivative kernels T disc,x α (n; s) according to (64), sampled Gaussian derivative kernels T sampl,x α (n; s) according to (53), or integrated Gaussian derivative kernels T int,x α (n; s) according to (54).(Horizontal axis: Reference scale 2. For a dense8 set of 80 logarithmically distributed scale levels σ acc,j = A 2 r j 2 , within the range σ ∈ [0.1, 6.0], where r 2 > 1, compute the scale-space signature, that is compute the scale-normalized response of the differential entity D norm L at all scales σ acc,j .3. Detect the local extrema over scale of the appropriate polarity (minima for Laplacian and principal curvature scale selection, and maxima for determinant of the Hessian and gradient magnitude scale selection) and select the local extremum that is closest to σ ref,i .9 If there is no local extremum of the right polarity, include the boundary extrema into the analysis, and do then select the global extremum out of these.4. Interpolate the scale value of the extremum to higher accuracy than grid spacing, by for each interior extremum fitting a second-order polynomial to the values at the central point and the values of the two adjacent neighbours.Find the extremum of the continuous polynomial, and let the scale value of the extremum of that interpolation polynomial be the scale estimate.10 Figures 13-20 show graphs of scale estimates with associated relative scale errors obtained in this way.

Scale selection with the scale-normalized Laplacian applied to Gaussian blobs
From Figure 13, which shows the scale estimates obtained by detecting local extrema over scale of the scale-normalized Laplacian operator, when applied to Gaussian blobs of different size, we see that for all the three approximation methods for Gaussian derivatives; discrete analogues of Gaussian derivatives, sampled Gaussian derivatives and integrated Gaussian derivatives, the scale estimates approach the ideal values of the fully continuous model with increasing size of the Gaussian blob used as input for the analysis.
For smaller scale values, there are, however, substantial deviations between the different methods.When the scale parameter σ is less than about 1, the results obtained from sampled Gaussian derivatives fail to generate interior local extrema over scale.Then, the extremum detection method instead resorts to returning the minimum scale of the scale interval, implying qualitatively substantially erroneous scale estimates.For the integrated Gaussian derivatives, there is also a discontinuity in the scale selection curve, while at a lower scale level, and not leading to as low scale values as for the sampled Gaussian derivatives.
For the discrete analogue of Gaussian derivatives, the behaviour is, on the other hand, qualitatively different.Since these derivative approximation kernels tend to regular central difference operators, as the scale parameter tends to zero, their magnitude is bounded from above in a completely different way than for the sampled or integrated Gaussian derivatives.When this bounded derivative response is multiplied by the scale parameter raised to the given power, the scalenormalized feature strength measure cannot assume as high values at very finest scales, as for the sampled or integrated Gaussian derivatives.This means that the extremum over scale will be assumed at a relatively coarser scale, when the reference scale is small, compared to the cases for the sampled or the integrated Gaussian kernels.
From Figure 14, which shows the relative scale estimation error E scaleest,rel (σ) according to (107), we can see that when the reference scale becomes larger, the scale estimates obtained with the discrete analogues of Gaussian derivatives do, on the other hand, lead to underestimates of the scale levels, whereas the scale estimates obtained with integrated Gaussian kernels lead to overestimates of the scale levels.For σ ref a bit greater than 1, the sampled Gaussian derivatives lead to the most accurate scale estimates for Laplacian blob detection applied to Gaussian blobs.

Scale selection with the scale-normalized determinant of the Hessian applied to Gaussian blobs
For Figures 15-16, which show corresponding results for determinant of the Hessian scale selection applied to Gaussian blobs, the results are similar to the results for Laplacian scale selection.These results are, however, nevertheless reported here, to emphasize that the scale selection method does not only apply to feature detectors that are linear in the dependency of the Gaussian derivatives, but also to feature detectors that correspond to genuinely non-linear combinations of Gaussian derivative responses.

Scale selection with the scale-normalized gradient magnitude applied to diffuse step edges
From Figure 17, which shows the selected scales obtained by detecting local extrema over scale of the scale-normalized gradient magnitude applied to diffuse step edges of different width, we can note that all the three discretization methods for Gaussian derivatives are here bounded from below by an inner scale.The reason why the behaviour is qualitatively different, in this case based on first-order derivatives, compared to the previous case with second-order derivatives, is that the magnitudes of the first-order derivative responses are in all these cases bounded from above.The lower bound on the scale estimates is, however, slightly higher for the discrete analogues of the Gaussian derivatives compared to the sampled or integrated Gaussian derivatives.
From Figure 18, we can also see that the sampled Gaussian derivatives lead to slightly more accurate scale estimates than the integrated Gaussian derivatives or the discrete analogues of Gaussian derivatives, over the entire scale range.4.5.5 Scale selection with the second-order principal curvature measure applied to diffuse ridges From Figure 19, which shows the selected scales obtained by detecting local extrema over scale of the scale-normalized principal curvature response according to (105), when applied to a set of diffuse ridges of different width, we can note that the behaviour is qualitatively very similar to the previously treated second-order methods for scale selection, based on extrema over scale of either the scale-normalized Laplacian or the scale-normalized determinant of the Hessian.
There are clear discontinuities in the scale estimates obtained from sampled or integrated Gaussian derivatives, when the reference scale σ ref goes down towards σ = 1, at slightly lower scale values for integrated Gaussian derivatives compared to the sampled Gaussian derivatives.For the discrete analogues of Gaussian derivatives, the scale estimates are bounded from below at the finest scales, whereas there are underestimates in the scale values near above σ = 1.Again, for scale values above about 1, the sampled Gaussian derivatives lead to results that are closest to those obtained in the ideal continuous case.

Summary of the evaluation on scale selection experiments
To summarize the results from this investigation, the sampled Gaussian derivatives lead to the most accurate scale estimates, when the reference scale σ ref of the image features is somewhat above 1.For lower values of the reference scale, the behaviour is, on the other hand, qualitatively different for the scale selection methods that are based on secondorder derivatives, or what could be expected more generally, derivatives of even order.When the scale parameter tends to zero, the strong influence from the fact that the derivatives of even order of the continuous Gaussian kernel tend to infinity at the origin, when the scale parameter tends to zero, implies that the scale selection methods based on either sampled or integrated Gaussian derivatives lead to singularities when the reference scale is sufficiently low (below 1 for the second-order derivatives in the above experiment).
If aiming at handling image data with discrete approximations of Gaussian derivatives of even order for very low scale values, it seems natural to then consider alternative discretization approaches, such as the discrete analogues of Gaussian derivatives.The specific lower bound on the scale values may, however, be strongly dependent upon what tasks the Gaussian derivative responses are to be used for, and also upon the order differentiation.

Discrete approximations of directional derivatives
When operating on a 2-D spatial scale-space representation, generated by convolutions with either rotationally symmetric Gaussian kernels according to (2) or affine Gaussian kernels, 11 it is often desirable to compute image features in terms of local directional derivatives.
Given an image orientation φ and its ortogonal direction ⊥φ = φ + π/2, we can express directional derivatives along these directions in terms of partial derivative operators ∂ x and ∂ y along the xand y-directions, respectively, according to Higher-order directional derivatives of the scale-space representation can then be defined according to where L here denotes either a scale-space representation based on convolution with a rotationally symmetric Gaussian kernel according to (2), or convolution with an affine Gaussian kernel.Image representations of this form are useful for modelling filter bank approaches, for either purposes in classical computer vision or in deep learning.It has also been demonstrated that the spatial components of the receptive fields of simple cells in the primary visual cortex of higher mammals can be modelled qualitatively reasonably well in terms of such directional derivatives combined with spatial smoothing using affine Gaussian kernels, then with the orientations φ and ⊥φ of the directional derivatives parallel to the orientations corresponding to the eigendirections of the affine spatial covariance matrix Σ, that underlies the definition of affine Gaussian kernels (see Equation ( 23) in Lindeberg (2021b)).
11 See Appendix A.6 for a description of the affine Gaussian scalespace concept that underlies the definition of affine Gaussian kernels.

Small-support directional derivative approximation masks
If attempting to compute filter bank responses in terms of directional derivatives in different image directions, and of different orders of spatial differentiation, the amount of computational work will, however, grow basically linearly with the number of different combinations of the orders m 1 and m 2 of spatial differentiation and the different image orientations φ, if we use an underlying filter basis in terms of Gaussian derivative responses along the coordinate axes based on the regular Gaussian scale-space concept formed from convolutions with the rotationally symmetric Gaussian kernel.If we instead base the filter banks on elongated affine Gaussian kernels, the amount of computational work will grow even more, since a non-separable convolution would then have to be performed for each image orientation, each combination of the scale parameters σ 1 and σ 2 , and for each order of spatial differentiation, as determined by the parameters m 1 and m 2 .
If we, on the other hand, base the analysis on the discrete scale-space concept, by which derivative approximations can be computed from the raw discrete scale-space representation, by applying small-support central difference masks, then the amount of computational work can be decreased substantially, since then for each new combination of the orders m 1 and m 2 of differentiation, we only need to apply a new small-support discrete filter mask.In the case, when the underlying scale-space representation is based on convolutions with the rotationally symmetric Gaussian kernel, we can use the same underlying, once and for all spatially smoothed image as the input for computing filter bank responses for all the possible orientations.In the case, when the underlying scale-space representation is instead based on convolutions with affine Gaussian kernels, we do, of course, have to redo the underlying spatial smoothing operation for each combination of the parameters σ 1 , σ 2 and φ.We can, however, nevertheless reuse the same underlying spatially smoothed image for all the combinations of the orders m 1 and m 2 of spatial differentiation.

Method for defining discrete directional derivative approximation masks
To define a discrete derivative approximation mask δ φ m 1 ⊥φ m 2 , for computing an approximation of the directional derivative L φ m 1 ⊥φ m 2 from an already smoothed scale-space representation L according to for a given image orientation φ and two orders m 1 and m 2 of spatial differentiation along the directions φ and ⊥φ, respectively, we can proceed as follows: The affine Gaussian kernel with its directional derivatives up to order 4 2. Expand the operator (113) by formal operator calculus over ( 109) and ( 110) to an expanded representation in terms of a linear combination of partial derivative operators ∂ x α y β along the Cartesian coordinate directions of the form: where the directional weight functions w (m1,m2) k (φ) are polynomials in terms of cos φ and sin φ. 3. Transfer the partial directional derivative operator ∂ φ m 1 ⊥φ m 2 to a corresponding directional derivative approximation mask δ φ m 1 ⊥φ m 2 , while simultaneously transferring all the Cartesian partial derivative operators ∂ x α y β to corresponding discrete derivative approximation masks δ x α y β , which leads to: In this way, we obtain explicit expressions for compact discrete directional derivative approximation masks, as depending on the orders m 1 and m 2 of spatial differentiation and the image direction φ.
Figure 21 shows corresponding equivalent affine Gaussian derivative approximation kernels, computed according to this scheme, by applying small-support directional derivative approximation masks of these forms to a sampled affine Gaussian kernel, as parameterized according to the form in Appendix A.7.
Please note, however, that the resulting kernels obtained in this way, are not in any way intended to be applied to actual image data.Instead, their purpose is just to illustrate the equivalent effect of first convolving the input image with a discrete approximation of the Gaussian kernel, and then applying a set of small-support directional derivative approximation masks, for different combinations of the spatial orders of differentiation, to the spatially smoothed image data.In situations when combinations of multiple orders of spatial differentiation are to be used in a computer vision system, for example, in applications involving filter banks, this form of discrete implementation will be computationally much more efficient, compared to applying a set of large-support filter kernels to the same image data.
By the central difference operators δ x α y β constituting numerical discrete approximations of the corresponding partial derivative operators ∂ x α y β , it follows that the directional derivative approximation mask δ φ m 1 ⊥φ m 2 will be a numerical approximation of the continuous directional derivative operator ∂ φ m 1 ⊥φ m 2 .Thereby, the discrete analogue of the directional derivative operator according to (112), from a discrete approximation L of the scale-space representation of an input image f , will constitute a numerical approximation of the corresponding continuous directional derivative of the underlying continuous image, provided that the input image has been sufficiently well sampled, and provided that the discrete approximation of scale-space smoothing is a sufficiently good approximation of the corresponding continuous Gaussian smoothing operation.
In practice, the resulting directional derivative masks will be of size 3 × 3 for first-and second-order derivatives and of size 5 × 5 for third-and fourth-order derivatives.Thus, once the underlying code for expressing these relationships has been written, these directional derivative approximation masks are extremely easy and efficient to apply in practice.
Appendix A.8 gives explicit expressions for the resulting discrete directional derivative approximation masks for spatial differentiation orders up to 4, whereas Appendix A.9 gives explicit expressions for the underlying Cartesian discrete derivative approximation masks up to order 4 of spatial differentiation.
The conceptual construction of compact directional derivative approximation masks performed in this way generalizes the notion of steerable filters (Freeman and Adelson 1991, Perona 1992, 1995, Beil 1994, Simoncelli and Farid 1996, Hel-Or and Teo 1998) to a wide class of filter banks, that can be computed in a very efficient manner, once an initial smoothing stage by scale-space filtering, or some approximation thereof, has been computed.5.3 Scale-space properties of directional derivative approximations computed by applying small-support directional derivative approximation masks to smoothed image data Note, in particular, that if we compute discrete approximations of directional derivatives based on a discrete scalespace representation computed using the discrete analogue of the Gaussian kernel according to Section 2.6, then discrete scale-space properties will hold also for the discrete approximations of directional derivatives, in the sense that: (i) cascade smoothing properties will hold between directional derivative approximations at different scales, and (ii) the discrete directional derivative approximations will obey nonenhancement of local extrema with increasing scale.

Summary and conclusions
We have presented an in-depth treatment of different ways of discretizing the Gaussian smoothing operation and the computation of Gaussian derivatives, for purposes in scalespace analysis and deep learning.Specifically, we have considered the following three main ways of discretizing the basic scale-space operations, in terms of either: sampling the Gaussian kernel and the Gaussian derivative kernels, integrating the Gaussian kernel and the Gaussian derivative kernels over the support regions of the pixels, or using a genuinely discrete scale-space theory, based on convolutions with the discrete analogue of the Gaussian kernel, complemented with derivative approximations computed by applying small-support central difference operators to the spatially smoothed image data.
To analyze the properties of these different ways of discretizing the Gaussian smoothing and Gaussian derivative computation operations, we have in Section 2 defined a set of quantifying performance measures, for which we have studied their behaviour as function of the scale parameter from very low to moderate levels of scale.
Regarding the purely spatial smoothing operation, the discrete analogue of the Gaussian kernel stands out as having the best theoretical properties over the entire scale range, from scale levels approaching zero to coarser scales.The results obtained from the sampled Gaussian kernel may deviate substantially from their continuous counterparts, when the scale parameter σ is less than about 0.5 or 0.75.For σ greater than about 1, the sampled Gaussian kernel does, on the other hand, lead to numerically very good approximations of results obtained from the corresponding continuous theory.
Regarding the computation of Gaussian derivative responses, we do also in Sections 3 and 4 find that, when applied to polynomial input to reveal the accuracy of the numerical approximations, the sampled Gaussian derivative kernels and the integrated Gaussian derivative kernels do not lead to numerically accurate or consistent derivative estimates, when the scale parameter σ is less than about 0.5 or 0.75.The integrated Gaussian kernels degenerate in somewhat less strong ways, for very fine scale levels than the sampled Gaussian derivative kernels, implying that the integrated Gaussian derivative kernels may have better ability to handle very fine scales than the sampled Gaussian derivative kernels.At coarser scales, the integrated Gaussian kernels do, on the other hand, lead to numerically less accurate estimates of the corresponding continuous counterparts, than the sampled Gaussian derivative kernels.
At very fine levels of scale, the discrete analogues of the Gaussian kernels stand out as giving the numerically far best estimates of derivative computations for polynomial input.When the scale parameter σ exceeds about 1, the sampled Gaussian derivative kernels do, on the other hand, lead to the numerically closest estimates to those obtained from the fully continuous theory.
The fact that the sampled Gaussian derivative kernels for sufficiently coarse scales lead to the closest approximations of the corresponding fully continuous theory should, however, not preclude from basing the analysis on the discrete analogues of Gaussian derivatives at coarser scales.If necessary, deviations between the results obtained from the discrete analogues of Gaussian derivatives and the corresponding fully continuous theory can, in principle, be compensated for by complementary calibration procedures, or by deriving corresponding genuinely discrete analogues of the relevant entities in the analysis.Additionally, in situations when a larger number of Gaussian derivative responses are to be computed simultaneously, this can be accomplished with substantially higher computational efficiency, if basing the scale-space analysis on the discrete analogue of the Gaussian kernel, which only involves a single spatial smoothing stage of large spatial support, from which each derivative approximation can then be computed using a small-support central difference operator.
As a complement to the presented methodologies of discretizing Gaussian smoothing and Gaussian derivative computations, we have also in Section 5 presented a computationally very efficient ways of computing directional derivatives of different orders and of different orientations, which is highly useful for computing filter bank type responses for different purposes in computer vision.When using the discrete analogue of the Gaussian kernel for smoothing, the presented discrete directional derivative approximation masks can be applied at any scale.If using either sampled Gaussian kernels or integrated Gaussian kernels for spatial smoothing, including extensions of rotationally symmetric kernels to anisotropic affine Gaussian kernels, the discrete derivative approximation masks can be used, provided that the scale parameter is sufficiently large in relation to the desired accuracy of the resulting numerical approximation.
Concerning the orders of spatial differentiation, we have in this treatment, for the purpose of presenting explicit expressions and quantitative experimental results, limited ourselves to spatial derivatives up to order 4. A motivation for this choice is the observation by Young (1985Young ( , 1987) ) that receptive fields up to order 4 have been observed in the primary visual cortex of higher mammals, why this choice should then cover a majority of the intended use cases.
It should be noted, however, that an earlier version of the theory for discrete derivative approximations, based on convolutions with the discrete analogue of the Gaussian kernel followed by central difference operators, has, however, been demonstrated to give useful results with regard to the sign of differential invariants that depend upon derivatives up to order 5 or 6, for purposes of performing automatic scale selection, when detecting edges or ridges from spatial image data (Lindeberg 1998b).Hence, provided that appropriate care is taken in the design of the visual operations that operate on image data, this theory could also be applied for higher orders of spatial differentiation.

Extensions of the approach
Concerning extensions of the approach, with regard to applications in deep learning, for which the modified Bessel functions I n (s), underlying the definition of the discrete analogue of the Gaussian kernel T disc (n; s) according to (26), are currently generally not available in standard frameworks for deep learning, a possible alternative approach consists of instead replacing the previously treated sampled Gaussian derivative kernels T sampl,x α (n; s) according to (53) or the integrated Gaussian derivative kernels T int,x α (n; s) according to (54) by the families of hybrid discretization approaches obtained by: (i) first smoothing the image with either the normalized sampled Gaussian kernel T normsampl (n; s) according to (19) or the integrated Gaussian kernel T int (n; s) according to (20), and then applying central difference operators δ x α of the form (60) to the spatially smoothed data.
When multiple Gaussian derivative responses of different orders are to be computed at the same scale level, such an approach would, in analogy to the previously treated discretization approach, based on first smoothing the image with the discrete analogue of the Gaussian kernel T disc (n; s) according to ( 26) and then applying central difference operators δ x α of the form (60) to the spatially smoothed data, resulting in equivalent discrete derivative approximation kernels T disc,x α (n; s) according to (64), to also be computationally much more efficient, compared to explicit smoothing with a set of either sampled Gaussian derivative kernels T sampl,x α (n; s) according to (53) or integrated Gaussian derivative kernels T int,x α (n; s) according to (54).
In terms of equivalent convolution kernels for the resulting hybrid discretization approaches, the corresponding discrete derivative approximation kernels for these classes of kernels will then be given by with T normsampl (n; s) and T int (n; s) according to ( 19) and (20), respectively. 12Such an approach is in a straightforward way compatible with learning of the scale parameters by back propagation, based on automatic differentiation in deep learning environments.It would be conceptually very straightforward to extend the theoretical framework and the experimental evaluations presented in this paper to incorporating also a detailed analysis of these two additional classes of discrete derivative approximations for Gaussian derivative operators of hybrid type.For reasons of space constraints, we have, however, not been able to include corresponding in-depth analyses of those additional discretization methods here. 1312 In practice, these explicit forms for the derivative approximation kernels would, however, never be used, since it is computationally much more efficient to instead first perform the spatial smoothing operation in an initial processing step, and then applying different combinations of discrete derivative approximations, in situations when multiple spatial derivatives of different orders are to be computed at the same scale level.With regard to theoretical analysis of the properties of these hybrid discretization approaches, the corresponding equivalent convolution kernels are, however, important when characterizing the properties of these methods. 13In particular, regarding the theoretical properties of these hybrid discretization approaches, it should be mentioned that, due to the approximation of the spatial derivative operators ∂ x α by the central difference operators δ x α , in combination with the unit l 1 -norm normalization of the corresponding spatial smoothing operations, the hybrid Concerning the formulation of discrete approximations of affine Gaussian derivative operators, it would also be straightforward to extend the framework in Section 5 to replacing the initial spatial smoothing step, based on convolution with the sampled affine Gaussian derivative kernel, by instead using an integrated affine Gaussian derivative kernel, with its filter coefficients of the form (118) derivative approximation kernels T hybr-sampl,x α and T hybr-int,x α (n; s) according to ( 116) and (117) will obey similar response properties ( 77) and ( 78) to monomial input, as the previously treated approach, based on the combination of spatial smoothing using the discrete analogue of the Gaussian kernel with central difference operators T disc,x α (n; s) according to (64).In other words, the hybrid kernels T hybr-sampl,x α (n; s) and T hybr-int,x α (n; s) according to ( 116) and (117) will with p k (x) = x k obey T hybr-sampl,x M (•; s) * p M (•) = M ! and T hybr-int,x M (•; s) * p M (•) = M !, as well as T hybr-sampl,x M (•; s) * p N (•) = 0 and T hybr-int,x M (•; s) * p N (•) = 0 for M > N .In these respects, these hybrid discretization approaches T hybr-sampl,x α (n; s) and T hybr-int,x α (n; s) could be expected to constitute better approximations of Gaussian derivative operators at very fine scales, than their corresponding non-hybrid counterparts, T normsampl,x α (n; s) and T int,x α (n; s) according to ( 19) and ( 20).Simultaneously, these hybrid approaches will also be computationally much more efficient than their non-hybrid counterparts, in situations where Gaussian derivatives of multiple orders are to be computed at the same scale.For very small values of the scale parameter, the spatial smoothing with the normalized sampled Gaussian kernel T normsampl,x α (n; s) can, however, again be expected to lead to systematically too small amounts of spatial smoothing (see Figure 3).For larger values of the scale parameter, the spatial smoothing with the integrated Gaussian kernel T int,x α (n; s) would, however, be expected to lead to a scale offset (see Figure 3), influenced by the variance of a box filter over each pixel support region.Furthermore, these hybrid approaches will not be guaranteed to obey information reducing properties from finer to coarser levels of scale, in terms of either non-creation of new local extrema, or non-enhancement of local extrema, as the discrete analogues of Gaussian derivatives T disc,x α (n; s) according to (64) obey.In situations, where the modified Bessel functions of integer order are immediately available, the approach, based on the combination of spatial smoothing with the discrete analogue of the Gaussian kernel with central differences, should therefore be preferable in relation to these hybrid approaches.In situations, where the modified Bessel functions of integer order are, however, not available as full-fledged primitive in a deep learning framework, these hybrid approaches could, on the other hand, be considered as interesting alternatives to the regular sampled or integrated Gaussian derivative kernels T sampl,x α (n; s) and T int,x α (n; s), because of their substantially better computational efficiency, in situations when spatial derivatives of multiple orders are to be computed at the same scale.What remains to explore, is how these hybrid discretization approaches compare to the previously treated three main classes of discretization methods, with respect to the set of quantitative performance measures defined and then evaluated experimentally in Sections 3.7-3.8and Sections 4.4-4.5 (see (Lindeberg 2024) for a treatment about this topic) as well as with respect to integration in different types of computer vision and/or deep learning algorithms.which shows that construction of applying the continuous Gaussian derivative convolution to a stepwise constant signal, defined as being equal to the discrete signal over each pixel support region, at the discrete grid points x = n corresponds to discrete convolution with the integrated Gaussian derivative kernel.
A.3 L 1 -norms of Gaussian derivative kernels This appendix gives explicit expressions for the L 1 -norms of 1-D Gaussian derivative kernels for differentiation orders up to 4, based on Equations ( 74)-( 77) in Lindeberg (1998a), while with the scale normalization underlying those equations, to constant L 1 -norms over scale, removed: for differentiation orders up to 4: of orders up to k = 4: q 2 (x; s) = x 2 + s, (151) q 4 (x; s) = x 4 + 6 x 2 s + 3 s 2 . (153) These diffusion polynomials do, in this respect, describe how a monomial input function f (x) = x k is affected by convolution with the continuous Gaussian kernel for standard deviation σ = √ s.

A.6 Affine Gaussian scale space
As a more general spatial scale-space representation for 2-D images, consider the affine Gaussian scale-space representation A general rationale for studying and making use of this affine scale-space representation is that it is closed under affine transformations, thus leading to affine covariance or affine equivariance (Lindeberg 1993a;Lindeberg and Gårding 1997).
Of order 1 embedded in masks of size 3 × 3: With regard to an evaluation of a method, referred to as "Lindeberg's smoothing method' in (Rey-Otero and Delbracio 2016), some clarifications would be needed concerning the experimental comparison they perform in that work, since that comparison is not made in relation to any of the best methods that arise from the discrete scale-space theory introduced in (Lindeberg 1990) and then extended in (Lindeberg 1993a Chapters 3 and 4).Rey-Otero and Delbracio (2016) compare to an Euler-forward discretization of the semi-discrete diffusion equation (Equation (4.30) in Lindeberg 1993a) 194) with initial condition L(x, y; 0) = f (x, y), that determines the evolution of a 2-D discrete scale-space representation over scale.The corresponding discrete scale-space representation, according to Lindeberg's discrete scale-space theory, can, however, be computed more accurately, using the explicit expression for the Fourier transform of the underlying discrete family of scale-space kernels, according to Equation (4.24) in (Lindeberg 1993a), and thus without using any a priori restriction to discrete levels in the scale direction, as used by Rey-Otero and Delbracio (2016).Additionally, concerning the choice of the parameter value γ, that determines the relative weighting between the contributions from the five-point ∇ 2 5 and cross-point ∇ 2 × discretizations of the Laplacian operator, Rey-Otero and Delbracio (2016) use a non-optimal value for this relative weighting parameter (γ = 1/2), instead of using either γ = 1/3, which gives the best numerical approximation to rotational symmetry (Proposition 4.16 in Lindeberg 1993a), or γ = 0, which leads to separable convolution operators on a Cartesian grid (Proposition 4.14 in Lindeberg 1993a), which then also implies that the discrete scale-space representation can be computed using separable convolution with the 1-D discrete analogue of the Gaussian kernel (26) that is used in this work.

Fig. 1 :
Fig.1: Graphs of the main types of Gaussian smoothing kernels and Gaussian derivative kernels considered in this paper, here at the scale σ = 1, with the raw smoothing kernels in the top row and the order of spatial differentiation increasing downwards up to order 4: (left column) continuous Gaussian kernels and continuous Gaussian derivatives, (middle left column) sampled Gaussian kernels and sampled Gaussian derivatives, (middle right column) integrated Gaussian kernels and integrated Gaussian derivatives, and (right column) discrete Gaussian kernels and discrete analogues of Gaussian derivatives.Note that the scaling of the vertical axis may vary between the different subfigures.(Horizontal axis: the 1-D spatial coordinate x ∈ [−5, 5].)

Fig. 2 :Fig. 3 :Fig. 4 :
Fig.2: Graphs of the l 1 -norm-based normalization error Enorm(T (•; s)), according to (39), for the discrete analogue of the Gaussian kernel, the sampled Gaussian kernel and the integrated Gaussian kernel.Note that this error measure is equal to zero for both the discrete analogue of the Gaussian kernel, the normalized sampled Gaussian kernel and the integrated Gaussian kernel.(Horizontal axis: Scale parameter in units of σ = √ s ∈ [0.1, 2].)

Fig. 5 :
Fig.5: Graphs of the relative scale difference E relscale (T (•; s)), according to (41) and in units of the spatial standard deviation of the discrete kernels, for the discrete analogue of the Gaussian kernel, the sampled Gaussian kernel and the integrated Gaussian kernel.This relative scale error is exactly equal to zero for the discrete analogue of the Gaussian kernel.For scale values σ < 0.75, the relative scale difference is substantial for sampled Gaussian kernel, and then rapidly tends to zero for larger scales.For the integrated Gaussian kernel, the relative scale difference is significantly larger, while approaching zero with increasing scale.The relative scale difference for the normalized sampled Gaussian kernel is equal to the relative scale difference for the regular sampled Gaussian kernel.(Horizontal axis: Scale parameter in units of σ =

Figures 9
Figures 9(a)-9(d) and Figures 10(a)-10(d) show how the l 1norms as well as the spatial spread measures vary as function of the scale parameter, with comparisons to the scale dependencies for the corresponding fully continuous measures.

Figures
Figures 10(a)-10(d) show graphs of the standard-deviationbased spatial spread measure V (|T x α (•; s)|) according to (80), for the main classes of discretizations of Gaussian derivative kernels, together with graphs of the corresponding spatial spread measures computed for continuous Gaussian derivative kernels.As can be seen from these graphs, the spatial spread measures differ significantly from the corresponding continuous measures for smaller values of the the scale parameters; for σ less than about 1 or 1.5, depending on the order of differentiation, and caused by the fact that too fine scales Case: α = 1.Note that the spatial spread measure of the discrete analogue of the first-order Gaussian derivative kernel is delimited from below by the spatial variance of the absolute value of the first-order central difference operator |δx|, which isV (|δx|) = 1.Spatial spread measures for 2nd-order derivative kernels Case: α = 2.Note that the spatial spread measure of the discrete analogue of the second-order Gaussian derivative kernel is delimited from below by the spatial variance of the absolute value of the second-order central difference operator |δxx|, which is V (|δxx|) = 1/ Case: α = 3.Note that the spatial spread measure of the discrete analogue of the third-order Gaussian derivative kernel is delimited from below by the spatial variance of the absolute value of the third-order central difference operator |δxxx|, which is V (|δxxx|) =

Fig. 21 :
Fig. 21: The affine Gaussian kernel for the spatial scale parameters σ 1 = 8 and σ 2 = 4 and image orientation φ = π/6, with its directional derivatives up to order 4, computed by applying small support directional derivative approximation masks of the form (115) to a sampled affine Gaussian kernel according to (165), based the explicit parameterization of affine Gaussian kernels according to Appendix A.7. (Horizontal axes: x-coordinate ∈ [−32, 32].Vertical axes: y-coordinate ∈ [−32, 32].Colour coding: positive values in red, negative values in blue.) spread measures for Gaussian derivative kernels This appendix gives explicit expressions for spread measures in terms of the standard deviations of the absolute values of the 1-D Gaussian derivative kernels 14 Sα = x∈R x 2 |g x α (x; σ)| dx x∈R |g x α (x; σ)| dx (141) exact expression for S 4 (σ) is given in Figure22.The calculations have been performed in Mathematica.A.5 Diffusion polynomials in the 1-D continuous caseThis appendix lists diffusion polynomials, that satisfy the 1L(x; 0) = f (x), for f (x) being monomials f (x) = x k (148)
Of order 4 embedded in masks of size 5 × 5: in relation to an evaluation of what is referred to as "Lindeberg's smoothing method" byRey-Otero and Delbracio (2016) for a corresponding continuous Gaussian derivative kernel.Explicit expressions for the latter spread measures S α (s) computed from continuous Gaussian derivative kernels are given in Appendix A.4. -Spatial spread measure offset: To quantify the absolute deviation between the above measured spatial spread measure V (|T x α (•; s)|) with the corresponding ideal value V (|g x α (•; s)|) for a continuous Gaussian derivative kernel, we will measure this offset in terms of the entity