Discrete Approximations of Gaussian Smoothing and Gaussian Derivatives

Lindeberg, Tony

doi:10.1007/s10851-024-01196-9

Discrete Approximations of Gaussian Smoothing and Gaussian Derivatives

Open access
Published: 17 June 2024

(2024)
Cite this article

Download PDF

You have full access to this open access article

Journal of Mathematical Imaging and Vision Aims and scope Submit manuscript

Discrete Approximations of Gaussian Smoothing and Gaussian Derivatives

Download PDF

Tony Lindeberg¹

159 Accesses
Explore all metrics

Abstract

This paper develops an in-depth treatment concerning the problem of approximating the Gaussian smoothing and the Gaussian derivative computations in scale-space theory for application on discrete data. With close connections to previous axiomatic treatments of continuous and discrete scale-space theory, we consider three main ways of discretizing these scale-space operations in terms of explicit discrete convolutions, based on either (i) sampling the Gaussian kernels and the Gaussian derivative kernels, (ii) locally integrating the Gaussian kernels and the Gaussian derivative kernels over each pixel support region, to aim at suppressing some of the severe artefacts of sampled Gaussian kernels and sampled Gaussian derivatives at very fine scales, or (iii) basing the scale-space analysis on the discrete analogue of the Gaussian kernel, and then computing derivative approximations by applying small-support central difference operators to the spatially smoothed image data.

We study the properties of these three main discretization methods both theoretically and experimentally and characterize their performance by quantitative measures, including the results they give rise to with respect to the task of scale selection, investigated for four different use cases, and with emphasis on the behaviour at fine scales. The results show that the sampled Gaussian kernels and the sampled Gaussian derivatives as well as the integrated Gaussian kernels and the integrated Gaussian derivatives perform very poorly at very fine scales. At very fine scales, the discrete analogue of the Gaussian kernel with its corresponding discrete derivative approximations performs substantially better. The sampled Gaussian kernel and the sampled Gaussian derivatives do, on the other hand, lead to numerically very good approximations of the corresponding continuous results, when the scale parameter is sufficiently large, in most of the experiments presented in the paper, when the scale parameter is greater than a value of about 1, in units of the grid spacing. Below a standard deviation of about 0.75, the derivative estimates obtained from convolutions with the sampled Gaussian derivative kernels are, however, not numerically accurate or consistent, while the results obtained from the discrete analogue of the Gaussian kernel, with its associated central difference operators applied to the spatially smoothed image data, are then a much better choice.

Edge Detection Using Topological Gradients: A Scale-Space Approach

Article 23 January 2015

Scale and Edge Detection with Topological Derivatives

Detection of Singularities by Discrete Multiscale Directional Representations

Article 11 July 2017

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

When operating on image data, the earliest layers of image operations are usually expressed in terms of receptive fields, which means that the image information is integrated over local support regions in image space. For modelling such operations, the notion of scale-space theory [24, 31, 34,35,36, 44, 46, 50, 84, 86, 96, 97] stands out as a principled theory, by which the shapes of the receptive fields can be determined from axiomatic derivations, that reflect desirable theoretical properties of the first stages of visual operations.

In summary, this theory states that convolutions with Gaussian kernels and Gaussian derivatives constitutes a canonical class of image operations as a first layer of visual processing. Such spatial receptive fields, or approximations thereof, can, in turn, be used as basis for expressing a large variety of image operations, both in classical computer vision [8, 12, 15, 42, 48, 49, 51, 53, 62, 63, 79, 89] and more recently in deep learning [26, 32, 57, 58, 69, 72, 78].

The theory for the notion of scale-space representation does, however, mainly concern continuous image data, while implementations of this theory on digital computers require a discretization over image space. The subject of this article is to describe and compare a number of basic approaches for discretizing the Gaussian convolution operation, as well as convolutions with Gaussian derivatives.

While one could possibly argue that at sufficiently coarse scales, where sampling effects ought to be small, the influence of choosing one form of discrete implementation compared to some other ought to be negligible, or at least of minor effect, there are situations where it is desirable to apply scale-space operations at rather fine scales, and then also to be reasonably sure that one would obtain desirable response properties of the receptive fields.

One such domain, and which motivates the present deeper study of discretization effects for Gaussian smoothing operations and Gaussian derivative computations at fine scales, is when applying Gaussian derivative operations in deep networks, as done in a recently developed subdomain of deep learning [26, 32, 57, 58, 69, 72, 78].

A practical observation, that one may make, when working with deep learning, is that deep networks may tend to have a preference to computing image representations at very fine scale levels. For example, empirical results indicate that deep networks often tend to perform image classification based on very fine-scale image information, corresponding to the local image texture on the surfaces of objects in the world. Indirect support for such a view may also be taken from the now well-established fact that deep networks may be very sensitive to adversarial perturbations, based on adding deliberately designed noise patterns of very low amplitude to the image data [3, 5, 30, 65, 85]. That observation demonstrates that deep networks may be very strongly influenced by fine-scale structures in the input image. Another observation may be taken from working with deep networks based on using Gaussian derivative kernels as the filter weights. If one designs such a network with complementary training of the scale levels for the Gaussian derivatives, then a common result is that the network will prefer to base its decisions based on receptive fields at rather fine scale levels.

When to implement such Gaussian derivative networks in practice, one hence faces the need for being able to go below the rule of thumb for classical computer vision, of not attempting to operate below a certain scale threshold, where the standard deviation of the Gaussian derivative kernel should then not be below a value of say $1/\sqrt{2}$ or 1, in units of the grid spacing.

From a viewpoint of theoretical signal processing, one may possibly take the view to argue that one should use the sampling theorem to express a lower bound on the scale level, which one should then never go below. For regular images, as obtained from digital cameras, or already acquired data sets as compiled by the computer vision community, such an approach based on the sampling theorem, is, however, not fully possible in practice. First of all, we almost never, or at least very rarely, have explicit information about the sensor characteristics of the image sensor. Secondly, it would hardly be possible to model the imaging process in terms of an ideal bandlimited filter with frequency characteristics near the spatial sampling density of the image sensor. Applying an ideal bandpass filter to an already given digital image may lead to ringing phenomena near the discontinuities in the image data, which will lead to far worse artefacts for spatial image data than for e.g. signal transmission over information carriers in terms of sine waves.

Thus, the practical problem, that one faces when designing and applying a Gaussian derivative network to image data, is to in a practically feasible manner express a spatial smoothing process, that can smooth a given digital input image for any fine scale of the discrete approximation to a Gaussian derivative filter. A theoretical problem, that then arises, concerns how to design such a process, so that it can operate from very fine scale levels, possibly starting even at scale level zero corresponding to the original input data, without leading to severe discretization artefacts.

A further technical problem that arises is that even if one would take the a priori view of basing the implementation on the purely discrete theory for scale-space smoothing and scale-space derivatives developed in [43, 45], and as we have taken in our previous work on Gaussian derivative networks [57, 58], one then faces the problem of handling the special mathematical functions used as smoothing primitives in this theory (the modified Bessel functions of integer order) when propagating gradients for training deep networks backwards by automatic differentiation, when performing learning of the scale levels in the network. These necessary mathematical primitives do not exist as built-in functions in e.g. PyTorch [67], which implies that the user then would have to implement a PyTorch interface for these functions himself, or choose some other type of discretization method, if aiming to learn the scale levels in the Gaussian derivative networks by back propagation. There are a few related studies of discretizations of scale-space operations [41, 73, 83, 87, 94]. These do not, however, answer the questions that need to be addressed for the intended use cases for our developments.

Wang [94] proposed pyramid-like algorithms for computing multi-scale differential operators using a spline technique, however, then taking rather coarse steps in the scale direction. Lim and Stiehl [41] studied properties of discrete scale-space representations under discrete iterations in the scale direction, based on Euler’s forward method. For our purpose, we do, however, need to consider the scale direction as a continuum. Tschirsich and Kuijper [87] investigated the compatibility of topological image descriptors with a discrete scale-space representations and did also derive an eigenvalue decomposition in relation to the semi-discrete diffusion equation, that determines the evolution properties over scale, to enable efficient computations of discrete scale-space representations of the same image at multiple scales. With respect to our target application area, we are, however, more interested in computing image features based on Gaussian derivative responses, and then mostly also computing discrete scale-space representation at a single scale only, for each input image. Slavík and Stehlík [83] developed a theory for more general evolution equations over semi-discrete domains, which incorporates the 1-D discrete scale-space evolution family, that we consider here, as corresponding to convolutions with the discrete analogue of the Gaussian kernel, as a special case. For our purposes, we are, however, more interested in performing an in-depth study of different discrete approximations of the axiomatically determined class of Gaussian smoothing operations and Gaussian derivative operators, than expanding the treatment to other possible evolution equations over discrete spatial domains.

Rey-Otero and Delbracio [73] assumed that the image data can be regarded as bandlimited, and did then use a Fourier-based approach for performing closed-form Gaussian convolution over a reconstructed Fourier basis, which in that way constitutes a way to eliminate the discretization errors, provided that a correct reconstruction an underlying continuous image, notably before the image acquisition step, can be performed.^{Footnote 1} A very closely related approach, for computing Gaussian convolutions, based on a reconstruction of an assumed-to-be bandlimited signal, has also been previously outlined by Åström and Heyden [2].

As argued earlier in this introduction, the image data that is processed in computer vision are, however, not generally accompanied with characteristic information regarding the image acquisition process, specifically not with regard to what extent the image data could be regarded as bandlimited. Furthermore, one could question if the image data obtained from a modern camera sensor could at all be modelled as bandlimited, for a cutoff frequency very near the resolution of the image. Additionally, with regard to our target application domain of deep learning, one could also question if it would be manageable to invoke a Fourier-based image reconstruction step for each convolution operation in a deep network. In the work to be developed here, we are, on the other hand, more interested in developing a theory for discretizing the Gaussian smoothing and the Gaussian derivative operations at very fine levels of scale, in terms of explicit convolution operations, and based on as minimal as possible assumptions regarding the nature of the image data.

The purpose of this article is thus to perform a detailed theoretical analysis of the properties of different discretizations of the Gaussian smoothing operation and the Gaussian derivative computations at any scale, and with emphasis on reaching as near as possible to the desirable theoretical properties of the underlying scale-space representation, to hold also at very fine scales for the discrete implementation.

For performing such analysis, we will consider basic approaches for discretizing the Gaussian kernel in terms of either pure spatial sampling (the sampled Gaussian kernel) or local integration over each pixel support region (the integrated Gaussian kernel) and compare to the results of a genuinely discrete scale-space theory (the discrete analogue of the Gaussian kernel), see Fig. 1 for graphs of such kernels. After analysing and numerically quantifying the properties of these basic types of discretizations, we will then extend the analysis to discretizations of Gaussian derivatives in terms of either sampled Gaussian derivatives, integrated Gaussian derivatives, and compare to the results of a genuinely discrete theory based on convolutions with the discrete analogue of the Gaussian kernel followed by discrete derivative approximations computed by applying small-support central difference operators to the discrete scale-space representation. We will also extend the analysis to the computation of local directional derivatives, as a basis for filter-bank approaches for receptive fields, based on either the scale-space representation generated by convolution with rotationally symmetric Gaussian kernels, or the affine Gaussian scale space.

It will be shown that, with regard to the topic of raw Gaussian smoothing, the discrete analogue of the Gaussian kernel has the best theoretical properties, out of the discretization methods considered. For scale values when the standard deviation of the continuous Gaussian kernel is above 0.75 or 1, the sampled Gaussian kernel does also have very good properties, and leads to very good approximations of the corresponding fully continuous results. The integrated Gaussian kernel is better at handling fine scale levels than the sampled Gaussian kernel, but does, however, comprise a scale offset that hampers its accuracy in approximating the underlying continuous theory.

Concerning the topic of approximating the computation of Gaussian derivative responses, it will be shown that the approach based on convolution with the discrete analogue of the Gaussian kernel followed by central difference operations has the clearly best properties at fine scales, out of the studied three main approaches. In fact, when the standard deviation of the underlying continuous Gaussian kernel is a bit below about 0.75, the sampled Gaussian derivative kernels and the integrated Gaussian derivative kernels do not lead to accurate numerical estimates of derivatives, when applied to monomials of the same order as the order of spatial differentiation, or lower. Over an intermediate scale range in the upper part of this scale interval, the integrated Gaussian derivative kernels do, however, have somewhat better properties than the sampled Gaussian derivative kernels. For the discrete approximations of Gaussian derivatives defined from convolutions with the discrete analogue of the Gaussian kernel followed by central differences, the numerical estimates of derivatives obtained by applying this approach to monomials of the same order as the order of spatial differentiation do, on the other hand, lead to derivative estimates exactly equal to their continuous counterparts, and also over the entire scale range.

For larger scale values, for standard deviations greater than about 1, relative to the grid spacing, in the experiments to be reported in the paper, the discrete approximations of Gaussian derivatives obtained from convolutions with sampled Gaussian derivatives do on the other hand lead to numerically very accurate approximations of the corresponding results obtained from the purely continuous scale-space theory. For the discrete derivative approximations obtained by convolutions with the integrated Gaussian derivatives, the box integration introduces a scale offset, that hampers the accuracy of the approximation of the corresponding expressions obtained from the fully continuous scale-space theory. The integrated Gaussian derivative kernels do, however, degenerate less seriously than the sampled Gaussian derivative kernels within a certain range of very fine scales. Therefore, they may constitute an interesting alternative, if the mathematical primitives needed for the discrete analogues of the Gaussian derivative are not fully available within a given system for programming deep networks.

For simplicity, we do in this treatment restrict ourselves to image operations that operate in terms of discrete convolutions only. In this respect, we do not consider implementations in terms of Fourier transforms, which are also possible, while less straightforward in the context of deep learning. We do furthermore not consider extensions to spatial interpolation operations, which operate between the positions of the image pixels, and which can be highly useful, for example, for locating the positions of image features with subpixel accuracy [10, 11, 91, 92, 95, 104]. We do additionally not consider representations that perform subsamplings at coarser scales, which can be useful for reducing the amount of computational work [13, 16, 17, 59, 62, 81, 82], or representations that aim at speeding up the spatial convolutions on serial computers based on performing the computations in terms of spatial recursive filters [14, 19, 22, 27, 93, 99]. For simplicity, we develop the theory for the special cases of 1-D signals or 2-D images, while extensions to higher-dimensional volumetric images is straightforward, as implied by separable convolutions for the scale-space concept based on convolutions with rotationally symmetric Gaussian kernels.

Concerning experimental evaluations, we do in this paper deliberately focus on and restrict ourselves to the theoretical properties of different discretization methods, and only report performance measures based on such theoretical properties. One motivation for this approach is that the integration with different types of visual modules may call for different relative properties of the discretization methods. We therefore want this treatment to be timeless, and not biased to the integration with particular computer vision methods or algorithms that operate on the output from Gaussian smoothing operations or Gaussian derivatives. Experimental evaluations with regard to Gaussian derivative networks will be reported in follow-up work. The results from this theoretical analysis should therefore be more generally applicable to a larger variety of approaches in classical computer vision, as well as to other deep learning approaches that involve Gaussian derivative operators.

2 Discrete Approximations of Gaussian Smoothing

The Gaussian scale-space representation $L(x, y;\; s)$ of a 2-D spatial image f(x, y) is defined by convolution with 2-D Gaussian kernels of different sizes

$$\begin{aligned} g_{2\text {D}}(x, y;\; s) = \frac{1}{2 \pi s} \, e^{-(x^2 + y^2)/2s} \end{aligned}$$

(1)

according to [24, 31, 34, 44, 50, 86, 96, 97]

$$\begin{aligned} L(x, y;\; s) = \int \limits _{\xi \in {{\mathbb {R}}}} \int \limits _{\eta \in {{\mathbb {R}}}} g_{2\text {D}}(\xi , \eta ;\; s) \, f(x - \xi , y - \eta ) \, \hbox {d}\xi \, \hbox {d}\eta . \end{aligned}$$

(2)

Equivalently, this scale-space representation can be seen as defined by the solution of the 2-D diffusion equation

$$\begin{aligned} \partial _s L = \frac{1}{2} \, ( \partial _{xx} L + \partial _{yy} L) \end{aligned}$$

(3)

with initial condition $L(x, y;\; 0) = f(x, y)$.

2.1 Theoretical Properties of Gaussian Scale-Space Representation

2.1.1 Non-Creation of New Structure with Increasing Scale

The Gaussian scale space, generated by convolving an image with Gaussian kernels, obeys a number of special properties, that ensure that the transformation from any finer scale level to any coarser scale level is guaranteed to always correspond to a simplification of the image information:

Non-creation of local extrema For any one-dimensional signal f, it can be shown that the number of local extrema in the 1-D Gaussian scale-space representation at any coarser scale $s_2$ is guaranteed to not be higher than the number of local extrema at any finer scale $s_1 < s_2$.
Non-enhancement of local extrema For any N-dimensional signal, it can be shown that the derivative of the scale-space representation with respect to the scale parameter $\partial _s L$ is guaranteed to obey $\partial _s L \le 0$ at any local spatial maximum point and $\partial _s L \ge 0$ at any local spatial minimum point. In this respect, the Gaussian convolution operation has a strong smoothing effect.

In fact, the Gaussian kernel can be singled out as the unique choice of smoothing kernel as having these properties, from axiomatic derivations, if combined with the requirement of a semi-group property over scales

$$\begin{aligned} g_{2\text {D}}(\cdot , \cdot ;\; s_1) * g_{2\text {D}}(\cdot , \cdot ;\; s_2) = g_{2\text {D}}(\cdot , \cdot ;\; s_1 + s_2) \end{aligned}$$

(4)

and certain regularity assumptions, see Theorem 5 in [43], Theorem 3.25 in [44] and Theorem 5 in [50] for more specific statements.

For related treatments about theoretically principled scale-space axiomatics, see also Koenderink [34], Babaud et al. [4], Yuille and Poggio [103], Koenderink and van Doorn [36], Pauwels et al. [68], Lindeberg [47], Weickert et al. [96] and Duits et al. [20].

2.1.2 Cascade Smoothing Property

Due to the semi-group property, it follows that the scale-space representation at any coarser scale $L(x, y;\; s_2)$ can be obtained by convolving the scale-space representation at any finer scale $L(x, y;\; s_1)$ with a Gaussian kernel parameterized by the scale difference $s_2 - s_1$:

$$\begin{aligned} L(\cdot , \cdot ;\; s_2) = g_{2\text {D}}(\cdot , \cdot ;\; s_2 - s_1) * L(\cdot , \cdot ;\; s_1). \end{aligned}$$

(5)

This form of cascade smoothing property is an essential property of a scale-space representation, since it implies that the transformation from any finer scale level $s_1$ to any coarser scale level $s_2$ will always be a simplifying transformation, provided that the convolution kernel used for the cascade smoothing operation corresponds to a simplifying transformation.

2.1.3 Spatial Averaging

The Gaussian kernel is non-negative

$$\begin{aligned} g_{2\text {D}}(x, y;\; s) \ge 0 \end{aligned}$$

(6)

and normalized to unit $L_1$-norm

$$\begin{aligned} \int \limits _{(x, y) \in {{\mathbb {R}}}^2} g_{2\text {D}}(x, y;\; s) = 1. \end{aligned}$$

(7)

In these respects, Gaussian smoothing corresponds to a spatial averaging process, which constitutes one of the desirable attributes of a smoothing process intended to reflect different spatial scales in image data.

2.1.4 Separable Gaussian Convolution

Due to the separability of the 2-D Gaussian kernel

$$\begin{aligned} g_{2\text {D}}(x, y;\; s) = g(x;\; s) \, g(y;\; s), \end{aligned}$$

(8)

where the 1-D Gaussian kernel is of the form

$$\begin{aligned} g(x;\; s) = \frac{1}{\sqrt{2 \pi s}} \, e^{-x^2/2s}, \end{aligned}$$

(9)

the 2-D Gaussian convolution operation (2) can also be written as two separable 1-D convolutions of the form

$$\begin{aligned}&L(x, y;\; s) = \int \limits _{\xi \in {{\mathbb {R}}}} g(\xi ;\, s) \, \left( \int \limits _{\eta \in {{\mathbb {R}}}} g(\eta ;\; s) \, f(x - \xi , y - \eta ) \, \hbox {d}\eta \right) \, \hbox {d}\xi . \end{aligned}$$

(10)

Methods that implement Gaussian convolution in terms of explicit discrete convolutions usually exploit this separability property, since if the Gaussian kernel is truncated^{Footnote 2} at the tails for $x = \pm N$, the computational work for separable convolution will be of the order

$$\begin{aligned} W_{\text{ sep }} = 2 \, (2 N + 1) \end{aligned}$$

(11)

per image pixel, whereas it would be of order

$$\begin{aligned} W_{\text{ non-sep }} = (2 N + 1)^2 \end{aligned}$$

(12)

for non-separable 2-D convolution.

2.2 Modelling Situation for Theoretical Analysis of Different Approaches for Implementing Gaussian Smoothing Discretely

From now on, we will, for simplicity, only consider the case with 1-D Gaussian convolutions of the form

$$\begin{aligned} L(x;\; s) = \int \limits _{\xi \in {{\mathbb {R}}}} g(\xi ;\, s) \, f(x - \xi ) \, \hbox {d}\xi , \end{aligned}$$

(13)

which are to be implemented in terms of discrete convolutions of the form

$$\begin{aligned} L(x;\; s) = \sum _{n \in {{\mathbb {Z}}}} \, T(n;\, s) \, f(x - n), \end{aligned}$$

(14)

for some family of discrete filter kernels $T(n;\; s)$.

2.2.1 Measures of the Spatial Extent of Smoothing Kernels

The spatial extent of these 1-D kernels can be described by the scale parameter s, which represents the spatial variance of the convolution kernel

$$\begin{aligned}{} & {} V(g(\cdot ;\; s)) = \nonumber \\{} & {} \quad = \frac{\int \limits _{x \in {{\mathbb {R}}}} x^2 \, g(x;\; s) \, \hbox {d}x}{\int \limits _{x \in {{\mathbb {R}}}} g(x;\; s) \, \hbox {d}x} - \left( \frac{\int \limits _{x \in {{\mathbb {R}}}} x \, g(x;\; s) \, \hbox {d}x}{\int \limits _{x \in {{\mathbb {R}}}} g(x;\; s) \, \hbox {d}x} \right) ^2 = s, \end{aligned}$$

(15)

and which can also be parameterized in terms of the standard deviation

$$\begin{aligned} \sigma = \sqrt{s}. \end{aligned}$$

(16)

For the discrete kernels, the spatial variance is correspondingly measured as

$$\begin{aligned}{} & {} V(T(\cdot ;\; s)) = \nonumber \\{} & {} \quad = \frac{\sum _{n \in {{\mathbb {Z}}}} n^2 \, T(n;\; s)}{\sum _{n \in {{\mathbb {Z}}}} T(n;\; s)} - \left( \frac{\sum _{n \in {{\mathbb {Z}}}} n \, T(n;\; s)}{\sum _{n \in {{\mathbb {Z}}}} T(n;\; s)} \right) ^2. \end{aligned}$$

(17)

2.3 The Sampled Gaussian Kernel

The presumably simplest approach for discretizing the 1-D Gaussian convolution integral (13) in terms of a discrete convolution of the form (14) is by choosing the discrete kernel $T(n;\; s)$ as the sampled Gaussian kernel

$$\begin{aligned} T_{\text{ sampl }}(n;\; s) = g(n;\; s). \end{aligned}$$

(18)

While this choice is easy to implement in practice, there are, however, three major conceptual problems with using such a discretization at very fine scales:

the filter coefficients may not be limited to the interval [0, 1],
the sum of the filter coefficients may become substantially greater than 1, and
the resulting filter kernel may have too narrow shape, in the sense that the spatial variance of the discrete kernel $V(T_{\text{ sampl }}(\cdot ;\; s))$ is substantially smaller than the spatial variance $V(g(\cdot ;\; s))$ of the continuous Gaussian kernel.

The first two problems imply that the resulting discrete spatial smoothing kernel is no longer a spatial weighted averaging kernel in the sense of Sect. 2.1.3, which implies problems, if attempting to interpret the result of convolutions with the sampled Gaussian kernels as reflecting different spatial scales. The third problem implies that there will not be a direct match between the value of the scale parameter provided as argument to the sampled Gaussian kernel and the scales that the discrete kernel would reflect in the image data.

Figures 2 and 3 show numerical characterizations of these entities for a range of small values of the scale parameter.

More fundamentally, it can be shown (see Section VII.A in Lindeberg [43]) that convolution with the sampled Gaussian kernel is guaranteed to not increase the number of local extrema (or zero-crossings) in the signal from the input signal to any coarser level of scale. The transformation from an arbitrary scale level to some other arbitrary coarser scale level is, however, not guaranteed to obey such a simplification property between any pair of scale levels. In this sense, convolutions with sampled Gaussian kernels do not truly obey non-creation of local extrema from finer to coarser levels of scale, in the sense described in Sect. 2.1.1.

2.4 The Normalized Sampled Gaussian Kernel

A straightforward, but ad hoc, way of avoiding the problems that the discrete filter coefficients may, for small values of the scale parameter have their sum exceed 1, is by normalizing the sampled Gaussian kernel with its discrete $l_1$-norm:

$$\begin{aligned} T_{\text{ normsampl }}(n;\; s) = \frac{g(n;\; s)}{\sum _{m \in {{\mathbb {Z}}}} g(m;\; s)}. \end{aligned}$$

(19)

By definition, we in this way avoid this problems that the regular sampled Gaussian kernel is not spatial weighted averaging kernel in the sense of Sect. 2.1.3.

The problem that the spatial variance of the discrete kernel $V(T_{\text{ normsampl }}(\cdot ;\; s))$ is substantially smaller that the spatial variance $V(g(\cdot ;\; s))$ of the continuous Gaussian kernel, will, however, persist, since the variance of a kernel is not affected by a uniform scaling of its amplitude values. In this sense, the resulting discrete kernels will not for small scale values accurately reflect the spatial scale corresponding to the scale argument, as specified by the scale parameter s.

2.5 The Integrated Gaussian Kernel

A possibly better way of enforcing the weights of the filter kernels to sum up to 1, is by instead letting the discrete kernel be determined by the integral of the continuous Gaussian kernel over each pixel support region [44, Equation (3.89)]

$$\begin{aligned} T_{\text{ int }}(n;\; s) = \int \limits _{x = n - 1/2}^{n + 1/2} g(x;\; s) \, \hbox {d}x, \end{aligned}$$

(20)

which in terms of the scaled error function ${\text {erg}}(x;\; s)$ can be expressed as

$$\begin{aligned} T_{\text{ int }}(n;\; s) = {\text {erg}}\left( n + \tfrac{1}{2};\; s\right) - {\text {erg}}\left( n - \tfrac{1}{2};\; s\right) \end{aligned}$$

(21)

with

$$\begin{aligned} {\text {erg}}(x;\; s) = \frac{1}{2} \left( 1 + {\text {erf}} \left( \frac{x}{\sqrt{2 s}} \right) \right) , \end{aligned}$$

(22)

where ${\text {erf}}(x)$ denotes the regular error function according to

$$\begin{aligned} {\text {erf}}(x) = \frac{2}{\sqrt{\pi }} \int \limits _{t = 0}^x e^{-t^2} dt. \end{aligned}$$

(23)

A conceptual argument for defining the integrated Gaussian kernel model is that, we may, given a discrete signal f(n), define a continuous signal ${\tilde{f}}(x)$, by letting the values of the signal in each pixel support region be equal to the value of the corresponding discrete signal, see Appendix A.2 for an explicit derivation. In this sense, there is a possible physical motivation for using this form of scale-space discretization.

By the continuous Gaussian kernel having its integral equal to 1, it follows that the sum of the discrete filter coefficients will over an infinite spatial domain also be exactly equal to 1. Furthermore, the discrete filter coefficients are also guaranteed to be in the interval [0, 1]. In these respects, the resulting discrete kernels will represent a true spatial weighting process, in the sense of Sect. 2.1.3.

Concerning the spatial variances $V(T_{\text{ int }}(\cdot ;\; s))$ of the resulting discrete kernels, they will also for smaller scale values be closer to the spatial variances $V(g(\cdot ;\; s))$ of the continuous Gaussian kernel, than for the sampled Gaussian kernel or the normalized sampled Gaussian kernel, as shown in Figs. 3 and 4. For larger scale values, the box integration over each pixel support region, will, on the other hand, however, introduce a scale offset, which for larger values of the scale parameter s approaches

$$\begin{aligned} \varDelta s_{\text{ int }} = \frac{1}{12} \approx 0.0833, \end{aligned}$$

(24)

which, in turn, corresponds to the spatial variance of a continuous box filter over each pixel support region, defined by

$$\begin{aligned} w_{\text{ box }} = \left\{ \begin{array}{ll} 1 &{} \text{ if } |x| \le \frac{1}{2}, \\ 0 &{} \text{ otherwise, } \end{array} \right. \end{aligned}$$

(25)

and which is used for defining the integrated Gaussian kernel from the continuous Gaussian kernel in (20).

Figure 3 shows a numerical characterization of the difference in scale values between the variance $V(T_{\text{ int }}(n;\; s))$ of the discrete integrated Gaussian kernel and the scale parameter s provided as argument to this function.

In terms of theoretical scale-space properties, it can be shown that the transformation from the input signal to any coarse scale always implies a simplification, in the sense that the number of local extrema (or zero-crossings) at any coarser level of scale is guaranteed to not exceed the number of local extrema (or zero-crossings) in the input signal (see Section 3.6.3 in Lindeberg [44]). The transformation from any finer scale level to any coarser scale level will, however, not be guaranteed to obey such a simplification property. In this respect, the integrated Gaussian kernel does not fully represent a discrete scale-space transformation, in the sense of Sect. 2.1.1.

2.6 The Discrete Analogue of the Gaussian Kernel

According to a genuinely discrete theory for spatial scale-space representation in Lindeberg [43], the discrete scale space is defined from discrete kernels of the form

$$\begin{aligned} T_{\text{ disc }}(n;\; s) = e^{-s} I_n(s), \end{aligned}$$

(26)

where $I_n(s)$ denote the modified Bessel functions of integer order (see [1]), which are related to the regular Bessel functions $J_n(z)$ of the first kind according to

$$\begin{aligned} I_n(x) = i^{-n} \, J_n(i \, x) = e^{-\frac{n \pi i}{2}} \, J_n\left( e^{\frac{i \pi }{2}}x\right) , \end{aligned}$$

(27)

and which for integer values of n, as we will restrict ourselves to here, can be expressed as

$$\begin{aligned} I_n(x) = \frac{1}{\pi } \int \limits _{\theta = 0}^{\pi } e^{x \cos \theta } \cos (n \, \theta ) \, \hbox {d}\theta . \end{aligned}$$

(28)

The discrete analogue of the Gaussian kernel $T_{\text{ disc }}(n;\; s)$ does specifically have the practically useful properties that:

the filter coefficients are guaranteed to be in the interval [0, 1],
the filter coefficients sum up to 1 (see Equation (3.43) in [44])
$$\begin{aligned} \sum _{n \in {{\mathbb {Z}}}} T_{\text{ disc }}(n;\; s) = 1, \end{aligned}$$
(29)
the spatial variance of the discrete kernel is exactly equal to the scale parameter (see Equation (3.53) in [44])
$$\begin{aligned} V(T_{\text{ disc }}(\cdot ;\; s)) = s. \end{aligned}$$
(30)

These kernels do also exactly obey a semi-group property over spatial scales (see Equation (3.41) in [44])

$$\begin{aligned} T_{\text{ disc }}(\cdot ;\; s_1) * T_{\text{ disc }}(\cdot ;\; s_2) = T_{\text{ disc }}(\cdot ;\; s_1 + s_2), \end{aligned}$$

(31)

which implies that the resulting discrete scale-space representation also obeys an exact cascade smoothing property

$$\begin{aligned} L_{\text{ disc }}(\cdot ;\; s_2) = T_{\text{ disc }}(\cdot ;\; s_2 - s_1) * L_{\text{ disc }}(\cdot ;\; s_1). \end{aligned}$$

(32)

More fundamentally, these discrete kernels do furthermore preserve scale-space properties to the discrete domain, in the sense that:

the number of local extrema (or zero-crossings) at a coarser scale is guaranteed to not exceed the number of local extrema (or zero-crossings) at any finer scale,
the resulting discrete scale-space representation is guaranteed to obey non-enhancement of local extrema, in the sense that the value at any local maximum is guaranteed to not increase with increasing scale, and that the value at any local minimum is guaranteed to not decrease with increasing scale.

In these respects, the discrete analogue of the Gaussian kernel obeys all the desirable theoretical properties of a discrete scale-space representation, corresponding to discrete analogues of the theoretical properties of the Gaussian scale-space representation stated in Sect. 2.1.

Specifically, the theoretical properties of the discrete analogue of the Gaussian kernel are better than the theoretical properties of the sampled Gaussian kernel, the normalized sampled Gaussian kernel or the integrated Gaussian kernel.

2.6.1 Diffusion Equation Interpretation of the Genuinely Discrete Scale-Space Representation Concept

In terms of diffusion equations, the discrete scale-space representation generated by convolving a 1-D discrete signal f by the discrete analogue of the Gaussian kernel according to (26)

$$\begin{aligned} L(x;\; s) = \sum _{n \in {{\mathbb {Z}}}} T_{\text{ disc }}(n;\, s) \, f(x - n) \end{aligned}$$

(33)

satisfies the semi-discrete 1-D diffusion equation [44] Theorem 3.28

$$\begin{aligned} \partial _s L = \frac{1}{2} \, \delta _{xx} L \end{aligned}$$

(34)

with initial condition $L(x;\; 0) = f(x)$, where $\delta _{xx}$ denotes the second-order discrete difference operator

$$\begin{aligned} \delta _{xx} = (+1, -2, +1). \end{aligned}$$

(35)

Over a 2-D discrete spatial domain, the discrete scale-space representation of an image f(x, y), generated by separable convolution with the discrete analogue of the Gaussian kernel

$$\begin{aligned} L(x, y;\; s) = \sum _{m \in {{\mathbb {Z}}}} T(m;\; s) \sum _{n \in {{\mathbb {Z}}}} \, T(n;\; s) \, f(x-m, y-n),\nonumber \\ \end{aligned}$$

(36)

satisfies the semi-discrete 2-D diffusion equation [44] Proposition 4.14

$$\begin{aligned} \partial _s L = \frac{1}{2} \, \nabla _5^2 L \end{aligned}$$

(37)

with initial condition $L(x, y;\; 0) = f(x, y)$, where $\nabla _5^2$ denotes the following discrete approximation of the Laplacian operator

$$\begin{aligned} \nabla _5^2 = \left( \begin{array}{ccc} 0 &{} +1 &{} 0 \\ +1 &{} -4 &{} +1 \\ 0 &{} -1 &{} 0 \end{array} \right) . \end{aligned}$$

(38)

In this respect, the discrete scale-space representation generated by convolution with the discrete analogue of the Gaussian kernel can be seen as a purely spatial discretization of the continuous diffusion equation (3), which can serve as an equivalent way of defining the continuous scale-space representation.

2.7 Performance Measures for Quantifying Deviations from Theoretical Properties of Discretizations of Gaussian Kernels

To quantify the deviations between properties of the discrete kernels, and desirable properties of the discrete kernels that are to transfer the desirable properties of a continuous scale-space representation to a corresponding discrete implementation, we will in this section quantity such deviations in terms of the following error measures:

Normalization error The difference between the $l_1$-norm of the discrete kernels and the desirable unit $l_1$-norm normalization will be measured by^{Footnote 3}
$$\begin{aligned} E_{\text{ norm }}(T(\cdot ;\; s)) = \sum _{n \in {{\mathbb {Z}}}} T(n;\; s) - 1. \end{aligned}$$
(39)
Absolute scale difference The difference between the variance of the discrete kernel and the argument of the scale parameter will be measured by
$$\begin{aligned} E_{\varDelta s}(T(\cdot ;\; s)) = V(T(\cdot ;\; s)) - s. \end{aligned}$$
(40)
This error measure is expressed in absolute units of the scale parameter. The reason, why we express this measure in units of the variance of the discretizations of the Gaussian kernel, is that variances are additive under convolutions of non-negative kernels.
Relative scale difference The relative scale difference, between the actual standard deviation of the discrete kernel and the argument of the scale parameter, will be measured by
$$\begin{aligned} E_{\text{ relscale }}(T(\cdot ;\; s)) = \sqrt{\frac{V(T(\cdot ;\; s))}{s}} - 1. \end{aligned}$$
(41)
This error measure is expressed in relative units of the scale parameter.^{Footnote 4} The reason, why we express this entity in units of the standard deviations of the discretizations of the Gaussian kernels, is that these standard deviations correspond to interpretations of the scale parameter in units of $[\text{ length}]$, in a way that is thus proportional to the scale level.
Cascade smoothing error The deviation from the cascade smoothing property of a scale-space kernel according to (5) and the actual result of convolving a discrete approximation of the scale-space representation at a given scale s, with its corresponding discretization of the Gaussian kernel, will be measured by
$$\begin{aligned} E_{\text{ cascade }}(T(\cdot ;\; s)) = \frac{\Vert T(\cdot ;\; s) * T(\cdot ;\; s) - T(\cdot ;\; 2s) \Vert _1}{\Vert T(\cdot ;\; 2s) \Vert _1}. \nonumber \\ \end{aligned}$$
(42)
While this measure of the cascade smoothing error could in principle instead be formulated for arbitrary relations between the scale level of the discrete approximation of the scale-space representation and the amount of additive spatial smoothing, we fix these scale levels to be equal for the purpose of conceptual simplicity.^{Footnote 5}

In the ideal theoretical case, all of these error measures should be equal to zero (up to numerical errors in the discrete computations). Any deviations from zero of these error measures do therefore represent a quantification of deviations from desirable theoretical properties in a discrete approximation of the Gaussian smoothing operation.

2.8 Numerical Quantifications of Performance Measures

In the following, we will show results of computing the above measures concerning desirable properties of discretizations of scale-space kernels for the cases of (i) the sampled Gaussian kernel, (ii) the integrated Gaussian kernel and (iii) the discrete analogue of the Gaussian kernel. Since the discretization effects are largest for small scale values, we will focus on the scale interval $\sigma \in [0.1, 2.0]$, however, in a few cases extended to the scale interval $\sigma \in [0.1, 4.0]$. (The reason for delimiting the scale parameter to the lower bound of $\sigma \ge 0.1$ is to avoid the singularity at $\sigma = 0$.)

2.8.1 Normalization Error

Figure 2 shows graphs of the $l_1$-norm-based normalization error $E_{\text{ norm }}(T(\cdot ;\; s))$ according to (39) for the main classes of discretizations of Gaussian kernels. For the integrated Gaussian kernel, the discrete analogue of the Gaussian kernel and the normalized sampled Gaussian kernel, the normalization error is identically equal to zero. For $\sigma \le 0.5$, the normalization error is, however, substantial for the regular sampled Gaussian kernel.

2.8.2 Standard Deviations of the Discrete Kernels

Figure 3 shows graphs of the standard deviations $\sqrt{V(T(\cdot ;\; s))}$ for the different main types of discretizations of the Gaussian kernels, which constitutes a natural measure of their spatial extent. For the discrete analogue of the Gaussian kernel, the standard deviation of the discrete kernel is exactly equal to the value of the scale parameter in units of $\sigma = \sqrt{s}$. For the sampled Gaussian kernel, the standard deviation is substantially lower than the value of the scale parameter in units of $\sigma = \sqrt{s}$ for $\sigma \le 0.5$. For the integrated Gaussian kernel, the standard deviation is for smaller values of the scale parameter closer to a the desirable linear trend. For larger values of the scale parameter, the standard deviation of the discrete kernel is, however, notably higher than $\sigma $.

2.8.3 Spatial Variance Offset of the Discrete Kernels

To quantify in a more detailed manner how the scale offset of the discrete approximations of Gaussian kernels depends upon the scale parameter, Fig. 4 shows graphs of the spatial variance-based scale difference measure $E_{\varDelta s}(T(\cdot ;\; s))$ according to (40) for the different discretization methods. For the discrete analogue of the Gaussian kernel, the scale difference is exactly equal to zero. For the sampled Gaussian kernel, the scale difference measure differs significantly from zero for $\sigma < 0.75$, while then rapidly approaching zero for larger scales. For the integrated Gaussian kernel, the variance-based scale difference measure does, however, not approach zero for larger scales. Instead, it approaches the numerical value $\varDelta s \approx 0.0833$, close to the spatial variance 1/12 of a box filter over each pixel support region. The spatial variance-based scale difference for the normalized sampled Gaussian kernel is equal to the spatial variance-based scale difference for the regular sampled Gaussian kernel.

2.8.4 Spatial Standard-Deviation-Based Relative Scale Difference

Figure 5 shows the spatial standard-deviation-based relative scale difference $E_{\text{ relscale }}(T(\cdot ;\; s))$ according to (41) for the main classes of discretizations of Gaussian kernels. This relative scale difference is exactly equal to zero for the discrete analogue of the Gaussian kernel. For scale values $\sigma < 0.75$, the relative scale difference is substantial for sampled Gaussian kernel, and then rapidly tends to zero for larger scales. For the integrated Gaussian kernel, the relative scale difference is significantly larger, while approaching zero with increasing scale. The relative scale difference for the normalized sampled Gaussian kernel is equal to the relative scale difference for the regular sampled Gaussian kernel.

2.8.5 Cascade Smoothing Error

Figure 6 shows the cascade smoothing error $E_{\text{ cascade }}(T(\cdot ;\; s))$ according to (42) for the main classes of discretizations of Gaussian kernels, while here complemented also with results for the normalized sampled Gaussian kernel, since the results for the latter kernel are different than for the regular sampled Gaussian kernel.

For exact numerical computations, this cascade smoothing error should be identically equal to zero for the discrete analogue of the Gaussian kernel. In the numerical implementation underlying these computations, there are, however, numerical errors of a low amplitude. For the sampled Gaussian kernel, the cascade smoothing error is very large for $\sigma \le 0.5$, notable for $\sigma < 0.75$, and then rapidly decreases with increasing scale. For the normalized sampled Gaussian kernel, the cascade smoothing error is for $\sigma \le 0.5$ significantly lower than for the regular sampled Gaussian kernel. For the integrated Gaussian kernel, the cascade smoothing error is lower than for the sampled Gaussian kernel for $\sigma \le 0.5$, while then decreasing much slower than for the sampled Gaussian kernel.

2.9 Summary of the Characterization Results from the Theoretical Analysis and the Quantitative Performance Measures

To summarize the theoretical and the experimental results presented in this section, the discrete analogue of the Gaussian kernel stands out as having the best theoretical properties in the stated respects, out of the set of treated discretization methods for the Gaussian smoothing operation.

The choice, concerning which method is preferable out of the choice between either the sampled Gaussian kernel or the integrated kernel, depends on whether one would prioritize the behaviour at either very fine scales or at coarse scales. The integrated Gaussian kernel has significantly better approximation of theoretical properties at fine scales, whereas its variance-based scale offset at coarser scales implies significantly larger deviations from the desirable theoretical properties at coarser scales, compared to either the sampled Gaussian kernel or the normalized sampled Gaussian kernel. The normalized sampled Gaussian kernel has properties closer to the desirable properties than the regular sampled Gaussian kernel. If one would introduce complementary mechanisms to compensate for the scale offset of the integrated Gaussian kernel, that kernel could, however, also constitute a viable solution at coarser scales.

3 Discrete Approximations of Gaussian Derivative Operators

According to the theory by Koenderink and van Doorn [35, 36], Gaussian derivatives constitute a canonical family of operators to derive from a Gaussian scale-space representation. Such Gaussian derivative operators can be equivalently defined by, either differentiating the Gaussian scale-space representation

$$\begin{aligned} L_{x^{\alpha } y^{\beta }}(x, y;\; s) = \partial _{x^{\alpha } y^{\beta }} L(x, y;\; s), \end{aligned}$$

(43)

or by convolving the input image by Gaussian derivative kernels

$$\begin{aligned}{} & {} L_{x^{\alpha } y^{\beta }}(x, y;\; s) = \nonumber \\{} & {} \quad = \int \limits _{\xi \in {{\mathbb {R}}}} \int \limits _{\eta \in {{\mathbb {R}}}} g_{2\textrm{D,}x^{\alpha } y^{\beta }}(\xi , \eta ;\; s) \, f(x - \xi , y - \eta ) \, \hbox {d}\xi \, \hbox {d}\eta , \nonumber \\ \end{aligned}$$

(44)

where

$$\begin{aligned} g_{2\textrm{D},x^{\alpha } y^{\beta }}(x, y;\; s) = \partial _{x^{\alpha } y^{\beta }} g_{2\text {D}}(x, y;\; s) \end{aligned}$$

(45)

and $\alpha $ and $\beta $ are non-negative integers.

3.1 Theoretical Properties of Gaussian Derivatives

Due to the cascade smoothing property of the Gaussian smoothing operation, in combination with the commutative property of differentiation under convolution operations, it follows that the Gaussian derivative operators also satisfy a cascade smoothing property over scales:

$$\begin{aligned} L_{x^{\alpha } y^{\beta }}(\cdot , \cdot ;\; s_2) = g_{2\textrm{D}}(\cdot , \cdot ;\; s_2 - s_1) * L_{x^{\alpha } y^{\beta }}(\cdot , \cdot ;\; s_1). \end{aligned}$$

(46)

Combined with the simplification property of the Gaussian kernel under increasing values of the scale parameter, it follows that the Gaussian derivative responses also obey such a simplifying property from finer to coarser levels of scale, in terms of (i) non-creation of new local extrema from finer to coarser levels of scale for 1-D signals, or (ii) non-enhancement of local extrema for image data over any number of spatial dimensions.

3.2 Separable Gaussian Derivative Operators

By the separability of the Gaussian derivative kernels

$$\begin{aligned} g_{2\textrm{D}, x^{\alpha } y^{\beta }}(x, y;\; s) = g_{x^{\alpha }}(x;\; s) \, g_{y^{\beta }}(y;\; s), \end{aligned}$$

(47)

the 2-D Gaussian derivative response can also be written as a separable convolution of the form

$$\begin{aligned}&L_{x^{\alpha } y^{\beta }}(x, y;\; s) = \nonumber \\&\quad = \int \limits _{\xi \in {{\mathbb {R}}}} g_{x^{\alpha }}(\xi ;\, s) \nonumber \\&\qquad \times \left( \int \limits _{\eta \in {{\mathbb {R}}}} g_{y^{\beta }}(\eta ;\; s) \, f(x - \xi , y - \eta ) \, \hbox {d}\eta \right) \, \hbox {d}\xi . \end{aligned}$$

(48)

In analogy with the previous treatment of purely Gaussian convolution operations, we will henceforth, for simplicity, consider the case with 1-D Gaussian derivative convolutions of the form

$$\begin{aligned} L_{x^{\alpha }}(x;\; s) = \int \limits _{\xi \in {{\mathbb {R}}}} g_{x^{\alpha }}(\xi ;\, s) \, f(x - \xi ) \, \hbox {d}\xi , \end{aligned}$$

(49)

which are to be implemented in terms of discrete convolutions of the form

$$\begin{aligned} L_{x^{\alpha }}(x;\; s) = \sum _{n \in {{\mathbb {Z}}}} \, T_{x^{\alpha }}(n;\, s) \, f(x - n) \end{aligned}$$

(50)

for some family of discrete filter kernels $T_{x^{\alpha }}(n;\; s)$.

3.2.1 Measures of the Spatial Extent of Gaussian Derivative or Derivative Approximation Kernels

The spatial extent (spread) of a Gaussian derivative operator $g_{x^{\alpha }}(\xi ;\, s)$ of the form (49) will be measured by the variance of its absolute value

$$\begin{aligned} S_{\alpha } = {{{\mathcal {S}}}}(g_{x^{\alpha }}(\cdot ;\, s)) = \sqrt{V(|g_{x^{\alpha }}(\cdot ;\, s)|)}. \end{aligned}$$

(51)

Explicit expressions for these spread measures computed for continuous Gaussian derivative kernels up to order 4 are given in Appendix A.4.

Correspondingly, the spatial extent of a discrete kernel $T_{x^{\alpha }}(n;\; s)$ designed to approximate a Gaussian derivative operator will be measured by the entity

$$\begin{aligned} {{{\mathcal {S}}}}(T_{x^{\alpha }}(\cdot ;\, s)) = \sqrt{V(|T_{x^{\alpha }}(\cdot ;\, s)|)}. \end{aligned}$$

(52)

3.3 Sampled Gaussian Derivative Kernels

In analogy with the previous treatment for the sampled Gaussian kernel in Sect. 2.3, the presumably simplest way to discretize the Gaussian derivative convolution integral (49), is by letting the discrete filter coefficients in the discrete convolution operation (50) be determined as sampled Gaussian derivatives

$$\begin{aligned} T_{\text{ sampl },x^{\alpha }}(n;\, s) = g_{x^{\alpha }}(n;\, s). \end{aligned}$$

(53)

Appendix A.1 describes how the Gaussian derivative kernels are related to the probabilistic Hermite polynomials and does also give explicit expressions for the 1-D Gaussian derivative kernels up to order 4.

For small values of the scale parameter, the resulting discrete kernels may, however, suffer from the following problems:

the $l_1$-norms of the discrete kernels may deviate substantially from the $L_1$-norms of the corresponding continuous Gaussian derivative kernels (with explicit expressions for the $L_1$-norms of the continuous Gaussian derivative kernels up to order 4 given in Appendix A.3),
the resulting filters may have too narrow shape, in the sense that the spatial variance of the absolute value of the discrete kernel $V(|T_{\text{ sampl },x^{\alpha }}(\cdot ;\, s)|)$ may differ substantially from the spatial variance of the absolute value of the corresponding continuous Gaussian derivative kernel $V(|g_{x^{\alpha }}(\cdot ;\, s)|)$ (see Appendix A.4 for explicit expressions for these spatial spread measures for the continuous Gaussian derivatives up to order 4).

Figures 9 and 10 show how the $l_1$-norms as well as the spatial spread measures vary as function of the scale parameter, with comparisons to the scale dependencies for the corresponding fully continuous measures.

3.4 Integrated Gaussian Derivative Kernels

In analogy with the treatment of the integrated Gaussian kernel in Sect. 2.5, a possible way of making the $l_1$-norm of the discrete approximation of a Gaussian derivative kernel closer to the $L_1$-norm of its continuous counterpart, is by defining the discrete kernel as the integral of the continuous Gaussian derivative kernel over each pixel support region

$$\begin{aligned} T_{\text{ int },x^{\alpha }}(n;\, s) = \int \limits _{x = n - 1/2}^{n + 1/2} g_{x^{\alpha }}(x;\; s) \, \hbox {d}x, \end{aligned}$$

(54)

again with a physical motivation of extending the discrete input signal f(n) to a continuous input signal $f_c(x)$, defined to be equal to the discrete value within each pixel support region, and then integrating that continuous input signal with a continuous Gaussian kernel, which does then correspond to convolving the discrete input signal with the corresponding integrated Gaussian derivative kernel (see Appendix A.2 for an explicit derivation).

Given that $g_{x^{\alpha - 1}}(x;\; s)$ is a primitive function of $g_{x^{\alpha }}(x;\; s)$, we can furthermore for $\alpha \ge 1$, write the relationship (54) as

$$\begin{aligned} T_{\text{ int },x^{\alpha }}(n;\; s) = g_{x^{\alpha -1}}\left( n + \tfrac{1}{2};\; s\right) - g_{x^{\alpha -1}}\left( n - \tfrac{1}{2};\; s\right) . \end{aligned}$$

(55)

With this definition, it follows immediately that the contributions to the $l_1$-norm of the discrete kernel $T_{\text{ int },x^{\alpha }}(n;\; s)$ will be equal to the contributions to the $L_1$-norm of $g_{x^{\alpha }}(n;\; s)$ over those pixels where the continuous kernel has the same sign over the entire pixel support region. For those pixels where the continuous kernel changes its sign within the support region of the pixel, however, the contributions will be different, thus implying that the contributions to the $l_1$-norm of the discrete kernel may be lower than the contributions to the $L_1$-norm of the corresponding continuous Gaussian derivative kernel, see Fig. 1 for an illustration of such graphs of integrated Gaussian derivative kernels.

Similarly to the previously treated case with the integrated Gaussian kernel, the integrated Gaussian derivative kernels will also imply a certain scale offset, as shown in Figs. 10 and 11.

3.5 Discrete Analogues of Gaussian Derivative Kernels

Common characteristics of the approximation methods for computing discrete Gaussian derivative responses considered so far are that the computation of each Gaussian derivative operator of a given order will imply a spatial convolution with a large-support kernel. Thus, the amount of necessary computational work will increase with the number of Gaussian derivative responses, that are to be used when constructing visual operations that base their processing steps on using Gaussian derivative responses as input.

A characteristic property of the theory for discrete derivative approximations with scale-space properties in Lindeberg [44, 45], however, is that discrete derivative approximations can instead be computed by applying small-support central difference operators to the discrete scale-space representation, and with preserved scale-space properties in terms of either (i) non-creation of local extrema with increasing scale for 1-D signals, or (ii) non-enhancement of local extrema towards increasing scales in arbitrary dimensions. With regard to the amount of computational work, this property specifically means that the amount of additive computational work needed, to add more Gaussian derivative responses as input to a visual module, will be substantially lower than for the previously treated discrete approximations, based on computing each Gaussian derivative response using convolutions with large-support spatial filters.

According to the genuinely discrete theory for defining discrete analogues of Gaussian derivative operators, discrete derivative approximations are, from the discrete scale-space representation, generated by convolution with the discrete analogue of the Gaussian kernel according to (26)

$$\begin{aligned} L(\cdot ;\; s) = T_{\text{ disc }}(\cdot ;\; s) * f(\cdot ), \end{aligned}$$

(56)

computed as

$$\begin{aligned} L_{x^{\alpha }}(x;\; s) = (\delta _{x^{\alpha }} L)(x;\; s), \end{aligned}$$

(57)

where $\delta _{x^{\alpha }}$ are small-support difference operators of the following forms in the special cases when $\alpha = 1$ or $\alpha = 2$

$$\begin{aligned}&\begin{aligned} \delta _x&= \left( -\tfrac{1}{2}, 0, +\tfrac{1}{2}\right) , \end{aligned} \end{aligned}$$

(58)

$$\begin{aligned}&\begin{aligned} \delta _{xx}&= (+1, -2, +1), \end{aligned} \end{aligned}$$

(59)

to ensure that the estimates of the first- and second-order derivatives are located at the pixel values, and not in between, and of the following forms for higher values of $\alpha $:

$$\begin{aligned} \delta _{x^{\alpha }} = \left\{ \begin{array}{ll} \delta _x (\delta _{xx})^i &{} \text{ if } \alpha = 1 + 2 i, \\ (\delta _{xx})^i &{} \text{ if } \alpha = 2 i, \end{array} \right. \end{aligned}$$

(60)

for integer i, where the special cases $\alpha = 3$ and $\alpha = 4$ then correspond to the difference operators

$$\begin{aligned}&\begin{aligned} \delta _{xxx}&= \left( -\tfrac{1}{2}, +1, 0, -1, +\tfrac{1}{2}\right) , \end{aligned}\end{aligned}$$

(61)

$$\begin{aligned}&\begin{aligned} \delta _{xxxx}&= (+1, -4, +6, -4, +1). \end{aligned} \end{aligned}$$

(62)

For 2-D images, corresponding discrete derivative approximations are then computed as straightforward extensions of the 1-D discrete derivative approximation operators

$$\begin{aligned} L_{x^{\alpha } y^{\beta }}(x, y;\; s) = (\delta _{x^{\alpha } y^{\beta }} L)(x, y;\; s) = (\delta _{x^{\alpha }} \delta _{y^{\beta }} L)(x, y;\; s), \nonumber \\ \end{aligned}$$

(63)

where $L(x, y;\; s)$ here denotes the discrete scale-space representation (36) computed using separable convolution with the discrete analogue of the Gaussian kernel (26) along each dimension.

In terms of explicit convolution kernels, computation of these types of discrete derivative approximations correspond to applying discrete derivative approximation kernels of the form

$$\begin{aligned} T_{\text{ disc },x^{\alpha }}(n;\; s) = (\delta _{x^{\alpha }} T_{\text{ disc }})(n;\; s) \end{aligned}$$

(64)

to the input data. In practice, such explicit derivative approximation kernels should not, however, never be applied for actual computations of discrete Gaussian derivative responses, since those operations can be carried out much more efficiently by computations of the forms (57) or (63), provided that the computations are carried out with sufficiently high numerical accuracy, so that the numerical errors do not grow too much because of cancellation of digits.

3.5.1 Cascade Smoothing Property

A theoretically attractive property of these types of discrete approximations of Gaussian derivative operators, is that they exactly obey a cascade smoothing property over scales, in 1-D of the form

$$\begin{aligned} L_{x^{\alpha }}(x;\; s_2) = T_{\text{ disc }}(\cdot ;\; s_2 - s_1) * L_{x^{\alpha }}(\cdot ;\; s_1), \end{aligned}$$

(65)

and in 2-D of the form

$$\begin{aligned} L_{x^{\alpha } y^{\beta }}(\cdot , \cdot ;\; s_2) = T_{\text{ disc }}(\cdot , \cdot ;\; s_2 - s_1) * L_{x^{\alpha } y^{\beta }}(\cdot , \cdot ;\; s_1), \end{aligned}$$

(66)

where $T_{\text{ disc }}(\cdot , \cdot ;\; s)$ here denotes the 2-D extension of the 1-D discrete analogue of the Gaussian kernel by separable convolution

$$\begin{aligned} T_{\text{ disc }}(m, n;\; s) = T_{\text{ disc }}(m;\; s) \, T_{\text{ disc }}(n;\; s). \end{aligned}$$

(67)

In practice, this cascade smoothing property implies that the transformation from any finer level of scale to any coarser level of scale is always a simplifying transformation, implying that this transformation always ensures: (i) non-creation of new local extrema (or zero-crossings) from finer to coarser levels of scale for 1-D signals, and (ii) non-enhancement of local extrema, in the sense that the derivative of the scale-space representation with respect to the scale parameter, always satisfies $\partial _{s} L \le 0$ at any local spatial maximum point and $\partial _{s} L \ge 0$ at any local spatial minimum point.

3.6 Numerical Correctness of the Derivative Estimates

To measure how well a discrete approximation of a Gaussian derivative operator reflects a differentiation operator, one can study the response properties to polynomials.^{Footnote 6} Specifically, in the 1-D case, the M:th-order derivative of an M-order monomial should be:

$$\begin{aligned} \partial _{x^M} ( x^M ) = M!. \end{aligned}$$

(68)

Additionally, the derivative of any lower-order polynomial should be zero:

$$\begin{aligned} \partial _{x^M} ( x^N ) = 0 \quad \quad \hbox { if}\ M > N. \end{aligned}$$

(69)

With respect to Gaussian derivative responses to monomials of the form

$$\begin{aligned} p_k(x) = x^k, \end{aligned}$$

(70)

the commutative property between continuous Gaussian smoothing and the computation of continuous derivatives then specifically implies that

$$\begin{aligned} g_{x^M}(\cdot ;\; s) * p_M(\cdot ) = M! \end{aligned}$$

(71)

and

$$\begin{aligned} g_{x^M}(\cdot ;\; s) * p_N(\cdot ) = 0 \quad \quad \hbox { if}\ M > N. \end{aligned}$$

(72)

If these relationships are not sufficiently well satisfied for the corresponding result of replacing a continuous Gaussian derivative operator by a numerical approximations of a Gaussian derivative, then the corresponding discrete approximation cannot be regarded as a valid approximation of the Gaussian derivative operator, that in turn is intended to reflect the differential structures in the image data.

It is therefore of interest to consider entities of the following type

$$\begin{aligned} P_{\alpha ,k}(s) = \left. (T_{x^{\alpha }}(\cdot ;\; s) * p_k(\cdot ))(x;\; s) \, \right| _{x = 0}, \end{aligned}$$

(73)

to characterize how well a discrete approximation $T_{x^{\alpha }}(n;\; s)$ of a Gaussian derivative operator of order $\alpha $ serves as a differentiation operator on a monomial of order k.

Figures 7 and 8 show the results of computing the responses of the discrete approximations of Gaussian derivative operators to different monomials in this way, up to order 4. Specifically, Fig. 7a shows the entity $P_{1,1}(s)$, which in the continuous case should be equal to 1. Figure 7b shows the entity $P_{2,2}(s)$, which in the continuous case should be equal to 2. Figure 7c shows the entity $P_{3,3}(s)$, which in the continuous case should be equal to 6. Figure 7d shows the entity $P_{4,4}(s)$, which in the continuous case should be equal to 24. Figure 8a and b show the entities $P_{3,1}(s)$ and $P_{4,2}(s)$, respectively, which in the continuous case should be equal to zero.

As can be seen from the graphs, the responses of the derivative approximation kernels to monomials of the same order as the order of differentiation do for the sampled Gaussian derivative kernels deviate notably from the corresponding ideal results obtained for continuous Gaussian derivatives, when the scale parameter is a bit below 0.75. For the integrated Gaussian derivative kernels, the responses of the derivative approximation kernels do also deviate when the scale parameter is a bit below 0.75. Within a narrow range of scale values in intervals of the order of [0.5, 0.75], the integrated Gaussian derivative kernels do, however, lead to somewhat lower deviations in the derivative estimates than for the sampled Gaussian derivative kernels. Also the responses of the third-order sampled Gaussian and integrated Gaussian derivative approximation kernels to a first-order monomial as well as the response of the fourth-order sampled Gaussian and integrated Gaussian derivative approximation kernels to a second-order monomial differ substantially from the ideal continuous values when the scale parameter is a bit below 0.75.

For the discrete analogues of the Gaussian derivative kernels, the results are, on the other hand, equal to the corresponding continuous counterparts, in fact, in the case of exact computations, exactly equal. This property can be shown by studying the responses of the central difference operators to the monomials, which are given by

$$\begin{aligned} \delta _{x^M} ( x^M ) = M! \end{aligned}$$

(74)

and

$$\begin{aligned} \delta _{x^M} ( x^N ) = 0 \quad \quad \hbox { if}\ M > N. \end{aligned}$$

(75)

Since the central difference operators commute with the spatial smoothing step with the discrete analogue of the Gaussian kernel, the responses of the discrete analogues of the Gaussian derivatives to the monomials are then obtained as

$$\begin{aligned} P_{\alpha ,k}(s)= & {} \left. T_{\text{ disc },x^{\alpha }}(\cdot ; s) * p_k(\cdot ) \, \right| _{x = 0} = \nonumber \\= & {} \left. T_{\text{ disc }} (\cdot ; s) * (\delta _{x^{\alpha }} \, p_k(\cdot )) \, \right| _{x = 0}, \end{aligned}$$

(76)

implying that

$$\begin{aligned} T_{\text{ disc },x^M}(\cdot ;\; s) * p_M(\cdot ) = M! \end{aligned}$$

(77)

and

$$\begin{aligned} T_{\text{ disc },x^M}(\cdot ;\; s) * p_N(\cdot ) = 0 \quad \quad \text{ if } M > N. \end{aligned}$$

(78)

In this respect, there is a fundamental difference between the discrete approximations of Gaussian derivatives obtained from the discrete analogues of Gaussian derivatives, compared to the sampled or the integrated Gaussian derivatives. At very fine scales, the discrete analogues of Gaussian derivatives produce much better estimates of differentiation operators, than the sampled or the integrated Gaussian derivatives.

The requirement that the Gaussian derivative operators and their discrete approximations should lead to numerically accurate derivative estimates for monomials of the same order as the order of differentiation is a natural consistency requirement for non-infinitesimal derivative approximation operators. The use of monomials as test functions, as used here, is particularly suitable in a multi-scale context, since the monomials are essentially scale-free and are not associated with any particular intrinsic scales.

3.7 Additional Performance Measures for Quantifying Deviations from Theoretical Properties of Discretizations of Gaussian Derivative Kernels

To additionally quantify the deviations between the properties of the discrete kernels, designed to approximate Gaussian derivative operators, and desirable properties of discrete kernels, that are to transfer the desirable properties of the continuous Gaussian derivatives to a corresponding discrete implementation, we will in this section quantify these deviations in terms of the following complementary error measures:

Normalization error The difference between the $l_1$-norm of the discrete kernel and the desirable $l_1$-normalization to a similar $L_1$-norm as for the continuous Gaussian derivative kernel will be measured by
$$\begin{aligned} E_{\text{ norm }}(T_{x^{\alpha }}(\cdot ;\; s)) = \frac{\Vert T_{x^{\alpha }}(\cdot ;\; s) \Vert _1}{\Vert g_{x^{\alpha }}(\cdot ;\; s) \Vert _1} - 1. \end{aligned}$$
(79)
Spatial spread measure The spatial extent of the discrete derivative approximation kernel will be measured by the entity
$$\begin{aligned} \sqrt{V(|T_{x^{\alpha }}(\cdot ;\; s)|)} \end{aligned}$$
(80)
and will be graphically compared to the spread measure $S_{\alpha }(s) = V(|g_{x^{\alpha }}(\cdot ;\; s)|)$ for a corresponding continuous Gaussian derivative kernel. Explicit expressions for the latter spread measures $S_{\alpha }(s)$ computed from continuous Gaussian derivative kernels are given in Appendix A.4.
Spatial spread measure offset To quantify the absolute deviation between the above measured spatial spread measure $\sqrt{V(|T_{x^{\alpha }}(\cdot ;\; s)|)}$ with the corresponding ideal value $V(|g_{x^{\alpha }}(\cdot ;\; s)|)$ for a continuous Gaussian derivative kernel, we will measure this offset in terms of the entity
$$\begin{aligned} O_{\alpha }(s) = \sqrt{V(|T_{x^{\alpha }}(\cdot ;\; s)|)} - \sqrt{V(|g_{x^{\alpha }}(\cdot ;\; s)|)}. \end{aligned}$$
(81)
Cascade smoothing error The deviation, from the cascade smoothing property of continuous Gaussian derivatives according to (46) and the actual result of convolving a discrete approximation of a Gaussian derivative response at a given scale with its corresponding discretization of the Gaussian kernel, will be measured by
$$\begin{aligned}{} & {} E_{\text{ cascade }}(T_{x^{\alpha }}(\cdot ;\; s)) = \nonumber \\{} & {} \quad = \frac{\Vert T_{x^{\alpha }}(\cdot ;\; 2s) - T(\cdot ;\; s) * T_{x^{\alpha }}(\cdot ;\; s) \Vert _1}{\Vert T_{x^{\alpha }}(\cdot ;\; 2s) \Vert _1}. \end{aligned}$$
(82)
For simplicity, we here restrict ourselves to the special case, when the scale parameter for the amount of incremental smoothing with a discrete approximation of the Gaussian kernel is equal to the scale parameter for the finer scale approximation of the Gaussian derivative response.^{Footnote 7}

Similarly to the previous treatment about error measures in Sect. 2.7, the normalization error and the cascade smoothing error should also be equal to zero in the ideal theoretical case. Any deviations from zero of these error measures do therefore represent a quantification of deviations from desirable theoretical properties in a discrete approximation of Gaussian derivative computations.

3.8 Numerical Quantification of Deviations from Theoretical Properties of Discretizations of Gaussian Derivative Kernels

3.8.1 $l_1$-Norms of Discrete Approximations of Gaussian Derivative Approximation Kernels

Figure 9 shows the $l_1$-norms $\Vert T_{x^{\alpha }}(\cdot ;\, s) \Vert _1$ for the different methods for approximating Gaussian derivative kernels with corresponding discrete approximations for differentiation orders up to 4, together with graphs of the $L_1$-norms $\Vert g_{x^{\alpha }}(\cdot ;\, s) \Vert _1$ of the corresponding continuous Gaussian derivative kernels.

From these graphs, we can first observe that the behaviour between the different methods differ significantly for values of the scale parameter $\sigma $ up to about 0.75, 1.25 or 1.5, depending on the order of differentiation.

For the sampled Gaussian derivatives, the $l_1$-norms tend to zero as the scale parameter $\sigma $ approaches zero for the kernels of odd order, whereas the $l_1$-norms tend to infinity for the kernels of even order. For the kernels of even order, the behaviour of the sampled Gaussian derivative kernels has the closest similarity to the behaviour of the corresponding continuous Gaussian derivatives. For the kernels of odd order, the behaviour is, on the other hand, worst.

For the integrated Gaussian derivatives, the behaviour is for the kernels of odd order markedly less singular as the scale parameter $\sigma $ tends to zero, than for the sampled Gaussian derivatives. For the kernels of even order, the behaviour does, on the other hand differ more. There is also some jaggedness behaviour at fine scales for the third- and fourth-order derivatives, caused by positive and negative values of the kernels cancelling their contribution within the support regions of single pixels.

For the discrete analogues of Gaussian derivatives, the behaviour is, qualitatively different for finer scales, in that the discrete analogues of the Gaussian derivatives tend to the basic central difference operators, as the scale parameter $\sigma $ tends to zero, and do therefore show a much more smooth behaviour as $\sigma \rightarrow 0$.

3.8.2 Spatial Spread Measures

Figure 10 shows graphs of the standard-deviation-based spatial spread measure $\sqrt{V(|T_{x^{\alpha }}(\cdot ;\; s)|)}$ according to (80), for the main classes of discretizations of Gaussian derivative kernels, together with graphs of the corresponding spatial spread measures computed for continuous Gaussian derivative kernels.

As can be seen from these graphs, the spatial spread measures differ significantly from the corresponding continuous measures for smaller values of the scale parameters; for $\sigma $ less than about 1 or 1.5, depending on the order of differentiation, and caused by the fact that too fine scales in the data cannot be appropriately resolved after a spatial discretization. For the sampled and the integrated Gaussian kernels, there is a certain jaggedness in some of the curves at fine scales, caused by interactions between the grid and the lobes of the continuous Gaussian kernels, that the discrete kernels are defined from. For the discrete analogues of the Gaussian kernels, these spatial spread measures are notably bounded from below by the corresponding measures for the central difference operators, that they approach with decreasing scale parameter.

Figure 11 shows more detailed visualizations of the deviations between these spatial spread measures and their corresponding ideal values for continuous Gaussian derivative kernels in terms of the spatial spread measure offset $O_{\alpha }(s)$ according to (81), for the different orders of spatial differentiation. The jaggedness of these curves for orders of differentiation greater than one is due to interactions between the lobes in the derivative approximation kernels and the grid. As can be seen from these graphs, the relative properties of the spatial spread measure offsets for the different discrete approximations to the Gaussian derivative operators differ somewhat, depending on the order of spatial differentiation. We can, however, note that the spatial spread measure offset for the integrated Gaussian derivative kernels is mostly somewhat higher than the spatial spread measure offset for the sampled Gaussian derivative kernels, consistent with the previous observation that the spatial box integration used for defining the integrated Gaussian derivative kernel introduces an additional amount of spatial smoothing in the spatial discretization.

3.8.3 Cascade Smoothing Errors

Figure 12 shows graphs of the cascade smoothing error $E_{\text{ cascade }}(T_{x^{\alpha }}(\cdot ;\; s))$ according to (82) for the main classes of methods for discretizing Gaussian derivative operators.

For the sampled Gaussian kernels, the cascade smoothing error is substantial for $\sigma < 0.75$ or $\sigma < 1.0$, depending on the order of differentiation. Then, for larger scale values, this error measure decreases rapidly.

For the integrated Gaussian kernels, the cascade smoothing error is lower than the cascade smoothing error for the sampled Gaussian kernels for $\sigma < 0.5$, $\sigma < 0.75$ or $\sigma < 1.0$, depending on the order of differentiation. For larger scale values, the cascade smoothing error for the integrated Gaussian kernels do, on the other hand, decrease much less rapidly with increasing scale than for the sampled Gaussian kernels, due to the additional spatial variance in the filters caused by the box integration, underlying the definition of the integrated Gaussian derivative kernels.

For the discrete analogues of the Gaussian derivatives, the cascade smoothing error should in the ideal case of exact computations lead to a zero error. In the graphs of these errors, we do, however, see a jaggedness at a very low level, caused by numerical errors.

3.9 Summary of the Characterization Results from the Theoretical Analysis and the Quantitative Performance Measures

To summarize the theoretical and the experimental results presented in this section, there is a substantial difference in the quality of the discrete approximations of Gaussian derivative kernels at fine scales:

For values of the scale parameter $\sigma $ less than about a bit below 0.75, the sampled Gaussian kernels and the integrated Gaussian kernels do not produce numerically accurate or consistent estimates of the derivatives of monomials. In this respects, these discrete approximations of Gaussian derivatives do not serve as good approximations of derivative operations at very fine scales. Within a narrow scale interval below about 0.75, the integrated Gaussian derivative kernels do, however, degenerate in a somewhat less serious manner than the sampled Gaussian derivative kernels.

For the discrete analogues of Gaussian derivatives, obtained by convolution with the discrete analogue of the Gaussian kernel followed by central difference operators, the corresponding derivative approximations are, on the other hand, exactly equal to their continuous counterparts. This property does, furthermore, hold over the entire scale range.

For larger values of the scale parameter, the sampled Gaussian kernel and the integrated Gaussian kernel do, on the other hand, lead to successively better numerical approximations of the corresponding continuous counterparts. In fact, when the value of the scale parameter is above about 1, the sampled Gaussian kernel leads to the numerically most accurate approximations of the corresponding continuous results, out of the studied three methods.

Hence, the choice between what discrete approximation to use for approximating the Gaussian derivatives, depends upon what scale ranges are important for the analysis, in which the Gaussian derivatives should be used.

In next section, we will build upon these results, and extend them further, by studying the effects of different discretization methods for the purpose of performing automatic scale selection. The motivation for studying that problem, as a benchmark proxy task for evaluating the quality of different discrete approximations of Gaussian derivatives, is that it involves explicit comparisons of feature responses at different scales.

4 Application to Scale Selection from Local Extrema Over Scale of Scale-Normalized Derivatives

When performing scale-space operations at multiple scales jointly, a critical problem concerns how to compare the responses of an image operator between different scales. Due to the scale-space smoothing operation, the amplitude of both Gaussian smoothed image data and of Gaussian derivative responses can be expected to decrease with scale. A practical problem then concerns how to compare a response of the same image operator at some coarser scale to a corresponding response at a finer scale. This problem is particularly important regarding the topic of scale selection [55], where the goal is to determine locally appropriate scale levels, to process and analyse particular image structures in a given image.

4.1 Scale-Normalized Derivative Operators

A theoretically well-founded way of performing scale normalization, to enable comparison between the responses of scale-space operations at different scales, is by defining scale-normalized derivative operators according to [48, 49]

$$\begin{aligned} \partial _{\xi } = s^{\gamma /2} \, \partial _x, \quad \quad \partial _{\eta } = s^{\gamma /2} \, \partial _y, \end{aligned}$$

(83)

where $\gamma > 0$ is a scale normalization power, to be chosen for the feature detection task at hand, and then basically replacing the regular Gaussian derivative responses by corresponding scale-normalized Gaussian derivative responses in the modules that implement visual operations on image data.

4.2 Scale Covariance Property of Scale-Normalized Derivative Responses

It can be shown that, given two images f(x, y) and $f'(x', y')$ that are related according to a uniform scaling transformation

$$\begin{aligned} x' = S \, x, \quad \quad y' = S \, y, \end{aligned}$$

(84)

for some spatial scaling factor $S > 0$, and with corresponding Gaussian derivative responses defined over the two respective image domains according to

$$\begin{aligned}&\begin{aligned} L_{\xi ^{\alpha } \eta ^{\beta }}(\cdot , \cdot ;\; s)&= \partial _{\xi ^{\alpha } \eta ^{\beta }} (g_{2\text {D}}(\cdot , \cdot ;\; s) * f(\cdot , \cdot )), \end{aligned} \end{aligned}$$

(85)

$$\begin{aligned}&\begin{aligned} L'_{{\xi '}^{\alpha } {\eta '}^{\beta }}(\cdot , \cdot ;\; s')&= \partial _{{\xi '}^{\alpha } {\eta '}^{\beta }} (g_{2\text {D}}(\cdot , \cdot ;\; s') * f'(\cdot , \cdot )), \end{aligned} \end{aligned}$$

(86)

these Gaussian derivative responses in the two domains will then be related according to [48, Equation (25)]

$$\begin{aligned} L_{\xi ^{\alpha } \eta ^{\beta }}(x, y;\; s) = S^{(\alpha + \beta )(1 - \gamma )} \, L'_{{\xi '}^{\alpha } {\eta '}^{\beta }}(x', y';\; s'), \end{aligned}$$

(87)

provided that the values of the scale parameters are matched according to [48, Equation (15)]

$$\begin{aligned} s' = S^2 \, s. \end{aligned}$$

(88)

Specifically, in the special case when $\gamma = 1$, the corresponding scale-normalized Gaussian derivative responses will then be equal

$$\begin{aligned} L_{\xi ^{\alpha } \eta ^{\beta }}(x, y;\; s) = L'_{{\xi '}^{\alpha } {\eta '}^{\beta }}(x', y';\; s'). \end{aligned}$$

(89)

4.3 Scale Selection from Local Extrema Over Scales of Scale-Normalized Derivative Responses

A both theoretically well-founded and experimentally extensively verified methodology to perform automatic scale selection, is by choosing hypotheses for locally appropriate scale levels from local extrema over scales of scale-normalized derivative responses [48, 49]. In the following, we will apply this methodology to four basic tasks in feature detection.

4.3.1 Interest Point Detection

With regard to the topic of interest point detection, consider the scale-normalized Laplacian operator [48, Equation (30)]

$$\begin{aligned} \nabla _{norm}^2 L = s \, (L_{xx} + L_{yy}), \end{aligned}$$

(90)

or the scale-normalized determinant of the Hessian [48, Equation (31)]

$$\begin{aligned} \det {{{\mathcal {H}}}}_{norm} L = s^2 \, (L_{xx} \, L_{yy} - L_{xy}^2), \end{aligned}$$

(91)

where we have here chosen $\gamma = 1$ for simplicity. It can then be shown that if we consider the responses of these operators to a Gaussian blob of size $s_0$

$$\begin{aligned} f_{\text{ blob },s_0}(x, y) = g_{2\text {D}}(x, y;\; s_0), \end{aligned}$$

(92)

for which the scale-space representation by the semi-group property of the Gaussian kernel (4) will be of the form

$$\begin{aligned} L_{\text{ blob },s_0}(x, y;\; s) = g_{2\text {D}}(x, y;\; s_0 + s), \end{aligned}$$

(93)

then the scale-normalized Laplacian response according to (90) and the scale-normalized determinant of the Hessian response according to (91) assume their global extrema over space and scale at [48, Equations (36) and (37)]

$$\begin{aligned}&\begin{aligned} ({\hat{x}}, {\hat{y}}, {\hat{s}})&= {\text {argmin}}_{(x, y;\; s)}(\nabla ^2 L_{\text{ blob },s_0})(x, y;\; s) \end{aligned}\nonumber \\&\begin{aligned}&= (0, 0, s_0), \end{aligned} \end{aligned}$$

(94)

$$\begin{aligned}&\begin{aligned} ({\hat{x}}, {\hat{y}}, {\hat{s}})&= {\text {argmax}}_{(x, y;\; s)}(\det {{{\mathcal {H}}}} L_{\text{ blob },s_0})(x, y;\; s) \end{aligned}\nonumber \\&\begin{aligned}&= (0, 0, s_0). \end{aligned} \end{aligned}$$

(95)

In this way, both a linear feature detector (the Laplacian) and a nonlinear feature detector (the determinant of the Hessian) can be designed to respond in a scale-selective manner, with their maximum response over scale at a scale that correspond to the inherent scale in the input data.

4.3.2 Edge Detection

Consider next the following idealized model of a diffuse edge [49, Equation (18)]

$$\begin{aligned} f_{\text{ edge },s_0}(x, y) = {\text {erg}}(x;\; s_0), \end{aligned}$$

(96)

where ${\text {erg}}(x;\; s_0)$ denotes the primitive function of a 1-D Gaussian kernel

$$\begin{aligned} {\text {erg}}(x;\; s_0) = \int \limits _{u = - \infty }^x g(u;\; s_0) \, du. \end{aligned}$$

(97)

Following a differential definition of edges, let us measure local scale-normalized edge strength by the scale-normalized gradient magnitude [49, Equation (15)]

$$\begin{aligned} L_{v,\text{ norm }} = s^{\gamma /2} \sqrt{L_x^2 + L_y^2}, \end{aligned}$$

(98)

which for the scale-space representation of the idealized edge model (96) leads to a response of the form

$$\begin{aligned} L_{v,\text{ norm }}(x, y;\; s) = s^{\gamma /2} \, g(x;\; s_0 + s). \end{aligned}$$

(99)

Then, it can be shown that this scale-normalized edge response will, at the spatial location of the edge at $x = 0$, assume its maximum magnitude over scale at the scale

$$\begin{aligned} {\hat{s}} = {\text {argmax}}_s L_{v,\text{ norm }}(0, 0;\; s) = s_0, \end{aligned}$$

(100)

provided that we choose the value of the scale normalization power $\gamma $ as [49, Equation (23)]

$$\begin{aligned} \gamma _{\text{ edge }} = \frac{1}{2}. \end{aligned}$$

(101)

4.3.3 Ridge Detection

Let us next consider the following idealized model of a ridge [49, Equation (52)]

$$\begin{aligned} f_{\text{ ridge },s_0}(x, y) = g(x;\; s_0). \end{aligned}$$

(102)

For a differential definition of ridges, consider a local coordinate system (p, q) aligned with the eigendirections of the Hessian matrix, such that the mixed second-order derivative $L_{pq} = 0$. Let us measure local scale-normalized ridge strength by the scale-normalized second-order derivative in the direction p [49, Equations (42) and (47)]:

$$\begin{aligned} L_{pp,\text{ norm }}= & {} s^{\gamma } L_{pp} = \nonumber \\= & {} s^{\gamma }\left( L_{xx} + L_{yy} - \sqrt{(L_{xx} - L_{yy})^2 + 4 L_{xy}^2} \right) ,\nonumber \\ \end{aligned}$$

(103)

which for the idealized ridge model (102) reduces to the form

$$\begin{aligned} L_{pp,\text{ norm }}(x, y;\; s) = s^{\gamma } \, L_{xx}(x, y;\; s) = s^{\gamma } \, g_{xx}(x;\; s_0 + s).\nonumber \\ \end{aligned}$$

(104)

Then, it can be shown that, at the spatial location of the ridge at $x = 0$, this scale-normalized ridge response will assume its maximum magnitude over scale at the scale

$$\begin{aligned} {\hat{s}} = {\text {argmax}}_s L_{pp,\text{ norm }}(0, 0;\; s) = s_0, \end{aligned}$$

(105)

provided that we choose the value of the scale normalization power $\gamma $ as [49, Equation (56)]

$$\begin{aligned} \gamma _{\text{ ridge }} = \frac{3}{4}. \end{aligned}$$

(106)

4.4 Measures of Scale Selection Performance

In the following, we will compare the results of using different ways of discretizing the Gaussian derivative operators, when applied task of performing scale selection for

Gaussian blobs of the form (92),
idealized diffuse step edges of the form (96), and
idealized Gaussian ridges of the form (102).

To quantify the performance of the different scale normalization, we will measure deviations from the ideal results in terms of:

Relative scale estimation error The difference, between a computed scale estimate ${\hat{s}}$ and the ideal scale estimate ${\hat{s}}_{\text{ ref }} = s_0$, will be measured by the entity
$$\begin{aligned} E_{\text{ scaleest,rel }}(s) = \sqrt{\frac{{\hat{s}}}{{\hat{s}}_{\text{ ref }}}} - 1. \end{aligned}$$
(107)
A motivation for measuring this entity in units of $\sigma = \sqrt{s}$ is to have the measurements in dimension of $[\text{ length}]$.

In the ideal continuous case, with the scale-space derivatives computed from continuous Gaussian derivatives, this error measure should be zero. Any deviations from zero, when computed from a discrete implementation based on discrete approximations of Gaussian derivative kernels, do therefore characterize the properties of the discretization.

4.5 Numerical Quantification of Deviations from Theoretical Properties Resulting from Different Discretizations of Scale-Normalized Derivatives

Our experimental investigation will focus on computing the relative scale estimation error for:

scale selection based on the scale-normalized Laplacian operator (90) for scale normalization power $\gamma = 1$, applied to an ideal Gaussian blob of the form (92),
scale selection based on the scale-normalized determinant of the Hessian operator (91) for scale normalization power $\gamma = 1$, applied to an ideal Gaussian blob of the form (92),
scale selection based on the scale-normalized gradient magnitude (98) for scale normalization power $\gamma = 1/2$, applied to an ideal diffuse edge of the form (96), and
scale selection based on the scale-normalized ridge strength measure (103) for scale normalization power $\gamma = 3/4$, applied to an ideal Gaussian ridge of the form (102).

With the given calibration of the scale normalization powers $\gamma $ to the specific feature detection tasks, the estimated scale level ${\hat{s}}$ will in the ideal continuous case correspond to the scale estimate reflecting the inherent scale of the feature model

$$\begin{aligned} {\hat{s}}_{\text{ ref }} = s_0, \end{aligned}$$

(108)

for all of the cases of ideal Gaussian blobs, ideal diffuse edges or ideal Gaussian ridges. This bears relationships to the matched filter theorem [88, 98], in that the scale selection mechanism will choose filters for detecting the different types of image structures image data, that as well as possible match their size.

4.5.1 Experimental Methodology

The experimental procedure, that we will follow in the following experiments, consists of:

1.
For a dense set of 50 logarithmically distributed scale levels $\sigma _{\text{ ref },i} = A_1 \, r_1^i$ within the range $\sigma \in [0.1, 4.0]$, where $r_1 > 1$, generate an ideal model signal (a Gaussian blob, a diffuse step edge or a diffuse ridge) with scale parameter $\sigma _{\text{ ref },i}$, that represents the size in dimension $[\text{ length}]$.
2.
For a dense^{Footnote 8} set of 80 logarithmically distributed scale levels $\sigma _{\text{ acc },j} = A_2 \, r_2^j$, within the range $\sigma \in [0.1, 6.0]$, where $r_2 > 1$, compute the scale-space signature, that is compute the scale-normalized response of the differential entity ${{{\mathcal {D}}}}_{norm} L$ at all scales $\sigma _{\text{ acc },j}$.
3.
Detect the local extrema over scale of the appropriate polarity (minima for Laplacian and principal curvature scale selection, and maxima for determinant of the Hessian and gradient magnitude scale selection) and select the local extremum that is closest to $\sigma _{\text{ ref },i}$.^{Footnote 9} If there is no local extremum of the right polarity, include the boundary extrema into the analysis, and do then select the global extremum out of these.
4.
Interpolate the scale value of the extremum to higher accuracy than grid spacing, by for each interior extremum fitting a second-order polynomial to the values at the central point and the values of the two adjacent neighbours. Find the extremum of the continuous polynomial, and let the scale value of the extremum of that interpolation polynomial be the scale estimate.^{Footnote 10}

Figures 13–20 show graphs of scale estimates with associated relative scale errors obtained in this way.

4.5.2 Scale Selection with the Scale-Normalized Laplacian Applied to Gaussian Blobs

From Fig. 13, which shows the scale estimates obtained by detecting local extrema over scale of the scale-normalized Laplacian operator, when applied to Gaussian blobs of different size, we see that for all the three approximation methods for Gaussian derivatives; discrete analogues of Gaussian derivatives, sampled Gaussian derivatives and integrated Gaussian derivatives, the scale estimates approach the ideal values of the fully continuous model with increasing size of the Gaussian blob used as input for the analysis.

For smaller scale values, there are, however, substantial deviations between the different methods. When the scale parameter $\sigma $ is less than about 1, the results obtained from sampled Gaussian derivatives fail to generate interior local extrema over scale. Then, the extremum detection method instead resorts to returning the minimum scale of the scale interval, implying qualitatively substantially erroneous scale estimates. For the integrated Gaussian derivatives, there is also a discontinuity in the scale selection curve, while at a lower scale level, and not leading to as low scale values as for the sampled Gaussian derivatives.

For the discrete analogue of Gaussian derivatives, the behaviour is, on the other hand, qualitatively different. Since these derivative approximation kernels tend to regular central difference operators, as the scale parameter tends to zero, their magnitude is bounded from above in a completely different way than for the sampled or integrated Gaussian derivatives. When this bounded derivative response is multiplied by the scale parameter raised to the given power, the scale-normalized feature strength measure cannot assume as high values at very finest scales, as for the sampled or integrated Gaussian derivatives. This means that the extremum over scale will be assumed at a relatively coarser scale, when the reference scale is small, compared to the cases for the sampled or the integrated Gaussian kernels.

From Fig. 14, which shows the relative scale estimation error $E_{\text{ scaleest,rel }}(\sigma )$ according to (107), we can see that when the reference scale becomes larger, the scale estimates obtained with the discrete analogues of Gaussian derivatives do, on the other hand, lead to underestimates of the scale levels, whereas the scale estimates obtained with integrated Gaussian kernels lead to overestimates of the scale levels. For $\sigma _{\text{ ref }}$ a bit greater than 1, the sampled Gaussian derivatives lead to the most accurate scale estimates for Laplacian blob detection applied to Gaussian blobs.

4.5.3 Scale Selection with the Scale-Normalized Determinant of the Hessian Applied to Gaussian Blobs

For Figs. 15–16, which show corresponding results for determinant of the Hessian scale selection applied to Gaussian blobs, the results are similar to the results for Laplacian scale selection. These results are, however, nevertheless reported here, to emphasize that the scale selection method does not only apply to feature detectors that are linear in the dependency of the Gaussian derivatives, but also to feature detectors that correspond to genuinely nonlinear combinations of Gaussian derivative responses.

4.5.4 Scale Selection with the Scale-Normalized Gradient Magnitude Applied to Diffuse Step Edges

From Fig. 17, which shows the selected scales obtained by detecting local extrema over scale of the scale-normalized gradient magnitude applied to diffuse step edges of different width, we can note that all the three discretization methods for Gaussian derivatives are here bounded from below by an inner scale. The reason why the behaviour is qualitatively different, in this case based on first-order derivatives, compared to the previous case with second-order derivatives, is that the magnitudes of the first-order derivative responses are in all these cases bounded from below. The lower bound on the scale estimates is, however, slightly higher for the discrete analogues of the Gaussian derivatives compared to the sampled or integrated Gaussian derivatives.

From Fig. 18, we can also see that the sampled Gaussian derivatives lead to slightly more accurate scale estimates than the integrated Gaussian derivatives or the discrete analogues of Gaussian derivatives, over the entire scale range.

4.5.5 Scale Selection with the Second-Order Principal Curvature Measure Applied to Diffuse Ridges

From Fig. 19, which shows the selected scales obtained by detecting local extrema over scale of the scale-normalized principal curvature response according to (105), when applied to a set of diffuse ridges of different width, we can note that the behaviour is qualitatively very similar to the previously treated second-order methods for scale selection, based on extrema over scale of either the scale-normalized Laplacian or the scale-normalized determinant of the Hessian.

There are clear discontinuities in the scale estimates obtained from sampled or integrated Gaussian derivatives, when the reference scale $\sigma _{\text{ ref }}$ goes down towards $\sigma = 1$, at slightly lower scale values for integrated Gaussian derivatives compared to the sampled Gaussian derivatives. For the discrete analogues of Gaussian derivatives, the scale estimates are bounded from below at the finest scales, whereas there are underestimates in the scale values near above $\sigma = 1$. Again, for scale values above about 1, the sampled Gaussian derivatives lead to results that are closest to those obtained in the ideal continuous case.

4.6 Summary of the Evaluation on Scale Selection Experiments

To summarize the results from this investigation, the sampled Gaussian derivatives lead to the most accurate scale estimates, when the reference scale $\sigma _{\text{ ref }}$ of the image features is somewhat above 1. For lower values of the reference scale, the behaviour is, on the other hand, qualitatively different for the scale selection methods that are based on second-order derivatives, or what could be expected more generally, derivatives of even order. When the scale parameter tends to zero, the strong influence from the fact that the derivatives of even order of the continuous Gaussian kernel tend to infinity at the origin, when the scale parameter tends to zero, implies that the scale selection methods based on either sampled or integrated Gaussian derivatives lead to singularities when the reference scale is sufficiently low (below 1 for the second-order derivatives in the above experiment).

If aiming at handling image data with discrete approximations of Gaussian derivatives of even order for very low scale values, it seems natural to then consider alternative discretization approaches, such as the discrete analogues of Gaussian derivatives. The specific lower bound on the scale values may, however, be strongly dependent upon what tasks the Gaussian derivative responses are to be used for, and also upon the order differentiation.

5 Discrete Approximations of Directional Derivatives

When operating on a 2-D spatial scale-space representation, generated by convolutions with either rotationally symmetric Gaussian kernels according to (2) or affine Gaussian kernels,^{Footnote 11} it is often desirable to compute image features in terms of local directional derivatives.

Given an image orientation $\varphi $ and its orthogonal direction $\bot \varphi = \varphi + \pi /2$, we can express directional derivatives along these directions in terms of partial derivative operators $\partial _x$ and $\partial _y$ along the x- and y-directions, respectively, according to

$$\begin{aligned}&\begin{aligned} \partial _{\varphi } = \cos \varphi \, \partial _x + \sin \varphi \, \partial _y, \end{aligned} \end{aligned}$$

(109)

$$\begin{aligned}&\begin{aligned} \partial _{\bot \varphi } = - \sin \varphi \, \partial _x + \cos \varphi \, \partial _y. \end{aligned} \end{aligned}$$

(110)

Higher-order directional derivatives of the scale-space representation can then be defined according to

$$\begin{aligned} L_{\varphi ^{m_1} \bot \varphi ^{m_2}} = \partial _{\varphi }^{m_1} \, \partial _{\bot \varphi }^{m_2} \, L, \end{aligned}$$

(111)

where L here denotes either a scale-space representation based on convolution with a rotationally symmetric Gaussian kernel according to (2), or convolution with an affine Gaussian kernel.

Image representations of this form are useful for modelling filter bank approaches, for either purposes in classical computer vision or in deep learning. It has also been demonstrated that the spatial components of the receptive fields of simple cells in the primary visual cortex of higher mammals can be modelled qualitatively reasonably well in terms of such directional derivatives combined with spatial smoothing using affine Gaussian kernels, then with the orientations $\varphi $ and $\bot \varphi $ of the directional derivatives parallel to the orientations corresponding to the eigendirections of the affine spatial covariance matrix $\varSigma $, that underlies the definition of affine Gaussian kernels (see Equation (23) in Lindeberg [56]).

5.1 Small-Support Directional Derivative Approximation Masks

If attempting to compute filter bank responses in terms of directional derivatives in different image directions, and of different orders of spatial differentiation, the amount of computational work will, however, grow basically linearly with the number of different combinations of the orders $m_1$ and $m_2$ of spatial differentiation and the different image orientations $\varphi $, if we use an underlying filter basis in terms of Gaussian derivative responses along the coordinate axes based on the regular Gaussian scale-space concept formed from convolutions with the rotationally symmetric Gaussian kernel. If we instead base the filter banks on elongated affine Gaussian kernels, the amount of computational work will grow even more, since a non-separable convolution would then have to be performed for each image orientation, each combination of the scale parameters $\sigma _1$ and $\sigma _2$, and for each order of spatial differentiation, as determined by the parameters $m_1$ and $m_2$.

If we, on the other hand, base the analysis on the discrete scale-space concept, by which derivative approximations can be computed from the raw discrete scale-space representation, by applying small-support central difference masks, then the amount of computational work can be decreased substantially, since then for each new combination of the orders $m_1$ and $m_2$ of differentiation, we only need to apply a new small-support discrete filter mask. In the case, when the underlying scale-space representation is based on convolutions the rotationally symmetric Gaussian kernel, we can use the same underlying, once and for all spatially smoothed image as the input for computing filter bank responses for all the possible orientations. In the case, when the underlying scale-space representation is instead based on convolutions with affine Gaussian kernels, we do, of course, have to redo the underlying spatial smoothing operation for each combination of the parameters $\sigma _1$, $\sigma _2$ and $\varphi $. We can, however, nevertheless reuse the same underlying spatially smoothed image for all the combinations of the orders $m_1$ and $m_2$ of spatial differentiation.

5.2 Method for Defining Discrete Directional Derivative Approximation Masks

To define a discrete derivative approximation mask $\delta _{\varphi ^{m_1} \bot \varphi ^{m_2}}$, for computing an approximation of the directional derivative $L_{\varphi ^{m_1} \bot \varphi ^{m_2}}$ from an already smoothed scale-space representation L according to

$$\begin{aligned} L_{\varphi ^{m_1} \bot \varphi ^{m_2}} = \delta _{\varphi ^{m_1} \bot \varphi ^{m_2}} L, \end{aligned}$$

(112)

for a given image orientation $\varphi $ and two orders $m_1$ and $m_2$ of spatial differentiation along the directions $\varphi $ and $\bot \varphi $, respectively, we can proceed as follows:

1.
Combine the continuous directional derivative operators (109) and (110) to a joint directional derivative operator of the form:
$$\begin{aligned} \partial _{\varphi ^{m_1} \bot \varphi ^{m_2}} = \partial _{\varphi }^{m_1} \, \partial _{\bot \varphi }^{m_2}. \end{aligned}$$
(113)
2.
Expand the operator (113) by formal operator calculus over (109) and (110) to an expanded representation in terms of a linear combination of partial derivative operators $\partial _{x^{\alpha } y^{\beta }}$ along the Cartesian coordinate directions of the form:
$$\begin{aligned} \partial _{\varphi ^{m_1} \bot \varphi ^{m_2}} = \sum _{k = 0}^{m_1 + m_2} w_{k}^{(m_1, m_2)}(\varphi ) \, \partial _{x^k y^{m_1 + m_2 - k}}, \end{aligned}$$
(114)
where the directional weight functions $w_{k}^{(m_1, m_2)}(\varphi )$ are polynomials in terms of $\cos \varphi $ and $\sin \varphi $.
3.
Transfer the partial directional derivative operator $\partial _{\varphi ^{m_1} \bot \varphi ^{m_2}}$ to a corresponding directional derivative approximation mask $\delta _{\varphi ^{m_1} \bot \varphi ^{m_2}}$, while simultaneously transferring all the Cartesian partial derivative operators $\partial _{x^{\alpha } y^{\beta }}$ to corresponding discrete derivative approximation masks $\delta _{x^{\alpha } y^{\beta }}$, which leads to:
$$\begin{aligned} \delta _{\varphi ^{m_1} \bot \varphi ^{m_2}} = \sum _{k = 0}^{m_1 + m_2} w_{k}^{(m_1, m_2)}(\varphi ) \, \delta _{x^k y^{m_1 + m_2 - k}}. \end{aligned}$$
(115)

In this way, we obtain explicit expressions for compact discrete directional derivative approximation masks, as depending on the orders $m_1$ and $m_2$ of spatial differentiation and the image direction $\varphi $.

Figure 21 shows corresponding equivalent affine Gaussian derivative approximation kernels, computed according to this scheme, by applying small-support directional derivative approximation masks of these forms to a sampled affine Gaussian kernel, as parameterized according to the form in Appendix A.7.

Please note, however, that the resulting kernels obtained in this way, are not in any way intended to be applied to actual image data. Instead, their purpose is just to illustrate the equivalent effect of first convolving the input image with a discrete approximation of the Gaussian kernel, and then applying a set of small-support directional derivative approximation masks, for different combinations of the spatial orders of differentiation, to the spatially smoothed image data. In situations when combinations of multiple orders of spatial differentiation are to be used in a computer vision system, for example, in applications involving filter banks, this form of discrete implementation will be computationally much more efficient, compared to applying a set of large-support filter kernels to the same image data.

By the central difference operators $\delta _{x^{\alpha } y^{\beta }}$ constituting numerical discrete approximations of the corresponding partial derivative operators $\partial _{x^{\alpha } y^{\beta }}$, it follows that the directional derivative approximation mask $\delta _{\varphi ^{m_1} \bot \varphi ^{m_2}}$ will be a numerical approximation of the continuous directional derivative operator $\partial _{\varphi ^{m_1} \bot \varphi ^{m_2}}$. Thereby, the discrete analogue of the directional derivative operator according to (112), from a discrete approximation L of the scale-space representation of an input image f, will constitute a numerical approximation of the corresponding continuous directional derivative of the underlying continuous image, provided that the input image has been sufficiently well sampled, and provided that the discrete approximation of scale-space smoothing is a sufficiently good approximation of the corresponding continuous Gaussian smoothing operation.

In practice, the resulting directional derivative masks will be of size $3 \times 3$ for first- and second-order derivatives and of size $5 \times 5$ for third- and fourth-order derivatives. Thus, once the underlying code for expressing these relationships has been written, these directional derivative approximation masks are extremely easy and efficient to apply in practice.

Appendix A.8 gives explicit expressions for the resulting discrete directional derivative approximation masks for spatial differentiation orders up to 4, whereas Appendix A.9 gives explicit expressions for the underlying Cartesian discrete derivative approximation masks up to order 4 of spatial differentiation.

The conceptual construction of compact directional derivative approximation masks performed in this way generalizes the notion of steerable filters [9, 25, 29, 70, 71, 80] to a wide class of filter banks, that can be computed in a very efficient manner, once an initial smoothing stage by scale-space filtering, or some approximation thereof, has been computed.

5.3 Scale-Space Properties of Directional Derivative Approximations Computed by Applying Small-Support Directional Derivative Approximation Masks to Smoothed Image Data

Note, in particular, that if we compute discrete approximations of directional derivatives based on a discrete scale-space representation computed using the discrete analogue of the Gaussian kernel according to Sect. 2.6, then discrete scale-space properties will hold also for the discrete approximations of directional derivatives, in the sense that: (i) cascade smoothing properties will hold between directional derivative approximations at different scales, and (ii) the discrete directional derivative approximations will obey non-enhancement of local extrema with increasing scale.

6 Summary and Conclusions

We have presented an in-depth treatment of different ways of discretizing the Gaussian smoothing operation and the computation of Gaussian derivatives, for purposes in scale-space analysis and deep learning. Specifically, we have considered the following three main ways of discretizing the basic scale-space operations, in terms of either:

sampling the Gaussian kernel and the Gaussian derivative kernels,
integrating the Gaussian kernel and the Gaussian derivative kernels over the support regions of the pixels, or
using a genuinely discrete scale-space theory, based on convolutions with the discrete analogue of the Gaussian kernel, complemented with derivative approximations computed by applying small-support central difference operators to the spatially smoothed image data.

To analyse the properties of these different ways of discretizing the Gaussian smoothing and Gaussian derivative computation operations, we have in Sect. 2 defined a set of quantifying performance measures, for which we have studied their behaviour as function of the scale parameter from very low to moderate levels of scale.

Regarding the purely spatial smoothing operation, the discrete analogue of the Gaussian kernel stands out as having the best theoretical properties over the entire scale range, from scale levels approaching zero to coarser scales. The results obtained from the sampled Gaussian kernel may deviate substantially from their continuous counterparts, when the scale parameter $\sigma $ is less than about 0.5 or 0.75. For $\sigma $ greater than about 1, the sampled Gaussian kernel does, on the other hand, lead to numerically very good approximations of results obtained from the corresponding continuous theory.

Regarding the computation of Gaussian derivative responses, we do also in Sects. 3 and 4 find that, when applied to polynomial input to reveal the accuracy of the numerical approximations, the sampled Gaussian derivative kernels and the integrated Gaussian derivative kernels do not lead to numerically accurate or consistent derivative estimates, when the scale parameter $\sigma $ is less than about 0.5 or 0.75. The integrated Gaussian kernels degenerate in somewhat less strong ways, for very fine scale levels than the sampled Gaussian derivative kernels, implying that the integrated Gaussian derivative kernels may have better ability to handle very fine scales than the sampled Gaussian derivative kernels. At coarser scales, the integrated Gaussian kernels do, on the other hand, lead to numerically less accurate estimates of the corresponding continuous counterparts, than the sampled Gaussian derivative kernels.

At very fine levels of scale, the discrete analogues of the Gaussian kernels stand out as giving the numerically far best estimates of derivative computations for polynomial input. When the scale parameter $\sigma $ exceeds about 1, the sampled Gaussian derivative kernels do, on the other hand, lead to the numerically closest estimates to those obtained from the fully continuous theory.

The fact that the sampled Gaussian derivative kernels for sufficiently coarse scales lead to the closest approximations of the corresponding fully continuous theory should, however, not preclude from basing the analysis on the discrete analogues of Gaussian derivatives at coarser scales. If necessary, deviations between the results obtained from the discrete analogues of Gaussian derivatives and the corresponding fully continuous theory can, in principle, be compensated for by complementary calibration procedures, or by deriving corresponding genuinely discrete analogues of the relevant entities in the analysis. Additionally, in situations when a larger number of Gaussian derivative responses are to be computed simultaneously, this can be accomplished with substantially higher computational efficiency, if basing the scale-space analysis on the discrete analogue of the Gaussian kernel, which only involves a single spatial smoothing stage of large spatial support, from which each derivative approximation can then be computed using a small-support central difference operator.

As a complement to the presented methodologies of discretizing Gaussian smoothing and Gaussian derivative computations, we have also in Sect. 5 presented a computationally very efficient ways of computing directional derivatives of different orders and of different orientations, which is highly useful for computing filter bank type responses for different purposes in computer vision. When using the discrete analogue of the Gaussian kernel for smoothing, the presented discrete directional derivative approximation masks can be applied at any scale. If using either sampled Gaussian kernels or integrated Gaussian kernels for spatial smoothing, including extensions of rotationally symmetric kernels to anisotropic affine Gaussian kernels, the discrete derivative approximation masks can be used, provided that the scale parameter is sufficiently large in relation to the desired accuracy of the resulting numerical approximation.

Concerning the orders of spatial differentiation, we have in this treatment, for the purpose of presenting explicit expressions and quantitative experimental results, limited ourselves to spatial derivatives up to order 4. A motivation for this choice is the observation by Young [100, 101] that receptive fields up to order 4 have been observed in the primary visual cortex of higher mammals, why this choice should then cover a majority of the intended use cases.

It should be noted, however, that an earlier version of the theory for discrete derivative approximations, based on convolutions with the discrete analogue of the Gaussian kernel followed by central difference operators, has, however, been demonstrated to give useful results with regard to the sign of differential invariants that depend upon derivatives up to order 5 or 6, for purposes of performing automatic scale selection, when detecting edges or ridges from spatial image data [49]. Hence, provided that appropriate care is taken in the design of the visual operations that operate on the image data, this theory could also be applied for higher orders of spatial differentiation.

6.1 Extensions of the Approach

Concerning extensions of the approach, with regard to applications in deep learning, for which the modified Bessel functions $I_n(s)$, underlying the definition of the discrete analogue of the Gaussian kernel $T_{\text{ disc }}(n;\; s)$ according to (26), are currently generally not available in standard frameworks for deep learning, a possible alternative approach consists of instead replacing the previously treated sampled Gaussian derivative kernels $T_{\text{ sampl },x^{\alpha }}(n;\; s)$ according to (53) or the integrated Gaussian derivative kernels $T_{\text{ int },x^{\alpha }}(n;\; s)$ according to (54) by the families of hybrid discretization approaches obtained by: (i) first smoothing the image with either the normalized sampled Gaussian kernel $T_{\text{ normsampl }}(n;\; s)$ according to (19) or the integrated Gaussian kernel $T_{\text{ int }}(n;\; s)$ according to (20), and then applying central difference operators $\delta _{x^{\alpha }}$ of the form (60) to the spatially smoothed data.

When multiple Gaussian derivative responses of different orders are to be computed at the same scale level, such an approach would, in analogy to the previously treated discretization approach, based on first smoothing the image with the discrete analogue of the Gaussian kernel $T_{\text{ disc }}(n;\; s)$ according to (26) and then applying central difference operators $\delta _{x^{\alpha }}$ of the form (60) to the spatially smoothed data, resulting in equivalent discrete derivative approximation kernels $T_{\text{ disc },x^{\alpha }}(n;\, s)$ according to (64), to also be computationally much more efficient, compared to explicit smoothing with a set of either sampled Gaussian derivative kernels $T_{\text{ sampl },x^{\alpha }}(n;\; s)$ according to (53) or integrated Gaussian derivative kernels $T_{\text{ int },x^{\alpha }}(n;\; s)$ according to (54).

In terms of equivalent convolution kernels for the resulting hybrid discretization approaches, the corresponding discrete derivative approximation kernels for these classes of kernels will then be given by

$$\begin{aligned}&\begin{aligned} T_{\text{ hybr-sampl },x^{\alpha }}(n;\; s)&= (\delta _{x^{\alpha }} T_{\text{ normsampl }})(n;\; s), \end{aligned} \end{aligned}$$

(116)

$$\begin{aligned}&\begin{aligned} T_{\text{ hybr-int },x^{\alpha }}(n;\; s)&= (\delta _{x^{\alpha }} T_{\text{ int }})(n;\; s), \end{aligned} \end{aligned}$$

(117)

with $T_{\text{ normsampl }}(n;\; s)$ and $T_{\text{ int }}(n;\; s)$ according to (19) and (20), respectively.^{Footnote 12}

Such an approach is in a straightforward way compatible with learning of the scale parameters by back propagation, based on automatic differentiation in deep learning environments. It would be conceptually very straightforward to extend the theoretical framework and the experimental evaluations presented in this paper to incorporating also a detailed analysis of these two additional classes of discrete derivative approximations for Gaussian derivative operators of hybrid type. For reasons of space constraints, we have, however, not been able to include corresponding in-depth analyses of those additional discretization methods here.^{Footnote 13}

Concerning the formulation of discrete approximations of affine Gaussian derivative operators, it would also be straightforward to extend the framework in Sect. 5 to replacing the initial spatial smoothing step, based on convolution with the sampled affine Gaussian derivative kernel, by instead using an integrated affine Gaussian derivative kernel, with its filter coefficients of the form

$$\begin{aligned}{} & {} T_{\text{ affint }}(m, n;\; \sigma _1, \sigma _2, \varphi ) = \nonumber \\{} & {} \quad = \int \limits _{x = m-1/2}^{m+1/2} \int \limits _{y = n-1/2}^{n+1/2} g_{\text{ aff }}(x, y;\; \sigma _1, \sigma _2, \varphi ) \, \hbox {d}x \, dy, \end{aligned}$$

(118)

with $g_{\text{ aff }}(x, y;\; \sigma _1, \sigma _2, \varphi )$ denoting the continuous affine Gaussian kernel according to (163) and (164), with the spatial scale parameters $\sigma _1$ and $\sigma _2$ in the two orthogonal principal directions of affine Gaussian kernel with orientation $\varphi $, and where the integral can in a straightforward way be approximated by numerical integration.

In analogy with the previously presented results, regarding the spatially isotropic Gaussian scale-space representation, based on discrete approximations of rotationally symmetric Gaussian kernels, for which the spatial covariance matrix $\varSigma $ in the matrix-based formulation of the affine Gaussian kernel $g_{\text{ aff }}(p;\; \varSigma )$ according to (155) is equal to the identity matrix I, such an approach, based on integrated affine Gaussian kernels, could be expected to have clear advantages compared to the sampled affine Gaussian kernel $T_{\text{ affsampl }}(m, n;\; \sigma _1, \sigma _2,\varphi )$ according to (165), for very small values of the spatial scale parameters $\sigma _1$ and $\sigma _2$.

A third line of extensions concerns to evaluate the influence with regard to performance of using the different types of treated discrete approximations of the Gaussian derivative operators, when used in specific computer vision algorithms and/or deep learning architectures, which we will address in future work.

Notes

For a number of clarifications with respect to an evaluation of what Rey-Otero and Delbracio [73] refer to as “Lindeberg’s smoothing method” in that work, see Appendix A.10.
In the implementations underlying this work, we truncate the Gaussian kernel at the tails, such that $2 \int _{x = N}^{\infty } g(x;\; s) \, \hbox {d}x \le \epsilon $, for a small value of $\epsilon $, of the order of $10^{-8}$. The reason for using such a small value of $\epsilon $ in our experiments is to ensure that the experimental results are not negatively affected by truncation effects. For practical implementations of Gaussian smoothing, one can often use a larger value of $\epsilon $, chosen so as to give a suitable trade-off between computational accuracy and computational efficiency.
When implementing this operation in practice, the infinite sum is replaced by a finite sum $E_{\text{ norm }}(T(\cdot ;\; s)) = \sum _{n = - N}^{N} T(n;\; s) - 1$, with the truncation bound N chosen such that $2 \int \limits _{x = N}^{\infty } g(x;\; s) \, \hbox {d}x \le \epsilon $ for a small $\epsilon $ chosen as $10^{-8}$, according to Footnote 2.
The motivation for using both an absolute and a relative error measure in this context, for the purpose of quantifying the differences between the estimated and the specified scale values for the discrete approximations of the Gaussian kernels, is that the definition of the integrated Gaussian kernel is associated with a specific type of scale offset, that can be seen as genuinely additive at coarse levels of scale, whereas for the purpose of comparing the scale differences computed at different scales, it may be more appropriate to quantify how large the scale differences are in relation to the scales at which they are actually computed.
It could, indeed, be interesting, to also investigate the cascade smoothing error for incremental smoothing for smaller values of $\varDelta s$.
More generally, the influence of the Gaussian smoothing operation on polynomial input can be described by diffusion polynomials, as described in Appendix A.5. Given a monomial $p_k(x) = x^k$ as input, the result of convolving this input signal with a Gaussian kernel is described by a diffusion polynomial $q_k(x;\, s)$, with explicit expressions for these up to order 4 in Eqs. (149)–(153). By combining such diffusion polynomials with spatial differentiation operations, we can obtain closed-form expressions for the results of applying Gaussian derivative operators to any polynomial.
Notably, it could, however, also be of interest to study these effects deeper for the case when the scale of the incremental smoothing is significantly lower than the scale of the discrete approximation of the Gaussian derivative response, which we, however, leave to future work.
In these experiments, we use a much denser distribution of the scale levels than would be used in an actual feature detection algorithm with automatic scale selection. The motivation for using such a dense distribution here, is to obtain accurate performance curves, that are not limited in accuracy because of a sparse scale sampling, but instead reflect the inherent properties of the underlying computational mechanisms.
In practice, this selection of a closest point should not be regarded as that important, since in most of the cases, there will be a dominant local extremum of the appropriate type. This selection is, however, a precaution, in case there would be multiple local extrema.
This is a similar scale refinement mechanism as used in the fully integrated algorithms for automatic scale selection in [48, 49].
See Appendix A.6 for a description of the affine Gaussian scale-space concept that underlies the definition of affine Gaussian kernels.
In practice, these explicit forms for the derivative approximation kernels would, however, never be used, since it is computationally much more efficient to instead first perform the spatial smoothing operation in an initial processing step, and then applying different combinations of discrete derivative approximations, in situations when multiple spatial derivatives of different orders are to be computed at the same scale level. With regard to theoretical analysis of the properties of these hybrid discretization approaches, the corresponding equivalent convolution kernels are, however, important when characterizing the properties of these methods.
In particular, regarding the theoretical properties of these hybrid discretization approaches, it should be mentioned that, due to the approximation of the spatial derivative operators $\partial _{x^{\alpha }}$ by the central difference operators $\delta _{x^{\alpha }}$, in combination with the unit $l_1$-norm normalization of the corresponding spatial smoothing operations, the hybrid derivative approximation kernels $T_{\text{ hybr-sampl },x^{\alpha }}$ and $T_{\text{ hybr-int },x^{\alpha }}(n;\; s)$ according to (116) and (117) will obey similar response properties (77) and (78) to monomial input, as the previously treated approach, based on the combination of spatial smoothing using the discrete analogue of the Gaussian kernel with central difference operators $T_{\text{ disc },x^{\alpha }}(n;\; s)$ according to (64). In other words, the hybrid kernels $T_{\text{ hybr-sampl },x^{\alpha }}(n;\; s)$ and $T_{\text{ hybr-int },x^{\alpha }}(n;\; s)$ according to (116) and (117) will with $p_k(x) = x^k$ obey $T_{\text{ hybr-sampl },x^M}(\cdot ;\; s) * p_M(\cdot ) = M!$ and $T_{\text{ hybr-int },x^M}(\cdot ;\; s) * p_M(\cdot ) = M!$, as well as $T_{\text{ hybr-sampl },x^M}(\cdot ;\; s) * p_N(\cdot ) = 0$ and $T_{\text{ hybr-int },x^M}(\cdot ;\; s) * p_N(\cdot ) = 0$ for $M > N$. In these respects, these hybrid discretization approaches $T_{\text{ hybr-sampl },x^{\alpha }}(n;\; s)$ and $T_{\text{ hybr-int },x^{\alpha }}(n;\; s)$ could be expected to constitute better approximations of Gaussian derivative operators at very fine scales, than their corresponding non-hybrid counterparts, $T_{\text{ normsampl },x^{\alpha }}(n;\; s)$ and $T_{\text{ int },x^{\alpha }}(n;\; s)$ according to (19) and (20). Simultaneously, these hybrid approaches will also be computationally much more efficient than their non-hybrid counterparts, in situations where Gaussian derivatives of multiple orders are to be computed at the same scale.
For very small values of the scale parameter, the spatial smoothing with the normalized sampled Gaussian kernel $T_{\text{ normsampl },x^{\alpha }}(n;\; s)$ can, however, again be expected to lead to systematically too small amounts of spatial smoothing (see Fig. 3). For larger values of the scale parameter, the spatial smoothing with the integrated Gaussian kernel $T_{\text{ int },x^{\alpha }}(n;\; s)$ would, however, be expected to lead to a scale offset (see Fig. 3), influenced by the variance of a box filter over each pixel support region. Furthermore, these hybrid approaches will not be guaranteed to obey information reducing properties from finer to coarser levels of scale, in terms of either non-creation of new local extrema, or non-enhancement of local extrema, as the discrete analogues of Gaussian derivatives $T_{\text{ disc },x^{\alpha }}(n;\; s)$ according to (64) obey. In situations, where the modified Bessel functions of integer order are immediately available, the approach, based on the combination of spatial smoothing with the discrete analogue of the Gaussian kernel with central differences, should therefore be preferable in relation to these hybrid approaches. In situations, where the modified Bessel functions of integer order are, however, not available as full-fledged primitive in a deep learning framework, these hybrid approaches could, on the other hand, be considered as interesting alternatives to the regular sampled or integrated Gaussian derivative kernels $T_{\text{ sampl },x^{\alpha }}(n;\; s)$ and $T_{\text{ int },x^{\alpha }}(n;\; s)$, because of their substantially higher computational efficiency, in situations when spatial derivatives of multiple orders are to be computed at the same scale.
What remains to explore, is how these hybrid discretization approaches compare to the previously treated three main classes of discretization methods, with respect to the set of quantitative performance measures defined and then evaluated experimentally in Sects. 3.7–3.8 and Sects. 4.4–4.5 (see [60] for a treatment about this topic), as well as with respect to integration in different types of computer vision and/or deep learning algorithms.
Note that since these kernels are symmetric, we can avoid the compensation with respect to the mean values.

References

Abramowitz, M., Stegun, I.A. (eds.) Handbook of Mathematical Functions. Applied Mathematics Series, 55th edn. National Bureau of Standards, Gaithersburg (1964)
Åström, K., Heyden, A.: Stochastic analysis of image acquisition and scale-space smoothing. In: Sporring, J., Nielsen, M., Florack, L., Johansen, P. (eds.) Gaussian Scale-Space Theory: Proceedings of PhD School on Scale-Space Theory, pp. 129–136. Springer, Heidelberg (1997)
Chapter Google Scholar
Athalye, A., Engstrom, L., Ilyas, A., Kwok, K.: Synthesizing robust adversarial examples. In: International Conference on Machine Learning (ICML 2018), pp. 284–293 (2018)
Babaud, J., Witkin, A.P., Baudin, M., Duda, R.O.: Uniqueness of the Gaussian kernel for scale-space filtering. IEEE Trans. Pattern Anal. Mach. Intell. 8(1), 26–33 (1986)
Article Google Scholar
Baker, N., Lu, H., Erlikhman, G., Kellman, P.J.: Deep convolutional networks do not classify based on global object shape. PLoS Comput. Biol. 14(12), e1006613 (2018)
Article Google Scholar
Ballester, C., Gonzalez, M.: Affine invariant texture segmentation and shape from texture by variational methods. J. Math. Imaging Vis. 9, 141–171 (1998)
Article MathSciNet Google Scholar
Baumberg, A.: Reliable feature matching across widely separated views. In: Proceedings of Computer Vision and Pattern Recognition (CVPR’00), pp. I:1774–1781 (2000)
Bay, H., Ess, A., Tuytelaars, T., van Gool, L.: Speeded up robust features (SURF). Comput. Vis. Image Underst. 110(3), 346–359 (2008)
Article Google Scholar
Beil, W.: Steerable filters and invariance theory. Pattern Recogn. Lett. 15(5), 453–460 (1994)
Article Google Scholar
Bekkers, E. J.: B-spline CNNs on Lie groups. In: International Conference on Learning Representations (ICLR 2020) (2020)
Bouma, H., Vilanova, A., Bescós, J. O., ter Haar Romeny, B., Gerritsen, F. A.: Fast and accurate Gaussian derivatives based on B-splines. In: Proceedings of Scale Space and Variational Methods in Computer Vision (SSVM 2007), vol. 4485, pp. 406–417, Springer LNCS (2007)
Bretzner, L., Lindeberg, T.: Feature tracking with automatic selection of spatial scales. Comput. Vis. Image Underst. 71(3), 385–392 (1998)
Article Google Scholar
Burt, P.J., Adelson, E.H.: The Laplacian pyramid as a compact image code. IEEE Trans. Communications 9(4), 532–540 (1983)
Article Google Scholar
Charalampidis, D.: Recursive implementation of the Gaussian filter using truncated cosine functions. IEEE Trans. Signal Process. 64(14), 3554–3565 (2016)
Article MathSciNet Google Scholar
Chomat, O., de Verdiere, V., Hall, D., Crowley, J.: Local scale selection for Gaussian based description techniques. In: Proceedings of European Conf. on Computer Vision (ECCV 2000), Springer LNCS, vol. 1842, pp. 117–133, Dublin, Ireland (2000). Springer
Crowley, J.L., Riff, O.: Fast computation of scale normalised Gaussian receptive fields. In: Griffin, L., Lillholm, M. (eds.) Proceedings of Scale-Space Methods in Computer Vision (Scale-Space’03), Springer LNCS, vol. 2695, pp. 584–598. Springer, Isle of Skye, Scotland (2003)
Crowley, J.L., Stern, R.M.: Fast computation of the Difference of Low Pass Transform. IEEE Trans. Pattern Anal. Mach. Intell. 6(2), 212–222 (1984)
Article Google Scholar
Dai, J., Jin, S., Zhang, J., Nguyen, T.Q.: Boosting feature matching accuracy with pairwise affine estimation. IEEE Trans. Image Process. 29, 8278–8291 (2020)
Article Google Scholar
Deriche, R.: Recursively implementing the Gaussian and its derivatives. In: Proceedings of International Conference on Image Processing (ICIP’92), pp. 263–267 (1992)
Duits, R., Florack, L., de Graaf, J., ter Haar Romeny, B.: On the axioms of scale space theory. J. Math. Imaging Vis. 20(3), 267–298 (2004)
Article MathSciNet Google Scholar
Eichhardt, I., Chetverikov, D.: Affine correspondences between central cameras for rapid relative pose estimation. In: Proceedings of European Conference on Computer Vision (ECCV 2018), Springer LNCS, vol. 11210, pp. 482–497 (2018)
Farnebäck, G., Westin, C.-F.: Improving Deriche-style recursive Gaussian filters. J. Math. Imaging Vis. 26(3), 293–299 (2006)
Article MathSciNet Google Scholar
Fedorov, V., Arias, P., Sadek, R., Facciolo, G., Ballester, C.: Linear multiscale analysis of similarities between images on Riemannian manifolds: Practical formula and affine covariant metrics. SIAM J. Imag. Sci. 8(3), 2021–2069 (2015)
Article MathSciNet Google Scholar
Florack, L.M.J.: Image Structure. Series in Mathematical Imaging and Vision. Springer, Cham (1997)
Book Google Scholar
Freeman, W.T., Adelson, E.H.: The design and use of steerable filters. IEEE Trans. Pattern Anal. Mach. Intell. 13(9), 891–906 (1991)
Article Google Scholar
Gavilima-Pilataxi, H., Ibarra-Fiallo, J.: Multi-channel Gaussian derivative neural networks for crowd analysis. In: Proceedings of International Conference on Pattern Recognition Systems (ICPRS 2023), pp. 1–7 (2023)
Geusebroek, J.-M., Smeulders, A.W.M., van de Weijer, J.: Fast anisotropic Gauss filtering. IEEE Trans. Image Process. 12(8), 938–943 (2003)
Article MathSciNet Google Scholar
Giannarou, S., Visentini-Scarzanella, M., Yang, G.-G.: Probabilistic tracking of affine-invariant anisotropic regions. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 130–143 (2013)
Article Google Scholar
Hel-Or, Y., Teo, P.C.: Canonical decomposition of steerable functions. J. Math. Imaging Vis. 9(1), 83–95 (1998)
Article MathSciNet Google Scholar
Hendrycks, D., Zhao, K., Basart, S., Steinhardt, J., Song, D.: Natural adversarial examples. In: Proceedings of Computer Vision and Pattern Recognition (CVPR 2021), pp. 15262–15271 (2021)
Iijima, T.: Basic theory on normalization of pattern (in case of typical one-dimensional pattern). Bull. Electrotech. Lab. 26, 368–388 (1962). ((in Japanese))
Google Scholar
Jacobsen, J.-J., van Gemert, J., Lou, Z., Smeulders, A. W. M.: Structured receptive fields in CNNs. In: Proceedings of Computer Vision and Pattern Recognition (CVPR 2016), pp. 2610–2619 (2016)
Keilmann, A., Godehardt, M., Moghiseh, A., Redenbach, C., Schladitz, K.: Improved anisotropic Gaussian filters. arXiv preprint arXiv:2303.13278, (2023)
Koenderink, J.J.: The structure of images. Biol. Cybern. 50(5), 363–370 (1984)
Article MathSciNet Google Scholar
Koenderink, J.J., van Doorn, A.J.: Representation of local geometry in the visual system. Biol. Cybern. 55(6), 367–375 (1987)
Article MathSciNet Google Scholar
Koenderink, J.J., van Doorn, A.J.: Generic neighborhood operators. IEEE Trans. Pattern Anal. Mach. Intell. 14(6), 597–605 (1992)
Article Google Scholar
Lampert, C.H., Wirjadi, O.: An optimal nonorthogonal separation of the anisotropic Gaussian convolution filter. IEEE Trans. Image Process. 15(11), 3501–3513 (2006)
Article MathSciNet Google Scholar
Lazebnik, S., Schmid, C., Ponce, J.: A sparse texture representation using local affine regions. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1265–1278 (2005)
Article Google Scholar
Li, O., Shui, P.-L.: Subpixel blob localization and shape estimation by gradient search in parameter space of anisotropic Gaussian kernels. Signal Process. 171, 107495 (2020)
Article Google Scholar
Liao, K., Liu, G., Hui, Y.: An improvement to the SIFT descriptor for image representation and matching. Pattern Recogn. Lett. 34(11), 1211–1220 (2013)
Article Google Scholar
Lim, J.-Y., Stiehl, H. S.: A generalized discrete scale-space formulation for 2-D and 3-D signals. In: International Conference on Scale-Space Theories in Computer Vision (Scale-Space’03), pp. 132–147 (2003). Springer LNCS volume (2695)
Linde, O., Lindeberg, T.: Composed complex-cue histograms: an investigation of the information content in receptive field based image descriptors for object recognition. Comput. Vis. Image Underst. 116(4), 538–560 (2012)
Article Google Scholar
Lindeberg, T.: Scale-space for discrete signals. IEEE Trans. Pattern Anal. Mach. Intell. 12(3), 234–254 (1990)
Article Google Scholar
Lindeberg, T.: Scale-Space Theory in Computer Vision. Springer (1993)
Google Scholar
Lindeberg, T.: Discrete derivative approximations with scale-space properties: a basis for low-level feature extraction. J. Math. Imaging Vis. 3(4), 349–376 (1993)
Article MathSciNet Google Scholar
Lindeberg, T.: Scale-space theory: A basic tool for analysing structures at different scales. Journal of Applied Statistics 21(2), 225–270 (1994). Also available from http://www.csc.kth.se/~tony/abstracts/Lin94-SI-abstract.html
Lindeberg, T.: On the axiomatic foundations of linear scale-space. In: Sporring, J., Nielsen, M., Florack, L., Johansen, P. (eds.) Gaussian Scale-Space Theory: Proceedings of PhD School on Scale-Space Theory, pp. 75–97, Copenhagen, Denmark (1996). Springer
Lindeberg, T.: Feature detection with automatic scale selection. Int. J. Comput. Vision 30(2), 77–116 (1998)
Google Scholar
Lindeberg, T.: Edge detection and ridge detection with automatic scale selection. Int. J. Comput. Vision 30(2), 117–154 (1998)
Article Google Scholar
Lindeberg, T.: Generalized Gaussian scale-space axiomatics comprising linear scale-space, affine scale-space and spatio-temporal scale-space. J. Math. Imaging Vis. 40(1), 36–81 (2011)
Article MathSciNet Google Scholar
Lindeberg, T.: Scale selection properties of generalized scale-space interest point detectors. J. Math. Imaging Vis. 46(2), 177–210 (2013)
Article MathSciNet Google Scholar
Lindeberg, T.: A computational theory of visual receptive fields. Biol. Cybern. 107(6), 589–635 (2013)
Article MathSciNet Google Scholar
Lindeberg, T.: Image matching using generalized scale-space interest points. J. Math. Imaging Vis. 52(1), 3–36 (2015)
Article MathSciNet Google Scholar
Lindeberg, T.: Discrete approximations of the affine Gaussian derivative model for visual receptive fields. arXiv preprint arXiv:1701.02127, (2017)
Lindeberg, T.: Scale selection. In: Ikeuchi, K. (ed.), Computer Vision, pp. 1110–1123. Springer, (2021). https://doi.org/10.1007/978-3-030-03243-2_242-1
Lindeberg, T.: Normative theory of visual receptive fields. Heliyon 7(1), 1–20 (2021). https://doi.org/10.1016/j.heliyon.2021.e05897
Article MathSciNet Google Scholar
Lindeberg, T.: Scale-covariant and scale-invariant Gaussian derivative networks. In: Proceedings of Scale Space and Variational Methods in Computer Vision (SSVM 2021), Springer LNCS, vol. 12679, pp. 3–14 (2021)
Lindeberg, T.: Scale-covariant and scale-invariant Gaussian derivative networks. J. Math. Imaging Vis. 64(3), 223–242 (2022)
Article MathSciNet Google Scholar
Lindeberg, T., Bretzner, L.: Real-time scale selection in hybrid multi-scale representations. In: Griffin, L., Lillholm, M. (eds.) Proceedings of Scale-Space Methods in Computer Vision (Scale-Space’03), Springer LNCS, vol. 2695, pp. 148–163. Springer, Isle of Skye, Scotland (2003)
Lindeberg, T.: Approximation properties relative to continuous scale space for hybrid discretizations of Gaussian derivative operators. arXiv preprint arXiv:2405.05095 (2024)
Lindeberg, T., Gårding, J.: Shape-adapted smoothing in estimation of 3-D shape cues from affine distortions of local 2-D structure. Image Vis. Comput. 15(6), 415–434 (1997)
Article Google Scholar
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 60(2), 91–110 (2004)
Article Google Scholar
Mikolajczyk, K., Schmid, C.: Scale and affine invariant interest point detectors. Int. J. Comput. Vision 60(1), 63–86 (2004)
Article Google Scholar
Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., van Gool, L.: A comparison of affine region detectors. Int. J. Comput. Vision 65(1–2), 43–72 (2005)
Article Google Scholar
Moosavi-Dezfooli, S.-M., Fawzi, A., Fawzi, O., Frossard, P.: Universal adversarial perturbations. In: Proceedings of Computer Vision and Pattern Recognition (CVPR 2017) (2017)
Morel, J.-M., Yu, G.: ASIFT: A new framework for fully affine invariant image comparison. SIAM J. Imag. Sci. 2(2), 438–469 (2009)
Article MathSciNet Google Scholar
Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in PyTorch. In: Proceedings Neural Information Processing Systems (NIPS 2017), (2017)
Pauwels, E.J., Fiddelaers, P., Moons, T., van Gool, L.J.: An extended class of scale-invariant and recursive scale-space filters. IEEE Trans. Pattern Anal. Mach. Intell. 17(7), 691–701 (1995)
Article Google Scholar
Penaud-Polge, V., Velasco-Forero, S., Angulo, J.: Fully trainable Gaussian derivative convolutional layer. In: International Conference on Image Processing (ICIP 2022), pp. 2421–2425 (2022)
Perona, P.: Steerable-scalable kernels for edge detection and junction analysis. In: Proceedings European Conference on Computer Vision (ECCV’92), Springer LNCS, vol. 588, pp. 3–18, Santa Margherita Ligure, Italy, May (1992)
Perona, P.: Deformable kernels for early vision. IEEE Trans. Pattern Anal. Mach. Intell. 17(5), 488–499 (1995)
Article Google Scholar
Pintea, S.L., Tömen, N., Goes, S.F., Loog, M., van Gemert, J.C.: Resolution learning in deep convolutional networks using scale-space theory. IEEE Trans. Image Process. 30, 8342–8353 (2021)
Article Google Scholar
Rey-Otero, I., Delbracio, M.: Computing an exact Gaussian scale-space. Image Process. On Line 6, 8–26 (2016)
Article Google Scholar
Rodríguez, M., Delon, J., Morel, J.-M.: Covering the space of tilts: Application to affine invariant image comparison. SIAM J. Imag. Sci. 11(2), 1230–1267 (2018)
Article MathSciNet Google Scholar
Rothganger, F., Lazebnik, S., Schmid, C., Ponce, J.: 3D object modeling and recognition using local affine-invariant image descriptors and multi-view spatial constraints. Int. J. Comput. Vision 66(3), 231–259 (2006)
Article Google Scholar
Rothganger, F., Lazebnik, S., Schmid, C., Ponce, J.: Segmenting, modeling, and matching video clips containing multiple moving objects. IEEE Trans. Pattern Anal. Mach. Intell. 29(3), 477–491 (2007)
Article Google Scholar
Sadek, R., Constantinopoulos, C., Meinhardt, E., Ballester, C.C., Caselles, V.: On affine invariant descriptors related to SIFT. SIAM J. Imag. Sci. 5(2), 652–687 (2012)
Article MathSciNet Google Scholar
Sangalli, M., Blusseau, S., Velasco-Forero, S., Angulo, J.: Scale equivariant U-net. In: Proceedings of British Machine Vision Conference (BMVC 2022) (2022)
Schiele, B., Crowley, J.: Recognition without correspondence using multidimensional receptive field histograms. Int. J. Comput. Vision 36(1), 31–50 (2000)
Article Google Scholar
Simoncelli, E.P., Farid, H.: Steerable wedge filters for local orientation analysis. IEEE Trans. Image Process. 5(9), 1377–1382 (1996)
Article Google Scholar
Simoncelli, E. P., Freeman, W. T.: The steerable pyramid: A flexible architecture for multi-scale derivative computation. In: Proceedings of International Conference on Image Processing (ICIP’95), Washington DC (1995)
Simoncelli, E.P., Freeman, W.T., Adelson, E.H., Heeger, D.J.: Shiftable multi-scale transforms. IEEE Trans. Information Theory 38(2), 587–607 (1992)
Article MathSciNet Google Scholar
Slavík, A., Stehlík, P.: Dynamic diffusion-type equations on discrete-space domains. J. Math. Anal. Appl. 427(1), 525–545 (2015)
Article MathSciNet Google Scholar
Sporring, J., Nielsen, M., Florack, L., Johansen, P. (eds.) Gaussian Scale-Space Theory: Proceedings of PhD School on Scale-Space Theory. Series in Mathematical Imaging and Vision. Springer, Copenhagen, Denmark (1997)
Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., Fergus, B.: Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, (2013)
ter Haar Romeny, B.: Front-End Vision and Multi-Scale Image Analysis. Springer, Cham (2003)
Book Google Scholar
Tschirsich, M., Kuijper, A.: Notes on discrete Gaussian scale space. J. Math. Imaging Vis. 51, 106–123 (2015)
Article MathSciNet Google Scholar
Turin, G.: An introduction to matched filters. IRE Trans. Inf. Theory 6(3), 311–329 (1960)
Article MathSciNet Google Scholar
Tuytelaars, T., Mikolajczyk, K.: In: A Survey on Local Invariant Features, volume 3(3) of Foundations and Trends in Computer Graphics and Vision. Now Publishers (2008)
Tuytelaars, T., van Gool, L.: Matching widely separated views based on affine invariant regions. Int. J. Comput. Vision 59(1), 61–85 (2004)
Unser, M., Aldroubi, A., Eden, M.: Fast B-spline transforms for continuous image representation and interpolation. IEEE Trans. Pattern Anal. Mach. Intell. 13(3), 277–285 (1991)
Article Google Scholar
Unser, M., Aldroubi, A., Eden, M.: B-spline signal processing. I. Theory. IEEE Trans. Signal Process. 41(2), 821–833 (1993)
Article Google Scholar
van Vliet, L.J., Young, I.T., Verbeek, P.W.: Recursive Gaussian derivative filters. In: International Conference on Pattern Recognition vol. 1, pp. 509–514 (1998)
Wang, Y.-P.: Image representations using multiscale differential operators. IEEE Trans. Image Process. 8(12), 1757–1771 (1999)
Article MathSciNet Google Scholar
Wang, Y.-P., Lee, S.L.: Scale-space derived from B-splines. IEEE Trans. Pattern Anal. Mach. Intell. 20(10), 1040–1055 (1998)
Article Google Scholar
Weickert, J., Ishikawa, S., Imiya, A.: Linear scale-space has first been proposed in Japan. J. Math. Imaging Vis. 10(3), 237–252 (1999)
Article MathSciNet Google Scholar
Witkin, A. P.: Scale-space filtering. In: Proceedings of 8th International Joint Conferences Artificial Intelligence, pp. 1019–1022, Karlsruhe, Germany (1983)
Woodward, P.M.: Probability and Information Theory, with Applications to Radar, vol. 3. Pergamon Press, Oxford (1953)
Google Scholar
Young, I.T., van Vliet, L.J.: Recursive implementation of the Gaussian filter. Signal Process. 44(2), 139–151 (1995)
Article Google Scholar
Young, R. A.: The Gaussian derivative theory of spatial vision: Analysis of cortical cell receptive field line-weighting profiles. Technical Report GMR-4920, Computer Science Department, General Motors Research Lab., Warren, Michigan (1985)
Young, R.A.: The Gaussian derivative model for spatial vision: I. Retinal mechanisms. Spat. Vis. 2(4), 273–293 (1987)
Article Google Scholar
Yu, G., Morel, J.-M.: A fully affine invariant image comparison method. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP 2009), pp. 1597–1600 (2009)
Yuille, A.L., Poggio, T.A.: Scaling theorems for zero-crossings. IEEE Trans. Pattern Anal. Mach. Intell. 8(1), 15–25 (1986)
Article Google Scholar
Zheng, Q., Gong, M., You, X., Tao, D.: A unified B-spline framework for scale-invariant keypoint detection. Int. J. Comput. Vision 130(3), 777–799 (2022)
Article Google Scholar

Download references

Acknowledgements

Python code, that implements a subset of the discretization methods for Gaussian smoothing and Gaussian derivatives in this paper, is available in the pyscsp package, available at GitHub: https://github.com/tonylindeberg/pyscsp as well as through PyPi: pip install pyscsp

Funding

Open access funding provided by Royal Institute of Technology.

Author information

Authors and Affiliations

Computational Brain Science Lab, Division of Computational Science and Technology, KTH Royal Institute of Technology, SE-100 44, Stockholm, Sweden
Tony Lindeberg

Authors

Tony Lindeberg
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tony Lindeberg.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The support from the Swedish Research Council (contract 2022-02969) is gratefully acknowledged.

Appendix

1.1 Explicit Expressions for Gaussian Derivative Kernels

This section gives explicit expressions for the 1-D Gaussian derivative kernels, that we derive discrete approximations for in Sect. 3. For simplicity, we here parameterize the kernels in terms of the standard deviation $\sigma $, instead to the variance $s = \sigma ^2$.

Consider the probabilistic Hermite polynomials ${\text {He}}_n(x)$, defined by

$$\begin{aligned} {\text {He}}_n(x) = (-1)^n e^{x^2/2} \, \partial _{x^n} \left( e^{-x^2/2} \right) , \end{aligned}$$

(119)

which implies that

$$\begin{aligned} \partial _{x^n} \left( e^{-x^2/2} \right) = (-1)^n {\text {He}}_n(x) \, e^{-x^2/2} \end{aligned}$$

(120)

and

$$\begin{aligned} \partial _{x^n} \left( e^{-x^2/2\sigma ^2} \right) = (-1)^n {\text {He}}_n\left( \frac{x}{\sigma }\right) \, e^{-x^2/2\sigma ^2} \frac{1}{\sigma ^n}. \end{aligned}$$

(121)

This means that the n:th-order Gaussian derivative kernel in 1-D can be written as

$$\begin{aligned}&\begin{aligned} \partial _{x^n} \left( g(x;\; \sigma ) \right) = \frac{1}{\sqrt{2 \pi } \sigma } \partial _{x^n} \left( e^{-x^2/2\sigma ^2} \right) \end{aligned}\nonumber \\&\begin{aligned} = \frac{1}{\sqrt{2 \pi } \sigma } \frac{(-1)^n}{\sigma ^n} {\text {He}}_n\left( \frac{x}{\sigma }\right) \, e^{-x^2/2\sigma ^2} \\ = \frac{(-1)^n}{\sigma ^n} {\text {He}}_n\left( \frac{x}{\sigma }\right) \, g(x;\; \sigma ). \end{aligned} \end{aligned}$$

(122)

For n up to fourth order of spatial differentiation, we have

$$\begin{aligned}&\begin{aligned} g(x;\; \sigma )&= \frac{1}{2 \pi \sigma } \, e^{-x^2/2\sigma ^2}, \end{aligned}\end{aligned}$$

(123)

$$\begin{aligned}&\begin{aligned} g_x(x;\; \sigma )&= -\frac{x}{\sigma ^2} \, g(x;\; \sigma ), \end{aligned}\end{aligned}$$

(124)

$$\begin{aligned}&\begin{aligned} g_{xx}(x;\; \sigma )&= \frac{(x^2 - \sigma ^2)}{\sigma ^4} \, g(x;\; \sigma ), \end{aligned}\end{aligned}$$

(125)

$$\begin{aligned}&\begin{aligned} g_{xxx}(x;\; \sigma )&= - \frac{(x^3 - 3 \ \sigma ^2 \, x)}{\sigma ^6} \, g(x;\; \sigma ), \end{aligned}\end{aligned}$$

(126)

$$\begin{aligned}&\begin{aligned} g_{xxxx}(x;\; \sigma )&= \frac{(x^4 - 6 \, \sigma ^2 \, x^2 + 3 \, \sigma ^4)}{\sigma ^8} \, g(x;\; \sigma ). \end{aligned} \end{aligned}$$

(127)

1.2 Derivation of the Integrated Gaussian Kernel and the Integrated Gaussian Derivatives

In this appendix, we will give a hands-on derivation of how convolution with integrated Gaussian kernels or integrated Gaussian derivative kernels arises from an assumption of extending any discrete signal to a piecewise constant continuous signal over each pixel support region.

Consider a Gaussian derivative kernel of order $\alpha $, where the special case $\alpha = 0$ corresponds to the regular zero-order Gaussian kernel. For any one-dimensional continuous signal $f_c(x)$, the Gaussian derivative response of order $\alpha $ is given by

$$\begin{aligned} L_{x^{\alpha }}(x; s) = \int \limits _{\xi \in {{\mathbb {R}}}} g_{x^{\alpha }}(x - \xi ;\, s) \, f_c(\xi ) \, d\xi . \end{aligned}$$

(128)

Let us next assume that we have a given discrete input signal f(n), and from this discrete signal define a step-wise constant continuous signal $f_c(x)$ according to

$$\begin{aligned} f_c(x) = f(n) \quad \quad \text{ if } -\tfrac{1}{2} < x - n \le \tfrac{1}{2}, \end{aligned}$$

(129)

where n denotes the integer nearest to the real-valued coordinate x.

The result of subjecting this continuous signal to the continuous Gaussian derivative convolution integral (128) at any integer grid point $x = n$ can therefore be written as

$$\begin{aligned} L_{x^{\alpha }}(n;\; s) = \sum _{m = -\infty }^{\infty } \int \limits _{\xi = m-1/2}^{m+1/2} g_{x^{\alpha }}(n - \xi ;\, s) \, f_c(\xi ) \, d\xi . \end{aligned}$$

(130)

Now, since $f_c(n - \xi ) = f(m)$ within the pixel support region $m-1/2 < \xi \le m+1/2$, we can also write this relation as

$$\begin{aligned} L_{x^{\alpha }}(n;\; s) = \sum _{m = -\infty }^{\infty } f(m) \int \limits _{\xi = m-1/2}^{m+1/2} g_{x^{\alpha }}(n - \xi ;\, s) \, d\xi . \end{aligned}$$

(131)

Next, by defining the integrated Gaussian derivative kernel as

$$\begin{aligned} T_{\text{ int }, x^{\alpha }}(n - m;\; s) = \int \limits _{\xi = m-1/2}^{m+1/2} g_{x^{\alpha }}(n - \xi ;\, s) \, d\xi , \end{aligned}$$

(132)

it follows that the relation (131) can be written as

$$\begin{aligned} L_{x^{\alpha }}(n;\; s)= & {} \sum _{m = -\infty }^{\infty } f(m) \, T_{\text{ int }, x^{\alpha }}(n - m;\; s) = \nonumber \\= & {} \sum _{m = -\infty }^{\infty }T_{\text{ int }, x^{\alpha }}(n - m;\; s) \, f(m), \end{aligned}$$

(133)

which shows that construction of applying the continuous Gaussian derivative convolution to a stepwise constant signal, defined as being equal to the discrete signal over each pixel support region, at the discrete grid points $x = n$ corresponds to discrete convolution with the integrated Gaussian derivative kernel.

1.3 $L_1$-Norms of Gaussian Derivative Kernels

This appendix gives explicit expressions for the $L_1$-norms of 1-D Gaussian derivative kernels

$$\begin{aligned} N_{\alpha } = \Vert g_{\alpha }(\cdot ;\; \sigma ) \Vert _1 = \int \limits _{x \in {{\mathbb {R}}}} \left| g_{x^{\alpha }}(x;\; \sigma ) \right| \, \hbox {d}x \end{aligned}$$

(134)

for differentiation orders up to 4, based on Equations (74)–(77) in Lindeberg [48], while with the scale normalization underlying those equations, to constant $L_1$-norms over scale, removed:

$$\begin{aligned}&\begin{aligned} N_0(\sigma )&= 1, \end{aligned} \end{aligned}$$

(135)

$$\begin{aligned}&\begin{aligned} N_1(\sigma )&= \frac{1}{\sigma } \sqrt{\frac{2}{\pi }} \approx \frac{0.798}{\sigma }, \end{aligned}\end{aligned}$$

(136)

$$\begin{aligned}&\begin{aligned} N_2(\sigma )&= \frac{1}{\sigma ^2} \sqrt{\frac{8}{e \, \pi }} \approx \frac{0.968}{\sigma ^2}, \end{aligned}\end{aligned}$$

(137)

$$\begin{aligned}&\begin{aligned} N_3(\sigma )&= \frac{1}{\sigma ^3} \left( 1 + \frac{4}{e^{3/2}} \right) \sqrt{\frac{2}{\pi }} \approx \frac{1.510}{\sigma ^3}, \end{aligned}\end{aligned}$$

(138)

$$\begin{aligned}&\begin{aligned} N_4(\sigma )&= \frac{1}{\sigma ^4} \frac{4 \sqrt{3}}{\left( e^{3/2} + e^{\sqrt{3/2}} \sqrt{\pi } \right) }\\&\quad \times \left( \sqrt{3 - \sqrt{6}} \, e^{\sqrt{6}} +\sqrt{3 + \sqrt{6}} \right) \end{aligned}\nonumber \\&\begin{aligned}&\qquad \quad \approx \frac{2.801}{\sigma ^4}. \end{aligned} \end{aligned}$$

(139)

1.4 Spatial Spread Measures for Gaussian Derivative Kernels

This appendix gives explicit expressions for spread measures in terms of the standard deviations of the absolute values of the 1-D Gaussian derivative kernels^{Footnote 14}

$$\begin{aligned} S_{\alpha } = \sqrt{\frac{\int \limits _{x \in {{\mathbb {R}}}} x^2 \left| g_{x^{\alpha }}(x;\; \sigma ) \right| \, \hbox {d}x}{\int \limits _{x \in {{\mathbb {R}}}} \left| g_{x^{\alpha }}(x;\; \sigma ) \right| \, \hbox {d}x}} \end{aligned}$$

(141)

for differentiation orders up to 4:

$$\begin{aligned}&\begin{aligned} S_0(\sigma )&= \sigma , \end{aligned} \end{aligned}$$

(142)

$$\begin{aligned}&\begin{aligned} S_1(\sigma )&= \sqrt{2} \, \sigma \approx 1.414 \, \sigma , \end{aligned}\end{aligned}$$

(143)

$$\begin{aligned}&\begin{aligned} S_2(\sigma )&= \root 4 \of {\frac{e \, \pi }{2}} \, \sqrt{1 + 3 \, \sqrt{\frac{2}{e \, \pi }} - 2{\text {erf}}\left( \frac{1}{\sqrt{2}}\right) } \times \sigma \end{aligned}\nonumber \\&\begin{aligned}&\qquad \quad \approx 1.498 \, \sigma , \end{aligned}\end{aligned}$$

(144)

$$\begin{aligned}&\begin{aligned} S_3(\sigma )&= \sqrt{\frac{28-2 \, e^{3/2}}{4+e^{3/2}}} \times \sigma \approx 1.498 \, \sigma , \end{aligned}\end{aligned}$$

(145)

$$\begin{aligned}&\begin{aligned} S_4(\sigma )&\approx 1.481 \, \sigma . \end{aligned} \end{aligned}$$

(146)

An exact expression for $S_4(\sigma )$ is given in Fig. 22. The calculations have been performed in Mathematica.

1.5 Diffusion Polynomials in the 1-D Continuous Case

This appendix lists diffusion polynomials, that satisfy the 1-D diffusion equation

$$\begin{aligned} \partial _s L = \frac{1}{2} \, \partial _{xx} L \end{aligned}$$

(147)

with initial condition $L(x;\; 0) = f(x)$, for f(x) being monomials

$$\begin{aligned} f(x) = x^k \end{aligned}$$

(148)

of orders up to $k = 4$:

$$\begin{aligned}&\begin{aligned} q_0(x;\; s) = 1, \end{aligned}\end{aligned}$$

(149)

$$\begin{aligned}&\begin{aligned} q_1(x;\; s) = x, \end{aligned}\end{aligned}$$

(150)

$$\begin{aligned}&\begin{aligned} q_2(x;\; s) = x^2 + s, \end{aligned}\end{aligned}$$

(151)

$$\begin{aligned}&\begin{aligned} q_3(x;\; s) = x^3 + 3 \, x \, s, \end{aligned}\end{aligned}$$

(152)

$$\begin{aligned}&\begin{aligned} q_4(x;\; s) = x^4 + 6 \, x^2 \, s + 3 \, s^2. \end{aligned} \end{aligned}$$

(153)

These diffusion polynomials do, in this respect, describe how a monomial input function $f(x) = x^k$ is affected by convolution with the continuous Gaussian kernel for standard deviation $\sigma = \sqrt{s}$.

1.6 Affine Gaussian Scale Space

As a more general spatial scale-space representation for 2-D images, consider the affine Gaussian scale-space representation

$$\begin{aligned} L(x, y;\; \varSigma ) = (g_{\text{ aff }}(\cdot , \cdot ;\; \varSigma ) * f(\cdot , \cdot ))(x, y;\; \varSigma ), \end{aligned}$$

(154)

where $g_{\text{ aff }}(x, y;\; \varSigma ) = g_{\text{ aff }}(p;\; \varSigma )$ for $p = (x, y)^T$ represents a 2-D affine Gaussian kernel of the form

$$\begin{aligned} g_{\text{ aff }}(p;\; \varSigma ) = \frac{1}{2 \pi \sqrt{\det \varSigma }} \, e^{-p^T \varSigma ^{-1} p/2}, \end{aligned}$$

(155)

with $\varSigma $ denoting any positive definite $2 \times 2$ matrix. In terms of diffusion equations, this affine scale-space representation along each ray $\varSigma = s \, \varSigma _0$ in affine scale-space satisfies the affine diffusion equation

$$\begin{aligned} \partial _s = \frac{1}{2} \nabla ^T (\varSigma _0 \, \nabla L) \end{aligned}$$

(156)

with initial condition $L(\cdot , \cdot ;\; 0) = f(\cdot , \cdot )$.

A general rationale for studying and making use of this affine scale-space representation is that it is closed under affine transformations, thus leading to affine covariance or affine equivariance [44, 61].

This closedness under affine transformations has been used for computing more accurate estimates of local surface orientation from monocular and binocular cues [61, 74], for computing affine invariant image features for image matching under wide baselines [7, 18, 21, 38, 40, 63, 64, 75, 76, 90], for performing affine invariant segmentation [6], for constructing affine covariant SIFT descriptors [66, 77, 102], for modelling receptive fields in biological vision [52, 56], for affine invariant tracking [28], and for formulating affine covariant metrics [23]. Affine Gaussian kernels with their related affine Gaussian derivatives have also been used as a general filter family for a large number of purposes in computer vision [33, 37, 39].

1.7 A Convenient Parameterization of Affine Gaussian Kernels Over a 2-D Spatial Domain

For the 2-D case, which we will restrict ourselves to henceforth, we may parameterize the spatial covariance matrix

$$\begin{aligned} \varSigma = \left( \begin{array}{cc} C_{xx} &{} C_{xy} \\ C_{xy} &{} C_{yy} \end{array} \right) , \end{aligned}$$

(157)

underlying the definition of affine Gaussian kernels in the affine Gaussian scale-space theory according to Appendix A.6, in terms of its eigenvalues $\lambda _1 >0$ and $\lambda _2 > 0$ as well as an image orientation $\varphi $, as

$$\begin{aligned}&\begin{aligned} C_{xx}&= \lambda _1 \cos ^2 \varphi + \lambda _2 \sin ^2 \varphi , \end{aligned}\end{aligned}$$

(158)

$$\begin{aligned}&\begin{aligned} C_{xy}&= (\lambda _1 - \lambda _2) \cos \varphi \, \sin \varphi , \end{aligned}\end{aligned}$$

(159)

$$\begin{aligned}&\begin{aligned} C_{yy}&= \lambda _1 \sin ^2 \varphi + \lambda _2 \cos ^2 \varphi , \end{aligned} \end{aligned}$$

(160)

and where we may additionally parameterize the eigenvalues in terms of corresponding standard deviations $\sigma _1$ and $\sigma _2$ according to

$$\begin{aligned}&\begin{aligned} \lambda _1&= \sigma _1^2, \end{aligned}\end{aligned}$$

(161)

$$\begin{aligned}&\begin{aligned} \lambda _2&= \sigma _2^2, \end{aligned} \end{aligned}$$

(162)

which then leads to the following explicit expression for the affine Gaussian derivative kernel

$$\begin{aligned} g_{\text{ aff }}(x, y;\; \sigma _1, \sigma _2, \varphi ) = \frac{1}{2 \pi \sigma _1 \sigma _2} \, e^{-A/2 \, \sigma _1^2 \, \sigma _2^2}, \end{aligned}$$

(163)

where

$$\begin{aligned} A= & {} (\sigma _2^2 \, x^2 + \sigma _1^2 \, y^2) \cos ^2 \varphi + (\sigma _1^2 \, x^2 + \sigma _2^2 \, y^2) \sin ^2 \varphi \nonumber \\{} & {} - 2 \, (\sigma _1^2 - \sigma _2^2) \, \cos \varphi \, \sin \varphi \, x \, y. \end{aligned}$$

(164)

The sampled affine Gaussian kernel is then given by

$$\begin{aligned} T_{\text{ affsampl }}(m, n;\; \sigma _1, \sigma _2, \varphi ) = g_{\text{ aff }}(m, n;\; \sigma _1, \sigma _2, \varphi ). \end{aligned}$$

(165)

At very fine scales, this discrete kernel will suffer from similar problems as the sampled rotationally symmetric Gaussian kernel according to (18), in the sense that: (i) the filter coefficients may exceed 1 for very small values of $\sigma _1$ or $\sigma _2$, (ii) the filter coefficients may sum up to a value larger than 1 for very small values of $\sigma _1$ or $\sigma _2$, and (iii) it may not be a sufficiently good numerical approximation of a spatial differentiation operator. For sufficiently large values of the scale parameters $\sigma _1$ and $\sigma _2$, however, this kernel can nevertheless be expected to constitute a reasonable approximation of the continuous affine Gaussian scale-space theory, for purposes of computing coarse-scale receptive field responses, for filter-bank approaches to e.g. visual recognition.

Alternatively, it is also possible to define a genuine discrete theory for affine kernels [54]. A limitation of that theory, however, is that a positivity requirement on the resulting spatial discretization imposes an upper bound on the eccentricities of the shapes of the kernels (as determined by the ratio between the eigenvalues $\lambda _1$ and $\lambda _2$ of $\varSigma $), and implying that the kernel shapes must not be too eccentric, to be represented within the theory. For this reason, we do not consider that theory in more detail here, and refer the reader to the original source for further details.

1.8 Explicit Expressions for Discrete Directional Derivative Approximation Masks

This appendix gives explicit expressions for directional derivative operator masks $\delta _{\varphi ^{m_1} \bot \varphi ^{m_2}}$ according to (112) and (115), in terms of underlying discrete derivative approximation masks along the Cartesian coordinate directions according to Appendix A.9.

Of order 1

$$\begin{aligned}&\begin{aligned} \delta _{\varphi } =&\cos \varphi \, \delta _x + \sin \varphi \, \delta _y, \end{aligned} \end{aligned}$$

(166)

$$\begin{aligned}&\begin{aligned} \delta _{\bot \varphi } =&- \sin \varphi \, \delta _x + \cos \varphi \, \delta _y. \end{aligned} \end{aligned}$$

(167)

Of order 2

$$\begin{aligned}&\begin{aligned} \delta _{\varphi \varphi } =&\cos ^2 \varphi \, \delta _{xx} + 2 \cos \varphi \, \sin \varphi \, \delta _{xy} + \sin ^2 \varphi \, \delta _{yy}, \end{aligned} \end{aligned}$$

(168)

$$\begin{aligned}&\begin{aligned} \delta _{\varphi \bot \varphi } =&\cos \varphi \, \sin \varphi \, (\delta _{yy} - \delta _{xx}) + (\cos ^2 \varphi - \sin ^2 \varphi ) \, \delta _{xy}, \end{aligned}\end{aligned}$$

(169)

$$\begin{aligned}&\begin{aligned} \delta _{\bot \varphi \bot \varphi } =&\sin ^2 \varphi \, \delta _{xx} - 2 \cos \varphi \, \sin \varphi \, \delta _{xy} + \cos ^2 \varphi \, \delta _{yy}. \end{aligned} \end{aligned}$$

(170)

Of order 3

$$\begin{aligned}&\begin{aligned} \delta _{\varphi \varphi \varphi }&= \cos ^3 \varphi \, \delta _{xxx} + 3 \cos ^2 \varphi \, \sin \varphi \, \delta _{xxy} \end{aligned}\nonumber \\&\begin{aligned}&\qquad \qquad + 3 \cos \varphi \, \sin ^2 \varphi \, \delta _{xyy} + \sin ^3 \varphi \, \delta _{yyy}, \end{aligned}\end{aligned}$$

(171)

$$\begin{aligned}&\begin{aligned} \delta _{\varphi \varphi \bot \varphi } =&- \cos ^2 \varphi \, \sin \varphi \, \delta _{xxx} \end{aligned}\nonumber \\&\begin{aligned}&\qquad \qquad + (\cos ^3 \varphi - 2 \, \cos \varphi \, \sin ^2 \varphi ) \, \delta _{xxy} \end{aligned}\nonumber \\&\begin{aligned}&\qquad \qquad - (\sin ^3 \varphi - 2 \, \cos ^2 \varphi \, \sin \varphi ) \, \delta _{xyy} \end{aligned}\nonumber \\&\begin{aligned}&\qquad \qquad + \cos \varphi \, \sin ^2 \varphi \, \delta _{yyy}, \end{aligned}\end{aligned}$$

(172)

$$\begin{aligned}&\begin{aligned} \delta _{\varphi \bot \varphi \bot \varphi } =&\cos \varphi \, \sin ^2 \varphi \, \delta _{xxx} \end{aligned}\nonumber \\&\begin{aligned}&\qquad \qquad + (\sin ^3 \varphi - 2 \, \cos ^2 \varphi \, * \sin \varphi ) \, \delta _{xxy} \end{aligned}\nonumber \\&\begin{aligned}&\qquad \qquad + (\cos ^3 \varphi - 2 \, \cos \varphi \, \sin ^2 \varphi 2) \, \delta _{xyy} \end{aligned}\nonumber \\&\begin{aligned}&\qquad \qquad + \cos ^2 \varphi \, \sin \varphi \, \delta _{yyy}, \end{aligned}\end{aligned}$$

(173)

$$\begin{aligned}&\begin{aligned} \delta _{\bot \varphi \bot \varphi \bot \varphi } =&- \sin ^3 \varphi \, \delta _{xxx} + 3 \, \sin ^2 \varphi \, \cos \varphi \, \delta _{xxy} \end{aligned}\nonumber \\&\begin{aligned}&\qquad \qquad - 3 \, \sin \varphi \, \cos ^2 \varphi \, \delta _{xyy} + \cos ^3 \varphi \, \delta _{yyy}. \end{aligned} \end{aligned}$$

(174)

Of order 4

$$\begin{aligned}&\begin{aligned} \delta _{\varphi \varphi \varphi \varphi } =&\cos ^4 \varphi \, \delta _{xxxx} + 4 \, \cos ^3 \varphi \, \sin \varphi \, \delta _{xxxy} \end{aligned}\nonumber \\&\begin{aligned}&\qquad \qquad + 6 \, \cos ^2 \varphi \, \sin ^2 \varphi \, \delta _{xxyy} \end{aligned}\nonumber \\&\begin{aligned}&\qquad \qquad + 4 \, \cos \varphi \, \sin ^3 \varphi \, \delta _{xyyy} + \sin ^4 \varphi \, \delta _{yyyy}, \end{aligned}\end{aligned}$$

(175)

$$\begin{aligned}&\begin{aligned} \delta _{\varphi \varphi \varphi \bot \varphi } =&- \cos ^3 \varphi \, \sin \varphi \, \delta _{xxxx} \end{aligned}\nonumber \\&\begin{aligned}&\qquad \qquad + (\cos ^4 \varphi - 3 \, \cos ^2 \varphi \, \sin ^2 \varphi ) \, \delta _{xxxy} \end{aligned}\nonumber \\&\begin{aligned}&\qquad \qquad + 3 \, (\cos ^3 \varphi \, \sin \varphi - \cos \varphi \, \sin ^3 \varphi ) \, \delta _{xxyy} \end{aligned}\nonumber \\&\begin{aligned}&\qquad \qquad + (3 \, \cos ^2 \varphi \, \sin ^2 \varphi - \sin ^4 \varphi ) \, \delta _{xyyy} \end{aligned}\nonumber \\&\begin{aligned}&\qquad \qquad + \cos \varphi \, \sin ^3 \varphi \, \delta _{yyyy}, \end{aligned}\end{aligned}$$

(176)

$$\begin{aligned}&\begin{aligned} \delta _{\varphi \varphi \bot \varphi \bot \varphi } =&\cos ^2 \varphi \, \sin ^2 \varphi \, \delta _{xxxx} \end{aligned}\nonumber \\&\begin{aligned}&\qquad \qquad + 2 \, (\cos \varphi \, \sin ^3 \varphi - \cos ^3 \varphi \, \sin \varphi ) \, \delta _{xxxy} \end{aligned}\nonumber \\&\begin{aligned}&\qquad \qquad + (\cos ^4 \varphi - 4 \, \cos ^2 \varphi \, \sin ^2 \varphi + \sin ^4 \varphi ) \, \delta _{xxyy} \end{aligned}\nonumber \\&\begin{aligned}&\qquad \qquad + 2 \, (\cos ^3 \varphi \, \sin \varphi - \cos \varphi \, \sin ^3 \varphi ) \, \delta _{xyyy} \end{aligned}\nonumber \\&\begin{aligned}&\qquad \qquad + \cos ^2 \varphi \, \sin ^2 \varphi \, \delta _{yyyy}, \end{aligned}\end{aligned}$$

(177)

$$\begin{aligned}&\begin{aligned} \delta _{\varphi \bot \varphi \bot \varphi \bot \varphi } =&- \cos \varphi \, \sin ^3 \varphi \, \delta _{xxxx} \end{aligned}\nonumber \\&\begin{aligned}&\qquad \qquad + (3 \, \cos ^2 \varphi \, \sin ^2 \varphi - \sin ^4 \varphi ) \, \delta _{xxxy} \end{aligned}\nonumber \\&\begin{aligned}&\qquad \qquad + 3 \, (\cos \varphi \, \sin ^3 \varphi - \cos ^3 \varphi \, \sin \varphi ) \, \delta _{xxyy} \end{aligned}\nonumber \\&\begin{aligned}&\qquad \qquad + (\cos ^4 \varphi - 3 \, \cos ^2 \varphi \, \sin ^2 \varphi ) \, \delta _{xyyy} \end{aligned}\nonumber \\&\begin{aligned}&\qquad \qquad + \cos ^3 \varphi \, \sin \varphi \, \delta _{yyyy}, \end{aligned}\end{aligned}$$

(178)

$$\begin{aligned}&\begin{aligned} \delta _{\bot \varphi \bot \varphi \bot \varphi \bot \varphi } =&\sin ^4 \varphi \, \delta _{xxxx} \end{aligned}\nonumber \\&\begin{aligned}&\qquad \qquad - 4 \, \sin ^3 \varphi \, \cos \varphi \, \delta _{xxxy} \end{aligned}\nonumber \\&\begin{aligned}&\qquad \qquad + 6 \, \sin ^2 \varphi \, \cos ^2 \phi \, \delta _{xxyy} \end{aligned}\nonumber \\&\begin{aligned}&\qquad \qquad - 4 \, \sin \varphi \, \cos ^3 \varphi \, \delta _{\hbox {d}xyyy} \end{aligned}\nonumber \\&\begin{aligned}&\qquad \qquad + \cos ^4 \varphi \, \delta _{yyyy}. \end{aligned} \end{aligned}$$

(179)

1.9 Explicit Expressions for Discrete Derivative Approximation Masks

This appendix gives explicit expressions for discrete derivative approximation masks $\delta _{x^{\alpha } y^{\beta }}$ up to order 4 for the case of a 2-D spatial image domain.

Of order 1 embedded in masks of size $3 \times 3$:

$$\begin{aligned}&\begin{aligned} \delta _x&= \left( \begin{array}{ccc} 0 &{} 0 &{} 0 \\ -\tfrac{1}{2} &{} 0 &{} +\tfrac{1}{2} \\ 0 &{} 0 &{} 0 \end{array} \right) \end{aligned}\end{aligned}$$

(180)

$$\begin{aligned}&\begin{aligned} \delta _y&= \left( \begin{array}{ccc} 0 &{} +\tfrac{1}{2} &{} 0 \\ 0 &{} 0 &{} 0 \\ 0 &{} -\tfrac{1}{2} &{} 0 \end{array} \right) \end{aligned} \end{aligned}$$

(181)

Of order 2 embedded in masks of size $3 \times 3$:

$$\begin{aligned}&\begin{aligned} \delta _{xx}&= \left( \begin{array}{ccc} 0 &{} 0 &{} 0 \\ +1 &{} -2 &{} 1 \\ 0 &{} 0 &{} 0 \end{array} \right) \end{aligned}\end{aligned}$$

(182)

$$\begin{aligned}&\begin{aligned} \delta _{xy}&= \left( \begin{array}{ccc} -\tfrac{1}{4} &{} 0 &{} +\tfrac{1}{4} \\ 0 &{} 0 &{} 0 \\ +\tfrac{1}{4} &{} 0 &{} -\tfrac{1}{4} \end{array} \right) \end{aligned}\end{aligned}$$

(183)

$$\begin{aligned}&\begin{aligned} \delta _{yy}&= \left( \begin{array}{ccc} 0 &{} +1 &{} 0 \\ 0 &{} -2 &{} 0 \\ 0 &{} +1 &{} 0 \end{array} \right) \end{aligned} \end{aligned}$$

(184)

Of order 3 embedded in masks of size $5 \times 5$:

$$\begin{aligned}&\begin{aligned} \delta _{xxx}&= \left( \begin{array}{ccccc} 0 &{} 0 &{} 0 &{} 0 &{} 0 \\ 0 &{} 0 &{} 0 &{} 0 &{} 0 \\ -\tfrac{1}{2} &{} +1 &{} 0 &{} -1 &{} \tfrac{1}{2} \\ 0 &{} 0 &{} 0 &{} 0 &{} 0 \\ 0 &{} 0 &{} 0 &{} 0 &{} 0 \end{array} \right) \end{aligned}\end{aligned}$$

(185)

$$\begin{aligned}&\begin{aligned} \delta _{xxy}&= \left( \begin{array}{ccccc} 0 &{} 0 &{} 0 &{} 0 &{} 0 \\ 0 &{} +\tfrac{1}{2} &{} -1 &{} +\tfrac{1}{2} &{} 0 \\ 0 &{} 0 &{} 0 &{} 0 &{} 0 \\ 0 &{} -\tfrac{1}{2} &{} +1 &{} -\tfrac{1}{2} &{} 0 \\ 0 &{} 0 &{} 0 &{} 0 &{} 0 \end{array} \right) \end{aligned}\end{aligned}$$

(186)

$$\begin{aligned}&\begin{aligned} \delta _{xyy}&= \left( \begin{array}{ccccc} 0 &{} 0 &{} 0 &{} 0 &{} 0 \\ 0 &{} -\tfrac{1}{2} &{} 0 &{} +\tfrac{1}{2} &{} 0 \\ 0, &{} +1 &{} 0 &{} -1 &{} 0 \\ 0 &{} -\tfrac{1}{2} &{} 0 &{} +\tfrac{1}{2} &{} 0 \\ 0 &{} 0 &{} 0 &{} 0 &{} 0 \end{array} \right) \end{aligned}\end{aligned}$$

(187)

$$\begin{aligned}&\begin{aligned} \delta _{yyy}&= \left( \begin{array}{ccccc} 0 &{} 0 &{} +\tfrac{1}{2} &{} 0 &{} 0 \\ 0 &{} 0 &{} -1 &{} 0 &{} 0 \\ 0 &{} 0 &{} 0 &{} 0 &{} 0 \\ 0 &{} 0 &{} +1 &{} 0 &{} 0 \\ 0 &{} 0 &{} -\tfrac{1}{2} &{} 0 &{} 0 \end{array} \right) \end{aligned} \end{aligned}$$

(188)

Of order 4 embedded in masks of size $5 \times 5$:

$$\begin{aligned}&\begin{aligned} \delta _{xxxx}&= \left( \begin{array}{ccccc} 0 &{} 0 &{} 0 &{} 0 &{} 0 \\ 0 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} -4 &{} 6 &{} -4 &{} 1 \\ 0 &{} 0 &{} 0 &{} 0 &{} 0 \\ 0 &{} 0 &{} 0 &{} 0 &{} 0 \end{array} \right) \end{aligned}\end{aligned}$$

(189)

$$\begin{aligned}&\begin{aligned} \delta _{xxxy}&= \left( \begin{array}{ccccc} 0 &{} 0 &{} 0 &{} 0 &{} 0 \\ -\tfrac{1}{4} &{} +\tfrac{1}{2} &{} 0 &{} -\tfrac{1}{2} &{} +\tfrac{1}{4} \\ 0 &{} 0 &{} 0 &{} 0 &{} 0 \\ +\tfrac{1}{4} &{} -\tfrac{1}{2} &{} 0 &{} +\tfrac{1}{2} &{} -\tfrac{1}{4} \\ 0 &{} 0 &{} 0 &{} 0 &{} 0 \end{array} \right) \end{aligned}\end{aligned}$$

(190)

$$\begin{aligned}&\begin{aligned} \delta _{xxyy}&= \left( \begin{array}{ccccc} 0 &{} 0 &{} 0 &{} 0 &{} 0 \\ 0 &{} +1 &{} -2 &{} +1 &{} 0 \\ 0 &{} -2 &{} +4 &{} -2 &{} 0 \\ 0 &{} +1 &{} -2 &{} +1 &{} 0 \\ 0 &{} 0 &{} 0 &{} 0 &{} 0 \end{array} \right) \end{aligned}\end{aligned}$$

(191)

$$\begin{aligned}&\begin{aligned} \delta _{xyyy}&= \left( \begin{array}{ccccc} 0 &{} -\tfrac{1}{4} &{} 0 &{} +\tfrac{1}{4} &{} 0 \\ 0 &{} +\tfrac{1}{2} &{} 0 &{} -\tfrac{1}{2} &{} 0 \\ 0 &{} 0 &{} 0 &{} 0 &{} 0 \\ 0 &{} -\tfrac{1}{2} &{} 0 &{} +\tfrac{1}{2} &{} 0 \\ 0 &{} +\tfrac{1}{4} &{} 0 &{} -\tfrac{1}{4} &{} 0 \end{array} \right) \end{aligned}\end{aligned}$$

(192)

$$\begin{aligned}&\begin{aligned} \delta _{yyyy}&= \left( \begin{array}{ccccc} 0 &{} 0 &{} +1 &{} 0 &{} 0 \\ 0 &{} 0 &{} -4 &{} 0 &{} 0 \\ 0 &{} 0 &{} +6 &{} 0 &{} 0 \\ 0 &{} 0 &{} -4 &{} 0 &{} 0 \\ 0 &{} 0 &{} +1 &{} 0 &{} 0 \end{array} \right) \end{aligned} \end{aligned}$$

(193)

1.10 Clarifications in Relation to an Evaluation of What is Referred to as “Lindeberg’s Smoothing Method” by Rey-Otero and Delbracio [73]

With regard to an evaluation of a method, referred to as “Lindeberg’s smoothing method’ in [73], some clarifications would be needed concerning the experimental comparison they perform in that work, since that comparison is not made in relation to any of the best methods that arise from the discrete scale-space theory introduced in [43] and then extended in [44, Chapters 3 and 4].

Rey-Otero and Delbracio [73] compare to an Euler-forward discretization of the semi-discrete diffusion equation (Equation (4.30) in Lindeberg [44])

$$\begin{aligned} \partial _t L = \frac{1}{2} \, ( (1 - \gamma ) \, \nabla _5^2 + \gamma \, \nabla _{\times }^2 ) \, L \end{aligned}$$

(194)

with initial condition $L(x, y;\; 0) = f(x, y)$, that determines the evolution of a 2-D discrete scale-space representation over scale. The corresponding discrete scale-space representation, according to Lindeberg’s discrete scale-space theory, can, however, be computed more accurately, using the explicit expression for the Fourier transform of the underlying discrete family of scale-space kernels, according to Equation (4.24) in [44], and thus without using any a priori restriction to discrete levels in the scale direction, as used by Rey-Otero and Delbracio [73].

Additionally, concerning the choice of the parameter value $\gamma $, that determines the relative weighting between the contributions from the five-point $\nabla _5^2$ and cross-point $\nabla _{\times }^2$ discretizations of the Laplacian operator, Rey-Otero and Delbracio [73] use a non-optimal value for this relative weighting parameter $(\gamma = 1/2$), instead of using either $\gamma = 1/3$, which gives the best numerical approximation to rotational symmetry (Proposition 4.16 in Lindeberg [44]), or $\gamma = 0$, which leads to separable convolution operators on a Cartesian grid (Proposition 4.14 in Lindeberg [44]), which then also implies that the discrete scale-space representation can be computed using separable convolution with the 1-D discrete analogue of the Gaussian kernel (26) that is used in this work.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Lindeberg, T. Discrete Approximations of Gaussian Smoothing and Gaussian Derivatives. J Math Imaging Vis (2024). https://doi.org/10.1007/s10851-024-01196-9

Download citation

Received: 17 November 2023
Accepted: 29 April 2024
Published: 17 June 2024
DOI: https://doi.org/10.1007/s10851-024-01196-9

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Discrete Approximations of Gaussian Smoothing and Gaussian Derivatives

Abstract

Similar content being viewed by others

Edge Detection Using Topological Gradients: A Scale-Space Approach

Scale and Edge Detection with Topological Derivatives

Detection of Singularities by Discrete Multiscale Directional Representations

1 Introduction

2 Discrete Approximations of Gaussian Smoothing

2.1 Theoretical Properties of Gaussian Scale-Space Representation

2.1.1 Non-Creation of New Structure with Increasing Scale

2.1.2 Cascade Smoothing Property

2.1.3 Spatial Averaging

2.1.4 Separable Gaussian Convolution

2.2 Modelling Situation for Theoretical Analysis of Different Approaches for Implementing Gaussian Smoothing Discretely

2.2.1 Measures of the Spatial Extent of Smoothing Kernels

2.3 The Sampled Gaussian Kernel

2.4 The Normalized Sampled Gaussian Kernel

2.5 The Integrated Gaussian Kernel

2.6 The Discrete Analogue of the Gaussian Kernel

2.6.1 Diffusion Equation Interpretation of the Genuinely Discrete Scale-Space Representation Concept

2.7 Performance Measures for Quantifying Deviations from Theoretical Properties of Discretizations of Gaussian Kernels

2.8 Numerical Quantifications of Performance Measures

2.8.1 Normalization Error

2.8.2 Standard Deviations of the Discrete Kernels

2.8.3 Spatial Variance Offset of the Discrete Kernels

2.8.4 Spatial Standard-Deviation-Based Relative Scale Difference

2.8.5 Cascade Smoothing Error

2.9 Summary of the Characterization Results from the Theoretical Analysis and the Quantitative Performance Measures

3 Discrete Approximations of Gaussian Derivative Operators

3.1 Theoretical Properties of Gaussian Derivatives

3.2 Separable Gaussian Derivative Operators

3.2.1 Measures of the Spatial Extent of Gaussian Derivative or Derivative Approximation Kernels

3.3 Sampled Gaussian Derivative Kernels

3.4 Integrated Gaussian Derivative Kernels

3.5 Discrete Analogues of Gaussian Derivative Kernels

3.5.1 Cascade Smoothing Property

3.6 Numerical Correctness of the Derivative Estimates

3.7 Additional Performance Measures for Quantifying Deviations from Theoretical Properties of Discretizations of Gaussian Derivative Kernels

3.8 Numerical Quantification of Deviations from Theoretical Properties of Discretizations of Gaussian Derivative Kernels

3.8.1 \(l_1\)-Norms of Discrete Approximations of Gaussian Derivative Approximation Kernels

3.8.2 Spatial Spread Measures

3.8.3 Cascade Smoothing Errors

3.9 Summary of the Characterization Results from the Theoretical Analysis and the Quantitative Performance Measures

4 Application to Scale Selection from Local Extrema Over Scale of Scale-Normalized Derivatives

4.1 Scale-Normalized Derivative Operators

4.2 Scale Covariance Property of Scale-Normalized Derivative Responses

4.3 Scale Selection from Local Extrema Over Scales of Scale-Normalized Derivative Responses

4.3.1 Interest Point Detection

4.3.2 Edge Detection

4.3.3 Ridge Detection

4.4 Measures of Scale Selection Performance

4.5 Numerical Quantification of Deviations from Theoretical Properties Resulting from Different Discretizations of Scale-Normalized Derivatives

4.5.1 Experimental Methodology

4.5.2 Scale Selection with the Scale-Normalized Laplacian Applied to Gaussian Blobs

4.5.3 Scale Selection with the Scale-Normalized Determinant of the Hessian Applied to Gaussian Blobs

4.5.4 Scale Selection with the Scale-Normalized Gradient Magnitude Applied to Diffuse Step Edges

4.5.5 Scale Selection with the Second-Order Principal Curvature Measure Applied to Diffuse Ridges

4.6 Summary of the Evaluation on Scale Selection Experiments

5 Discrete Approximations of Directional Derivatives

5.1 Small-Support Directional Derivative Approximation Masks

5.2 Method for Defining Discrete Directional Derivative Approximation Masks

5.3 Scale-Space Properties of Directional Derivative Approximations Computed by Applying Small-Support Directional Derivative Approximation Masks to Smoothed Image Data

6 Summary and Conclusions

6.1 Extensions of the Approach

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Appendix

1.1 Explicit Expressions for Gaussian Derivative Kernels

1.2 Derivation of the Integrated Gaussian Kernel and the Integrated Gaussian Derivatives

1.3 \(L_1\)-Norms of Gaussian Derivative Kernels

1.4 Spatial Spread Measures for Gaussian Derivative Kernels

1.5 Diffusion Polynomials in the 1-D Continuous Case