Scale Selection Properties of Generalized ScaleSpace Interest Point Detectors
Authors
 First Online:
DOI: 10.1007/s1085101203783
 Cite this article as:
 Lindeberg, T. J Math Imaging Vis (2013) 46: 177. doi:10.1007/s1085101203783
Abstract
Scaleinvariant interest points have found several highly successful applications in computer vision, in particular for imagebased matching and recognition.

an enriched set of differential interest operators at a fixed scale including the Laplacian operator, the determinant of the Hessian, the new Hessian feature strength measures I and II and the rescaled level curve curvature operator, as well as

an enriched set of scale selection mechanisms including scale selection based on local extrema over scale, complementary postsmoothing after the computation of nonlinear differential invariants and scale selection based on weighted averaging of scale values along feature trajectories over scale.
A theoretical analysis of the sensitivity to affine image deformations is presented, and it is shown that the scale estimates obtained from the determinant of the Hessian operator are affine covariant for an anisotropic Gaussian blob model. Among the other purely secondorder operators, the Hessian feature strength measure I has the lowest sensitivity to nonuniform scaling transformations, followed by the Laplacian operator and the Hessian feature strength measure II. The predictions from this theoretical analysis agree with experimental results of the repeatability properties of the different interest point detectors under affine and perspective transformations of real image data. A number of less complete results are derived for the level curve curvature operator.
Keywords
Feature detection Interest point Blob detection Corner detection Scale Scalespace Scale selection Scale invariance Scale calibration Scale linking Feature trajectory Deep structure Affine transformation Differential invariant Gaussian derivative Multiscale representation Computer vision1 Introduction
The notion of scale selection is essential to adapt the scale of processing to local image structures. A computer vision system equipped with an automatic scale selection mechanism will have the ability to compute scaleinvariant image features and thereby handle the a priori unknown scale variations that may occur in image data because of objects and substructures of different physical size in the world as well as objects at different distances to the camera. Computing local image descriptors at integration scales proportional to the detection scales of scaleinvariant image features, moreover makes it possible to compute scaleinvariant image descriptors (Lindeberg [35]; Bretzner and Lindeberg [4]; Mikolajczyk and Schmid [49]; Lowe [48]; Bay et al. [2]; Lindeberg [38, 43]).
A general framework for performing scale selection can be obtained by detecting local extrema over scale of γnormalized derivative expressions (Lindeberg [35]). This approach has been applied to a large variety of feature detection tasks (Lindeberg [34]; Bretzner and Lindeberg [4]; Sato et al. [54]; Frangi et al. [11]; Krissian et al. [22]; Chomat et al. [5]; Hall et al. [15]; Mikolajczyk and Schmid [49]; Lazebnik et al. [24]; Negre et al. [52]; Tuytelaars and Mikolajczyk [58]). Specifically, highly successful applications can be found in imagebased recognition (Lowe [48]; Bay et al. [2]). Alternative approaches for scale selection have also been proposed in terms of the detection of peaks over scale in weighted entropy measures (Kadir and Brady [18]) or Lyapunov functionals (Sporring et al. [56]), minimization of normalized error measures over scale (Lindeberg [36]), determining minimum reliable scales for feature detection according a noise suppression model (Elder and Zucker [9]), determining optimal stopping times in nonlinear diffusionbased image restoration methods using similarity measurements relative to the original data (Mrázek and Navara [51]), by applying statistical classifiers for texture analysis at different scales (Kang et al. [19]) or by performing image segmentation from the scales at which a supervised classifier delivers class labels with the highest posterior (Loog et al. [47]; Li et al. [25]).

postsmoothing of differential feature responses by performing a secondstage scalespace smoothing step after the computation of nonlinear differential invariants, so as to simplify the task of linking feature responses over scale into feature trajectories, and

weighted scale selection where the scale estimates are computed by weighted averaging of scalenormalized feature responses along each feature trajectory over scale, in contrast to previous detection of local extrema or global extrema over scale.
 (i)
When using a set of different types of interest point detectors that are based on different linear or nonlinear combinations of scalespace derivatives, a basic question arises of how to relate thresholds on the magnitude values between different types of interest point detectors. By studying the responses of the different interest point detectors to unit contrast Gaussian blobs, we will derive a way of expressing mutually corresponding thresholds between different types of interest points detectors. Algorithmically, the resulting threshold relations lead to intuitively very reasonable results.
 (ii)
The new scale selection method based on weighted averaging along feature trajectories over scale raises questions of how the properties of this scale selection method can be related to the previous scale selection method based on local extrema over scale of scalenormalized derivatives. We will show that for Gaussian blobs, the scale estimates obtained by weighted averaging over scale will be similar to the scale estimates obtained from local extrema over scale. If we assume that scale calibration can be performed based on the behaviour for Gaussian blobs, this result therefore shows that no relative scale compensation is needed between the two types of scale selection approaches. In previous work on scale selection based on γnormalized derivatives [34, 35] a similar assumption of scale calibration based on Gaussian model signals has been demonstrated to lead to highly useful results for calibrating the value of the γparameter with respect to the problems of blob detection, corner detection, edge detection and ridge detection, with a large number of successful computer vision applications building on the resulting feature detectors.
 (iii)
For the scale linking algorithm presented in [39], which is based on local gradient ascent or gradient decent starting from local extrema in the differential responses at adjacent levels of scale, it turns out that a second postsmoothing stage after the computation of nonlinear differential invariants is highly useful for increasing the performance of the scale linking algorithm, by suppressing spurious responses of low relative amplitude in the nonlinear differential responses that are used for computing interest points. This selfsimilar amount of postsmoothing is determined as a constant times the local scale for computing the differential expressions, and may affect the scale estimates obtained from local extrema over scale or weighted averaging over scale. We will analyze how large this effect will be for different amounts of postsmoothing and also show how relative scale normalization factors can be determined for the different differential expressions to obtain scale estimates that are unbiased with respect to the effect of the postsmoothing operation, if we again assume that scale calibration can be performed based on the scale selection properties for Gaussian blobs. Notably, different scale compensation factors for the influence of postsmoothing will be obtained for the different differential expressions that are used for defining interest points. Without postsmoothing, the scale estimates obtained from the different differential expressions are, however, all similar for Gaussian blobs, which indicates the possibilities of using different types of differential expressions for performing combined interest point detection and scale selection, so that they can be interchangeably replaced in a modular fashion.
 (iv)
When detecting interest points from images that are taken of an object from different viewing directions, the local image pattern will be deformed by the perspective projection. If the interest point corresponds to a point in the world that is located at a smooth surface of an object, this deformation can to first order of approximation be modelled by a local affine transformation (Gårding and Lindeberg [12]). While the notion of affine shape adaptation has been demonstrated to be a highly useful tool for computing affine invariant interest points (Lindeberg and Gårding [46]; Baumberg [1]; Mikolajczyk and Schmid [49]; Tuytelaars and van Gool [57]), the success of such an affine shape adaptation process depends on the robustness of the underlying interest points that are used for initiating the iterative affine shape adaptation process. To investigate the properties of the different interest point detectors under affine transformations, we will perform a detailed analysis of the scale selection properties for affine Gaussian blobs, for which closed form theoretical analysis is possible. The analysis shows that the determinant of the Hessian operator and the new Hessian feature strength measure I do both have significantly better behaviour under affine transformations than the Laplacian operator or the new Hessian feature strength measure II. In comparison with experimental results [39], the interest point detectors that have the best theoretical properties under affine transformations of Gaussian blob do also have significantly better repeatability properties under affine and perspective transformations than the other two. These results therefore show how experimental properties of interest points can be predicted by theoretical analysis, which contributes to an increased understanding of the relative properties of different types of interest point detectors.
In very recent work [42], these generalized scalespace interest points have been integrated with local scaleinvariant image descriptors and been demonstrated to lead to highly competitive results for imagebased matching and recognition.
1.1 Outline of the Presentation
The paper is organized as follows. Section 2 reviews main components of a generalized framework for detecting scaleinvariant interest points from scalespace features, including a richer set of interest point detectors at a fixed scale as well as new scale selection mechanisms.
In Sect. 3 the scale selection properties of this framework are analyzed for scale selection based on local extrema over scale of γnormalized derivatives, when applied to rotationally symmetric as well as anisotropic Gaussian blob models. Section 4 gives a corresponding analysis for scale selection by weighted averaging over scale along feature trajectories.
Section 5 summarizes and compares the results obtained from the two scale selection approaches including complementary theoretical arguments to highlight their similarities in the rotationally symmetric case. It is also shown how scale calibration factors can be determined so as to obtain comparable scale estimates from interest point detectors that have been computed from different types of differential expressions. Comparisons are also presented of the relative sensitivity of the scale estimates to affine transformations outside the similarity group, with a brief comparison to experimental results. Finally, Sect. 6 concludes with an overall summary and discussion.
2 ScaleSpace Interest Points
2.1 ScaleSpace Representation
2.2 Differential Entities for Detecting ScaleSpace Interest Points
A common approach to image matching and object recognition consists of matching interest points with associated image descriptors. Basic requirements on the interest points on which the image matching is to be performed are that they should (i) have a clear, preferably mathematically wellfounded, definition, (ii) have a welldefined position in image space, (iii) have local image structures around the interest point that are rich in information content such that the interest points carry important information to later stages and (iv) be stable under local and global deformations of the image domain, including perspective image deformations and illumination variations such that the interest points can be reliably computed with a high degree of repeatability. The image descriptors computed at the interest points should also (v) be sufficiently distinct, such that interest points corresponding to physically different points can be kept separate.
Preferably, the interest points should also have an attribute of scale, to make it possible to compute reliable interest points from realworld image data, including scale changes in the image domain. Specifically, the interest points should preferably also be scaleinvariant to make it possible to match corresponding image patches under scale variations.
 (i)either of the following established differential operators [35]:

the Laplacian operator$$ \nabla^2 L = L_{xx} + L_{yy} $$(5)

the determinant of the Hessian$$ \det {\mathcal{H}} L = L_{xx} L_{yy}  L_{xy}^2 $$(6)

the rescaled level curve curvature$$ \tilde{\kappa}(L) = L_x^2 L_{yy} + L_y^2 L_{xx}  2 L_x L_y L_{xy} $$(7)

 (ii)either of the following new differential analogues and extensions of the Harris operator [16] proposed in [39]:where \(k \in ]0, \frac{1}{4}[\) with the preferred choice k≈0.04, or

the unsigned Hessian feature strength measure I$$ {\mathcal{D}}_1 L = \left \{ \begin{array}{l} \det {\mathcal{H}} L  k \, \operatorname {trace}^2 {\mathcal{H}} L\\ [3pt] \quad \mbox{if $\det {\mathcal{H}} L  k \, \operatorname {trace}^{2} {\mathcal{H}} L > 0$} \\[3pt] 0 \quad \mbox{otherwise} \end{array} \right . $$(8)

the signed Hessian feature strength measure I$$ \tilde{\mathcal{D}}_1 L = \left \{ \begin{array}{l} \det {\mathcal{H}} L  k \, \operatorname {trace}^2 {\mathcal{H}} L\\ [3pt] \quad \mbox{if $\det {\mathcal{H}} L  k \, \operatorname {trace}^{2} {\mathcal{H}} L > 0$} \\[3pt] \det {\mathcal{H}} L + k \, \operatorname {trace}^2 {\mathcal{H}} L\\[3pt] \quad \mbox{if $\det {\mathcal{H}} L + k \, \operatorname {trace}^{2} {\mathcal{H}} L < 0$} \\ 0 \quad \mbox{otherwise} \end{array} \right . $$(9)

 (iii)either of the following new differential analogues and extensions of the Shi and Tomasi operator [55] proposed in [39]:where L _{ pp } and L _{ qq } denote the eigenvalues of the Hessian matrix (the principal curvatures) ordered such that L _{ pp }≤L _{ qq } [34]:

the unsigned Hessian feature strength measure II$$ {\mathcal{D}}_2 L = \min ( \lambda_1, \lambda_2 ) = \min (L_{pp}, L_{qq} ) $$(10)

the signed Hessian feature strength measure II$$ \tilde{\mathcal{D}}_2 L = \left \{ \begin{array}{l@{\quad}l} L_{pp} & \mbox{if $L_{pp} < L_{qq}$} \\[3pt] L_{qq} & \mbox{if $L_{qq} < L_{pp}$} \\[3pt] (L_{pp} + L_{qq})/2 & \mbox{otherwise} \end{array} \right . $$(11)

A basic motivation for defining the new differential operators \({\mathcal{D}}_{1}\), \(\tilde{\mathcal{D}}_{1}\), \({\mathcal{D}}_{2}\) and \(\tilde{\mathcal{D}}_{2}\) from the Hessian matrix \({\mathcal{H}} L\) in a structurally related way as the Harris and the ShiandTomasi operators are defined from the secondmoment matrix (structure tensor) are that: (i) under an affine transformation p′=A p with p=(x,y)^{ T } and A denoting a nonsingular 2×2 matrix it can be shown that the Hessian matrix \({\mathcal{H}} f\) transforms in a similar way \(({\mathcal{H}} f')(p') = A^{T} \, ({\mathcal{H}} f)(p) \, A^{1}\) as the secondmoment matrix μ′(p)=A ^{−T } μ(p) A ^{−1} [31, 46] and (ii) provided that the Hessian matrix is either positive or negative definite, the Hessian matrix \({\mathcal{H}} L\) computed at a point p _{0} defines an either positive or negative definite quadratic form \(Q_{{\mathcal{H}} L}(p) = (p  p_{0})^{T} ({\mathcal{H}} L) (p  p_{0})\) in a similar way as the secondmoment matrix μ computed at p _{0} does: Q _{ μ }(p)=(p−p _{0})^{ T } μ (p−p _{0}). From these two analogies, we can conclude that provided the Hessian matrix is either positive or negative definite, these two types of descriptors should have strong qualitative similarities. Experimentally, the new differential interest point detectors \({\mathcal{D}}_{1}\), \(\tilde{\mathcal{D}}_{1}\), \({\mathcal{D}}_{2}\) and \(\tilde{\mathcal{D}}_{2}\) can be shown to perform very well and to allow for image features with better repeatability properties under affine and perspective transformations than the more traditional Laplacian or Harris operators [39].
Other ways of defining image features from the secondorder differential image structure of images have been proposed by Danielsson et al. [7] and Griffin [13].
2.3 Scale Selection Mechanisms
Scale Selection from γNormalized Derivatives
Furthermore, by performing simultaneous scale selection and spatial selection by detecting scalespace extrema, where the scalenormalized differential expression \({\mathcal{D}}_{\gamma\mathit{norm}} L\) assumes local extrema with respect to both space and scale, constitutes a general framework for detecting scaleinvariant interest points. Formally, such scalespace extrema are characterized by the firstorder derivatives with respect to space and scale being zeroIf some scalenormalized differential invariant \({\mathcal{D}}_{\gamma\mathit{norm}} L\) assumes a local extremum over scale at scale t _{0} in scalespace, then under a uniform rescaling of the input pattern by a factor s there will be a local extremum over scale in the scalespace of the transformed signal at scale s ^{2} t _{0}.
Generalized Scale Selection Mechanisms

by performing postsmoothing of the differential expression \({\mathcal{D}}_{\gamma\mathit{norm}} L\) prior to the detection of local extrema over space or scale with an integration scale (postsmoothing scale) t _{ post }=c ^{2} t proportional to the differentiation scale t with c>0 (see Appendix A.1 for a brief description of the algorithmic motivations for using such a postsmoothing operation when linking image features over scale that have been computed from nonlinear differential entities) and

by performing weighted averaging of scale values along any feature trajectory T over scale in a scalespace primal sketch according towhere ψ denotes some (positive and monotonically increasing) transformation of the scalenormalized feature strength response \({\mathcal{D}}_{\gamma\mathit{norm}} L\) and with the scale parameter parameterized in terms of effective scale [28]$$ \hat{\tau}_T = \frac{\int_{\tau \in T} \tau \, \psi(({\mathcal{D}}_{\gamma\mathit{norm}} L)(x(\tau);\; \tau)) \, d\tau}{ \int_{\tau \in T} \psi(({\mathcal{D}}_{\gamma\mathit{norm}} L)(x(\tau);\; \tau)) \, d\tau} $$(19)to obtain a scale covariant construction of the corresponding scale estimates$$ \tau = A \log t + B \quad \mbox{where }A \in \mathbb {R}_+\ \mbox{and}\ B \in \mathbb {R}$$(20)that implies that the resulting image features will be scaleinvariant.$$ \hat{t}_T = \exp \biggl( \frac{\hat{\tau}_T  B}{A} \biggr) $$(21)
Experimentally, it can be shown that scalespace interest points detected by these generalized scale selection mechanisms lead to interest points with better repeatability properties under affine and perspective image deformations compared to corresponding interest points detected by regular scalespace extrema [39]. In this sense, these generalized scale selection mechanisms make it possible to detect more robust image features. Specifically, the use of scale selection by weighted averaging over scale is made possible by linking image features over scale into feature trajectories,^{2} which ensures that the scale estimates should only be influenced by responses from scale levels that correspond to qualitatively similar types of image structures along a feature trajectory over scale.
The subject of this article is to analyze properties of these generalized scale selection mechanisms theoretically when applied to the interest point detectors listed in Sect. 2.2.
3 Scale Selection Properties for Local Extrema over Scale
For theoretical analysis, we will consider a Gaussian prototype model of bloblike image structures. With such a prototype model, the semigroup property of the Gaussian kernel makes it possible to directly obtain the scalespace representations at coarser scales in terms of Gaussian functions, which simplifies theoretical analysis. Specifically, the result of computing polynomial differential invariants at different scales will be expressed in terms of Gaussian functions multiplied by polynomials. Thereby, closedform theoretical analysis becomes tractable, which would otherwise be much harder to carry out regarding the application of the nonlinear operations that are used for defining the interest points to general image data.
The use of Gaussian prototype model can also be motivated by conceptual simplicity. If we would like to model an image feature at some scale, then the Gaussian model is the model that requires the minimum amount of information in the sense that the Gaussian distribution is the distribution with maximum entropy ^{3} given a specification of the mean value m and the covariance matrix Σ of the distribution. Specifically, the Gaussian function with scale parameter t serves as an aperture function that measures image structures with respect to an inner scale beyond which finerscale structures cannot be resolved.
In previous work [34, 35] it has been shown that determination of the γparameter in scale selection for different types of feature detection tasks, such as blob detection, corner detection, edge detection and ridge detection, can be performed based on the behaviour of these feature detectors on Gaussianbased intensity profiles. As will be shown later, the theoretical results that will be derived based on Gaussian blob models will lead to theoretical predictions that agree with the relative repeatability properties of different types of interest point detectors under affine and perspective transformations. Formally, however, further application of these results will be based on an assumption that the scale selection behaviour can be calibrated based on the behaviour for Gaussian prototype models.
3.1 Regular Scale Selection from Local Extrema over Scale

How will the selected scale levels be related between different interest point detectors?

How will the scalenormalized magnitude values be related between different interest point detectors that respond to similar image structures?
3.1.1 The Pure SecondOrder Interest Point Detectors
3.1.2 Scale Invariant Feature Responses After Contrast Normalization
Relationships between scalenormalized thresholds \(C_{{\mathcal{D}} L}\) for different types of scaleinvariant interest point detectors \({\mathcal{D}} L = \nabla^{2} L\), \(\det {\mathcal{H}} L\), \({\mathcal{D}}_{1} L\), \(\tilde{\mathcal{D}}_{1} L\), \({\mathcal{D}}_{2} L\) and \(\tilde{\mathcal{D}}_{2} L\) using scalenormalized derivatives with γ=1. The complementary expression for the HarrisLaplace operator is based on the assumption of a relative integration scale of r=1
Feature detector 
\({\mathcal{D}} L\) 
\(C_{{\mathcal{D}} L}\) 

Laplacian 
∇^{2} L _{ norm }=t(L _{ xx }+L _{ yy }) 
\(C_{\nabla^{2} L} = C\) 
determinant of the Hessian 
\(\det {\mathcal{H}}_{\mathit{norm}} L = t^{2} (L_{xx} L_{yy}  L_{xy}^{2})\) 
\(C_{\det {\mathcal{H}} L} = C^{2}/4\) 
Hessian feature strength I 
\({\mathcal{D}}_{1,\mathit{norm}} L = t^{2} (L_{xx} L_{yy}  L_{xy}^{2}  k \, (L_{xx} + L_{yy})^{2})\) 
\(C_{{\mathcal{D}}_{1} L} = (14k)\, C^{2}/4\) 
Hessian feature strength Ĩ 
\(\tilde{\mathcal{D}}_{1,\mathit{norm}} L = t^{2} (L_{xx} L_{yy}  L_{xy}^{2} \pm k \, (L_{xx} + L_{yy})^{2})\) 
\(C_{\tilde{\mathcal{D}}_{1} L} = (14k) \, C^{2}/4\) 
Hessian feature strength II 
\({\mathcal{D}}_{2,\mathit{norm}} = t \, \min(L_{pp}, L_{qq})\) 
\(C_{{\mathcal{D}}_{2} L} = C/2\) 
Hessian feature strength \(\tilde{\mbox{II}}\) 
\(\tilde{\mathcal{D}}_{2,\mathit{norm}} L = t (L_{pp} \; \mbox{or} \; L_{qq})\) 
\(C_{\tilde{\mathcal{D}}_{2} L} = C/2\) 
HarrisLaplace 
\(H_{\mathit{norm}} =t^{2} \, (\det \mu  k \, \operatorname {trace}^{2} \mu)\) 
C _{ H }=(1−4k) C ^{4}/256 
Note:
3.1.3 The Rescaled Level Curve Curvature Operator
3.2 Scale Selection with Complementary Postsmoothing
When linking image features at different scales into feature trajectories, the use of postsmoothing of any differential expression \({\mathcal{D}}_{\mathit{norm}} L\) according to (18) was proposed in [39] to simplify the task for the scale linking algorithm, by suppressing small local perturbations in the responses of the differential feature detectors at any single scale. Since this complementary postsmoothing operation will affect the magnitude values of the scalenormalized differential responses that are used in the different interest point detectors, one may ask how large effect this operation will have on the resulting scale estimates.
In this section, we shall analyze the influence of the postsmoothing operation for scale selection based on local extrema over scale of scalenormalized derivatives.
3.2.1 The Laplacian and the Determinant of the Hessian Operators
3.2.2 The Hessian Feature Strength Measure I
If we restrict ourselves to the analysis of a single isolated Gaussian blob, a similar approximation holds for the signed Hessian feature strength measure \(\tilde{\mathcal{D}}_{1,\gamma\mathit{norm}} L\).
3.2.3 The Hessian Feature Strength Measure II
If we restrict ourselves to the analysis of a single isolated Gaussian blob, a similar approximation holds for the signed Hessian feature strength measure \(\tilde{\mathcal{D}}_{2,\gamma\mathit{norm}} L\).
3.2.4 The Rescaled Level Curve Curvature Operator
3.3 Influence of Affine Image Deformations
Note on Relation to Influence Under General Affine Transformations
3.3.1 The Laplacian operator
3.3.2 The Determinant of the Hessian
3.3.3 The Hessian Feature Strength Measure I
3.3.4 The Hessian Feature Strength Measure II
3.3.5 The Rescaled Level Curve Curvature Operator
4 Scale Selection by Weighted Averaging Along Feature Trajectories
4.1 The Pure SecondOrder Interest Point Detectors
Since these scale estimates are similar to the scale estimates obtained form local extrema over scale, it follows that the scalenormalized magnitude values will also be similar and the relationships between scalenormalized thresholds described in Table 1 will also hold for scale selection based on weighted averaging over scale.
Corresponding Scale Estimates for General Values of γ
4.2 Influence of the Postsmoothing Operation
4.2.1 The Laplacian and the Determinant of the Hessian Operators
4.2.2 The Hessian Feature Strength Measure I
4.2.3 The Hessian Feature Strength Measure II
4.3 Influence of Affine Image Deformations
To analyze how the scale estimates \(\hat{t}\) obtained by weighted averaging along feature trajectories are affected by affine image deformations, let us again consider an anisotropic Gaussian blob (76) as a prototype model of a rotationally symmetric Gaussian blob that has been subjected to an affine image deformation and with its scalespace representation according to (78).
4.3.1 The Laplacian Operator
4.3.2 The Determinant of the Hessian
4.3.3 The Hessian Feature Strength Measure I
Specifically, a comparison with the corresponding expression for the Laplacian operator (140) shows that scale selection based on the Hessian feature strength measure I is less sensitive to affine image deformations compared to scale selection based on the Laplacian.
4.3.4 The Hessian Feature Strength Measure II
Again, the scale estimates for scale selection based on the Hessian feature strength measure II are more affected by affine image deformations compared to the scale estimates obtained by the determinant of the Hessian, the Hessian feature strength measure I or the Laplacian.
5 Relations Between the Scale Selection Methods
5.1 Rotationally Symmetric Gaussian Blob
From the above mentioned results, we can first note that for the specific case of a rotationally symmetric Gaussian blob, the scale estimates obtained from local extrema over scale vs. weighted averaging over scale are very similar.
Exact scale estimates obtained from local extrema over scale vs. weighted averaging over scale for the Laplacian and determinant operators applied to a rotationally symmetric Gaussian blob with scale parameter t _{0} and for a general amount of postsmoothing as determined by the postsmoothing parameter c
Operator 
Extrema over scale 
Weighted averaging 

\(\nabla^{2}_{\mathit{norm}} L\) 
t _{0}/(1+c ^{2}) 
t _{0}/(1+c ^{2}) 
\(\det {\mathcal{H}}_{\mathit{norm}}\) 
\(t_{0}/\sqrt{1 + 2 c^{2}}\) 
\(t_{0}/\sqrt{1 + 2 c^{2}}\) 
Approximate scale estimates obtained from local extrema over scale vs. weighted averaging over scale for the Hessian feature strength measures I and II applied to a rotationally symmetric Gaussian blob with scale parameter t _{0} and for a specific amount of postsmoothing with c=1/2
Operator 
Extrema over scale 
Weighted averaging 

\({\mathcal{D}}_{1,\mathit{norm}} L\) 
≈0.813 t _{0} 
≈0.813 t _{0} 
\({\mathcal{D}}_{2,\mathit{norm}} L\) 
≈0.699 t _{0} 
≈0.694 t _{0} 
5.1.1 Theoretical Symmetry Properties Between the Scale Estimates
5.1.2 Calibration Factors for Setting ScaleInvariant Integration Scales
Calibration factors \(A_{{\mathcal{D}}_{L}}\) to obtain compensated scale estimates \(\hat{t}_{{\mathcal{D}}_{L},comp} = \hat{t}_{{\mathcal{D}}_{L}}/A_{{\mathcal{D}}_{L}}\) that lead to \(\hat{t}_{{\mathcal{D}}_{L},comp} = t_{0}\) for a rotationally symmetric Gaussian blob irrespective of the interest point operator \({\mathcal{D}} L\) or the postsmoothing parameter c
Operator 
Calibration factor \(A_{{\mathcal{D}} L}\) 

\(\nabla^{2}_{\mathit{norm}} L\) 
1/(1+c ^{2}) 
\(\det {\mathcal{H}}_{\mathit{norm}}\) 
\(1/\sqrt{1 + 2 c^{2}}\) 
\({\mathcal{D}}_{1,\mathit{norm}}\) 
\(\approx e^{\theta_{{\mathcal{D}}_{1} L}}\) with \(\theta_{{\mathcal{D}}_{1} L}\) according to (130) 
\({\mathcal{D}}_{2,\mathit{norm}}\) 
\(\approx e^{\theta_{{\mathcal{D}}_{2} L}}\) with \(\theta_{{\mathcal{D}}_{2} L}\) according to (135) 
5.2 Anisotropic Gaussian Blob
5.2.1 Taylor Expansions for Nonuniform Scaling Factors Near s=1
From the analysis of the scale selection properties of an anisotropic Gaussian blob with scale parameters t _{1} and t _{2} in Sect. 3.3 and Sect. 4.2, we found that scale selection based on local extrema over scale or weighted scale selection lead to a similar and affine covariant scale estimate \(\sqrt{t_{1} t_{2}}\) for the determinant of the Hessian operator \(\det {\mathcal{H}}_{\mathit{norm}} L\).
For the Laplacian \(\nabla_{\mathit{norm}}^{2} L\) and the Hessian feature strength measures \({\mathcal{D}}_{1,\mathit{norm}} L\) and \({\mathcal{D}}_{2,\mathit{norm}} L\), the scale estimates are, however, not affine covariant. Moreover, the two scale selection methods may lead to different results. When performing a Taylor expansion of the scale estimate parameterized in terms of a nonuniform scaling factor s relative to a baseline scale t _{0}, the Taylor expansions around s=1 did, however, agree in their lowest order terms. In this sense, the two scale selection approaches have approximately similar properties for the Gaussian blob model for affine image deformations near the similarity group.
Taylor expansions for the scale estimates obtained for an anisotropic Gaussian blob with scale parameters t _{1}=s t _{0} and t _{2}=t _{0}/s around s=1 (assuming s>1 for the \({\mathcal{D}}_{2,\mathit{norm}} L\) operator). The table shows the terms in the Taylor expansion that are common for scale selection based on local extrema over scale and scale selection based on weighted averaging over scale
Operator 
Common terms in series expansion of scale estimate 

\(\nabla^{2}_{\mathit{norm}} L\) 
\((1  \frac{1}{4} (s 1)^{2} + \frac{1}{4} (s 1)^{3} + {\mathcal{O}}((s 1)^{4})) t_{0}\) 
\(\det {\mathcal{H}}_{\mathit{norm}} L\) 
t _{0} 
\({\mathcal{D}}_{1,\mathit{norm}} L\) 
\((1 + \frac{1}{21} \, (s 1)^{2}  \frac{1}{21} \, (s1)^{3} +{\mathcal{O}}((s 1)^{4})) t_{0}\) 
\({\mathcal{D}}_{2,\mathit{norm}} L\) 
\((1+\frac{1}{2}(s1) \frac{1}{8} (s1)^{2} +{\mathcal{O}}((s1)^{3}) ) t_{0}\) 
5.2.2 Graphs of Nonuniform Scaling Dependencies for General s≥1
For the determinant of the Hessian \(\det {\mathcal{H}}_{\mathit{norm}} L\), the scale estimate coincides with the geometric average of the scale parameters for any nonsingular amount of nonuniform scaling. For the Laplacian operator \(\nabla_{\mathit{norm}}^{2} L\), the scale estimate \(\hat{t}_{\nabla^{2} L}\) is lower than the geometric average of the scale parameters in the two directions, whereas the scale estimates are higher than the geometric average for the Hessian feature strength measures \({\mathcal{D}}_{1,\mathit{norm}} L\) and \({\mathcal{D}}_{2,\mathit{norm}} L\). For moderate values of s∈[1,4], the scale estimates from the Hessian feature strength measure \({\mathcal{D}}_{1,\mathit{norm}} L\), are quite close to the affine covariant geometric average. For the Hessian feature strength measure \({\mathcal{D}}_{2,\mathit{norm}} L\) on the other hand, the scale estimate increases approximately linearly with the nonuniform scaling factor s.
These graphs also show that the qualitative behaviour derived for Taylor expansions near s=1 (Table 5) extend to noninfinitesimal scaling factors up to at least a factor of four.
5.3 Comparison with Experimental Repeatability Properties
In this section, we shall compare the above mentioned theoretical results with experimental results of the repeatability properties of the different interest point detectors under affine image transformations.
5.3.1 Experimental Methodology

a pure scaling U(s) with scaling factor s=2,

a pure rotation R(φ) with rotation angle φ=π/4, and

nonuniform scalings N(s) with scaling factors \(s = \sqrt[4]{2}\) and \(s = \sqrt{2}\), respectively, which are repeated and averaged over four different orientations respectivelywith relative orientations of φ _{0}=0, π/4, π/2 and 3π/4.$$ N_{\varphi_0}(s) = R(\varphi_0) \, N(s) \, R( \varphi_0)^{1} $$(168)
For each one of the resulting 14×(1+10)=154 images, the 400 most significant interest points were detected. For interest points detected based on scalespace extrema, the image features were ranked on the scalenormalized response of the differential operator at the scalespace extremum. For interest points detected by scale linking, the image features were ranked on a significance measure obtained by integrating the scalenormalized responses of the differential operator along each feature trajectory, using the methodology described in [39].
The evaluation of the matching score was only performed for image features that are within the image domain for both images before and after the transformation. Moreover, only features within corresponding scale ranges were evaluated. In other words, if the scale range for the image f before the affine transformation was [t _{ min },t _{ max }], then image features were searched for in the transformed image f′ within the scale range \([t'_{\mathit{min}}, t'_{\mathit{max}}] = [(\det A) \, t_{\mathit{min}}, (\det A) \, t_{\mathit{max}}]\). In addition, features in a narrow scaledependent frame near the image boundaries were suppressed, to avoid boundary effects from influencing the results. In these experiments, we used t _{ min }=4 and t _{ max }=256.
5.3.2 Relations Between Experimental Results and Theoretical Results
Relative ranking of 10 scaleinvariant interest point detectors based on scale selection from scalespace extrema with regard to their repeatability scores under a set of 10 different affine image deformations applied to each one of the 14 images in the image dataset illustrated in Fig. 5 and the extraction of the 400 most significant interest points from each image
Scale selection from local extrema over scale  

Feature detector 
Type 
Complementary 
p (400) 
\(\tilde{\kappa}_{\gamma\mathit{norm}}(L)\) 
extr 
– 
0.876 
\({\mathcal{D}}_{1,\mathit{norm}} L\) 
extr 
– 
0.868 
\(\det {\mathcal{H}}_{\mathit{norm}} L\) 
extr 
\({\mathcal{D}}_{1} L > 0\) 
0.867 
\(\det {\mathcal{H}}_{\mathit{norm}} L\) 
extr 
\(\tilde{\mathcal{D}}_{1} L > 0\) 
0.852 
\(\tilde{\mathcal{D}}_{1,\mathit{norm}} L\) 
extr 
– 
0.849 
\(\nabla^{2}_{\mathit{norm}} L\) 
extr 
– 
0.844 
\(\tilde{\mathcal{D}}_{2,\mathit{norm}} L\) 
extr 
\({\mathcal{D}}_{1} L > 0\) 
0.842 
\({\mathcal{D}}_{2,\mathit{norm}} L\) 
extr 
\({\mathcal{D}}_{1} L > 0\) 
0.841 
\(\nabla^{2}_{\mathit{norm}} L\) 
extr 
\({\mathcal{D}}_{1} L > 0\) 
0.839 
HarrisLaplace 
extr 
– 
0.781 
Relative ranking of 10 scaleinvariant interest point detectors based on scale selection by scale linking and weighted averaging over scale with regard to their repeatability scores under a set of 10 different affine image deformations applied to each one of the 14 images in the image dataset illustrated in Fig. 5 and the extraction of the 400 most significant interest points from each image
Scale selection by weighted averaging over scale  

Feature detector 
Type 
Complementary 
p (400) 
\({\mathcal{D}}_{1,\mathit{norm}} L\) 
linkw 
– 
0.887 
\(\det {\mathcal{H}}_{\mathit{norm}} L\) 
linkw 
\({\mathcal{D}}_{1} L > 0\) 
0.886 
\(\tilde{\mathcal{D}}_{2,\mathit{norm}} L\) 
linkw 
\({\mathcal{D}}_{1} L > 0\) 
0.880 
\(\det {\mathcal{H}}_{\mathit{norm}} L\) 
linkw 
\(\tilde{\mathcal{D}}_{1} L > 0\) 
0.878 
\(\tilde{\kappa}_{\gamma\mathit{norm}}(L)\) 
linkw 
– 
0.873 
\(\det {\mathcal{H}}_{\mathit{norm}} L\) 
linkw 
– 
0.871 
\(\tilde{\mathcal{D}}_{1,\mathit{norm}} L\) 
linkw 
– 
0.866 
\({\mathcal{D}}_{2,\mathit{norm}} L\) 
linkw 
\({\mathcal{D}}_{1} L > 0\) 
0.858 
\(\nabla^{2}_{\mathit{norm}} L\) 
linkw 
\({\mathcal{D}}_{1} L > 0\) 
0.856 
HarrisLaplace 
linkw 
– 
0.855 
As can be seen from Table 6, the best repeatability properties for the interest point detectors based on scale selection from local extrema over scale are obtained for (i) the rescaled level curve curvature \(\tilde{\kappa}_{\gamma\mathit{norm}}(L)\), (ii) the Hessian feature strength measure \({\mathcal{D}}_{1,\mathit{norm}} L\) and (iii) the determinant of the Hessian \(\det {\mathcal{H}}_{\mathit{norm}} L\).
From Table 7, we can see that the best repeatability properties for the interest point detectors based on scale selection using scale linking and weighted averaging over scale are obtained for (i) the Hessian feature strength measure \({\mathcal{D}}_{1,\mathit{norm}} L\), (ii) the determinant of the Hessian \(\det {\mathcal{H}}_{\mathit{norm}} L\) and (iii) the Hessian feature strength measure \(\tilde{\mathcal{D}}_{2,\mathit{norm}} L\).
The repeatability scores are furthermore generally better for scale selection based on weighted averaging over scale compared to scale selection based on local extrema over scale.
In comparison with our theoretical analysis, we have previously shown that the response of the determinant of the Hessian \(\det {\mathcal{H}}_{\mathit{norm}} L\) to an affine Gaussian blob is affine covariant, for both scale selection based on local extrema over scale (97) and scale selection based on scale linking and weighted averaging over scale (143). For the Hessian feature strength measure \({\mathcal{D}}_{1,\mathit{norm}} L\), a major contribution to this differential expression comes from the affine covariant determinant of the Hessian \(\det {\mathcal{H}}_{\mathit{norm}} L\), and the deviations from affine covariance are small for both scale selection based on local extrema over scale (100) and scale selection by weighted averaging over scale (148), provided that the nonuniform image deformations are not too far from the similarity group in the sense that the nonuniform scaling factor s used in the Taylor expansions is not too far from 1. Specifically, the two interest point detectors that have the best theoretical properties under affine image deformations in the sense of having the smallest correction terms in Table 5 are also among the top three interest point detectors for both scale selection based on local extrema over scale and scale selection based on scale linking and weighted averaging over scale. In this respect, the predictions from our theoretical analysis are in very good agreement with the experimental results.
Somewhat more surprisingly the signed Hessian feature strength measure \(\tilde{\mathcal{D}}_{2,\mathit{norm}} L\) performs very well when combined with scale selection based on weighted averaging over scale. The corresponding unsigned entity \({\mathcal{D}}_{2,\mathit{norm}} L\) does not perform as well, and more comparable to the Laplacian operator \(\nabla_{\mathit{norm}}^{2} L\). A possible explanation for this is that keeping the signs of the principal curvatures in the nonlinear minimum operation improves the ability of this operator to distinguish between nearby competing image structures, a property that is not captured by the analysis of isolated Gaussian blobs. The repeatability properties of the unsigned version \({\mathcal{D}}_{2,\mathit{norm}} L\) are therefore in closer agreement with the presented analysis.
The rescaled level curve curvature \(\tilde{\kappa}_{\gamma\mathit{norm}}(L)\) performs comparably very well for scale selection based on local extrema over scale, whereas it does not perform as well for scale selection based on scale linking and weighted averaging over scale. For scale selection based on local extrema over scale, our analysis showed that the deviation from affine covariance is comparably low (111) for the value of γ=7/8 that we used in our experiments. For this scale selection method, the experimental results are therefore in agreement with our theoretical results. Contrary to the other interest point detectors, the repeatability properties of the rescaled level curve curvature operator \(\tilde{\kappa}_{\gamma\mathit{norm}}(L)\) are, however, not improved by scale linking. A possible algorithmic explanation to this could be that the rescaled level curve curvature operator \(\tilde{\kappa}_{\gamma\mathit{norm}}(L)\) contains a different type of nonlinearity that may cause difficulties for the scale linking algorithm. Calculating closedform expressions for the scale estimates obtained by weighed averaging over scale does also seem harder for this operator. We therefore leave it as an open problem to investigate if also this interest point detector could be improved by scale linking and scale selection from weighted averaging of possibly transformed magnitude values along the corresponding feature trajectories.
Experimental results in [39] show that the Hessian feature strength measure \({\mathcal{D}}_{1,\mathit{norm}} L\) and the determinant of the Hessian \(\det {\mathcal{H}}_{\mathit{norm}} L\) and are also the two interest point detectors that give the best repeatability properties under real (calibrated) perspective image transformations. Thus, the two best interest point detectors according to our theoretical analysis are also the interest point detectors that have the best properties for real image data.
6 Summary and Discussion
We have analyzed the scale selection properties of (i) the Laplacian operator \(\nabla_{\mathit{norm}}^{2} L\), (ii) the determinant of the Hessian \(\det {\mathcal{H}}_{\mathit{norm}} L\), (iii)–(iv) the new Hessian feature strength measures \({\mathcal{D}}_{1,\mathit{norm}} L\) and \({\mathcal{D}}_{2,\mathit{norm}} L\) and (iv) the rescaled level curve curvature operator \(\tilde{\kappa}_{\gamma\mathit{norm}}(L)\) when applied to a Gaussian prototype blob model and using scale selection from either (vi) local extrema over scale of scalenormalized derivatives or (vii) weighted averaging of scale values along feature trajectories over scale. We have also analyzed (viii) the influence of a secondary postsmoothing step after the computation of possibly nonlinear differential invariants and (ix) the sensitivity of the scale estimates to affine image deformations.
The analysis shows that the scale estimates from the determinant of the Hessian \(\det {\mathcal{H}}_{\mathit{norm}} L\) are affine covariant for the Gaussian blob model for both scale selection based on local extrema over scale and scale selection by weighted averaging over scale. The analysis also shows that the scale estimates from the Laplacian operator \(\nabla_{\mathit{norm}}^{2} L\) and the Hessian feature strength measures \({\mathcal{D}}_{1,\mathit{norm}} L\) and \({\mathcal{D}}_{2,\mathit{norm}} L\) are not affine covariant. Out of the latter three operators, the Hessian feature strength measure \({\mathcal{D}}_{1,\mathit{norm}} L\) has the lowest sensitivity to affine image deformations outside the similarity group, whereas the Hessian feature strength measure \({\mathcal{D}}_{2,\mathit{norm}} L\) has the highest sensitivity. The stronger scale dependency of the Hessian feature strength measure \({\mathcal{D}}_{2,\mathit{norm}} L\) can be understood from the fact that it responds to the eigenvalue of the Hessian matrix corresponding to the slowest spatial variations.
Experimental results reported in Sect. 5.3 and [39], show that the interest point detectors based on the new Hessian feature strength measure \({\mathcal{D}}_{1,\mathit{norm}} L\) and the determinant of the Hessian \(\det {\mathcal{H}}_{\mathit{norm}} L\) have significantly better repeatability properties under affine or perspective image transformations than the Laplacian ∇_{ norm } L or the Hessian feature strength measure \({\mathcal{D}}_{2,\mathit{norm}} L\). Corresponding advantages hold relative to the differenceofGaussians (DoG) approximation of the Laplacian operator or the HarrisLaplace operator. Hence, the interest point detectors that have the best theoretical properties under affine deformations of Gaussian blobs do also have the best experimental properties. In this respect, the predictions from this theoretical analysis agree with corresponding experimental results.
When considering scale selection for a rotationally symmetric Gaussian blob, it is shown that the scale estimates obtained by scale selection from local extrema over scale vs. weighted averaging over scale do for γ=1 (in the 2D case) lead to similar results for each one of these four operators. This similarity can be explained from a symmetry property of the scalespace signature under inversion transformations of the scale parameter, which correspond to reflections along the scale axis after a logarithmic transformation of the scale parameter in terms of effective scale. Because of this similarity between the scale estimates obtained from the two types of scale selection approaches, we may conclude that no additional scale compensation or scale calibration is needed between scale estimates that are obtained from weighted averaging over scale vs. local extrema over scale (provided that γ=1).
Since the commonly used differenceofGaussians operator can be seen as a discrete approximation of the Laplacian operator [41], the analysis of the scale selection properties for the Laplacian operator also provides a theoretical model for analyzing the scale selection properties of the differenceofGaussian keypoint detector used in the SIFT operator [48]. The above mentioned results concerning the scale selection properties of the Laplacian operator \(\nabla^{2}_{\mathit{norm}} L\) do also extend to the HarrisLaplace operator [49] for which the spatial selection is performed based on spatial extrema of the Harris measure H, whereas the scale selection properties are solely determined by the scale selection properties of the Laplacian \(\nabla^{2}_{\mathit{norm}} L\). Incorporating the scale selection properties of the determinant of the Hessian \(\det {\mathcal{H}}_{\mathit{norm}} L\), the results do also extend to the HarrisdetHessian, detmuLaplace and detmudetHessian operators proposed in [39] as well as other possible types of hybrid approaches.
For scale estimates that are computed algorithmically from realworld images in an actual implementation, the robustness of image features that are obtained by scale selection from local extrema over scale or weighted scale selection over scale may, however, differ substantially. Experimental results reported in Sect. 5.3 and [39] show that weighted scale selection leads to interest points that have significantly better repeatability properties under perspective image deformations compared to interest points computed with scale selection from local extrema over scale. Theoretically, we have also seen that in several cases, weighted scale selection makes it easier to derive closedform expressions for the scale estimate than for scale selection based on local extrema over scale. In these respects, scale selection by weighted averaging over scale can have both practical and theoretical advantages.
When making use of a complementary postsmoothing operation to suppress spurious variations in the nonlinear feature responses from the interest point detectors to simplify the task of scale linking, the influence of this postsmoothing operation on the scale estimates may, however, be different for different interest point detectors. If we assume that scale calibration can be performed based on the scale selection properties for Gaussian blobs, we have derived a set of relative calibration or compensation factors for each one of the five main types of interest point detectors studied in this paper.
To conclude, the analysis presented in this paper provides a theoretical basis for a defining a richer repertoire of mechanisms for computing scaleinvariant image features and image descriptors for a wide range of possible applications in computer vision. In very recent work [42], these generalized scalespace interest points have been integrated with local scaleinvariant image descriptors and been demonstrated to lead to highly competitive results for imagebased matching and recognition.
As outlined in Appendix A.2, these interest point detectors and the analysis of these can be extended to higherdimensional image data in a rather straightforward manner.
Indeed, it can be shown that the definition of scalenormalized derivatives in this way captures the full degrees of freedom by which scale invariance can be obtained from local extrema over scale of scalenormalized derivatives defined from a Gaussian scalespace, as formally proved by necessity in [35, Appendix A.1].
By linking image features over scale into feature trajectories it also becomes possible to define a significance value by integrating scalenormalized feature responses over scale. Experimentally, it can be shown that such ranking of image features leads to selections of subsets of interest points with better overall repeatability properties than selection of subsets of interest points from the extremum responses of interest points detectors at scalespace extrema. An intuitive motivation for this property is a heuristic principle that image features that are stable over large ranges of scales should be more likely to be significant than image features that only exist over a shorter life length in scalespace [27, Assumption 1 in Sect. 3 on p. 296].
Maximum entropy solutions have been argued to be taken as preferred default solutions for underconstrained problems [3, 59] although the applicability of these arguments has also been questioned [6, 8].
This approximation may be reasonable for small values of c for which the major contribution of the postsmoothing integration originates from values of \({\mathcal{D}}_{2} L\) near the interest point.
In this section we will in many cases restrict the analysis to the specific case of γ=1, since some of the results become significantly more complex for a general value of γ≠1. In a few cases where the corresponding results become reasonably compact, we will, however, include them.
A plausible explanation why the difference between the scale estimated is smaller for the determinant of the Hessian \(\det {\mathcal{H}} L\) and the Hessian feature strength measure \({\mathcal{D}}_{1,\mathit{norm}} L\) compared to difference in scale estimates for the Laplacian \(\nabla_{\mathit{norm}}^{2} L\) and the Hessian feature strength measure \({\mathcal{D}}_{2,\mathit{norm}} L\) is that secondorder derivative responses are squared for the determinant of the Hessian \(\det {\mathcal{H}} L\) and the Hessian feature strength measure \({\mathcal{D}}_{1,\mathit{norm}} L\), whereas the Laplacian \(\nabla_{\mathit{norm}}^{2} L\) and the Hessian feature strength measure \({\mathcal{D}}_{2,\mathit{norm}} L\) operators depend on the secondorder derivative responses in a linear way.
Thereby, the integrals that define the weighted scale selection estimates will get a comparably higher relative contribution from scale levels near the maximum over scale, which in turn implies that the influence due to skewness in the scalespace signature caused by values of γ≠1 will be lower (compare with Sect. 5.1.1). By varying the power a in the selfsimilar transformation function (114), it is more generally possible to modulate this effect.
The motivation for multiplying the Gaussian curvature by a power of the gradient magnitude in (175) is that the resulting operator should assume high values when the gradient magnitude and the Gaussian curvature are simultaneously high. More generally, also other powers of the gradient magnitude could be considered (204). The current power of four is chosen because it leads to the simplest calculations, in analogy with the multiplication by the gradient magnitude raised to the power of three for the 2D rescaled level curve curvature operator (7).
Acknowledgements
I would like to thank the anonymous reviewers for valuable comments and questions that improved the presentation and Oskar Linde for valuable comments on an early version of the manuscript.
Open Access
This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.