Scale Selection Properties of Generalized Scale-Space Interest Point Detectors

Scale-invariant interest points have found several highly successful applications in computer vision, in particular for image-based matching and recognition. This paper presents a theoretical analysis of the scale selection properties of a generalized framework for detecting interest points from scale-space features presented in Lindeberg (Int. J. Comput. Vis. 2010, under revision) and comprising: an enriched set of differential interest operators at a fixed scale including the Laplacian operator, the determinant of the Hessian, the new Hessian feature strength measures I and II and the rescaled level curve curvature operator, as well as an enriched set of scale selection mechanisms including scale selection based on local extrema over scale, complementary post-smoothing after the computation of non-linear differential invariants and scale selection based on weighted averaging of scale values along feature trajectories over scale. It is shown how the selected scales of different linear and non-linear interest point detectors can be analyzed for Gaussian blob models. Specifically it is shown that for a rotationally symmetric Gaussian blob model, the scale estimates obtained by weighted scale selection will be similar to the scale estimates obtained from local extrema over scale of scale normalized derivatives for each one of the pure second-order operators. In this respect, no scale compensation is needed between the two types of scale selection approaches. When using post-smoothing, the scale estimates may, however, be different between different types of interest point operators, and it is shown how relative calibration factors can be derived to enable comparable scale estimates for each purely second-order operator and for different amounts of self-similar post-smoothing. an enriched set of differential interest operators at a fixed scale including the Laplacian operator, the determinant of the Hessian, the new Hessian feature strength measures I and II and the rescaled level curve curvature operator, as well as an enriched set of scale selection mechanisms including scale selection based on local extrema over scale, complementary post-smoothing after the computation of non-linear differential invariants and scale selection based on weighted averaging of scale values along feature trajectories over scale. A theoretical analysis of the sensitivity to affine image deformations is presented, and it is shown that the scale estimates obtained from the determinant of the Hessian operator are affine covariant for an anisotropic Gaussian blob model. Among the other purely second-order operators, the Hessian feature strength measure I has the lowest sensitivity to non-uniform scaling transformations, followed by the Laplacian operator and the Hessian feature strength measure II. The predictions from this theoretical analysis agree with experimental results of the repeatability properties of the different interest point detectors under affine and perspective transformations of real image data. A number of less complete results are derived for the level curve curvature operator.

-an enriched set of differential interest operators at a fixed scale including the Laplacian operator, the determinant of the Hessian, the new Hessian feature strength measures I and II and the rescaled level curve curvature operator, as well as -an enriched set of scale selection mechanisms including scale selection based on local extrema over scale, complementary post-smoothing after the computation of nonlinear differential invariants and scale selection based on weighted averaging of scale values along feature trajectories over scale.
It is shown how the selected scales of different linear and non-linear interest point detectors can be analyzed for Gaussian blob models. Specifically it is shown that for a rotationally symmetric Gaussian blob model, the scale estimates obtained by weighted scale selection will be similar to the scale estimates obtained from local extrema over scale of scale normalized derivatives for each one of the pure secondorder operators. In this respect, no scale compensation is needed between the two types of scale selection approaches. When using post-smoothing, the scale estimates may, however, be different between different types of interest point operators, and it is shown how relative calibration factors can be derived to enable comparable scale estimates for each purely second-order operator and for different amounts of self-similar post-smoothing. A theoretical analysis of the sensitivity to affine image deformations is presented, and it is shown that the scale estimates obtained from the determinant of the Hessian operator are affine covariant for an anisotropic Gaussian blob model. Among the other purely second-order operators, the Hessian feature strength measure I has the lowest sensitivity to nonuniform scaling transformations, followed by the Laplacian operator and the Hessian feature strength measure II. The predictions from this theoretical analysis agree with experimental results of the repeatability properties of the different interest point detectors under affine and perspective transformations of real image data. A number of less complete results are derived for the level curve curvature operator.
Keywords Feature detection · Interest point · Blob detection · Corner detection · Scale · Scale-space · Scale selection · Scale invariance · Scale calibration · Scale linking · Feature trajectory · Deep structure · Affine transformation · Differential invariant · Gaussian derivative · Multi-scale representation · Computer vision

Introduction
The notion of scale selection is essential to adapt the scale of processing to local image structures. A computer vision system equipped with an automatic scale selection mechanism will have the ability to compute scale-invariant image features and thereby handle the a priori unknown scale variations that may occur in image data because of objects and substructures of different physical size in the world as well as objects at different distances to the camera. Computing local image descriptors at integration scales proportional to the detection scales of scale-invariant image features, moreover makes it possible to compute scale-invariant image descriptors (Lindeberg [35]; Bretzner and Lindeberg [4]; Mikolajczyk and Schmid [49]; Lowe [48]; Bay et al. [2]; Lindeberg [38,43]).
A general framework for performing scale selection can be obtained by detecting local extrema over scale of γnormalized derivative expressions (Lindeberg [35]). This approach has been applied to a large variety of feature detection tasks (Lindeberg [34]; Bretzner and Lindeberg [4]; Sato et al. [54]; Frangi et al. [11]; Krissian et al. [22]; Chomat et al. [5]; Hall et al. [15]; Mikolajczyk and Schmid [49]; Lazebnik et al. [24]; Negre et al. [52]; Tuytelaars and Mikolajczyk [58]). Specifically, highly successful applications can be found in image-based recognition (Lowe [48]; Bay et al. [2]). Alternative approaches for scale selection have also been proposed in terms of the detection of peaks over scale in weighted entropy measures (Kadir and Brady [18]) or Lyapunov functionals (Sporring et al. [56]), minimization of normalized error measures over scale (Lindeberg [36]), determining minimum reliable scales for feature detection according a noise suppression model (Elder and Zucker [9]), determining optimal stopping times in non-linear diffusionbased image restoration methods using similarity measurements relative to the original data (Mrázek and Navara [51]), by applying statistical classifiers for texture analysis at different scales (Kang et al. [19]) or by performing image segmentation from the scales at which a supervised classifier delivers class labels with the highest posterior (Loog et al. [47]; Li et al. [25]).
Recently, a generalization of the differential approach for scale selection based on local extrema over scale of γnormalized derivatives has been proposed by linking image features over scale into feature trajectories over scale in a generalized scale-space primal sketch [39]. Specifically, two novel scale selection mechanisms have been proposed in terms of: -post-smoothing of differential feature responses by performing a second-stage scale-space smoothing step after the computation of non-linear differential invariants, so as to simplify the task of linking feature responses over scale into feature trajectories, and -weighted scale selection where the scale estimates are computed by weighted averaging of scale-normalized feature responses along each feature trajectory over scale, in contrast to previous detection of local extrema or global extrema over scale.
The subject of this article is to perform an in-depth theoretical analysis of properties of these scale selection methods when applied to the task of computing scale-invariant interest points: (i) When using a set of different types of interest point detectors that are based on different linear or non-linear combinations of scale-space derivatives, a basic question arises of how to relate thresholds on the magnitude values between different types of interest point detectors. By studying the responses of the different interest point detectors to unit contrast Gaussian blobs, we will derive a way of expressing mutually corresponding thresholds between different types of interest points detectors. Algorithmically, the resulting threshold relations lead to intuitively very reasonable results. (ii) The new scale selection method based on weighted averaging along feature trajectories over scale raises questions of how the properties of this scale selection method can be related to the previous scale selection method based on local extrema over scale of scalenormalized derivatives. We will show that for Gaussian blobs, the scale estimates obtained by weighted averaging over scale will be similar to the scale estimates obtained from local extrema over scale. If we assume that scale calibration can be performed based on the behaviour for Gaussian blobs, this result therefore shows that no relative scale compensation is needed between the two types of scale selection approaches. In previous work on scale selection based on γ -normalized derivatives [34,35] a similar assumption of scale calibration based on Gaussian model signals has been demonstrated to lead to highly useful results for calibrating the value of the γ -parameter with respect to the problems of blob detection, corner detection, edge detection and ridge detection, with a large number of successful computer vision applications building on the resulting feature detectors. (iii) For the scale linking algorithm presented in [39], which is based on local gradient ascent or gradient decent starting from local extrema in the differential responses at adjacent levels of scale, it turns out that a second post-smoothing stage after the computation of non-linear differential invariants is highly useful for increasing the performance of the scale linking algorithm, by suppressing spurious responses of low relative amplitude in the non-linear differential responses that are used for computing interest points. This selfsimilar amount of post-smoothing is determined as a constant times the local scale for computing the differential expressions, and may affect the scale estimates obtained from local extrema over scale or weighted averaging over scale. We will analyze how large this effect will be for different amounts of post-smoothing and also show how relative scale normalization factors can be determined for the different differential expressions to obtain scale estimates that are unbiased with respect to the effect of the post-smoothing operation, if we again assume that scale calibration can be performed based on the scale selection properties for Gaussian blobs. Notably, different scale compensation factors for the influence of post-smoothing will be obtained for the different differential expressions that are used for defining interest points. Without postsmoothing, the scale estimates obtained from the different differential expressions are, however, all similar for Gaussian blobs, which indicates the possibilities of using different types of differential expressions for performing combined interest point detection and scale selection, so that they can be interchangeably replaced in a modular fashion. (iv) When detecting interest points from images that are taken of an object from different viewing directions, the local image pattern will be deformed by the perspective projection. If the interest point corresponds to a point in the world that is located at a smooth surface of an object, this deformation can to first order of approximation be modelled by a local affine transformation (Gårding and Lindeberg [12]). While the notion of affine shape adaptation has been demonstrated to be a highly useful tool for computing affine invariant interest points (Lindeberg and Gårding [46]; Baumberg [1]; Mikolajczyk and Schmid [49]; Tuytelaars and van Gool [57]), the success of such an affine shape adaptation process depends on the robustness of the underlying interest points that are used for initiating the iterative affine shape adaptation process. To investigate the properties of the different interest point detectors under affine transformations, we will perform a detailed analysis of the scale selection properties for affine Gaussian blobs, for which closed form theoretical analysis is possible. The analysis shows that the determinant of the Hessian operator and the new Hessian feature strength measure I do both have significantly better behaviour under affine transformations than the Laplacian operator or the new Hessian feature strength measure II. In comparison with experimental results [39], the interest point detectors that have the best theoretical properties under affine transformations of Gaussian blob do also have significantly better repeatability properties under affine and perspective transformations than the other two. These results therefore show how experimental properties of interest points can be predicted by theoretical analysis, which contributes to an increased understanding of the relative properties of different types of interest point detectors.
In very recent work [42], these generalized scale-space interest points have been integrated with local scale-invariant image descriptors and been demonstrated to lead to highly competitive results for image-based matching and recognition.

Outline of the Presentation
The paper is organized as follows. Section 2 reviews main components of a generalized framework for detecting scaleinvariant interest points from scale-space features, including a richer set of interest point detectors at a fixed scale as well as new scale selection mechanisms.
In Sect. 3 the scale selection properties of this framework are analyzed for scale selection based on local extrema over scale of γ -normalized derivatives, when applied to rotationally symmetric as well as anisotropic Gaussian blob models. Section 4 gives a corresponding analysis for scale selection by weighted averaging over scale along feature trajectories.
Section 5 summarizes and compares the results obtained from the two scale selection approaches including complementary theoretical arguments to highlight their similarities in the rotationally symmetric case. It is also shown how scale calibration factors can be determined so as to obtain comparable scale estimates from interest point detectors that have been computed from different types of differential expressions. Comparisons are also presented of the relative sensitivity of the scale estimates to affine transformations outside the similarity group, with a brief comparison to experimental results. Finally, Sect. 6 concludes with an overall summary and discussion.

Differential Entities for Detecting Scale-Space Interest Points
A common approach to image matching and object recognition consists of matching interest points with associated image descriptors. Basic requirements on the interest points on which the image matching is to be performed are that they should (i) have a clear, preferably mathematically wellfounded, definition, (ii) have a well-defined position in image space, (iii) have local image structures around the interest point that are rich in information content such that the interest points carry important information to later stages and (iv) be stable under local and global deformations of the image domain, including perspective image deformations and illumination variations such that the interest points can be reliably computed with a high degree of repeatability. The image descriptors computed at the interest points should also (v) be sufficiently distinct, such that interest points corresponding to physically different points can be kept separate.
Preferably, the interest points should also have an attribute of scale, to make it possible to compute reliable interest points from real-world image data, including scale changes in the image domain. Specifically, the interest points should preferably also be scale-invariant to make it possible to match corresponding image patches under scale variations.
Within this scale-space framework, interest point detectors can be defined at any level of scale using (i) either of the following established differential operators [35]: -the Laplacian operator -the determinant of the Hessian -the rescaled level curve curvaturẽ (ii) either of the following new differential analogues and extensions of the Harris operator [16] proposed in [39]: -the unsigned Hessian feature strength measure I -the signed Hessian feature strength measure Ĩ where k ∈]0, 1 4 [ with the preferred choice k ≈ 0.04, or (iii) either of the following new differential analogues and extensions of the Shi and Tomasi operator [55] proposed in [39]: -the unsigned Hessian feature strength measure II -the signed Hessian feature strength measure IĨ where L pp and L qq denote the eigenvalues of the Hessian matrix (the principal curvatures) ordered such that L pp ≤ L qq [34]: Figure 1 shows examples of detecting different types of interest points from a grey-level image. In this figure, the repetitive nature of the underlying image structures in the row of similar books illustrate the ability of the interest point detectors to respond to approximately similar structures in the image domain by corresponding responses. Figure 2 illustrates the repeatability properties of such interest points  A basic motivation for defining the new differential operators D 1 ,D 1 , D 2 andD 2 from the Hessian matrix HL in a structurally related way as the Harris and the Shi-and-Tomasi operators are defined from the second-moment matrix (structure tensor) are that: (i) under an affine transformation p = A p with p = (x, y) T and A denoting a non-singular 2 × 2 matrix it can be shown that the Hessian matrix Hf transforms in a similar way (Hf ) [31,46] and (ii) provided that the Hessian matrix is either positive or negative definite, the Hessian matrix HL computed at a point p 0 defines an either positive or negative definite quadratic form Q HL (p) = (p − p 0 ) T (HL)(p − p 0 ) in a similar way as the second-moment matrix μ computed at p 0 does: Q μ (p) = (p − p 0 ) T μ (p − p 0 ). From these two analogies, we can conclude that provided the Hessian matrix is either positive or negative definite, these two types of descriptors should have strong qualitative similarities. Experimentally, the new differential interest point detectors D 1 ,D 1 , D 2 andD 2 can be shown to perform very well and to allow for image features with better repeatability properties under affine and perspective transformations than the more traditional Laplacian or Harris operators [39].
The Laplacian ∇ 2 L responds to bright and dark blobs as formalized in terms of local minima or maxima of the Laplacian operator. The determinant of the Hessian det HL responds to bright and dark blobs by positive responses and in addition to saddle-like image features by negative responses as well as to corners. The unsigned Hessian feature strength D 1 L responds to bright and dark blobs as well as to corners, with the complementary requirement that the ratio of the eigenvalues λ 1 and λ 2 of the Hessian matrix (with |λ 1 | ≤ |λ 2 ) should be sufficiently close to one, as specified by the parameter k according to: For this entity to respond, it is therefore necessary that there are strong intensity variations along two different directions in the image domain. The signed Hessian feature strength measureD 1 L responds to similar image features as the unsigned entity D 1 L, and in addition to saddle-like image features with a corresponding constraint on the ratio between the eigenvalues. The Hessian feature strength measures D 2 L andD 2 L respond strongly when both of the principal curvatures are strong and the local image pattern therefore contains strong intensity variations in two orthogonal directions. The unsigned entity D 2 L disregards the sign of the principal curvatures, whereas the signed entityD 2 L preserves the sign of the principal curvature of the lowest magnitude.
Other ways of defining image features from the secondorder differential image structure of images have been proposed by Danielsson et al. [7] and Griffin [13].

Scale Selection Mechanisms
Scale Selection from γ -Normalized Derivatives In (Lindeberg [29,31,35,37]) a general framework for automatic scale selection was proposed based on the idea of detecting local extrema over scale of γ -normalized derivatives defined according to where γ > 0 is a free parameter 1 that can be related to the dimensionality of the image features that the feature detector is designed to respond to, e.g., in terms of the evolution properties over scale in terms of (i)  [53]; see also Appendix A.3 for an explicit interpretation of the parameter γ in terms of the dimensionality D of second-order image features according to (213). Specifically, it was shown in [35] that local extrema over scale of homogeneous polynomial differential invariants D γ −norm L expressed in terms of γ -normalized Gaussian derivatives are transformed in a scale-covariant way: If some scale-normalized differential invariant D γ −norm L assumes a local extremum over scale at scale t 0 in scale-space, then under a uniform rescaling of the input pattern by a factor s there will be a local extremum over scale in the scale-space of the transformed signal at scale s 2 t 0 .
Furthermore, by performing simultaneous scale selection and spatial selection by detecting scale-space extrema, where the scale-normalized differential expression D γ −norm L assumes local extrema with respect to both space and scale, constitutes a general framework for detecting scale-invariant interest points. Formally, such scale-space extrema are characterized by the first-order derivatives with respect to space and scale being zero ∇(D γ −norm L) = 0 and ∂ t (D γ −norm L) = 0 (16) and in addition the composed Hessian matrix computed over both space and scale being either positive or negative definite.

Generalized Scale Selection Mechanisms
In [39] this approach was extended in the following ways: -by performing post-smoothing of the differential expression D γ −norm L prior to the detection of local extrema over space or scale with an integration scale (post-smoothing scale) t post = c 2 t proportional to the differentiation scale t with c > 0 (see Appendix A.1 for a brief description of the algorithmic motivations for using such a post-smoothing operation when linking image features over scale that have been computed from non-linear differential entities) and -by performing weighted averaging of scale values along any feature trajectory T over scale in a scale-space primal sketch according tô where ψ denotes some (positive and monotonically increasing) transformation of the scale-normalized feature strength response D γ −norm L and with the scale parameter parameterized in terms of effective scale [28] to obtain a scale covariant construction of the corresponding scale estimateŝ that implies that the resulting image features will be scaleinvariant.
The motivation for performing scale selection by weighted averaging of scale-normalized differential responses over scale is analogous to the motivation for scale selection from local extrema over scale in the sense that interesting characteristic scale levels for further analysis should be obtained from the scales at which the differential operator assumes is strongest scale-normalized magnitude values over scale. Contrary to scale selection based on local extrema over scale, however, scale selection by weighted averaging over scale implies that the scale estimate will not only be obtained from the behaviour around the local extremum over scale, but also including the responses from all scales along a feature trajectory over scale. The intention behind this choice is that the scale estimates should therefore be more robust and less sensitive to local image perturbations. Experimentally, it can be shown that scale-space interest points detected by these generalized scale selection mechanisms lead to interest points with better repeatability properties under affine and perspective image deformations compared to corresponding interest points detected by regular scale-space extrema [39]. In this sense, these generalized scale selection mechanisms make it possible to detect more robust image features. Specifically, the use of scale selection by weighted averaging over scale is made possible by linking image features over scale into feature trajectories, 2 which ensures that the scale estimates should only be influenced by responses from scale levels that correspond to qualitatively similar types of image structures along a feature trajectory over scale.
The subject of this article is to analyze properties of these generalized scale selection mechanisms theoretically when applied to the interest point detectors listed in Sect. 2.2.

Scale Selection Properties for Local Extrema over Scale
For theoretical analysis, we will consider a Gaussian prototype model of blob-like image structures. With such a prototype model, the semi-group property of the Gaussian kernel makes it possible to directly obtain the scale-space representations at coarser scales in terms of Gaussian functions, which simplifies theoretical analysis. Specifically, the result of computing polynomial differential invariants at different scales will be expressed in terms of Gaussian functions multiplied by polynomials. Thereby, closed-form theoretical analysis becomes tractable, which would otherwise be much harder to carry out regarding the application of the non-linear operations that are used for defining the interest points to general image data. The use of Gaussian prototype model can also be motivated by conceptual simplicity. If we would like to model an image feature at some scale, then the Gaussian model is the model that requires the minimum amount of information in the sense that the Gaussian distribution is the distribution with maximum entropy 3 given a specification of the mean value m and the covariance matrix Σ of the distribution. Specifically, the Gaussian function with scale parameter t serves as an aperture function that measures image structures with respect to an inner scale beyond which finer-scale structures cannot be resolved.
In previous work [34,35] it has been shown that determination of the γ -parameter in scale selection for different types of feature detection tasks, such as blob detection, corner detection, edge detection and ridge detection, can be performed based on the behaviour of these feature detectors on Gaussian-based intensity profiles. As will be shown later, the theoretical results that will be derived based on Gaussian blob models will lead to theoretical predictions that agree with the relative repeatability properties of different types of interest point detectors under affine and perspective transformations. Formally, however, further application of these results will be based on an assumption that the scale selection behaviour can be calibrated based on the behaviour for Gaussian prototype models.

Regular Scale Selection from Local Extrema over Scale
Two basic questions in the relation to the different interest point detectors reviewed in Sect. 2.2 concern: -How will the selected scale levels be related between different interest point detectors? -How will the scale-normalized magnitude values be related between different interest point detectors that respond to similar image structures?
Ideally, we would like similar scale estimates to be obtained for different interest point detectors, so that the interest point detectors could be modularly replaceable in the computer vision algorithms they are part of. Since the interest point detectors are expressed in terms of different types of linear or non-linear combinations of scale-space derivatives, a basic question concerns how to express comparable thresholds on the magnitude values for the different interest point detectors. In this section, we will relate these entities by applying scale selection from local extrema of scale-normalized 3 Maximum entropy solutions have been argued to be taken as preferred default solutions for underconstrained problems [3,59] although the applicability of these arguments has also been questioned [6,8].
derivatives over scale to a single Gaussian blob: Due to the semi-group property of the Gaussian kernel the scale-space representation of f obtained by Gaussian smoothing is given by

The Pure Second-Order Interest Point Detectors
By differentiation, if follows that the scale normalized (signed or unsigned) feature strength measure at the center (x, y) = (0, 0) of the blob will for the Laplacian (5), the determinant of the Hessian (6) and the Hessian feature strength measures I (8) and II (10) be given by By differentiating these expressions with respect to the scale parameter t and setting the derivative to zero, it follows that the extremum value over scale will for all these descriptors be assumed at the same scalê For the specific choice of γ = 1, the selected scalet will be equal to the scale of the Gaussian blob, i.e.t = t 0 , and the extremum value over scale for each one of the respective feature detectors is These

Scale Invariant Feature Responses After Contrast Normalization
When applying different types of interest point detectors in parallel, some approach is needed for expressing comparable thresholds between different types of interest point detectors. Let us assume that such calibration of corresponding thresholds between different interest point detectors can be performed based on the their responses to Gaussian blobs. If we would like to present a Gaussian blob on a screen and would like to make it possible to vary its size (spatial extent) without affecting its perceived brightness on the screen, let us assume that this can be performed by keeping the contrast between the maximum and the minimum values constant. Let us therefore multiply the amplitude of the original Gaussian blob f by a factor 2πt 0 so as to obtain an input signal with unit contrast as measured by the range between the minimum and maximum values. Then, the maximum value over scale of the contrast normalized Gaussian blob will be given by These expressions provide a way to express mutually related magnitude thresholds for the different interest point detectors as shown in Table 1.
Note: For the Harris operator [16], which is determined from the second-moment matrix according to for some k ∈]0, 1 4 [, a corresponding analysis shows that the response at the center (x, y) = (0, 0) of a Gaussian blob is at scale t = t 0 given by if we let the integration scale s be related to the local scale t according to s = r 2 t. This value therefore expresses the magnitude value that will obtained by applying the Harris-Laplace operator [49] to a Gaussian blob with unit contrast, provided that scale selection is performed using scalenormalized derivatives with γ = 1. In all other respects, the scale selection properties of the Harris-Laplace operator are similar to the scale selection properties of the Laplacian operator.

The Rescaled Level Curve Curvature Operator
When applying the rescaled level curve curvature operator κ γ −norm (L) to a rotationally symmetric Gaussian blob we obtaiñ This expression assumes its spatial extremum on the circle where the extremum value is 5 (43) and this entity assumes its extremum over scale at In the special case when γ = 7/8 [4] this corresponds to with the corresponding scale-normalized response (46) and the following approximate relation for γ = 7/8 if the Gaussian blob is normalized to unit contrast Due to the use of a γ -value not equal to one, this magnitude measure is not fully scale invariant. The scale dependency can, however, be compensated for by multiplying the maximum feature response over scale by a scale-dependent compensation factor t 2(1−γ ) .

Scale Selection with Complementary Post-smoothing
When linking image features at different scales into feature trajectories, the use of post-smoothing of any differential expression D norm L according to (18) was proposed in [39] to simplify the task for the scale linking algorithm, by suppressing small local perturbations in the responses of the differential feature detectors at any single scale. Since this complementary post-smoothing operation will affect the magnitude values of the scale-normalized differential responses that are used in the different interest point detectors, one may ask how large effect this operation will have on the resulting scale estimates.
In this section, we shall analyze the influence of the postsmoothing operation for scale selection based on local extrema over scale of scale-normalized derivatives.

The Laplacian and the Determinant of the Hessian Operators
Consider again a rotationally symmetric Gaussian blob (22) with its scale-space representation of the form (24). Then, the scale-normalized Laplacian ∇ 2 γ −norm L and the scalenormalized determinant of the Hessian det H γ −norm L are given by With complementary Gaussian post-smoothing with scale parameter t post = c 2 t, the resulting differential expressions assume the form and assume their extremal scale-normalized responses over scale at In the specific case when γ = 1 and c = 1/2, these local extrema over scale are given bŷ In other words, by comparison with the results in Sect. 3.1.1, we find that the use of a post-smoothing operation with integration scale determined by c = 1/2, the scale estimates will be about 10 % lower when measured in units of σ = √ t . To obtain unbiased scale estimates that lead tot = t 0 for a Gaussian blob, we can either multiply the scale estimates by correction factors from (52) and (53) or choose γ as function of c according to With c = 1/2, the latter settings correspond to the following values of γ : γ det HL = 11 10 = 1.10 (59)

The Hessian Feature Strength Measure I
To analyze the effect of the post-smoothing operation for the Hessian feature strength measure I computed for a Gaussian blob, which is given by provided that this entity is positive, let us initially disregard the effect of the local condition det HL − k trace 2 HL > 0 in (8) and integrate the closed-form expression (60) over the entire image plane R 2 instead of over only the finite region where this entity is positive Then, complementary post-smoothing with integration scale t post = c 2 t implies that this approximation of the postsmoothed differential entity is given by Corresponding integration within the finite support region (61) where D 1 > 0 gives an expression that is too complex to be written out here. Unfortunately, it is hard to analyze the scales at which these entities assumes local extrema over scale, since differentiation of the above mentioned expression and solving for its roots leads to fourth-order equations. In the case of γ = 1, c = 1/2 and κ = 0.04, we can, however, find the numerical solution For these parameter settings, the use of a spatial postsmoothing operation does again lead to scale estimates that are about 10 % lower. If we restrict ourselves to the analysis of a single isolated Gaussian blob, a similar approximation holds for the signed Hessian feature strength measureD 1,γ −norm L.

The Hessian Feature Strength Measure II
For the Hessian feature strength measure II (10), we also have a corresponding situation with a logical switching between two differential entities |L pp | and |L qq | with L pp and L qq determined by (12) and (13). Solving for boundary between these domains, which is determined by L pp + L qq = 0, gives that we should select |L qq | within the circular region and |L pp | outside. Solving for the corresponding integrals gives with Unfortunately, it is again hard to solve for the local extrema over scale of the post-smoothed derivative expressions in closed form. For this reason, let us approximate the composed expression D 2 L by the contribution from its first term 4 L qq inside and with the integral extended from the circular region (64) to the entire image plane Then, the local extrema over scale are given by the solutions of the third-order equation where the special case with γ = 1 and c = 1/2 has the numerical solution For the D 2,norm L operator and these parameter values, the use of a spatial post-smoothing operation does therefore lead to scale estimates that are about 16 % lower, and the influence is therefore stronger than for the Laplacian ∇ 2 norm L, determinant of the Hessian det H norm L or the Hessian feature strength D 1,norm L operators.
If we restrict ourselves to the analysis of a single isolated Gaussian blob, a similar approximation holds for the signed Hessian feature strength measureD 2,γ −norm L.

The Rescaled Level Curve Curvature Operator
If we apply post-smoothing to the rescaled level curve curvature computed for a rotationally symmetric Gaussian blob (41) with post-smoothing scale t post = c 2 t, we obtaiñ This entity assumes it spatial extremum on the circle and the extremum value on this circle is By differentiating this expression with respect to the scale parameter t, it follows that the selected scale level will be a solution of the third-order equation Unfortunately, the closed form expression for the solution is rather complex. Nevertheless, we can note that due to the homogeneity of this equation, the solution will always be proportional the scale t 0 of the original Gaussian blob. In the specific case with γ = 7/8 and c = 1/2 we obtain In other words, compared to the case without post-smoothing (45), the relative difference between the selected scale levels is here less than 5 %, when measured in units of σ = √ t .

Influence of Affine Image Deformations
To analyze the behaviour of the different interest point detectors under image deformations, let us next consider an anisotropic Gaussian blob as a prototype model of a rotationally symmetric Gaussian blob that has been subjected to an affine image deformation that we can see as representing a local linearization of the perspective mapping from a surface patch in the world to the image plane. Specifically, we can model the effect of foreshortening by different spatial extents t 1 and t 2 along the different coordinate directions where the ratio between the scale parameter t 1 and t 2 is related to the angle θ between the normal directions of the surface patch and the image plane according to if we without loss off generality assume that t 1 ≥ t 2 . Since all the feature detectors we consider are based on rotationally invariant differential expressions, it is sufficient to study the case when the anisotropic Gaussian blob is aligned to one the coordinate directions. Due to the semi-group property of the (one-dimensional) Gaussian kernel, the scalespace representation of f is then given by Note on Relation to Influence Under General Affine Transformations A general argument for studying the influence of non-uniform scaling transformations can be obtained by decomposing a general two-dimensional affine transformation matrix A into [32] where R 1 and R 2 can be forced to be rotation matrices, if we relax the requirement of non-negative entries in the diagonal elements σ 1 and σ 2 of a regular singular value decomposition. With this model, the geometric average of the absolute values of the diagonal entries corresponds to a uniform scaling transformation. We know that the Gaussian scale-space is closed under uniform scaling transformations, rotations and reflections. The differential expressions we use for detecting interest points are based on rotationally invariant differential invariants, which implies that the scale estimates will also be rotationally invariant. Furthermore, our scale estimates are transformed in a scale covariant way under uniform scaling transformations. Hence, if we without essential loss of generality disregard reflections and assume that σ 1 and σ 2 are both positive, the degree of freedom that remains to be studied concerns non-uniform scaling transformations of the form whose influence on the scale estimates will be investigated in this section.

The Laplacian operator
For the Laplacian operator, the γ -normalized response as function of space and scale is given by This entity has critical points at the origin (x, y) = (0, 0) and at where the first pair of roots corresponds to saddle points if t 1 > t 2 , while the other pair of roots correspond to local extrema. Unfortunately, the critical points outside the origin lead to rather complex expressions. We shall therefore focus on the critical point at the origin, for which the selected scale(s) will be the root(s) of the third-order equation For a general value of γ , the explicit solution is too complex to be written out here. In the specific case of γ = 1, however, we obtain for t 1 ≥ t 2 where which in the special case of t 1 = t 2 = t 0 reduces to If we on the other hand reparameterize the scale parameters t 1 and t 2 of the Gaussian blob as t 1 = s t 0 and t 2 = t 0 /s, corresponding to a non-uniform scaling transformation with relative scaling factor s > 1 renormalized such the determinant of the transformation matrix is equal to one, then a Taylor expansion of t ∇ 2 L around s = 1 gives From this result we get an approximate expression for how the Laplacian scale selection method is affected by affine transformations outside the similarity group. Specifically, we can note that the scales selected from local extrema over scale of the scale-normalized Laplacian operator are not invariant under general affine transformations.

The Determinant of the Hessian
By differentiation of (78) it follows that the scale-normalized determinant of the Hessian is given by This expression does also have multiple critical points. Again, however, we focus on the central point (x, y) = (0, 0), for which the derivative with respect to scale is of the form This equation has a positive root at which in the special case of γ = 1 simplifies to the affine covariant expression Notably, if we again reparameterize the scale parameters according to t 1 = s t 0 and t 2 = t 0 /s, then for any non-uniform scaling transformation renormalized such that the determinant of the transformation matrix is one, it holds that which implies that (in this specific case and with γ = 1) scale selection based on the scale-normalized determinant of the Hessian leads to affine covariant scale estimates for the Gaussian blob model. In this respect, there is a significant difference to scale selection based on the scale-normalized Laplacian, for which the scale estimates will be biased according to (86) and (92).
For other values of γ , a Taylor expansion of t det HL around s = 1 gives implying a certain dependency on the relative scaling factor s. Provided that |γ − 1| < 1/2, this dependency will, however, be lower than for the Laplacian scale selection method (92) with γ = 1.

The Hessian Feature Strength Measure I
For the Hessian feature strength measure I, the behaviour of the scale-normalized response at the origin is given by provided that this entity is positive. If we differentiate this expression with respect to the scale parameter and set the derivative to zero, we obtain a fourth-order equation, which in principle can be solved in closed form, but leads to very complex expressions, even when restricted to γ = 1.
If we reparameterize the scale parameters according to t 1 = s t 0 and t 2 = t 0 /s and then restrict the parameter k in D 1,γ −norm to k = 0.04, however, we can obtain a manageable expression for the Taylor expansion of the selected scale t D 1 L as function of the non-uniform scaling factor s in the specific case of γ = 1 From this expression we can see that the scales selected from local extrema over scale of the scale-normalized Hessian feature strength measure I are not invariant under nonuniform scaling transformations. For values of s reasonably close to one, however, the deviation from affine invariance is quite low, and significantly smaller than for the Laplacian operator (92). This could also be expected, since a major contribution to the Hessian feature strength measure D 1,γ −norm originates from the affine covariant determinant of the Hessian det H γ −norm L.

The Hessian Feature Strength Measure II
With t 1 > t 2 , the Hessian feature strength measure II at the origin is given by This entity assumes its local extremum over scale at which in the case when γ = 1 reduces to If we again reparameterize the scale parameters according to t 1 = s t 0 and t 2 = t 0 /s and in order to obtain more compact expressions restrict ourselves to the case when γ = 1, then a Taylor expansion of t D 2 L around s = 1 gives From a comparison with (92), (97) and (100) we can see that the scales selected from the scale-normalized Hessian feature strength measure II are more sensitive to non-uniform scaling transformations than the scales selected by the scalenormalized Laplacian ∇ 2 norm L, the determinant of the Hessian det H norm L or the Hessian feature strength measure D 1,norm .

The Rescaled Level Curve Curvature Operator
For the anisotropic Gaussian blob model, a computation of the rescaled level curve curvature operator κ γ −norm (L) gives This entity assumes its spatial maximum on the ellipse and on this ellipse it holds that By differentiating this expression with respect to t and setting the derivative to zero, it follows that the extremum over scale is assumed at which in the special case when γ = 5/4 reduces to the affine covariant expression By again reparameterizing the scale parameters in the Gaussian blob model according to t 1 = s t 0 and t 2 = t 0 /s and performing a Taylor expansion around s = 1 for a general value of γ , it follows that which in the case with γ = 7/8 assumes the form with second-and third-order relative bias terms about three times the magnitude compared to the Hessian feature strength measure I in (100).

Scale Selection by Weighted Averaging Along Feature Trajectories
The Given that image features x(t) at different scales t have been linked into a feature trajectory T over scale scale selection by weighted averaging over scale implies that the scale estimate is computed as [39] for some positive and monotonically increasing transformation function ψ of the magnitude values |(D γ −norm L)(x(τ ); τ )| of the differential feature responses.
Specifically, the following family of scale invariant transformation functions was considered where w DL ∈ [0, 1] is a so-called feature weighting function that measures the relative strength of the feature detector DL compared to other possible competing types of feature detectors and a is the scalar parameter in the self-similar power law.
In this section, we shall analyze the scale selection properties of this construction for the differential feature detectors defined in Sect. 2.2 under the simplifying assumptions of w DL = 1 and a = 1. With respect to the analysis at the center of a Gaussian blob, the assumption of w DL = 1 is particularly relevant for the weighting functions of the form considered in [39] if we make use of the fact that L ξ = L η = 0 at any critical point (as at the center of a Gaussian blob) and disregard the influence of the noise suppression parameter ε.

The Pure Second-Order Interest Point Detectors
From the explicit expression for the magnitude of the scalenormalized Laplacian response at the center of rotationally symmetric Gaussian blob (25) it follows that the weighted scale selection estimate according to (113) will in the case of γ = 1 5 and with the effective scale parameter τ defined as τ = log t be given bŷ Similarly, from the explicit expression for the determinant of the Hessian at the center of the Gaussian blob (26), it follows that the weighted scale estimate will be determined bŷ Due to the similarity between the explicit expressions for the Hessian feature strength measure I (27) and the determinant of the Hessian response (26) as well as the similarity 5 In this section we will in many cases restrict the analysis to the specific case of γ = 1, since some of the results become significantly more complex for a general value of γ = 1. In a few cases where the corresponding results become reasonably compact, we will, however, include them. between the Hessian feature strength measure II (28) and the Laplacian response (27) at the center of a Gaussian blob, the scale estimates for D 1,γ −norm L and D 2,γ −norm L will be analogous: When expressed in terms of the regular scale parameter the weighted scale selection method does hence for a rotationally symmetric Gaussian blob lead to similar scale estimates as are obtained from local extrema over scale of γnormalized derivatives (29) when γ = 1. Since these scale estimates are similar to the scale estimates obtained form local extrema over scale, it follows that the scale-normalized magnitude values will also be similar and the relationships between scale-normalized thresholds described in Table 1 will also hold for scale selection based on weighted averaging over scale.

Corresponding Scale Estimates for General Values of γ
For a general value of γ ∈]0, 2[, the corresponding scale estimates become as follows in terms of effective scale τ = log t: where both expressions have the limit value log t 0 when γ → 1. (Note that | cot(πγ )| → ∞ when γ → 1.) By comparing these scale estimates to the corresponding scale estimateτ = log γ 2−γ in (29) obtained from local extrema over scale, we can compute a Taylor expansions of the difference in the scale estimates: Notably, the difference in scale estimates between the two types of scale selection approaches is smaller 6 for scale selection using the determinant of the Hessian det HL or the Hessian feature strength measure D 1,norm L compared to scale selection based on the Laplacian ∇ 2 norm L or the Hessian feature strength measure D 2,norm L.

The Laplacian and the Determinant of the Hessian Operators
From the explicit expressions for the post-smoothed Laplacian (50) and the post-smoothed determinant of the Hessian (51), it follows that the weighted scale estimates are 6 A plausible explanation why the difference between the scale estimated is smaller for the determinant of the Hessian det HL and the Hessian feature strength measure D 1,norm L compared to difference in scale estimates for the Laplacian ∇ 2 norm L and the Hessian feature strength measure D 2,norm L is that second-order derivative responses are squared for the determinant of the Hessian det HL and the Hessian feature strength measure D 1,norm L, whereas the Laplacian ∇ 2 norm L and the Hessian feature strength measure D 2,norm L operators depend on the second-order derivative responses in a linear way.
Thereby, the integrals that define the weighted scale selection estimates will get a comparably higher relative contribution from scale levels near the maximum over scale, which in turn implies that the influence due to skewness in the scale-space signature caused by values of γ = 1 will be lower (compare with Sect. 5.1.1). By varying the power a in the self-similar transformation function (114), it is more generally possible to modulate this effect. for γ = 1 given bŷ which agree with the corresponding scale estimates (52) and (53)

The Hessian Feature Strength Measure II
To analyze the effect of post-smoothing of the Hessian feature strength measure II in (10), let us again approximate the composed post-smoothed differential expression in (67) by the contribution from its first term (68) with the spatial integration extended to the entire plane. Then, the weighted scale estimate can be approximated bŷ In the specific case when c = 1/2 this scale estimate reduces tô

Influence of Affine Image Deformations
To analyze how the scale estimatest obtained by weighted averaging along feature trajectories are affected by affine image deformations, let us again consider an anisotropic Gaussian blob (76) as a prototype model of a rotationally symmetric Gaussian blob that has been subjected to an affine image deformation and with its scale-space representation according to (78).

The Laplacian Operator
At the origin, the scale-normalized Laplacian response according to (82) reduces to and the scale estimate obtained by weighted scale selection is given bŷ With a reparameterization of the scale parameters t 1 and t 2 of the Gaussian blob as t 1 = s t 0 and t 2 = t 0 /s, corresponding to a non-uniform scaling transformation with relative scaling factor s > 1 renormalized such the determinant of the transformation matrix is equal to one, the scale estimate in units of t can be written Notably, this scale estimate is not identical to the scale estimate (86) obtained from local extrema over scale. The Taylor expansion of t ∇ 2 L around s = 1 is in turn given by and is, however, similar until the third-order terms to the Taylor expansion (92) of the corresponding scale estimate obtained from local extrema over scale. In this respect, the behaviour of the two scale selection methods is qualitatively rather similar when applied to the anisotropic Gaussian blob model.

The Determinant of the Hessian
At the origin, the response of the determinant of the Hessian operator (93) simplifies to 2 (141) and the scale estimate obtained by weighted scale selection iŝ corresponding to the affine covariant scale estimatê and in agreement with our earlier result (96) for scale selection from local extrema over scale.

The Hessian Feature Strength Measure I
With the Hessian feature strength measure I at the origin given by (99), the scale estimate obtained by weighted scale selection is determined bŷ With a reparameterization of the scale parameters according to t 1 = s t 0 and t 2 = t 0 /s, this expression simplifies tô A Taylor expansion around s = 1 of the scale estimate expressed in units of t = exp t gives which simplifies to the following form for κ = 0.04 and agreeing until the third-order terms with the corresponding Taylor expansion (100) for the scale estimate obtained from local extrema over scale. Specifically, a comparison with the corresponding expression for the Laplacian operator (140) shows that scale selection based on the Hessian feature strength measure I is less sensitive to affine image deformations compared to scale selection based on the Laplacian.

The Hessian Feature Strength Measure II
Assuming that t 1 ≥ t 2 , the Hessian feature strength measure II at the origin is given by and the weighted scale estimatê With the scale parameters reparameterized according to t 1 = s t 0 and t 2 = t 0 /s, the corresponding scale estimate can be written for which a Taylor expansion around s = 1 gives and agreeing until the second-order terms with the corresponding Taylor expansion (104) for the scale estimate obtained from local extrema over scale.
Again, the scale estimates for scale selection based on the Hessian feature strength measure II are more affected by affine image deformations compared to the scale estimates obtained by the determinant of the Hessian, the Hessian feature strength measure I or the Laplacian.

Rotationally Symmetric Gaussian Blob
From the above mentioned results, we can first note that for the specific case of a rotationally symmetric Gaussian blob, the scale estimates obtained from local extrema over scale vs. weighted averaging over scale are very similar. Table 2 shows the scales that are selected for the Laplacian ∇ 2 norm L and the determinant of the Hessian det H norm L in the presence of a general post-smoothing operation. Table 3 shows corresponding approximate estimates for the Hessian feature strength measure D 1,norm L and the Hessian feature strength measure D 2,norm L for c = 1/2. Notably, the exact scale estimates agree perfectly, whereas the approximate estimates are very similar. In this sense, the two scale selection methods have rather similar effects when applied to a rotationally symmetric Gaussian blob.

Theoretical Symmetry Properties Between the Scale Estimates
The similarity between the results of the two scale selection methods can generally be understood by studying the scalespace signatures that show how the Laplacian and the determinant of the Hessian responses evolve as function of scale at the center of the blob (below assuming no post-smoothing corresponding to c = 0): 4 (155) The left column in the upper and middle rows in Fig. 3 shows these graphs with a linear scaling of the regular scale parameter t and the right column shows corresponding graphs with a logarithmic scaling of the scale parameter in terms of effective scale τ . As can be seen from the latter graphs, the scale-space signatures assume a symmetric shape when expressed in terms of effective scale, which implies that the weighted scale estimates, which correspond to the center of gravity of the graphs, will be assumed at a similar position as the global extremum over scale. This property can also be understood algebraically, due to the functional symmetry of (154) and (155) under mappings of the form corresponding to the symmetry Since the response properties of the Hessian feature strength measures D 1,norm L and D 2,norm L are of similar forms corresponding symmetry properties follow also for these operators. These symmetry properties do also extend to monotonically increasing transformations ψ of the differential responses of the form These symmetry properties do, however, not extend to general values of γ = 1, since such values may lead to a skewness in the scale-space signature (see the bottom row in Fig. 3).

Calibration Factors for Setting Scale-Invariant Integration Scales
The scale estimates may, however, differ depending on what differential expression the interest point detector is based on. Hence, if we would like to set an integration scale t int for computing a local image descriptor from the scale estimatê t DL , in such a way that the integration scale should be the same for any interest point detector DL applied to a rotationally symmetric Gaussian blob, irrespective of whether the interest points are computed from scale-space extrema or feature trajectories in a scale-space primal sketch, we can parameterize the integration scale according to with the calibration factor A DL determined from the results in Table 4.

Taylor Expansions for Non-uniform Scaling Factors
Near s = 1 From the analysis of the scale selection properties of an anisotropic Gaussian blob with scale parameters t 1 and t 2 in Sect. 3.3 and Sect. 4.2, we found that scale selection based  For the Laplacian ∇ 2 norm L and the Hessian feature strength measures D 1,norm L and D 2,norm L, the scale estimates are, however, not affine covariant. Moreover, the two scale selection methods may lead to different results. When performing a Taylor expansion of the scale estimate parameterized in terms of a non-uniform scaling factor s relative to a base-line scale t 0 , the Taylor expansions around s = 1 did, however, agree in their lowest order terms. In this sense, the two scale selection approaches have approximately similar properties for the Gaussian blob model for affine image deformations near the similarity group.
From a comparison between the Taylor expansions for the scale estimates for the different interest point detectors in Table 5, we can conclude that after the affine covariant determinant of the Hessian det H norm L, the scale estimate obtained from Hessian feature strength measure D 1,norm L has the lowest sensitive to affine image deformations followed by the Laplacian ∇ 2 norm L and the Hessian feature strength measure D 2,norm L. Corresponding results hold for the corresponding signed Hessian feature strength measures D 1,norm L andD 2,norm L.

Graphs of Non-uniform Scaling Dependencies for
General s ≥ 1 From the analysis in Sect. 4.3 it follows from (139), (142), (144) and (152) that for an anisotropic Gaussian blob with scale parameters t 1 = s t 0 and t 2 = t 0 /s, the scale estimates for weighted scale selection using the Laplacian ∇ 2 norm L, determinant of the Hessian det H norm L and the Hessian feature strength measures D 1,norm L and D 2,norm L are in the absence Table 5 Taylor expansions for the scale estimates obtained for an anisotropic Gaussian blob with scale parameters t 1 = s t 0 and t 2 = t 0 /s around s = 1 (assuming s > 1 for the D 2,norm L operator). The table shows the terms in the Taylor expansion that are common for scale selection based on local extrema over scale and scale selection based on weighted averaging over scale

Operator
Common terms in series expansion of scale estimate of post-smoothing (c = 0) given bŷ (167) Figure 4 shows graphs of how the scale estimates depend on the non-uniform scaling parameter s for scale selection by weighted averaging over scale. As can be seen from these graphs, the behaviour is qualitatively somewhat different for the four differential expressions. For the determinant of the Hessian det H norm L, the scale estimate coincides with the geometric average of the scale parameters for any non-singular amount of non-uniform scaling. For the Laplacian operator ∇ 2 norm L, the scale estimatet ∇ 2 L is lower than the geometric average of the scale parameters in the two directions, whereas the scale estimates are higher than the geometric average for the Hessian feature strength measures D 1,norm L and D 2,norm L. For moderate values of s ∈ [1,4], the scale estimates from the Hessian feature strength measure D 1,norm L, are quite close to the affine covariant geometric average. For the Hessian feature strength measure D 2,norm L on the other hand, the scale estimate increases approximately linearly with the non-uniform scaling factor s.
These graphs also show that the qualitative behaviour derived for Taylor expansions near s = 1 (Table 5) extend to non-infinitesimal scaling factors up to at least a factor of four.

Comparison with Experimental Repeatability Properties
In this section, we shall compare the above mentioned theoretical results with experimental results of the repeatability properties of the different interest point detectors under affine image transformations. Figure 5 shows a few examples of images from an image data set with 14 images from natural environments. Each such image was subjected to 10 different types of affine image transformations encompassing:
For a locally planar surface patch viewed by a scaled orthographic projection model, the non-uniform rescalings correspond to the amount of foreshortening that arises with slant angles equal to 32.8 • and 45 • , respectively. In this respect, the chosen deformations reflect reasonable requirements of robustness to viewing variations for image-based matching and recognition. For each one of the resulting 14 × (1 + 10) = 154 images, the 400 most significant interest points were detected. For interest points detected based on scale-space extrema, the image features were ranked on the scale-normalized response of the differential operator at the scale-space extremum. For interest points detected by scale linking, the image features were ranked on a significance measure obtained by integrating the scale-normalized responses of the differential operator along each feature trajectory, using the methodology described in [39].
To make a judgement of whether two image features A and B detected in two differently transformed images f and f should be regarded as belonging to the same feature or not, we associated a scale dependent circle C A and C B with each feature, with the radius of each circle equal to the detection scale of the corresponding feature measured in units of the standard deviation σ = √ t of the Gaussian kernel used for scale-space smoothing to the selected scale, in a similar way as the graphical illustrations of scale dependent image features in previous sections. Then, each such feature was transformed to the other image domain, using the affine transformation applied to the image coordinates of the center of the circle and with the scale value transformed to be proportional to the determinant of the affine transformation matrix, t = (det A) t, resulting in two new circular features C A and C B . The relative amount of overlap between any pair of circles was defined by forming the ratio between the intersection and the union of the two circles in a similar way as [50] define a corresponding ratio for ellipses Matching relations were computed in both directions and a match was then permitted only if a pair of image features maximize this ratio over replacements of either image feature by other image features in the same domain and, in addition, the value of this ratio was above a threshold m(C A , C B ) > m 0 , where we have chosen m 0 = 0.40. Furthermore, only one match was permitted for each image fea- Harris-Laplace extr -0.781 ture, and matching candidates were evaluated in decreasing order of significance. Finally, given that a total number of N matched features matches have been found from N features detected from the image f and N features from the transformed image f , the matching performance was computed as The matching performance was computed in both directions from f to f as well as from f to f and the average value of these performance measures was reported. The evaluation of the matching score was only performed for image features that are within the image domain for both images before and after the transformation. Moreover, only features within corresponding scale ranges were evaluated. In other words, if the scale range for the image f before the affine transformation was [t min , t max ], then image features were searched for in the transformed image f within the scale range [t min , t max ] = [(det A) t min , (det A) t max ]. In addition, features in a narrow scale-dependent frame near the image boundaries were suppressed, to avoid boundary effects from influencing the results. In these experiments, we used t min = 4 and t max = 256.   As can be seen from Table 6, the best repeatability properties for the interest point detectors based on scale selection from local extrema over scale are obtained for (i) the rescaled level curve curvatureκ γ −norm (L), (ii) the Hessian feature strength measure D 1,norm L and (iii) the determinant of the Hessian det H norm L.

Relations Between Experimental Results and Theoretical Results
From Table 7, we can see that the best repeatability properties for the interest point detectors based on scale selection using scale linking and weighted averaging over scale are obtained for (i) the Hessian feature strength measure D 1,norm L, (ii) the determinant of the Hessian det H norm L and (iii) the Hessian feature strength measureD 2,norm L.
The repeatability scores are furthermore generally better for scale selection based on weighted averaging over scale compared to scale selection based on local extrema over scale.
In comparison with our theoretical analysis, we have previously shown that the response of the determinant of the Hessian det H norm L to an affine Gaussian blob is affine covariant, for both scale selection based on local extrema over scale (97) and scale selection based on scale linking and weighted averaging over scale (143). For the Hessian feature strength measure D 1,norm L, a major contribution to this differential expression comes from the affine covariant determinant of the Hessian det H norm L, and the deviations from affine covariance are small for both scale selection based on local extrema over scale (100) and scale selection by weighted averaging over scale (148), provided that the nonuniform image deformations are not too far from the similarity group in the sense that the non-uniform scaling factor s used in the Taylor expansions is not too far from 1. Specifically, the two interest point detectors that have the best theoretical properties under affine image deformations in the sense of having the smallest correction terms in Table 5 are also among the top three interest point detectors for both scale selection based on local extrema over scale and scale selection based on scale linking and weighted averaging over scale. In this respect, the predictions from our theoretical analysis are in very good agreement with the experimental results.
Somewhat more surprisingly the signed Hessian feature strength measureD 2,norm L performs very well when combined with scale selection based on weighted averaging over scale. The corresponding unsigned entity D 2,norm L does not perform as well, and more comparable to the Laplacian operator ∇ 2 norm L. A possible explanation for this is that keeping the signs of the principal curvatures in the non-linear minimum operation improves the ability of this operator to distinguish between nearby competing image structures, a property that is not captured by the analysis of isolated Gaussian blobs. The repeatability properties of the unsigned version D 2,norm L are therefore in closer agreement with the presented analysis.
The rescaled level curve curvatureκ γ −norm (L) performs comparably very well for scale selection based on local extrema over scale, whereas it does not perform as well for scale selection based on scale linking and weighted averaging over scale. For scale selection based on local extrema over scale, our analysis showed that the deviation from affine covariance is comparably low (111) for the value of γ = 7/8 that we used in our experiments. For this scale selection method, the experimental results are therefore in agreement with our theoretical results. Contrary to the other interest point detectors, the repeatability properties of the rescaled level curve curvature operatorκ γ −norm (L) are, however, not improved by scale linking. A possible algorithmic explanation to this could be that the rescaled level curve curvature operatorκ γ −norm (L) contains a different type of non-linearity that may cause difficulties for the scale linking algorithm. Calculating closed-form expressions for the scale estimates obtained by weighed averaging over scale does also seem harder for this operator. We therefore leave it as an open problem to investigate if also this interest point detector could be improved by scale linking and scale selection from weighted averaging of possibly transformed magnitude values along the corresponding feature trajectories.
Experimental results in [39] show that the Hessian feature strength measure D 1,norm L and the determinant of the Hessian det H norm L and are also the two interest point detectors that give the best repeatability properties under real (calibrated) perspective image transformations. Thus, the two best interest point detectors according to our theoretical analysis are also the interest point detectors that have the best properties for real image data.

Summary and Discussion
We have analyzed the scale selection properties of (i) the Laplacian operator ∇ 2 norm L, (ii) the determinant of the Hessian det H norm L, (iii)-(iv) the new Hessian feature strength measures D 1,norm L and D 2,norm L and (iv) the rescaled level curve curvature operatorκ γ −norm (L) when applied to a Gaussian prototype blob model and using scale selection from either (vi) local extrema over scale of scalenormalized derivatives or (vii) weighted averaging of scale values along feature trajectories over scale. We have also analyzed (viii) the influence of a secondary post-smoothing step after the computation of possibly non-linear differential invariants and (ix) the sensitivity of the scale estimates to affine image deformations.
The analysis shows that the scale estimates from the determinant of the Hessian det H norm L are affine covariant for the Gaussian blob model for both scale selection based on local extrema over scale and scale selection by weighted averaging over scale. The analysis also shows that the scale estimates from the Laplacian operator ∇ 2 norm L and the Hessian feature strength measures D 1,norm L and D 2,norm L are not affine covariant. Out of the latter three operators, the Hessian feature strength measure D 1,norm L has the lowest sensitivity to affine image deformations outside the similarity group, whereas the Hessian feature strength measure D 2,norm L has the highest sensitivity. The stronger scale dependency of the Hessian feature strength measure D 2,norm L can be understood from the fact that it responds to the eigenvalue of the Hessian matrix corresponding to the slowest spatial variations.
Experimental results reported in Sect. 5.3 and [39], show that the interest point detectors based on the new Hessian feature strength measure D 1,norm L and the determinant of the Hessian det H norm L have significantly better repeatability properties under affine or perspective image transformations than the Laplacian ∇ norm L or the Hessian feature strength measure D 2,norm L. Corresponding advantages hold relative to the difference-of-Gaussians (DoG) approximation of the Laplacian operator or the Harris-Laplace operator. Hence, the interest point detectors that have the best theoretical properties under affine deformations of Gaussian blobs do also have the best experimental properties. In this respect, the predictions from this theoretical analysis agree with corresponding experimental results.
When considering scale selection for a rotationally symmetric Gaussian blob, it is shown that the scale estimates obtained by scale selection from local extrema over scale vs. weighted averaging over scale do for γ = 1 (in the 2-D case) lead to similar results for each one of these four operators. This similarity can be explained from a symmetry property of the scale-space signature under inversion transformations of the scale parameter, which correspond to reflections along the scale axis after a logarithmic transformation of the scale parameter in terms of effective scale. Because of this similarity between the scale estimates obtained from the two types of scale selection approaches, we may conclude that no additional scale compensation or scale calibration is needed between scale estimates that are obtained from weighted averaging over scale vs. local extrema over scale (provided that γ = 1).
Since the commonly used difference-of-Gaussians operator can be seen as a discrete approximation of the Laplacian operator [41], the analysis of the scale selection properties for the Laplacian operator also provides a theoretical model for analyzing the scale selection properties of the difference-of-Gaussian keypoint detector used in the SIFT operator [48]. The above mentioned results concerning the scale selection properties of the Laplacian operator ∇ 2 norm L do also extend to the Harris-Laplace operator [49] for which the spatial selection is performed based on spatial extrema of the Harris measure H , whereas the scale selection properties are solely determined by the scale selection properties of the Laplacian ∇ 2 norm L. Incorporating the scale selection properties of the determinant of the Hessian det H norm L, the results do also extend to the Harris-detHessian, detmu-Laplace and detmu-detHessian operators proposed in [39] as well as other possible types of hybrid approaches.
For scale estimates that are computed algorithmically from real-world images in an actual implementation, the robustness of image features that are obtained by scale selection from local extrema over scale or weighted scale selection over scale may, however, differ substantially. Experimental results reported in Sect. 5.3 and [39] show that weighted scale selection leads to interest points that have significantly better repeatability properties under perspective image deformations compared to interest points computed with scale selection from local extrema over scale. Theoretically, we have also seen that in several cases, weighted scale selection makes it easier to derive closedform expressions for the scale estimate than for scale selection based on local extrema over scale. In these respects, scale selection by weighted averaging over scale can have both practical and theoretical advantages.
When making use of a complementary post-smoothing operation to suppress spurious variations in the non-linear feature responses from the interest point detectors to simplify the task of scale linking, the influence of this postsmoothing operation on the scale estimates may, however, be different for different interest point detectors. If we assume that scale calibration can be performed based on the scale selection properties for Gaussian blobs, we have derived a set of relative calibration or compensation factors for each one of the five main types of interest point detectors studied in this paper.
To conclude, the analysis presented in this paper provides a theoretical basis for a defining a richer repertoire of mechanisms for computing scale-invariant image features and image descriptors for a wide range of possible applications in computer vision. In very recent work [42], these generalized scale-space interest points have been integrated with local scale-invariant image descriptors and been demonstrated to lead to highly competitive results for image-based matching and recognition.
As outlined in Appendix A.2, these interest point detectors and the analysis of these can be extended to higherdimensional image data in a rather straightforward manner. In the original work on a scale-space primal sketch for intensity data (as obtained from zero-order scale-space operations without derivative computations) [27,31] a scale linking algorithm was proposed based on (i) the detection of greylevel blobs at any level of scale and (ii) adaptive scale sampling refinements. By theoretical analysis of singularities in scale-space, the generic types of blob events in scale-space were classified [26]. Specifically, the theoretical analysis showed that although the local drift velocity in scale-space may momentarily tend to infinity near bifurcation events, it can be regarded as unlikely that an extremum point moves outside the support region of the grey-level blob. For this reason, the matching of image structures over scales used in the scale linking algorithm was based on detecting overlaps between the support regions of the grey-level blobs at adjacent scales, and local refinements were performed if the relations between partially overlapping grey-level blobs could not be decomposed into the generic types of blob events. Experimentally, it was shown that this approach allowed for extraction of intuitively reasonable image structures based on very few assumptions. Due to the adaptive scale refinements and the explicit computation of grey-level blobs, this type algorithm does, however, not appear suitable for realtime implementation on a regular processor.
In [39], a simplified and more efficient type of scale linking algorithm was proposed, where scale refinements and explicit detection of grey-level blobs are avoided. In order to still be able to make use of the highly useful property that a local extremum can be expected to be unlikely to drift outside the support region of the grey-level blob, a local descent/ascent search is instead initiated at the position corresponding to the local extrema at the adjacent scale. If the descent/ascent search is initiated within the support region of a grey-level blob, this procedure should therefore proceed towards the extremum. This approach therefore avoids the complexity problems that otherwise would occur if matching a large number of image features between adjacent scale levels.
When computing non-linear expressions from image data at, in particular fine or moderate scales, however, it turns out that one may get several local extrema of low relative amplitude in relation to neighbouring local extrema, which do not correspond to perceptually relevant image structures. Since such local perturbations cannot be expected to be stable under natural imaging conditions, they cannot be expected to be perceptually or algorithmically useful. By performing a small amount of additional Gaussian smoothing to the computed non-linear differential expression, a large number of such spurious responses can be suppressed (see Fig. 6). Thereby, the task of scale linking will be simplified for the scale linking algorithm. Due to the suppression of a large number of irrelevant features, the scale linking algorithm will also run significantly faster. Experimentally, the repeatability properties of the resulting interest points do also become better.
From scale invariance arguments it is natural to let the relation between the post-smoothing scale t post and the local scale level for computing derivatives t be given by for some fixed value of the relative post-smooth scale parameter c. In our first work on generalized scale-space interest points [39], we used c = 1/2 when linking image structures over scale into feature trajectories. Specifically, the experiments reported in Sect. 5.3 were performed using c = 1/2. In connection with our more recent work on integrating the generalized scale-space interest points with local image descriptors [42], we found that c ≈ 3/8 is a better choice when matching image descriptors under affine and perspective image deformations. Increasing c above c = 1/2 decreases the repeatability properties for image deformations outside the similarity group, whereas decreasing c below c = 1/4 leads to a lower suppression of irrelevant features which affects the repeatability properties of the resulting interest points. Beyond this, we have not tried to optimize the performance, bearing in mind that one could also consider other types of algorithms for scale linking, which could then lead to different trade-offs in terms of the complexity of the local matching step vs. efficiency or accuracy considerations.

A.2 3-D Generalizations of the Scale-Space Interest Point Detectors
For image data f : R 3 → R that are defined over a threedimensional image domain indexed by the image coordinates (x, y, z), there are natural ways to extend the interest point detectors considered in this work based on the scalespace representation L : R 3 × R + → R of f generated by convolution with the three-dimensional Gaussian kernel has been previously demonstrated to be effective for detecting sparse spatio-temporal interest points in video data (Laptev and Lindeberg [23]). The three-dimensional Hessian feature strength measure D 1,γ −norm L can be seen as a differential analogue to the spatio-temporal Harris operator, defined from second-order derivatives of image intensities only, which should allow for the computation of more dense sets of interest points, in analogy with the denser sets of interest points that are obtained from the 2-D Hessian feature strength measure D 1,γ −norm L compared to the interest points detected by the 2-D Harris operator. For 3-D image data, the parameter k should be in the interval ]0, 1 27 [. Also, the spatio-temporal determinant of the Hessian has been used for detecting spatio-temporal interest points (Willems et al. [61]) and been demonstrated to allow for denser sets of spatio-temporal interest points than the spatiotemporal Harris operator. The new 3-D interest point detectors D 1,γ −norm L, D 2,γ −norm L andG γ −norm (L) provide a way to extend this repertoire of 3-D interest point detectors. When applying these operators to spatio-temporal image data, they should be combined with a spatio-temporal scale-space representation (Lindeberg [33,40]; Lindeberg and Fagerström [45]) that allows for different scale parameters over the space vs. time. Moreover, the specific coupling between space and time should be explicitly considered when expressing invariance properties over space-time (Lindeberg et al. [40,44]).
In the following, we shall analyze the scale-selection properties of these 3-D interest point detectors when applied to volumetric (spatial) image data using the same amount of scale-space smoothing over all spatial dimensions.

A.2.2 The Pure Second-Order Interest Point Detectors
By differentiation it follows the scale normalized feature strength measure at the center (x, y, z) = (0, 0, 0) of the Gaussian blob will for the Laplacian (173), the determinant of the Hessian (174) and the Hessian feature strength measures I (176) and II (177) be given by Differentiating these expressions with respect to the scale parameter t and setting the derivative to zero gives that the extremum over scale will for all these interest point detectors be assumed at the same scalê If we would like these interest point detectors to return a scale estimate corresponding to the scale parameter of a Gaussian blob, we should therefore choose Using γ = 1 would otherwise lead to a lower scale estimatê For scale selection based on weighted averaging over scale of the scale-normalized Laplacian response (183), the corresponding scale estimates are for γ ∈]0, 5 2 [ given bŷ where ψ(u) denotes the digamma function, which is the logarithmic derivative of the Gamma function ψ(u) = Γ (u)/Γ (u). Similarly, from the explicit expression for the determinant of the Hessian at the center of the Gaussian blob (184), it follows that the weighted scale estimate will for γ ∈]0, 5 2 [ be given bŷ Due to the similarity between the explicit expressions for the Hessian feature strength measure I (185) and the determinant of the Hessian response (184) as well as the similarity between the Hessian feature strength measure II (186) and the Laplacian response (185) at the center of a Gaussian blob, the scale estimates for D 1,γ −norm L and D 2,γ −norm L will be analogous: for the Laplacian ∇ 2 norm L and the Hessian feature strength measure D 2,norm L operators, and a bias factor C 2 = e ψ(3γ )−ψ( 15 2 −3γ ) (197) for the determinant of the Hessian det H norm L and the Hessian feature strength measure D 1,norm L operators.
By following the methodology outlined in Sects. 3.2, 3.3, 4.2 and 4.3, a corresponding more detailed analysis can be performed concerning the influence of the post-smoothing operation and affine image deformations for these 3-D interest point detectors.
higher-dimensional space) at t = t 0 [34] with the corresponding γ -normalized Laplacian response which assumes the following value at the origin Differentiating this expression with respect to t gives and setting the derivative to zero giveŝ Requiring this scale estimate to be equal to t 0 implies that the γ -value for a pure second-order operator should therefore be related to the dimensionality D of the image features they should respond to according to Note that the 2-D case is special in the sense that only for this dimensionality D = 2 does scale selection based on the most scale-invariant choice γ = 1 lead to scale estimates that are equal to the diffuseness parameter t 0 of a Gaussian intensity profile. It can also be noted that only for two-dimensional blobs will the corresponding γ -normalized magnitude values of scale-space extrema be independent of the size of the blob, whereas in other dimensionalities the corresponding magnitude values need to be normalized by a scale-dependent correction factort 1−γ to lead to scale-invariant magnitude values that are independent of the diffuseness t 0 of the Ddimensional Gaussian intensity profile. His research interests in computer vision relate to scale-space representation, image features, object recognition, spatio-temporal recog-nition, focus-of-attention and shape. He has developed theories and methodologies for continuous and discrete scale-space representation, detection of salient image structures, automatic scale selection, scaleinvariant image features, affine invariant features, affine and Galilean normalization, temporal and spatio-temporal scale-space concepts as well as spatial and spatio-temporal image descriptors for image-based recognition. He does also work on computational modelling of biological vision and has previously worked on applications in medical image analysis and gesture recognition. He is author of the book Scale-Space Theory in Computer Vision.