Skip to main content
Log in

A Unified B-Spline Framework for Scale-Invariant Keypoint Detection

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Scale-invariant keypoint detection is a fundamental problem in low-level vision. To accelerate keypoint detectors (e.g. DoG, Harris-Laplace, Hessian-Laplace) that are developed in Gaussian scale-space, various fast detectors (e.g., SURF, CenSurE, and BRISK) have been developed by approximating Gaussian filters with simple box filters. However, there is no principled way to design the shape and scale of the box filters. Additionally, the involved integral image technique makes it difficult to figure out the continuous kernels that correspond to the discrete ones used in these detectors, so there is no guarantee that those good properties such as causality in the original Gaussian space can be inherited. To address these issues, in this paper, we propose a unified B-spline framework for scale-invariant keypoint detection. Owing to an approximate relationship to Gaussian kernels, the B-spline framework provides a mathematical interpretation of existing fast detectors based on integral images. In addition, from B-spline theories, we illustrate the problem in repeated integration, which is the generalized version of the integral image technique. Finally, following the dominant measures for keypoint detection and automatic scale selection, we develop B-spline determinant of Hessian (B-DoH) and B-spline Laplacian-of-Gaussian (B-LoG) as two instantiations within the unified B-spline framework. For efficient computation, we propose to use repeated running-sums to convolve images with B-spline kernels with fixed orders, which avoids the problem of integral images by introducing an extra interpolation kernel. Our B-spline detectors can be designed in a principled way without the heuristic choice of kernel shape and scales and naturally extend the popular SURF and CenSurE detectors with more complex kernels. Extensive experiments on the benchmark dataset demonstrate that the proposed detectors outperform the others in terms of repeatability and efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

Notes

  1. Available from http://www.robots.ox.ac.uk/~vgg/research/affine/

References

  • Afonso, M. V., Nascimento, J. C., & Marques, J. S. (2014). Automatic estimation of multiple motion fields from video sequences using a region matching based approach. IEEE Transactions on Multimedia, 16(1), 1–14.

    Article  Google Scholar 

  • Agrawal, M., Konolige, K., & Blas,M. R. (2008). Censure: Center surround extremas for realtime feature detection and matching. In European Conference on Computer Vision (pp. 102-115). Springer, Berlin, Heidelberg.

  • Awrangjeb, M., Lu, G., & Fraser, C. S. (2012). Performance comparisons of contour-based corner detectors. IEEE Transactions on Image Processing, 21(9), 4167–4179.

    Article  MathSciNet  Google Scholar 

  • Babaud, J., Witkin, A. P., Baudin, M., & Duda, R. O. (1986). Uniqueness of the gaussian kernel for scale-space filtering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1, 26–33.

    Article  Google Scholar 

  • Balntas, V., Lenc, K., Vedaldi, A., & Mikolajczyk, K. (2017). Hpatches: A benchmark and evaluation of handcrafted and learned local descriptors. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5173–5182).

  • Barroso-Laguna, A., Riba, E., Ponsa, D., & Mikolajczyk, K. (2019). Key. net: Keypoint detection by handcrafted and learned cnn filters. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 5836–5844).

  • Bay, H., Tuytelaars, T., & Van Gool, L. (2006). Surf: Speeded up robust features. In European Conference on Computer Vision (pp. 404-417). Springer, Berlin, Heidelberg.

  • Bay, H., Ess, A., Tuytelaars, T., & Van Gool, L. (2008). Speeded-up robust features (SURF). Computer Vision and Image Understanding, 110 (3): 346–359. ISSN 1077-3142.

  • Benbihi, A., Geist, M., & Pradalier, C. (2019). Elf: Embedded localisation of features in pre-trained cnn. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 7940–7949).

  • Bouma, H., Vilanova, A., Bescós, J. O., ter Haar Romeny, B. M., Gerritsen, F. A. (2007). Fast and accurate gaussian derivatives based on b-splines. In International Conference on Scale Space and Variational Methods in Computer Vision (pp. 406-417). Springer, Berlin, Heidelberg.

  • Bretzner, L., Laptev, I., & Lindeberg, T. (2002). Hand gesture recognition using multi-scale colour features, hierarchical models and particle filtering. In Proceedings of fifth IEEE international conference on automatic face gesture recognition (pp. 423–428). IEEE.

  • Brown, M., & Lowe, D. (2002). Invariant features from interest point groups. In British Machine Vision Conference. Citeseer.

  • Canny, J. (1987). A computational approach to edge detection. In Readings in computer vision (pp. 184–203). Elsevier, Amsterdam.

  • Chaudhury, K., Muñoz-Barrutia, A., & Unser, M. (2010). Fast space-variant elliptical filtering using box splines. Image Processing, IEEE Transactions on, 19 (9): 2290 –2306. ISSN 1057-7149. https://doi.org/10.1109/TIP.2010.2046953.

  • Crow, F. (1984). Summed-area tables for texture mapping. ACM SIGGRAPH Computer Graphics, 18(3), 207–212.

    Article  Google Scholar 

  • Csurka, G., Dance, C., Fan, L., Willamowski, J., & Bray, C. (2004). Visual categorization with bags of keypoints. In Workshop on statistical learning in computer vision, ECCV 1, pp. 1–2. Prague.

  • Deselaers, T., Keysers, D., & Ney, H. (2008). Features for image retrieval: an experimental comparison. Information retrieval, 11(2), 77–107.

    Article  Google Scholar 

  • DeTone, D., Malisiewicz, T., & Rabinovich, A. (2018). Superpoint: Self-supervised interest point detection and description. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops (pp. 224–236).

  • Dusmanu, M., Rocco, I., Pajdla, T., Pollefeys, M., Sivic, J., Torii, A., & Sattler, T. (2019). D2-net: A trainable cnn for joint description and detection of local features. In Proceedings of the ieee/cvf conference on computer vision and pattern recognition (pp. 8092–8101).

  • Fauqueur, J., Brostow, G., & Cipolla, R. (2007). Assisted video object labeling by joint tracking of regions and keypoints. In 2007 IEEE 11th International Conference on Computer Vision (pp. 1–7). IEEE.

  • DeTone, D., Malisiewicz, T., & Rabinovich, A. (2018). Superpoint: Self-supervised interest point detection and description. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops (pp. 224–236).

  • Goh, S., Goodman, T., & Lee, S. (2007). Causality properties of refinable functions and sequences. Advances in Computational Mathematics, 26(1), 231–250.

    Article  MathSciNet  Google Scholar 

  • Harris, C. G., Stephens, M., et al. (1988). A combined corner and edge detector. In Alvey vision conference 15, pp. 10–5244. Citeseer.

  • Heckbert, P. S. (1986). Filtering by repeated integration. In ACM SIGGRAPH Computer Graphics, 20, pp. 315–321. ACM.

  • Herman, G., Zhang, B., Wang, Y., Ye, G., & Chen, F. (2013). Mutual information-based method for selecting informative feature sets. Pattern Recognition, 46(12), 3315–3327.

    Article  Google Scholar 

  • Kadir, T., & Brady, M. (2001). Saliency, scale and image description. International Journal of Computer Vision, 45(2): 83–105. ISSN 0920-5691.

  • Kienzle, W., Wichmann, F., Scholkopf, B., & Franz, M. (2007). A nonparametric approach to bottom-up visual saliency. Advances in Neural Information Processing Systems, 19, 689.

    Google Scholar 

  • Koenderink, J. J. (1984). The structure of images. Biological cybernetics, 50(5), 363–370.

    Article  MathSciNet  Google Scholar 

  • S. Krig. (2016). Interest point detector and feature descriptor survey. In Computer vision metrics (pp. 187-246). Springer, Cham.

  • Łągiewka, M., Korytkowski, M., & Scherer, R. (2017). Distributed image retrieval with color and keypoint features. In 2017 IEEE International Conference on INnovations in Intelligent SysTems and Applications (INISTA) (pp. 45–50). IEEE.

  • Lawton, W., Lee, S., & Shen, Z. (1995). Characterization of compactly supported refinable splines. Advances in computational mathematics, 3(1–2), 137–145.

    Article  MathSciNet  Google Scholar 

  • Ledwich, L., & Williams, S. (2004). Reduced sift features for image retrieval and indoor localisation. In Australian conference on robotics and automation 322, pp. 3. Citeseer.

  • Leutenegger, S., Chli, M., & Siegwart, R. (2011). Brisk: Binary robust invariant scalable keypoints. In 2011 IEEE international conference on computer vision (ICCV) (pp. 2548–2555). IEEE.

  • Li, Y., Wang, S., Tian, Q., & Ding, X. (2015). A survey of recent advances in visual feature detection. Neurocomputing, 149, 736–751.

    Article  Google Scholar 

  • Lindeberg, T. (1994). Scale-space theory: A basic tool for analyzing structures at different scales. Journal of applied statistics, 21(1–2), 225–270.

    Article  Google Scholar 

  • Lindeberg, T. (1998). Feature detection with automatic scale selection. International Journal of Computer Vision, 30(2), 79–116.

    Article  Google Scholar 

  • Lindeberg, T. (2009). Scale-space. Encyclopedia of Computer Science and Engineering, (B. Wah, ed), IV: 2495–2504.

  • Lindeberg, T. (2013). Scale selection properties of generalized scale-space interest point detectors. Journal of Mathematical Imaging and vision, 46(2), 177–210.

    Article  MathSciNet  Google Scholar 

  • Lindeberg, T. (2014). Scale selection. Computer Vision: A Reference Guide, (K. Ikeuchi, ed.) (pp. 701–713).

  • Lindeberg, T. (2015). Image matching using generalized scale-space interest points. Journal of mathematical Imaging and Vision, 52(1), 3–36.

    Article  MathSciNet  Google Scholar 

  • Lindeberg, T. (2018). Spatio-temporal scale selection in video data. Journal of Mathematical Imaging and Vision, 60(4), 525–562.

    Article  MathSciNet  Google Scholar 

  • Lindeberg, T., & Bretzner, L. (2003). Real-time scale selection in hybrid multi-scale representations. In International Conference on Scale-Space Theories in Computer Vision (pp. 148-163). Springer, Berlin, Heidelberg.

  • Lorenz, C., Carlsen, I., Buzug, T., Fassnacht, C., & Weese, J. (1997). Multi-scale line segmentation with automatic estimation of width, contrast and tangential direction in 2d and 3d medical images. In CVRMed-MRCAS’97. (pp. 233-242). Springer, Berlin, Heidelberg.

  • Lowe, D. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60 (2): 91–110. ISSN 0920-5691.

  • Mair, E., Hager, G. D., Burschka, D., Suppa, M., & Hirzinger, G. (2010). Adaptive and generic corner detection based on the accelerated segment test. In European conference on Computer vision (pp. 183-196). Springer, Berlin, Heidelberg.

  • Mikolajczyk, K. (2002). Detection of local features invariant to affine transformations. PhD thesis, Institut National Polytechnique de Grenoble, France.

  • Mikolajczyk, K., Schmid, C. (2004). Scale & affine invariant interest point detectors. International Journal of Computer Vision, 60(1): 63–86. ISSN 0920-5691.

  • Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., & Gool, L. (2005). A comparison of affine region detectors. International Journal of Computer Vision, 65(1): 43–72. ISSN 0920-5691.

  • Mortensen, E., Deng, H., & Shapiro, L. (2005). A sift descriptor with global context. In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, 1, pp. 184–190. IEEE.

  • Muñoz, A., Ertlé, R., & Unser, M. (2002). Continuous wavelet transform with arbitrary scales and O (N) complexity. Signal processing, 82 (5): 749–757. ISSN 0165-1684.

  • Muñoz-Barrutia, A., Artaechevarria, X., & Ortiz-de Solorzano, C. (2010). Spatially variant convolution with scaled b-splines. Image Processing, IEEE Transactions on, 19 (1): 11 –24. ISSN 1057-7149. https://doi.org/10.1109/TIP.2009.2031235.

  • Ono, Y., Trulls Fortuny, E., Fua, P., & Yi, K. M. (2018). Lf-net: Learning local features from images. In Neural Information Processing Systems (NIPS), number CONF.

  • Revaud, J., Weinzaepfel, P., De Souza, C., Pion, N., Csurka, G., Cabon, Y., & Humenberger, M. (2019). R2d2: repeatable and reliable detector and descriptor. In Neural Information Processing Systems (NIPS).

  • Rosten, E., Porter, R., & Drummond, T. (2010). Faster and better: A machine learning approach to corner detection. IEEE transactions on pattern analysis and machine intelligence, 32(1), 105–119.

    Article  Google Scholar 

  • Rublee, E., Rabaud, V., Konolige, K., & Bradski, G. R. (2011). Orb: An efficient alternative to sift or surf. In ICCV, 11, pp. 2. Citeseer.

  • Savinov, N., Seki, A., Ladicky, L., Sattler, T., & Pollefeys, M. (2017). Quad-networks: unsupervised learning to rank for interest point detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1822–1830).

  • Tola, E., Lepetit, V., & Fua, P. (2009). Daisy: An efficient dense descriptor applied to wide-baseline stereo. IEEE transactions on pattern analysis and machine intelligence, 32(5), 815–830.

    Article  Google Scholar 

  • Tuytelaars, T., Mikolajczyk, K., et al. (2008). Local invariant feature detectors: a survey. Foundations and trends® in computer graphics and vision, 3(3), 177–280.

    Article  Google Scholar 

  • Unser, M., Aldroubi, A., & Eden, M. (1992). On the asymptotic convergence of b-spline wavelets to gabor functions. IEEE Transactions on Information Theory, 38 (2): 864 –872, Mar. ISSN 0018-9448. https://doi.org/10.1109/18.119742.

  • Unser, M., Aldroubi, A., & Eden, M. (1993a). B-spline signal processing. i. theory. Signal Processing, IEEE Transactions on, 41(2): 821 –833, Feb. ISSN 1053-587X. https://doi.org/10.1109/78.193220.

  • Unser, M., Aldroubi, A., & Eden, M. (1993). The l/sub 2/-polynomial spline pyramid. IEEE Transactions on Pattern Analysis and Machine Intelligence, 15(4), 364–379.

    Article  Google Scholar 

  • Unser, M., Aldroubi, A., & Schiff, S. (1994). Fast implementation of the continuous wavelet transform with integer scales. IEEE Transactions on Signal Processing, 42 (12): 3519 –3523, Dec. ISSN 1053-587X. https://doi.org/10.1109/78.340787.

  • van denBoomgaard, R., van derWeij, R. (2006). Gaussian convolutions numerical approximations based on interpolation. Scale-Space and Morphology in Computer Vision (pp. 205–214).

  • Viola, P., & Jones, M. (2001). Rapid object detection using a boosted cascade of simple features. In Computer Vision and Pattern Recognition, 1, 2001. https://doi.org/10.1109/CVPR.2001.990517

    Article  Google Scholar 

  • Wang, Y.-P., & Lee, S. (1998). Scale-space derived from b-splines. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(10): 1040 –1055, Oct. ISSN 0162-8828. https://doi.org/10.1109/34.722612.

  • Wang, Z., Xiao, H., He, W., Wen, F., & Yuan, K. (2013). Real-time sift-based object recognition system. In 2013 IEEE International Conference on Mechatronics and Automation (pp. 1361–1366). IEEE.

  • Witkin, A. P. (1987). Scale-space filtering. In Readings in Computer Vision (pp. 329–332). Elsevier, Amsterdam.

  • Zhang, J., Marszałek, M., Lazebnik, S., & Schmid, C. (2007). Local features and kernels for classification of texture and object categories: A comprehensive study. International Journal of Computer Vision, 73(2), 213–238.

    Article  Google Scholar 

Download references

Acknowledgements

This research was supported by the NSFC (No. 61772220 and 62172177).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xinge You.

Additional information

Communicated by Tinne Tuytelaars.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Proof of Proposition 1

Proof

It is obvious that the first derivative of the scaled \(zero^{th}\) order B-spline \(\varphi ^0_s(x)\) is \(\varDelta _s(x+\frac{s}{2})\). Using (2), the scaled B-spline of degree n can be expressed as

$$\begin{aligned} \varphi _s^n(x)=\underbrace{\varphi _s^0*\varphi _s^0*\ldots *\varphi _s^0(x)}_{n+1\ times}. \end{aligned}$$
(36)

Thus, the \((n+1)^{th}\) derivative of \(\varphi ^n_s(x)\) can be calculated using (36), i.e.,

$$\begin{aligned} D^{n+1}\varphi _s^n(x)&= \underbrace{\left( D\varphi _s^0\right) *\left( D\varphi _s^0\right) *\ldots *\left( D\varphi _s^0\right) (x)}_{n+1\ times} \nonumber \\&= \underbrace{\left( \varDelta _s\left( \cdot +\frac{s}{2}\right) \right) *\left( \varDelta _s\left( \cdot +\frac{s}{2}\right) \right) *\ldots *\left( \varDelta _s\left( \cdot +\frac{s}{2}\right) \right) (x)}_{n+1\ times}\nonumber \\&= \varDelta _s^{n+1}\left( x+s\frac{n+1}{2}\right) , \end{aligned}$$
(37)

where \(D^{n+1}\) is the \((n+1)\)-fold iteration of the differential operator \(Df(x)=\frac{\partial {f(x)}}{\partial {x}}\). Finally, using (37)

$$\begin{aligned} f*\varphi _{s,d}^n(x)&= \left( D^{-(n+1-d)}f\right) *\left( D^{n+1-d}\varphi _{s,d}^n\right) (x)\nonumber \\&= \left( D^{-(n+1-d)}f\right) *\left( D^{n+1}\varphi _s^n\right) (x)\nonumber \\&= \left( D^{-(n+1-d)}f\right) *\varDelta _s^{n+1}\left( x+s\frac{n+1}{2}\right) . \end{aligned}$$
(38)

\(\square \)

Appendix B: Proof of Theorem 1

Proof

Using (6), the \((n+1)^{th}\) finite difference of the discrete B-spline \(\phi _s^n(k)\) can be calculated as

$$\begin{aligned} \varDelta ^{n+1}\phi _s^n(k)&= \underbrace{\left( \varDelta \phi _s^0\right) *\left( \varDelta \phi _s^0\right) *\ldots *\left( \varDelta \phi _s^0\right) }_{n+1\ times}(k)\nonumber \\&= \underbrace{\varDelta _s*\varDelta _s*\ldots *\varDelta _s\left( k+\bigg \lfloor \frac{(s-1)(n+1)}{2}\bigg \rfloor \right) }_{n+1\ times}\nonumber \\&= \varDelta _s^{n+1}\left( k+\bigg \lfloor \gamma \bigg \rfloor \right) . \end{aligned}$$
(39)

Replacing \(D^{-(n+1-d)}\) with the \((n+1-d)\)-fold iteration of the running-sum operator \(\varDelta ^{-(n+1-d)}\), (21) is reformulated as

$$\begin{aligned}&f*h(k)=\left( \varDelta ^{-(n+1-d)}*f\right) *\varDelta _s^{n+1}\left( k+\lfloor \gamma ^{'}\rfloor \right) \nonumber \\&\quad = f*\varDelta ^d*\left( \varDelta ^{-(n+1)}*\varDelta _s^{n+1}\left( \cdot +\lfloor \gamma \rfloor \right) \right) \left( k+\bigg \lfloor \gamma ^{'}\bigg \rfloor -\bigg \lfloor \gamma \bigg \rfloor \right) \nonumber \\&\quad = f*\varDelta ^d*\phi _{s}^n\left( k+\bigg \lfloor \gamma ^{'}\bigg \rfloor -\bigg \lfloor \gamma \rfloor \right) \qquad using~(39)\nonumber \\&\quad = f*\phi _{s,d}^n(k) \qquad using~(10). \end{aligned}$$
(40)

\(\square \)

Appendix C: Proof of Proposition 2

Proof

From (9), we can establish the relation between \(\varphi _{s,d}^n(k)\) and \(\phi _{s,d}^n(k)\):

$$\begin{aligned} \varphi _{s,d}^n(k) = \phi _{s,d}^n*\varphi ^{n-d}\left( k+\{\gamma ^{'}\}\right) . \end{aligned}$$
(41)

Similar to the idea of repeated integration, the results remain unchanged if the finite difference operator \(\varDelta \) is applied to the kernel and running-sum operator \(\varDelta ^{-1}\) is applied to the original signal, i.e.,

$$\begin{aligned} Wf(s,k)&=f*\varphi _{s,d}^n(k) \nonumber \\&=\left( \varDelta ^{-(n+1-d)}f\right) *\left( \varDelta ^{n+1-d}\varphi _{s,d}^n\right) (k)\nonumber \\&=\left( \varDelta ^{-(n+1-d)}f\right) *\varphi ^{n-d}\left( \cdot +\{\gamma ^{'}\}\right) \left( \varDelta ^{n+1-d}\phi _{s,d}^n\right) (k) \nonumber \\&=\left( \varDelta ^{-(n+1-d)}f\right) *\varphi ^{n-d}\left( \cdot +\{\gamma ^{'}\}\right) \nonumber \\&\quad \left( \varDelta ^{n+1}\phi _{s}^n\right) \left( k+\bigg \lfloor \gamma ^{'}\bigg \rfloor -\bigg \lfloor \gamma \bigg \rfloor \right) \nonumber \\&=\left( \varDelta ^{-(n+1-d)}f\right) *\varphi ^{n-d}\left( \cdot +\{\gamma ^{'}\}\right) *\varDelta _s^{n+1}\left( k+\lfloor \gamma ^{'}\rfloor \right) , \end{aligned}$$
(42)

where the third equation derives using (41), and the last equation derives using (39). \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zheng, Q., Gong, M., You, X. et al. A Unified B-Spline Framework for Scale-Invariant Keypoint Detection. Int J Comput Vis 130, 777–799 (2022). https://doi.org/10.1007/s11263-021-01568-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-021-01568-3

Keywords

Navigation