Advertisement

S2DNet: Learning Image Features for Accurate Sparse-to-Dense Matching

Conference paper
  • 563 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12348)

Abstract

Establishing robust and accurate correspondences is a fundamental backbone to many computer vision algorithms. While recent learning-based feature matching methods have shown promising results in providing robust correspondences under challenging conditions, they are often limited in terms of precision. In this paper, we introduce S2DNet, a novel feature matching pipeline, designed and trained to efficiently establish both robust and accurate correspondences. By leveraging a sparse-to-dense matching paradigm, we cast the correspondence learning problem as a supervised classification task to learn to output highly peaked correspondence maps. We show that S2DNet achieves state-of-the-art results on the HPatches benchmark, as well as on several long-term visual localization datasets.

Keywords

Feature matching Classification Visual localization 

Notes

Acknowledgement

This project has received funding from the Bosch Research Foundation (Bosch Forschungsstiftung).

Supplementary material

504435_1_En_37_MOESM1_ESM.pdf (13.8 mb)
Supplementary material 1 (pdf 14174 KB)

References

  1. 1.
    Phototourism Challenge: CVPR 2019 Image Matching Workshop (2019)Google Scholar
  2. 2.
    Arandjelovic, R., Gronát, P., Torii, A., Pajdla, T., Sivic, J.: NetVLAD: CNN architecture for weakly supervised place recognition. In: Conference on Computer Vision and Pattern Recognition (2016)Google Scholar
  3. 3.
    Arandjelovic, R., Zisserman, A.: Three things everyone should know to improve object retrieval. In: Conference on Computer Vision and Pattern Recognition (2012)Google Scholar
  4. 4.
    Balntas, V., Johns, E., Tang, L., Mikolajczyk, K.: PN-Net: conjoined triple deep network for learning local image descriptors. arXiv Preprint (2016)Google Scholar
  5. 5.
    Balntas, V., Lenc, K., Vedaldi, A., Mikolajczyk, K.: HPatches: a benchmark and evaluation of handcrafted and learned local descriptors. In: Conference on Computer Vision and Pattern Recognition (2017)Google Scholar
  6. 6.
    Balntas, V., Riba, E., Ponsa, D., Mikolajczyk, K.: Learning local feature descriptors with triplets and shallow convolutional neural networks. In: British Machine Vision Conference (2016)Google Scholar
  7. 7.
    Bay, H., Tuytelaars, T., Gool, L.V.: SURF: speeded up robust features. In: European Conference on Computer Vision (2006)Google Scholar
  8. 8.
    Brachmann, E., Rother, C.: Neural-guided RANSAC: learning where to sample model hypotheses. In: ICCV (2019)Google Scholar
  9. 9.
    Brown, M.A., Hua, G., Winder, S.A.J.: Discriminative learning of local image descriptors. IEEE Trans. Pattern Anal. Mach. Intell. 33, 43–57 (2011)CrossRefGoogle Scholar
  10. 10.
    Calonder, M., Lepetit, V., Strecha, C., Fua, P.: BRIEF: binary robust independent elementary features. In: European Conference on Computer Vision (2010)Google Scholar
  11. 11.
    Choy, C.B., Gwak, J., Savarese, S., Chandraker, M.: Universal correspondence network. In: Advances in Neural Information Processing Systems, vol. 30 (2016)Google Scholar
  12. 12.
    Cummins, M.J., Newman, P.: FAB-MAP: probabilistic localization and mapping in the space of appearance. Int. J. Robot. Res. 27, 647–665 (2008)CrossRefGoogle Scholar
  13. 13.
    DeTone, D., Malisiewicz, T., Rabinovich, A.: SuperPoint: self-supervised interest point detection and description. In: CVPR Workshop (2018)Google Scholar
  14. 14.
    Dong, J., Soatto, S.: Domain-size pooling in local descriptors: DSP-SIFT. In: Conference on Computer Vision and Pattern Recognition (2014)Google Scholar
  15. 15.
    Dusmanu, M., et al.: D2-Net: a trainable CNN for joint description and detection of local features. In: Conference on Computer Vision and Pattern Recognition (2019)Google Scholar
  16. 16.
    Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24, 381–395 (1981)MathSciNetCrossRefGoogle Scholar
  17. 17.
    Gauglitz, S., Höllerer, T., Turk, M.: Evaluation of interest point detectors and feature descriptors for visual tracking. Int. J. Comput. Vision 94, 335 (2011)CrossRefGoogle Scholar
  18. 18.
    Germain, H., Bourmaud, G., Lepetit, V.: Sparse-to-dense hypercolumn matching for long-term visual localization. In: International Conference on 3D Vision (2019)Google Scholar
  19. 19.
    Hariharan, B., Arbeláez, P.A., Girshick, R.B., Malik, J.: Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (2014)Google Scholar
  20. 20.
    Harris, C., Stephens, M.: A combined corner and edge detector. In: Proceedings of Fourth Alvey Vision Conference (1988)Google Scholar
  21. 21.
    Heinly, J., Schönberger, J.L., Dunn, E., Frahm, J.M.: Reconstructing the world* in six days *(as captured by the Yahoo 100 million image dataset). In: Conference on Computer Vision and Pattern Recognition (2015)Google Scholar
  22. 22.
    Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv Preprint (2015)Google Scholar
  23. 23.
    Kim, H., Lee, D., Sim, J., Kim, C.: SOWP: spatially ordered and weighted patch descriptor for visual tracking. In: International Conference on Computer Vision (2015)Google Scholar
  24. 24.
    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. CoRR abs/1412.6980 (2014)Google Scholar
  25. 25.
    Lebeda, K., Matas, J.E.S., Chum, O.: Fixing the locally optimized RANSAC. In: British Machine Vision Conference (2012)Google Scholar
  26. 26.
    Li, Z., Snavely, N.: MegaDepth: learning single-view depth prediction from internet photos. In: Conference on Computer Vision and Pattern Recognition (2018)Google Scholar
  27. 27.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 60(2), 91–110 (2004)CrossRefGoogle Scholar
  28. 28.
    Maddern, W., Pascoe, G., Linegar, C., Newman, P.: 1 year, 1000 km: the Oxford RobotCar dataset. Int. J. Robot. Res. 36, 3–15 (2017)CrossRefGoogle Scholar
  29. 29.
    Mikolajczyk, K., Schmid, C.: Scale & affine invariant interest point detectors. Int. J. Comput. Vision 60, 63–86 (2004)CrossRefGoogle Scholar
  30. 30.
    Mikolajczyk, K., et al.: A comparison of affine region detectors. Int. J. Comput. Vision 65, 43–72 (2005)CrossRefGoogle Scholar
  31. 31.
    Mishchuk, A., Mishkin, D., Radenović, F., Matas, J.: Working hard to know your neighbor’s margins: local descriptor learning loss. In: Advances in Neural Information Processing Systems (2017)Google Scholar
  32. 32.
    Mishkin, D., Radenović, F., Matas, J.: Repeatability is not enough: learning affine regions via discriminability. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11213, pp. 287–304. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01240-3_18CrossRefGoogle Scholar
  33. 33.
    Noh, H., Araujo, A., Sim, J., Weyand, T., Han, B.: Large-scale image retrieval with attentive deep local features. In: International Conference on Computer Vision (2016)Google Scholar
  34. 34.
    Ono, Y., Trulls, E., Fua, P., Yi, K.M.: LF-Net: learning local features from images. In: NeurIPS (2018)Google Scholar
  35. 35.
    Perdoch, M., Chum, O., Matas, J.: efficient representation of local geometry for large scale object retrieval. In: Conference on Computer Vision and Pattern Recognition (2009)Google Scholar
  36. 36.
    Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: Conference on Computer Vision and Pattern Recognition (2007)Google Scholar
  37. 37.
    Revaud, J., et al.: R2D2: repeatable and reliable detector and descriptor. In: Advances in Neural Information Processing Systems (2019)Google Scholar
  38. 38.
    Rocco, I., Cimpoi, M., Arandjelović, R., Torii, A., Pajdla, T., Sivic, J.: Neighbourhood consensus networks. In: Advances in Neural Information Processing Systems (2018)Google Scholar
  39. 39.
    Rublee, E., Rabaud, V., Konolige, K., Bradski, G.R.: ORB: an efficient alternative to SIFT or SURF. In: International Conference on Computer Vision (2011)Google Scholar
  40. 40.
    Sarlin, P.E., Cadena, C., Siegwart, R., Dymczyk, M.: From coarse to fine: robust hierarchical localization at large scale. In: Conference on Computer Vision and Pattern Recognition (2019)Google Scholar
  41. 41.
    Sarlin, P.E., DeTone, D., Malisiewicz, T., Rabinovich, A.: SuperGlue: learning feature matching with graph neural networks. In: CVPR (2020). https://arxiv.org/abs/1911.11763
  42. 42.
    Sattler, T., Havlena, M., Radenovic, F., Schindler, K., Pollefeys, M.: Hyperpoints and fine vocabularies for large-scale location recognition. In: International Conference on Computer Vision (2015)Google Scholar
  43. 43.
    Sattler, T., Leibe, B., Kobbelt, L.: Efficient & effective prioritized matching for large-scale image-based localization. IEEE Trans. Pattern Anal. Mach. Intell. 39, 1744–1756 (2017)CrossRefGoogle Scholar
  44. 44.
    Sattler, T., et al.: Benchmarking 6DOF outdoor visual localization in changing conditions. In: Conference on Computer Vision and Pattern Recognition (2018)Google Scholar
  45. 45.
    Sattler, T., et al.: Are large-scale 3D models really necessary for accurate visual localization? In: Conference on Computer Vision and Pattern Recognition (2017)Google Scholar
  46. 46.
    Sattler, T., Weyand, T., Leibe, B., Kobbelt, L.: Image retrieval for image-based localization revisited. In: British Machine Vision Conference (2012)Google Scholar
  47. 47.
    Savinov, N., Seki, A., Ladicky, L., Sattler, T., Pollefeys, M.: Quad-networks: unsupervised learning to rank for interest point detection. In: Conference on Computer Vision and Pattern Recognition (2016)Google Scholar
  48. 48.
    Schönberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: Conference on Computer Vision and Pattern Recognition (2016)Google Scholar
  49. 49.
    Schönberger, J.L., Zheng, E., Pollefeys, M., Frahm, J.M.: Pixelwise view selection for unstructured multi-view stereo. In: European Conference on Computer Vision (2016)Google Scholar
  50. 50.
    Simo-Serra, E., Trulls, E., Ferraz, L., Kokkinos, I., Fua, P., Moreno-Noguer, F.: Discriminative learning of deep convolutional feature point descriptors. In: International Conference on Computer Vision (2015)Google Scholar
  51. 51.
    Simonyan, K., Vedaldi, A., Zisserman, A.: Learning local feature descriptors using convex optimisation. IEEE Trans. Pattern Anal. Mach. Intell. 36, 1573–1585 (2014)CrossRefGoogle Scholar
  52. 52.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014)Google Scholar
  53. 53.
    Svärm, L., Enqvist, O., Kahl, F., Oskarsson, M.: City-scale localization for cameras with known vertical direction. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1455–1461 (2017)CrossRefGoogle Scholar
  54. 54.
    Svärm, L., Enqvist, O., Oskarsson, M., Kahl, F.: Accurate localization and pose estimation for large 3D models. In: Conference on Computer Vision and Pattern Recognition (2014)Google Scholar
  55. 55.
    Sweeney, C., Fragoso, V., Höllerer, T., Turk, M.: Large scale SfM with the distributed camera model. In: International Conference on 3D Vision (2016)Google Scholar
  56. 56.
    Taira, H., et al.: InLoc: indoor visual localization with dense matching and view synthesis. CoRR abs/1803.10368 (2018)Google Scholar
  57. 57.
    Thomee, B., et al.: YFCC100M: the new data in multimedia research. Commun. ACM 59, 64–73 (2016)CrossRefGoogle Scholar
  58. 58.
    Tian, Y., Fan, B., Wu, F.: L2-Net: deep learning of discriminative patch descriptor in euclidean space. In: Conference on Computer Vision and Pattern Recognition (2017)Google Scholar
  59. 59.
    Tian, Y., Yu, X., Fan, B., Wu, F., Heijnen, H., Balntas, V.: SOSNet: second order similarity regularization for local descriptor learning. In: Conference on Computer Vision and Pattern Recognition (2019)Google Scholar
  60. 60.
    Toft, C., et al.: Semantic match consistency for long-term visual localization. In: European Conference on Computer Vision (2018)Google Scholar
  61. 61.
    Torii, A., Arandjelovic, R., Sivic, J., Okutomi, M., Pajdla, T.: 24/7 place recognition by view synthesis. IEEE Trans. Pattern Anal. Mach. Intell. 40 (2015)Google Scholar
  62. 62.
    Tuytelaars, T., Mikolajczyk, K.: Local invariant feature detectors: a survey. Found. Trends Comput. Graph. Vision 3 (2007)Google Scholar
  63. 63.
    Wijmans, E., Furukawa, Y.: Exploiting 2D floorplan for building-scale panorama RGBD alignment. In: Conference on Computer Vision and Pattern Recognition (2016)Google Scholar
  64. 64.
    Yang, H., Shao, L., Zheng, F., Wang, L., Song, Z.: Recent advances and trends in visual tracking: a review. Neurocomputing 74, 3823–3831 (2011)CrossRefGoogle Scholar
  65. 65.
    Yi, K.M., Trulls, E., Lepetit, V., Fua, P.: LIFT: learned invariant feature transform. In: European Conference on Computer Vision (2016)Google Scholar
  66. 66.
    Yi, K.M., Trulls, E., Ono, Y., Lepetit, V., Salzmann, M., Fua, P.: Learning to find good correspondences. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.LIGM, École des Ponts, Univ Gustave Eiffel, CNRSMarne-la-valléeFrance
  2. 2.Laboratoire IMSUniversité de BordeauxBordeauxFrance

Personalised recommendations