Skip to main content

Efficient Neighbourhood Consensus Networks via Submanifold Sparse Convolutions

  • Conference paper
  • First Online:
Computer Vision – ECCV 2020 (ECCV 2020)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12354))

Included in the following conference series:

Abstract

In this work we target the problem of estimating accurately localized correspondences between a pair of images. We adopt the recent Neighbourhood Consensus Networks that have demonstrated promising performance for difficult correspondence problems and propose modifications to overcome their main limitations: large memory consumption, large inference time and poorly localized correspondences. Our proposed modifications can reduce the memory footprint and execution time more than \(10\times \), with equivalent results. This is achieved by sparsifying the correlation tensor containing tentative matches, and its subsequent processing with a 4D CNN using submanifold sparse convolutions. localization accuracy is significantly improved by processing the input images in higher resolution, which is possible due to the reduced memory footprint, and by a novel two-stage correspondence relocalization module. The proposed Sparse-NCNet method obtains state-of-the-art results on the HPatches Sequences and InLoc visual localization benchmarks, and competitive results on the Aachen Day-Night benchmark.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Arandjelović, R., Gronat, P., Torii, A., Pajdla, T., Sivic, J.: NetVLAD: CNN architecture for weakly supervised place recognition. In: CVPR (2016)

    Google Scholar 

  2. Arandjelović, R., Zisserman, A.: Three things everyone should know to improve object retrieval. In: Proceedings CVPR, pp. 2911–2918 (2012)

    Google Scholar 

  3. Balntas, V., Lenc, K., Vedaldi, A., Mikolajczyk, K.: HPatches: A benchmark and evaluation of handcrafted and learned local descriptors. In: Proceedings CVPR (2017)

    Google Scholar 

  4. Balntas, V., Hammarstrand, L., Heijnen, H., Kahl, F., Maddern, W., Mikolajczyk, K., et al.: Workshop in long-term visual localization under changing conditions. In: CVPR (2019). https://www.visuallocalization.net/workshop/cvpr/2019/

  5. Balntas, V., Johns, E., Tang, L., Mikolajczyk, K.: PN-Net: Conjoined triple deep network for learning local image descriptors (2016). arXiv preprint arXiv:1601.05030

  6. Balntas, V., Riba, E., Ponsa, D., Mikolajczyk, K.: Learning local feature descriptors with triplets and shallow convolutional neural networks. In: Proceedings BMVC (2016)

    Google Scholar 

  7. Bay, H., Tuytelaars, T., Van Gool, L.: SURF: speeded up robust features. In: Leonardis, A., Bischof, H., Pinz, Axel (eds.) ECCV 2006. LNCS, vol. 3951, pp. 404–417. Springer, Heidelberg (2006). https://doi.org/10.1007/11744023_32

    Chapter  Google Scholar 

  8. Bian, J., Lin, W.Y., Matsushita, Y., Yeung, S.K., Nguyen, T.D., Cheng, M.M.: GMS: Grid-based motion statistics for fast, ultra-robust feature correspondence. In: Proceedings CVPR (2017)

    Google Scholar 

  9. Brachmann, E., Rother, C.: Neural-guided RANSAC: learning where to sample model hypotheses. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4322–4331 (2019)

    Google Scholar 

  10. Choy, C., Gwak, J., Savarese, S.: 4D Spatio-temporal ConvNets: Minkowski convolutional neural networks. In: Proceedings CVPR (2019)

    Google Scholar 

  11. Choy, C., Park, J., Koltun, V.: Fully convolutional geometric features. In: Proceedings ICCV (2019)

    Google Scholar 

  12. DeTone, D., Malisiewicz, T., Rabinovich, A.: SuperPoint: self-supervised interest point detection and description. In: CVPR Workshops (2018)

    Google Scholar 

  13. Dusmanu, M., et al.: D2-Net: a trainable CNN for joint detection and description of local features. In: Proceedings CVPR (2019)

    Google Scholar 

  14. Gao, X.S., Hou, X.R., Tang, J., Cheng, H.F.: Complete solution classification for the perspective-three-point problem. IEEE PAMI 25(8), 930–943 (2003)

    Article  Google Scholar 

  15. Germain, H., Bourmaud, G., Lepetit, V.: Sparse-to-dense hypercolumn matching for long-term visual localization. In: 3DV (2019)

    Google Scholar 

  16. Girshick, R.: Fast R-CNN. In: Proceedings ICCV (2015)

    Google Scholar 

  17. Gojcic, Z., Zhou, C., Wegner, J.D., Guibas, L.J., Birdal, T.: Learning multiview 3D point cloud registration (2020). arXiv preprint arXiv:2001.05119

  18. Grabner, A., Roth, P.M., Lepetit, V.: 3D pose estimation and 3D model retrieval for objects in the wild. In: Proceedings CVPR (2018)

    Google Scholar 

  19. Graham, B.: Sparse 3D convolutional neural networks (2015). arXiv preprint arXiv:1505.02890

  20. Graham, B.: Spatially-sparse convolutional neural networks (2014). arXiv preprint arXiv:1409.6070

  21. Graham, B., Engelcke, M., van der Maaten, L.: 3D semantic segmentation with submanifold sparse convolutional networks. In: Proceedings CVPR (2018)

    Google Scholar 

  22. Han, X., Leung, T., Jia, Y., Sukthankar, R., Berg, A.C.: MatchNet: unifying feature and metric learning for patch-based matching. In: Proceedings CVPR (2015)

    Google Scholar 

  23. Johnson, J., Douze, M., Jégou, H.: Billion-scale similarity search with GPUs (2017). arXiv preprint arXiv:1702.08734

  24. Julesz, B.: Towards the automation of binocular depth perception. In: Proceedings IFIP Congress, pp. 439–444 (1962)

    Google Scholar 

  25. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2015)

    Google Scholar 

  26. Laguna, A.B., Riba, E., Ponsa, D., Mikolajczyk, K.: Key. Net: keypoint detection by handcrafted and learned CNN filters. In: Proceedings ICCV (2019)

    Google Scholar 

  27. Lenc, K., Vedaldi, A.: Learning covariant feature detectors. In: Hua, G., Jégou, He (eds.) ECCV 2016. LNCS, vol. 9915, pp. 100–117. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49409-8_11

    Chapter  Google Scholar 

  28. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV 60(2), 91–110 (2004)

    Article  Google Scholar 

  29. Marr, D., Poggio, T.: Cooperative computation of stereo disparity. Science 194(4262), 283–287 (1976)

    Article  Google Scholar 

  30. Mikolajczyk, K., Schmid, C.: An affine invariant interest point detector. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2350, pp. 128–142. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-47969-4_9

    Chapter  Google Scholar 

  31. Mikolajczyk, K., et al.: A comparison of affine region detectors. IJCV 65(1–2), 43–72 (2005)

    Article  Google Scholar 

  32. Mishchuk, A., Mishkin, D., Radenovic, F., Matas, J.: Working hard to know your neighbor’s margins: local descriptor learning loss. In: NIPS (2017)

    Google Scholar 

  33. Mishkin, D., Radenović, F., Matas, J.: Repeatability is not enough: learning discriminative affine regions via discriminability. In: Proceedings ECCV (2018)

    Google Scholar 

  34. Moo Yi, K., Trulls, E., Ono, Y., Lepetit, V., Salzmann, M., Fua, P.: Learning to find good correspondences. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2666–2674 (2018)

    Google Scholar 

  35. Mori, K.I., Kidode, M., Asada, H.: An iterative prediction and correction method for automatic stereocomparison. Comput. Graph. Image Process. 2(3–4), 393–401 (1973)

    Article  Google Scholar 

  36. Noh, H., Araujo, A., Sim, J., Weyand, T., Han, B.: Large-scale image retrieval with attentive deep local features. In: Proceedings ICCV (2017)

    Google Scholar 

  37. Ono, Y., Trulls, E., Fua, P., Yi, K.M.: LF-Net: learning local features from images. In: NIPS (2018)

    Google Scholar 

  38. Oron, S., Dekel, T., Xue, T., Freeman, W.T., Avidan, S.: Best-buddies similarity–robust template matching using mutual nearest neighbors. IEEE PAMI 40(8), 1799–1813 (2017)

    Article  Google Scholar 

  39. Paszke, A., et al.: Automatic differentiation in PyTorch (2017)

    Google Scholar 

  40. Persson, M., Nordberg, K.: Lambda twist: an accurate fast robust perspective three point (P3P) solver. In: Proceedings ECCV (2018)

    Google Scholar 

  41. Revaud, J., Weinzaepfel, P., de Souza, C.R., Humenberger, M.: R2D2: repeatable and reliable detector and descriptor. In: NeurIPS (2019)

    Google Scholar 

  42. Rocco, I., Arandjelović, R., Sivic, J.: Efficient neighbourhood consensus networks via submanifold sparse convolutions (2020). https://arxiv.org/abs/2004.10566

  43. Rocco, I., Arandjelović, R., Sivic, J.: Sparse neighbouhood consensus networks (2020). https://www.di.ens.fr/willow/research/sparse-ncnet/

  44. Rocco, I., Cimpoi, M., Arandjelović, R., Torii, A., Pajdla, T., Sivic, J.: Neighbourhood consensus networks. In: NeurIPS (2018)

    Google Scholar 

  45. Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: ORB: An efficient alternative to SIFT or SURF. In: Proceedings ICCV (2011)

    Google Scholar 

  46. Sarlin, P.E., DeTone, D., Malisiewicz, T., Rabinovich, A.: Superglue: learning feature matching with graph neural networks (2019). arXiv preprint arXiv:1911.11763

  47. Sattler, T., et al.: Benchmarking 6DOF outdoor visual localization in changing conditions. In: Proceedings CVPR (2018)

    Google Scholar 

  48. Schaffalitzky, F., Zisserman, A.: Automated scene matching in movies. In: Lew, M.S., Sebe, N., Eakins, J.P. (eds.) CIVR 2002. LNCS, vol. 2383, pp. 186–197. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45479-9_20

    Chapter  MATH  Google Scholar 

  49. Schmid, C., Mohr, R.: Local grayvalue invariants for image retrieval. IEEE PAMI 19(5), 530–535 (1997)

    Article  Google Scholar 

  50. Schönberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2016)

    Google Scholar 

  51. Schönberger, J.L., Zheng, E., Frahm, J.-M., Pollefeys, M.: Pixelwise view selection for unstructured multi-view stereo. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 501–518. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46487-9_31

    Chapter  Google Scholar 

  52. Sivic, J., Zisserman, A.: Video google: a text retrieval approach to object matching in videos. In: Proceedings ICCV (2003)

    Google Scholar 

  53. Taira, H., et al.: InLoc: indoor visual localization with dense matching and view synthesis. In: Proceedings CVPR (2018)

    Google Scholar 

  54. Tian, Y., Fan, B., Wu, F.: L2-Net: deep learning of discriminative patch descriptor in Euclidean space. In: Proceeding CVPR (2017)

    Google Scholar 

  55. Torii, A., Arandjelović, R., Sivic, J., Okutomi, M., Pajdla, T.: 24/7 place recognition by view synthesis. In: CVPR (2015)

    Google Scholar 

  56. Verdie, Y., Yi, K., Fua, P., Lepetit, V.: TILDE: a temporally invariant learned detector. In: Proceedings CVPR (2015)

    Google Scholar 

  57. Widya, A.R., Torii, A., Okutomi, M.: Structure from motion using dense cnn features with keypoint relocalization. IPSJ Trans. Comput. Vis. Appl. 10(1), 6 (2018)

    Article  Google Scholar 

  58. Yi, K.M., Trulls, E., Lepetit, V., Fua, P.: LIFT: learned invariant feature transform. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 467–483. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_28

    Chapter  Google Scholar 

  59. Zagoruyko, S., Komodakis, N.: Learning to compare image patches via convolutional neural networks. In: Proceedings CVPR (2015)

    Google Scholar 

  60. Zhang, J., et al.: Learning two-view correspondences and geometry using order-aware network. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5845–5854 (2019)

    Google Scholar 

  61. Zhang, Z., Deriche, R., Faugeras, O., Luong, Q.T.: A robust technique for matching two uncalibrated images through the recovery of the unknown epipolar geometry. Artif. Intell. 78(1–2), 87–119 (1995)

    Article  Google Scholar 

  62. Zhao, W.L., Jégou, H., Gravier, G.: Oriented pooling for dense and non-dense rotation-invariant features. In: Proceedings BMVC (2013)

    Google Scholar 

  63. Zhou, H., Sattler, T., Jacobs, D.W.: Evaluating local features for day-night matching. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9915, pp. 724–736. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49409-8_60

    Chapter  Google Scholar 

Download references

Acknowledgements

This work was partially supported by the European Regional Development Fund under project IMPACT (reg. no. CZ.02.1.01/0.0/0.0/15 003/0000468), Louis Vuitton ENS Chair on Artificial Intelligence, and the French government under management of Agence Nationale de la Recherche as part of the “Investissements d’avenir” program, reference ANR-19-P3IA-0001 (PRAIRIE 3IA Institute).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ignacio Rocco .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 14381 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Rocco, I., Arandjelović, R., Sivic, J. (2020). Efficient Neighbourhood Consensus Networks via Submanifold Sparse Convolutions. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12354. Springer, Cham. https://doi.org/10.1007/978-3-030-58545-7_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-58545-7_35

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-58544-0

  • Online ISBN: 978-3-030-58545-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics