Advertisement

Multi-view Optimization of Local Feature Geometry

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12346)

Abstract

In this work, we address the problem of refining the geometry of local image features from multiple views without known scene or camera geometry. Current approaches to local feature detection are inherently limited in their keypoint localization accuracy because they only operate on a single view. This limitation has a negative impact on downstream tasks such as Structure-from-Motion, where inaccurate keypoints lead to large errors in triangulation and camera localization. Our proposed method naturally complements the traditional feature extraction and matching paradigm. We first estimate local geometric transformations between tentative matches and then optimize the keypoint locations over multiple views jointly according to a non-linear least squares formulation. Throughout a variety of experiments, we show that our method consistently improves the triangulation and camera localization performance for both hand-crafted and learned local features.

Keywords

3D reconstruction Local features 

Notes

Acknowledgements

This work was supported by the Microsoft Mixed Reality & AI Zürich Lab PhD scholarship.

Supplementary material

500725_1_En_39_MOESM1_ESM.pdf (3.3 mb)
Supplementary material 1 (pdf 3348 KB)

References

  1. 1.
    Agarwal, S., Mierle, K., et al.: Ceres solver. http://ceres-solver.org
  2. 2.
    Arandjelovic, R., Gronat, P., Torii, A., Pajdla, T., Sivic, J.: NetVLAD: CNN architecture for weakly supervised place recognition. In: Proceedings CVPR (2016)Google Scholar
  3. 3.
    Arandjelovic, R., Zisserman, A.: Three things everyone should know to improve object retrieval. In: Proceedings of CVPR (2012)Google Scholar
  4. 4.
    Balntas, V., Lenc, K., Vedaldi, A., Mikolajczyk, K.: HPatches: a benchmark and evaluation of handcrafted and learned local descriptors. In: Proceedings of CVPR (2017)Google Scholar
  5. 5.
    Balntas, V., Riba, E., Ponsa, D., Mikolajczyk, K.: Learning local feature descriptors with triplets and shallow convolutional neural networks. In: Proceedings of BMVC (2016)Google Scholar
  6. 6.
    Barroso-Laguna, A., Riba, E., Ponsa, D., Mikolajczyk, K.: Key. Net: Keypoint detection by handcrafted and learned CNN filters. In: Proceedings ICCV (2019)Google Scholar
  7. 7.
    Bay, H., Tuytelaars, T., Van Gool, L.: SURF: speeded up robust features. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 404–417. Springer, Heidelberg (2006).  https://doi.org/10.1007/11744023_32CrossRefGoogle Scholar
  8. 8.
    Calonder, M., Lepetit, V., Strecha, C., Fua, P.: BRIEF: binary robust independent elementary features. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 778–792. Springer, Heidelberg (2010).  https://doi.org/10.1007/978-3-642-15561-1_56CrossRefGoogle Scholar
  9. 9.
    Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: Proceedings of CVPR (2009)Google Scholar
  10. 10.
    DeTone, D., Malisiewicz, T., Rabinovich, A.: SuperPoint: self-supervised interest point detection and description. In: CVPR Workshops (2018)Google Scholar
  11. 11.
    Dosovitskiy, A., et al.: FlowNet: learning optical flow with convolutional networks. In: Proceedings of ICCV (2015)Google Scholar
  12. 12.
    Dusmanu, M., et al.: D2-Net: a trainable CNN for joint detection and description of local features. In: Proceedings of CVPR (2019)Google Scholar
  13. 13.
    Eichhardt, I., Barath, D.: Optimal multi-view correction of local affine frames. In: Proceedings of BMVC (2019)Google Scholar
  14. 14.
    Goesele, M., Curless, B., Seitz, S.M.: Multi-view stereo revisited. In: Proceedings of CVPR (2006)Google Scholar
  15. 15.
    Han, X., Leung, T., Jia, Y., Sukthankar, R., Berg, A.C.: MatchNet: unifying feature and metric learning for patch-based matching. In: Proceedings of CVPR (2015)Google Scholar
  16. 16.
    Harris, C., Stephens, M.: A combined corner and edge detector. In: Proceedings of Alvey Vision Conference (1988)Google Scholar
  17. 17.
    Hartmann, W., Galliani, S., Havlena, M., Van Gool, L., Schindler, K.: Learned multi-patch similarity. In: Proceedings of ICCV (2017)Google Scholar
  18. 18.
    Heinly, J., Schönberger, J.L., Dunn, E., Frahm, J.M.: Reconstructing the world* in six days *(as captured by the Yahoo 100 million image dataset). In: Proceedings of CVPR (2015)Google Scholar
  19. 19.
    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Proceedings of ICLR (2015)Google Scholar
  20. 20.
    Kruskal, J.B.: On the shortest spanning subtree of a graph and the traveling salesman problem. Proc. Am. Math. Soc. 7(1), 48–50 (1956)MathSciNetCrossRefGoogle Scholar
  21. 21.
    Li, Y., Snavely, N., Huttenlocher, D., Fua, P.: Worldwide pose estimation using 3D point clouds. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7572, pp. 15–29. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-33718-5_2CrossRefGoogle Scholar
  22. 22.
    Li, Z., Snavely, N.: MegaDepth: learning single-view depth prediction from internet photos. In: Proceedings of CVPR (2018)Google Scholar
  23. 23.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 91–110 (2004)CrossRefGoogle Scholar
  24. 24.
    Luo, W., Schwing, A.G., Urtasun, R.: Efficient deep learning for stereo matching. In: Proceedings of CVPR (2016)Google Scholar
  25. 25.
    Luo, Z., et al.: GeoDesc: learning local descriptors by integrating geometry constraints. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11213, pp. 170–185. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01240-3_11CrossRefGoogle Scholar
  26. 26.
    Mishchuk, A., Mishkin, D., Radenovic, F., Matas, J.: Working hard to know your neighbor’s margins: local descriptor learning loss. In: Advances in NeurIPS (2017)Google Scholar
  27. 27.
    Noh, H., Araujo, A., Sim, J., Weyand, T., Han, B.: Largescale image retrieval with attentive deep local features. In: Proceedings of ICCV (2017)Google Scholar
  28. 28.
    Olson, E., Leonard, J., Teller, S.: Fast iterative optimization of pose graphs with poor initial estimates. In: Proceedings of ICRA (2006)Google Scholar
  29. 29.
    Ono, Y., Trulls, E., Fua, P., Yi, K.M.: LF-Net: learning local features from images. In: Advances in NeurIPS (2019)Google Scholar
  30. 30.
    Revaud, J., Weinzaepfel, P., de Souza, C.R., Humenberger, M.: R2D2: repeatable and reliable detector and descriptor. In: Advances in NeurIPS (2019)Google Scholar
  31. 31.
    Rocco, I., Arandjelović, R., Sivic, J.: Convolutional neural network architecture for geometric matching. In: Proceedings of CVPR (2017)Google Scholar
  32. 32.
    Rocco, I., Arandjelović, R., Sivic, J.: End-to-end weakly-supervised semantic alignment. In: Proceedings of CVPR (2018)Google Scholar
  33. 33.
    Rocco, I., Cimpoi, M., Arandjelović, R., Torii, A., Pajdla, T., Sivic, J.: Neighbourhood consensus networks. In: Advances in NeurIPS (2018)Google Scholar
  34. 34.
    Sattler, T., Leibe, B., Kobbelt, L.: Fast image-based localization using direct 2D-to-3D matching. In: Proceedings of ICCV (2011)Google Scholar
  35. 35.
    Sattler, T., et al.: Benchmarking 6DoF outdoor visual localization in changing conditions. In: Proceedings of CVPR (2018)Google Scholar
  36. 36.
    Savinov, N., Seki, A., Ladicky, L., Sattler, T., Pollefeys, M.: Quad-networks: unsupervised learning to rank for interest point detection. In: Proceedings of CVPR (2017)Google Scholar
  37. 37.
    Schönberger, J.L., Hardmeier, H., Sattler, T., Pollefeys, M.: Comparative evaluation of hand-crafted and learned local features. In: Proceedings of CVPR (2017)Google Scholar
  38. 38.
    Schönberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: Proceedings of CVPR (2016)Google Scholar
  39. 39.
    Schönberger, J.L., Zheng, E., Frahm, J.-M., Pollefeys, M.: Pixelwise view selection for unstructured multi-view stereo. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 501–518. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46487-9_31CrossRefGoogle Scholar
  40. 40.
    Schöps, T., et al.: A multi-view stereo benchmark with high-resolution images and multi-camera videos. In: Proceedings of CVPR (2017)Google Scholar
  41. 41.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Proceedings of ICLR (2015)Google Scholar
  42. 42.
    Tola, E., Lepetit, V., Fua, P.: Daisy: an efficient dense descriptor applied to wide-baseline stereo. IEEE PAMI 32(5), 815–830 (2009)CrossRefGoogle Scholar
  43. 43.
    Verdie, Y., Yi, K., Fua, P., Lepetit, V.: TILDE: a temporally invariant learned detector. In: Proceedings of CVPR (2015)Google Scholar
  44. 44.
    Yao, Y., Luo, Z., Li, S., Fang, T., Quan, L.: MVSNet: depth inference for unstructured multi-view stereo. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11212, pp. 785–801. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01237-3_47CrossRefGoogle Scholar
  45. 45.
    Yi, K.M., Trulls, E., Lepetit, V., Fua, P.: LIFT: learned invariant feature transform. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 467–483. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46466-4_28CrossRefGoogle Scholar
  46. 46.
    Zagoruyko, S., Komodakis, N.: Learning to compare image patches via convolutional neural networks. In: Proceedings of CVPR (2015)Google Scholar
  47. 47.
    Zbontar, J., LeCun, Y.: Stereo matching by training a convolutional neural network to compare image patches. J. Mach. Learn. Res. 17(1), 2287–2318 (2016)zbMATHGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Department of Computer ScienceETH ZürichZürichSwitzerland
  2. 2.MicrosoftZürichSwitzerland

Personalised recommendations