Skip to main content

Map-Free Visual Relocalization: Metric Pose Relative to a Single Image

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 (ECCV 2022)

Abstract

Can we relocalize in a scene represented by a single reference image? Standard visual relocalization requires hundreds of images and scale calibration to build a scene-specific 3D map. In contrast, we propose Map-free Relocalization, i.e., using only one photo of a scene to enable instant, metric scaled relocalization. Existing datasets are not suitable to benchmark map-free relocalization, due to their focus on large scenes or their limited variability. Thus, we have constructed a new dataset of 655 small places of interest, such as sculptures, murals and fountains, collected worldwide. Each place comes with a reference image to serve as a relocalization anchor, and dozens of query images with known, metric camera poses. The dataset features changing conditions, stark viewpoint changes, high variability across places, and queries with low to no visual overlap with the reference image. We identify two viable families of existing methods to provide baseline results: relative pose regression, and feature matching combined with single-image depth prediction. While these methods show reasonable performance on some favorable scenes in our dataset, map-free relocalization proves to be a challenge that requires new, innovative solutions.

E. Arnold—Work done during internship at Niantic.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Abouelnaga, Y., Bui, M., Ilic, S.: DistillPose: lightweight camera localization using auxiliary learning. In: IROS (2021)

    Google Scholar 

  2. Apple: ARKit. https://developer.apple.com/documentation/arkit/configuration_objects/understanding_world_tracking. Accessed 6 Mar 2022

  3. Arandjelovic, R., Gronat, P., Torii, A., Pajdla, T., Sivic, J.: NetVLAD: CNN architecture for weakly supervised place recognition. In: CVPR (2016)

    Google Scholar 

  4. Balntas, V., Li, S., Prisacariu, V.: RelocNet: continuous metric learning relocalisation using neural nets. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. LNCS, vol. 11218, pp. 782–799. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01264-9_46

    Chapter  Google Scholar 

  5. Barath, D., Matas, J., Noskova, J.: MAGSAC: marginalizing sample consensus. In: CVPR (2019)

    Google Scholar 

  6. Barath, D., Noskova, J., Ivashechkin, M., Matas, J.: MAGSAC++, a fast, reliable and accurate robust estimator. In: CVPR (2020)

    Google Scholar 

  7. Besl, P., McKay, N.D.: A method for registration of 3-D shapes. IEEE TPAMI (1992)

    Google Scholar 

  8. Bhowmik, A., Gumhold, S., Rother, C., Brachmann, E.: Reinforced feature points: optimizing feature detection and description for a high-level task. In: CVPR (2020)

    Google Scholar 

  9. Brachmann, E., Humenberger, M., Rother, C., Sattler, T.: On the limits of pseudo ground truth in visual camera re-localisation. In: ICCV (2021)

    Google Scholar 

  10. Brachmann, E., et al.: DSAC-differentiable RANSAC for camera localization. In: CVPR (2017)

    Google Scholar 

  11. Brachmann, E., Rother, C.: Learning less is more - 6D camera localization via 3D surface regression. In: CVPR (2018)

    Google Scholar 

  12. Brachmann, E., Rother, C.: Expert sample consensus applied to camera re-localization. In: ICCV (2019)

    Google Scholar 

  13. Brachmann, E., Rother, C.: Neural-guided RANSAC: learning where to sample model hypotheses. In: ICCV (2019)

    Google Scholar 

  14. Brachmann, E., Rother, C.: Visual camera re-localization from RGB and RGB-D images using DSAC. IEEE TPAMI (2021)

    Google Scholar 

  15. Brahmbhatt, S., Gu, J., Kim, K., Hays, J., Kautz, J.: Geometry-aware learning of maps for camera localization. In: CVPR (2018)

    Google Scholar 

  16. Cai, R., Hariharan, B., Snavely, N., Averbuch-Elor, H.: Extreme rotation estimation using dense correlation volumes. In: CVPR (2021)

    Google Scholar 

  17. Campos, C., Tardós, J.D.: Scale-aware direct monocular odometry. In: ICUS (2021)

    Google Scholar 

  18. Chen, K., Snavely, N., Makadia, A.: Wide-baseline relative camera pose estimation with directional learning. In: CVPR (2021)

    Google Scholar 

  19. Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T., Nießner, M.: ScanNet: Richly-annotated 3D reconstructions of indoor scenes. In: CVPR (2017)

    Google Scholar 

  20. Dai, A., Nießner, M., Zollhöfer, M., Izadi, S., Theobalt, C.: Bundlefusion: real-time globally consistent 3D reconstruction using on-the-fly surface reintegration. ACM TOG (2017)

    Google Scholar 

  21. Dusmanu, M., et al.: D2-net: a trainable CNN for joint detection and description of local features. In: CVPR (2019)

    Google Scholar 

  22. Eggert, D.W., Lorusso, A., Fisher, R.B.: Estimating 3-D rigid body transformations: a comparison of four major algorithms. Mach. Vision Appl. (1997)

    Google Scholar 

  23. Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: NeurIPS (2014)

    Google Scholar 

  24. En, S., Lechervy, A., Jurie, F.: RPNet: an end-to-end network for relative camera pose estimation. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018. LNCS, vol. 11129, pp. 738–745. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11009-3_46

    Chapter  Google Scholar 

  25. Facil, J.M., Ummenhofer, B., Zhou, H., Montesano, L., Brox, T., Civera, J.: CAM-Convs: camera-aware multi-scale convolutions for single-view depth. In: CVPR (2019)

    Google Scholar 

  26. Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM (1981)

    Google Scholar 

  27. Gao, X.S., Hou, X.R., Tang, J., Cheng, H.F.: Complete solution classification for the perspective-three-point problem. IEEE TPAMI (2003)

    Google Scholar 

  28. Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: CVPR (2012)

    Google Scholar 

  29. Godard, C., Mac Aodha, O., Brostow, G.J.: Unsupervised monocular depth estimation with left-right consistency. In: CVPR (2017)

    Google Scholar 

  30. Godard, C., Mac Aodha, O., Firman, M., Brostow, G.J.: Digging into self-supervised monocular depth prediction. In: ICCV (2019)

    Google Scholar 

  31. Google: ARCore. https://developers.google.com/ar/develop/fundamentals. Accessed 6 Mar 2022

  32. Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision. Cambridge University Press, Cambridge (2003)

    Google Scholar 

  33. Humenberger, M., et al.: Robust image retrieval-based visual localization using kapture (2020)

    Google Scholar 

  34. Izadi, S., et al.: Kinectfusion: real-time 3D reconstruction and interaction using a moving depth camera. In: ACM UIST (2011)

    Google Scholar 

  35. Jin, Y., Mishkin, D., Mishchuk, A., Matas, J., Fua, P., Yi, K.M., Trulls, E.: Image matching across wide baselines: from paper to practice. IJCV (2020)

    Google Scholar 

  36. Kendall, A., Cipolla, R.: Geometric loss functions for camera pose regression with deep learning. In: CVPR (2017)

    Google Scholar 

  37. Kendall, A., Grimes, M., Cipolla, R.: PoseNet: a convolutional network for real-time 6-DOF camera relocalization. In: CVPR (2015)

    Google Scholar 

  38. Li, X., Wang, S., Zhao, Y., Verbeek, J., Kannala, J.: Hierarchical scene coordinate classification and regression for visual localization. In: CVPR (2020)

    Google Scholar 

  39. Li, Z., Snavely, N.: MegaDepth: learning single-view depth prediction from internet photos. In: CVPR (2018)

    Google Scholar 

  40. Liu, C., Kim, K., Gu, J., Furukawa, Y., Kautz, J.: PlaneRCNN: 3D plane detection and reconstruction from a single image. In: CVPR (2019)

    Google Scholar 

  41. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV (2004)

    Google Scholar 

  42. Melekhov, I., Ylioinas, J., Kannala, J., Rahtu, E.: Relative camera pose estimation using convolutional neural networks. In: Blanc-Talon, J., Penne, R., Philips, W., Popescu, D., Scheunders, P. (eds.) ACIVS 2017. LNCS, vol. 10617, pp. 675–687. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-70353-4_57

    Chapter  Google Scholar 

  43. Newcombe, R.A., et al.: KinectFusion: real-time dense surface mapping and tracking. In: ISMAR (2011)

    Google Scholar 

  44. Nister, D.: An efficient solution to the five-point relative pose problem. IEEE TPAMI (2004)

    Google Scholar 

  45. Pion, N., Humenberger, M., Csurka, G., Cabon, Y., Sattler, T.: Benchmarking image retrieval for visual localization. In: 3DV (2020)

    Google Scholar 

  46. Radenović, F., Tolias, G., Chum, O.: CNN image retrieval learns from BoW: unsupervised fine-tuning with hard examples. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 3–20. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_1

    Chapter  Google Scholar 

  47. Raguram, R., Frahm, J.-M., Pollefeys, M.: A comparative analysis of RANSAC techniques leading to adaptive real-time random sample consensus. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5303, pp. 500–513. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88688-4_37

    Chapter  Google Scholar 

  48. Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV (2021)

    Google Scholar 

  49. Ranftl, R., Koltun, V.: Deep fundamental matrix estimation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11205, pp. 292–309. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01246-5_18

    Chapter  Google Scholar 

  50. Ranftl, R., Lasinger, K., Hafner, D., Schindler, K., Koltun, V.: Towards robust monocular depth estimation: mixing datasets for zero-shot cross-dataset transfer. IEEE TPAMI (2020)

    Google Scholar 

  51. Rau, A., Garcia-Hernando, G., Stoyanov, D., Brostow, G.J., Turmukhambetov, D.: Predicting visual overlap of images through interpretable non-metric box embeddings. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12350, pp. 629–646. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58558-7_37

    Chapter  Google Scholar 

  52. Revaud, J., Weinzaepfel, P., de Souza, C.R., Humenberger, M.: R2D2: repeatable and reliable detector and descriptor. In: NeurIPS (2019)

    Google Scholar 

  53. Sarlin, P.E., Cadena, C., Siegwart, R., Dymczyk, M.: From coarse to fine: robust hierarchical localization at large scale. In: CVPR (2019)

    Google Scholar 

  54. Sarlin, P.E., DeTone, D., Malisiewicz, T., Rabinovich, A.: SuperGlue: learning feature matching with graph neural networks. In: CVPR (2020)

    Google Scholar 

  55. Sattler, T., Leibe, B., Kobbelt, L.: Fast image-based localization using direct 2D-to-3D matching. In: ICCV (2011)

    Google Scholar 

  56. Sattler, T., Leibe, B., Kobbelt, L.: Improving image-based localization by active correspondence search. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7572, pp. 752–765. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33718-5_54

    Chapter  Google Scholar 

  57. Sattler, T., Leibe, B., Kobbelt, L.: Efficient & effective prioritized matching for large-scale image-based localization. IEEE TPAMI (2017)

    Google Scholar 

  58. Sattler, T., et al.: Benchmarking 6DOF outdoor visual localization in changing conditions. In: CVPR (2018)

    Google Scholar 

  59. Sattler, T., Weyand, T., Leibe, B., Kobbelt, L.: Image retrieval for image-based localization revisited. In: BMVC (2012)

    Google Scholar 

  60. Schönberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: CVPR (2016)

    Google Scholar 

  61. Shavit, Y., Ferens, R., Keller, Y.: Learning multi-scene absolute pose regression with transformers. In: ICCV (2021)

    Google Scholar 

  62. Shotton, J., Glocker, B., Zach, C., Izadi, S., Criminisi, A., Fitzgibbon, A.: Scene coordinate regression forests for camera relocalization in RGB-D images. In: CVPR (2013)

    Google Scholar 

  63. Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from RGBD images. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7576, pp. 746–760. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33715-4_54

    Chapter  Google Scholar 

  64. Snavely, N., Seitz, S.M., Szeliski, R.: Photo tourism: exploring photo collections in 3D. In: ACM SIGGRAPH (2006)

    Google Scholar 

  65. Song, S., Yu, F., Zeng, A., Chang, A.X., Savva, M., Funkhouser, T.: Semantic scene completion from a single depth image. In: CVPR (2017)

    Google Scholar 

  66. Sun, J., Shen, Z., Wang, Y., Bao, H., Zhou, X.: LoFTR: detector-free local feature matching with transformers. In: CVPR (2021)

    Google Scholar 

  67. Sun, W., Jiang, W., Trulls, E., Tagliasacchi, A., Yi, K.M.: ACNe: attentive context normalization for robust permutation-equivariant learning. In: CVPR (2020)

    Google Scholar 

  68. Suwajanakorn, S., Snavely, N., Tompson, J., Norouzi, M.: Discovery of latent 3D keypoints via end-to-end geometric reasoning. In: NeurIPS (2018)

    Google Scholar 

  69. Tateno, K., Tombari, F., Laina, I., Navab, N.: CNN-SLAM: real-time dense monocular slam with learned depth prediction. In: CVPR (2017)

    Google Scholar 

  70. Tiwari, L., Ji, P., Tran, Q.-H., Zhuang, B., Anand, S., Chandraker, M.: Pseudo RGB-D for self-improving monocular SLAM and depth prediction. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12356, pp. 437–455. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58621-8_26

    Chapter  Google Scholar 

  71. Toft, C., Turmukhambetov, D., Sattler, T., Kahl, F., Brostow, G.J.: Single-image depth prediction makes feature matching easier. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12361, pp. 473–492. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58517-4_28

    Chapter  Google Scholar 

  72. Torii, A., Arandjelovic, R., Sivic, J., Okutomi, M., Pajdla, T.: 24/7 place recognition by view synthesis. In: CVPR (2015)

    Google Scholar 

  73. Türkoğlu, M.Ö., Brachmann, E., Schindler, K., Brostow, G., Monszpart, A.: Visual camera re-localization using graph neural networks and relative pose supervision. In: 3DV (2021)

    Google Scholar 

  74. Tyszkiewicz, M., Fua, P., Trulls, E.: Disk: learning local features with policy gradient. In: NeurIPS (2020)

    Google Scholar 

  75. Ummenhofer, B., et al.: Demon: depth and motion network for learning monocular stereo. In: CVPR (2017)

    Google Scholar 

  76. Valentin, J., et al.: Learning to navigate the energy landscape. In: 3DV (2016)

    Google Scholar 

  77. Walch, F., Hazirbas, C., Leal-Taixé, L., Sattler, T., Hilsenbeck, S., Cremers, D.: Image-based localization using LSTMs for structured feature correlation. In: ICCV (2017)

    Google Scholar 

  78. Wald, J., Sattler, T., Golodetz, S., Cavallari, T., Tombari, F.: Beyond controlled environments: 3D camera re-localization in changing indoor scenes. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12352, pp. 467–487. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58571-6_28

    Chapter  Google Scholar 

  79. Watson, J., Firman, M., Brostow, G.J., Turmukhambetov, D.: Self-supervised monocular depth hints. In: ICCV (2019)

    Google Scholar 

  80. Winkelbauer, D., Denninger, M., Triebel, R.: Learning to localize in new environments from synthetic training data. In: ICRA (2021)

    Google Scholar 

  81. Wu, C.: VisualSFM: a visual structure from motion system (2011). http://ccwu.me/vsfm/

  82. Wu, C.: Towards linear-time incremental structure from motion. In: 3DV (2013)

    Google Scholar 

  83. Xian, K., et al.: Monocular relative depth perception with web stereo data supervision. In: CVPR (2018)

    Google Scholar 

  84. Yang, N., Wang, R., Stückler, J., Cremers, D.: Deep virtual stereo odometry: leveraging deep depth prediction for monocular direct sparse odometry. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11212, pp. 835–852. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01237-3_50

    Chapter  Google Scholar 

  85. Yi, K.M., Trulls, E., Ono, Y., Lepetit, V., Salzmann, M., Fua, P.: Learning to find good correspondences. In: CVPR (2018)

    Google Scholar 

  86. Zhang, W., Kosecka, J.: Image based localization in urban environments. In: 3DV (2006)

    Google Scholar 

  87. Zhou, Q., Sattler, T., Pollefeys, M., Leal-Taixe, L.: To learn or not to learn: visual localization from essential matrices. In: ICRA (2020)

    Google Scholar 

  88. Zhou, T., Brown, M., Snavely, N., Lowe, D.G.: Unsupervised learning of depth and ego-motion from video. In: CVPR (2017)

    Google Scholar 

  89. Zhou, Y., Barnes, C., Lu, J., Yang, J., Li, H.: On the continuity of rotation representations in neural networks. In: CVPR (2019)

    Google Scholar 

Download references

Acknowledgements

We thank Pooja Srinivas, Amy Duxbury, Camille François, Brian McClendon and their teams, for help with validating and anonymizing the dataset and Galen Han for his help in building the bundle adjustment pipeline. We also thank players of Niantic games for scanning places of interests.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eduardo Arnold .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Arnold, E. et al. (2022). Map-Free Visual Relocalization: Metric Pose Relative to a Single Image. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13661. Springer, Cham. https://doi.org/10.1007/978-3-031-19769-7_40

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-19769-7_40

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-19768-0

  • Online ISBN: 978-3-031-19769-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics