Skip to main content

LaMAR: Benchmarking Localization and Mapping for Augmented Reality

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 (ECCV 2022)

Abstract

Localization and mapping is the foundational technology for augmented reality (AR) that enables sharing and persistence of digital content in the real world. While significant progress has been made, researchers are still mostly driven by unrealistic benchmarks not representative of real-world AR scenarios. In particular, benchmarks are often based on small-scale datasets with low scene diversity, captured from stationary cameras, and lacking other sensor inputs like inertial, radio, or depth data. Furthermore, ground-truth (GT) accuracy is mostly insufficient to satisfy AR requirements. To close this gap, we introduce a new benchmark with a comprehensive capture and GT pipeline, which allow us to co-register realistic AR trajectories in diverse scenes and from heterogeneous devices at scale. To establish accurate GT, our pipeline robustly aligns the captured trajectories against laser scans in a fully automatic manner. Based on this pipeline, we publish a benchmark dataset of diverse and large-scale scenes recorded with head-mounted and hand-held AR devices. We extend several state-of-the-art methods to take advantage of the AR specific setup and evaluate them on our benchmark. Based on the results, we present novel insights on current research gaps to provide avenues for future work in the community.

P.-E. Sarlin and M. Dusmanu—Equal contribution.

V. Larsson—Now at Lund University, Sweden.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Agarwal, S., Mierle, K., et al.: Ceres solver. http://ceres-solver.org

  2. Arandjelovic, R.: Three things everyone should know to improve object retrieval. In: CVPR (2012)

    Google Scholar 

  3. Arandjelovic, R., Gronat, P., Torii, A., Pajdla, T., Sivic, J.: NetVLAD: CNN architecture for weakly supervised place recognition. In: Proceedings of the CVPR (2016)

    Google Scholar 

  4. Badino, H., Huber, D., Kanade, T.: The CMU visual localization data set (2011). http://3dvis.ri.cmu.edu/data-sets/localization

  5. Bahl, P., Padmanabhan, V.N.: RADAR: an in-building RF-based user location and tracking system. In: INFOCOM (2000)

    Google Scholar 

  6. Balntas, V., Lenc, K., Vedaldi, A., Mikolajczyk, K.: HPatches: a benchmark and evaluation of handcrafted and learned local descriptors. In: Proceedings of the CVPR (2017)

    Google Scholar 

  7. Bay, H., Ess, A., Tuytelaars, T., Van Gool, L.: Speeded-up robust features (SURF). CVIU 110, 346–359 (2008)

    Google Scholar 

  8. Brachmann, E., Humenberger, M., Rother, C., Sattler, T.: On the limits of pseudo ground truth in visual camera re-localisation. In: Proceedings of the ICCV (2021)

    Google Scholar 

  9. Brachmann, E., Rother, C.: Expert sample consensus applied to camera re-localization. In: ICCV (2019)

    Google Scholar 

  10. Brachmann, E., Rother, C.: Visual camera re-localization from RGB and RGB-D images using DSAC. T-PAMI 44, 5847–5865 (2021)

    Google Scholar 

  11. Cao, B., Araujo, A., Sim, J.: Unifying deep local and global features for image search. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12365, pp. 726–743. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58565-5_43

    Chapter  Google Scholar 

  12. Carlevaris-Bianco, N., Ushani, A.K., Eustice, R.M.: University of Michigan North Campus long-term vision and lidar dataset. Int. J. Robot. Res. 35, 1023–1035 (2015)

    Article  Google Scholar 

  13. Chan, Y.T., Tsui, W.Y., So, H.C., Chung Ching, P.: Time-of-arrival based localization under NLOS conditions. IEEE Trans. Veh. Technol. 55, 17–24 (2006)

    Article  Google Scholar 

  14. Chen, D.M., et al.: City-scale landmark identification on mobile devices. In: CVPR (2011)

    Google Scholar 

  15. Choi, S., Zhou, Q.Y., Koltun, V.: Robust reconstruction of indoor scenes. In: CVPR, pp. 5556–5565 (2015)

    Google Scholar 

  16. Chum, O., Matas, J., Kittler, J.: Locally optimized RANSAC. In: Joint Pattern Recognition Symposium, pp. 236–243 (2003)

    Google Scholar 

  17. Cohen-Steiner, D., Da, F.: A greedy Delaunay-based surface reconstruction algorithm. Vis. Comput. 20(1), 4–16 (2004)

    Article  Google Scholar 

  18. Comsa, C.R., Luo, J., Haimovich, A., Schwartz, S.: Wireless localization using time difference of arrival in narrow-band multipath systems. In: 2007 International Symposium on Signals, Circuits and Systems (2007)

    Google Scholar 

  19. Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T., Nießner, M.: ScanNet: richly-annotated 3D reconstructions of indoor scenes. In: CVPR (2017)

    Google Scholar 

  20. DeTone, D., Malisiewicz, T., Rabinovich, A.: SuperPoint: self-supervised interest point detection and description. In: CVPR workshops (2018)

    Google Scholar 

  21. Dusmanu, M., Miksik, O., Schönberger, J.L., Pollefeys, M.: Cross-descriptor visual localization and mapping. In: ICCV (2021)

    Google Scholar 

  22. Dusmanu, M., et al.: D2-Net: a trainable CNN for joint detection and description of local features. In: CVPR (2019)

    Google Scholar 

  23. Dusmanu, M., Schönberger, J.L., Sinha, S., Pollefeys, M.: Privacy-preserving image features via adversarial affine subspace embeddings. In: CVPR (2021)

    Google Scholar 

  24. Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24, 381–395 (1981)

    Article  MathSciNet  Google Scholar 

  25. Geppert, M., Larsson, V., Speciale, P., Schönberger, J.L., Pollefeys, M.: Privacy preserving structure-from-motion. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 333–350. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_20

    Chapter  Google Scholar 

  26. Geppert, M., Larsson, V., Speciale, P., Schonberger, J.L., Pollefeys, M.: Privacy preserving localization and mapping from uncalibrated cameras. In: CVPR (2021)

    Google Scholar 

  27. Grisetti, G., Kümmerle, R., Stachniss, C., Burgard, W.: A tutorial on graph-based slam. IEEE Intell. Transp. Syst. Mag. 2(4), 31–43 (2010)

    Article  Google Scholar 

  28. He, S., Chan, S.H.G.: Wi-fi fingerprint-based indoor positioning: recent advances and comparisons. IEEE Commun. Surv. Tutor. 18, 466–490 (2016)

    Article  Google Scholar 

  29. Hee Lee, G., Li, B., Pollefeys, M., Fraundorfer, F.: Minimal solutions for pose estimation of a multi-camera system. In: Inaba, M., Corke, P. (eds.) Robotics Research. STAR, vol. 114, pp. 521–538. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-28872-7_30

    Chapter  Google Scholar 

  30. Hodaň, T., et al.: BOP: benchmark for 6D object pose estimation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11214, pp. 19–35. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01249-6_2

    Chapter  Google Scholar 

  31. Humenberger, M., et al.: Robust image retrieval-based visual localization using kapture. arXiv preprint arXiv:2007.13867 (2020)

  32. Hyeon, J., Kim, J., Doh, N.: Pose correction for highly accurate visual localization in large-scale indoor spaces. In: ICCV (2021)

    Google Scholar 

  33. Jégou, H., Douze, M., Schmid, C., Pérez, P.: Aggregating local descriptors into a compact image representation. In: CVPR (2010)

    Google Scholar 

  34. Jin, Y., et al.: Image matching across wide baselines: from paper to practice. Int. J. Comput. Vis. 129, 517–547 (2020)

    Article  Google Scholar 

  35. Johns, E., Yang, G.Z.: Feature co-occurrence maps: appearance-based localisation throughout the day. In: ICRA (2013)

    Google Scholar 

  36. Kendall, A., Cipolla, R.: Geometric loss functions for camera pose regression with deep learning. In: CVPR (2017)

    Google Scholar 

  37. Kendall, A., Grimes, M., Cipolla, R.: PoseNet: a convolutional network for real-time 6-DoF camera relocalization. In: ICCV (2015)

    Google Scholar 

  38. Khalajmehrabadi, A., Gatsis, N., Akopian, D.: Modern WLAN fingerprinting indoor positioning methods and deployment challenges (2016)

    Google Scholar 

  39. Laoudias, C., Michaelides, M.P., Panayiotou, C.G.: Fault detection and mitigation in WLAN RSS fingerprint-based positioning. J. Locat. Based Serv. 6, 101–116 (2012)

    Article  Google Scholar 

  40. Lee, D., et al.: Large-scale localization datasets in crowded indoor spaces. In: CVPR (2021)

    Google Scholar 

  41. Li, X., Ylioinas, J., Verbeek, J., Kannala, J.: Scene coordinate regression with angle-based reprojection loss for camera relocalization. In: ECCV workshop (2018)

    Google Scholar 

  42. Li, Y., Snavely, N., Huttenlocher, D., Fua, P.: Worldwide pose estimation using 3D point clouds. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7572, pp. 15–29. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33718-5_2

    Chapter  Google Scholar 

  43. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV 60, 91–110 (2004)

    Article  Google Scholar 

  44. Maddern, W., Pascoe, G., Linegar, C., Newman, P.: 1 year, 1000 km: the Oxford RobotCar dataset. IJRR 36, 3–15 (2017)

    Google Scholar 

  45. Massiceti, D., Krull, A., Brachmann, E., Rother, C., Torr, P.H.S.: Random forests versus neural networks - what’s best for camera localization? In: ICRA (2017)

    Google Scholar 

  46. Meng, L., Chen, J., Tung, F., Little, J.J., Valentin, J., de Silva, C.W.: Backtracking regression forests for accurate camera relocalization. In: IROS (2017)

    Google Scholar 

  47. Mikolajczyk, K., Schmid, C.: Scale and affine invariant interest point detectors. IJCV 60, 63–86 (2004)

    Article  Google Scholar 

  48. Milford, M.J., Wyeth, G.F.: SeqSLAM: visual route-based navigation for sunny summer days and stormy winter nights. In: ICRA (2012)

    Google Scholar 

  49. Mishchuk, A., Mishkin, D., Radenovic, F., Matas, J.: Working hard to know your neighbor’s margins: local descriptor learning loss. In: NeurIPS (2017)

    Google Scholar 

  50. Ng, T., Lopez-Rodriguez, A., Balntas, V., Mikolajczyk, K.: Reassessing the limitations of CNN methods for camera pose regression. arXiv (2021)

    Google Scholar 

  51. Peng, R., Sichitiu, M.L.: Angle of arrival localization for wireless sensor networks. In: 2006 3rd Annual IEEE Communications Society on Sensor and Ad Hoc Communications and Networks (2006)

    Google Scholar 

  52. Pion, N., Humenberger, M., Csurka, G., Cabon, Y., Sattler, T.: Benchmarking image retrieval for visual localization. In: 3DV (2020)

    Google Scholar 

  53. Pless, R.: Using many cameras as one. In: CVPR (2003)

    Google Scholar 

  54. Radenović, F., Tolias, G., Chum, O.: Fine-tuning CNN image retrieval with no human annotation. T-PAMI 41(7), 1655–1668 (2018)

    Article  Google Scholar 

  55. Rau, A., Garcia-Hernando, G., Stoyanov, D., Brostow, G.J., Turmukhambetov, D.: Predicting visual overlap of images through interpretable non-metric box embeddings. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12350, pp. 629–646. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58558-7_37

    Chapter  Google Scholar 

  56. Reina, S.C., Solin, A., Rahtu, E., Kannala, J.: ADVIO: an authentic dataset for visual-inertial odometry. In: ECCV (2018). http://arxiv.org/abs/1807.09828

  57. Revaud, J., Almazán, J., de Rezende, R.S., de Souza, C.R.: Learning with average precision: training image retrieval with a listwise loss. In: ICCV (2019)

    Google Scholar 

  58. Revaud, J., Weinzaepfel, P., de Souza, C.R., Humenberger, M.: R2D2: repeatable and reliable detector and descriptor. In: NeurIPS (2019)

    Google Scholar 

  59. Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: ORB: an efficient alternative to SIFT or SURF. In: ICCV (2011)

    Google Scholar 

  60. Rusinkiewicz, S., Levoy, M.: Efficient variants of the ICP algorithm. In: 3DIM (2001)

    Google Scholar 

  61. Sarlin, P.E., Cadena, C., Siegwart, R., Dymczyk, M.: From coarse to fine: robust hierarchical localization at large scale. In: CVPR (2019)

    Google Scholar 

  62. Sarlin, P.E., DeTone, D., Malisiewicz, T., Rabinovich, A.: SuperGlue: learning feature matching with graph neural networks. In: CVPR (2020)

    Google Scholar 

  63. Sarlin, P.E., et al.: Back to the feature: learning robust camera localization from pixels to pose. In: CVPR (2021)

    Google Scholar 

  64. Sattler, T., Leibe, B., Kobbelt, L.: Improving image-based localization by active correspondence search. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7572, pp. 752–765. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33718-5_54

    Chapter  Google Scholar 

  65. Sattler, T., et al.: Benchmarking 6DOF outdoor visual localization in changing conditions. In: CVPR (2018)

    Google Scholar 

  66. Sattler, T., Weyand, T., Leibe, B., Kobbelt, L.P.: Image retrieval for image-based localization revisited. In: BMVC (2012)

    Google Scholar 

  67. Schönberger, J., Frahm, J.M.: Structure-from-motion revisited. In: CVPR (2016)

    Google Scholar 

  68. Schönberger, J.L., Hardmeier, H., Sattler, T., Pollefeys, M.: Comparative evaluation of hand-crafted and learned local features. In: Proceedings of the CVPR (2017)

    Google Scholar 

  69. Schönberger, J.L., Pollefeys, M., Geiger, A., Sattler, T.: Semantic visual localization. In: CVPR (2018)

    Google Scholar 

  70. Schops, T., et al.: A multi-view stereo benchmark with high-resolution images and multi-camera videos. In: CVPR (2017)

    Google Scholar 

  71. Shibuya, M., Sumikura, S., Sakurada, K.: Privacy preserving visual SLAM. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12367, pp. 102–118. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58542-6_7

    Chapter  Google Scholar 

  72. Shotton, J., Glocker, B., Zach, C., Izadi, S., Criminisi, A., Fitzgibbon, A.: Scene coordinate regression forests for camera relocalization in RGB-D images. In: CVPR (2013)

    Google Scholar 

  73. Speciale, P., Schönberger, J.L., Kang, S.B., Sinha, S.N., Pollefeys, M.: Privacy preserving image-based localization. In: CVPR (2019)

    Google Scholar 

  74. Sun, J., Shen, Z., Wang, Y., Bao, H., Zhou, X.: LoFTR: detector-free local feature matching with transformers. In: CVPR (2021)

    Google Scholar 

  75. Sun, X., Xie, Y., Luo, P., Wang, L.: A dataset for benchmarking image-based localization. In: CVPR (2017)

    Google Scholar 

  76. Taira, H., et al.: InLoc: indoor visual localization with dense matching and view synthesis. In: CVPR (2018)

    Google Scholar 

  77. Tian, Y., Yu, X., Fan, B., Wu, F., Heijnen, H., Balntas, V.: SOSNet: second order similarity regularization for local descriptor learning. In: CVPR (2019)

    Google Scholar 

  78. Tolias, G., Avrithis, Y., Jégou, H.: To aggregate or not to aggregate: selective match kernels for image search. In: ICCV (2013)

    Google Scholar 

  79. Torii, A., Arandjelović, R., Sivic, J., Okutomi, M., Pajdla, T.: 24/7 place recognition by view synthesis. In: CVPR (2015)

    Google Scholar 

  80. Umeyama, S.: Least-squares estimation of transformation parameters between two point patterns. IEEE Trans. Pattern Anal. Mach. Intell. 13(04), 376–380 (1991)

    Article  Google Scholar 

  81. Ungureanu, D., et al.: HoloLens 2 research mode as a tool for computer vision research (2020)

    Google Scholar 

  82. Valentin, J., Niessner, M., Shotton, J., Fitzgibbon, A., Izadi, S., Torr, P.H.S.: Exploiting uncertainty in regression forests for accurate camera relocalization. In: CVPR (2015)

    Google Scholar 

  83. Wald, J., Sattler, T., Golodetz, S., Cavallari, T., Tombari, F.: Beyond controlled environments: 3D camera re-localization in changing indoor scenes. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12352, pp. 467–487. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58571-6_28

    Chapter  Google Scholar 

  84. Wang, F., Galliani, S., Vogel, C., Speciale, P., Pollefeys, M.: PatchMatchNet: learned multi-view patchmatch stereo (2021)

    Google Scholar 

  85. Wang, S., Laskar, Z., Melekhov, I., Li, X., Kannala, J.: Continual learning for image-based camera localization. In: ICCV (2021)

    Google Scholar 

  86. Wenzel, P., et al.: 4Seasons: a cross-season dataset for multi-weather SLAM in autonomous driving. In: GCPR (2020)

    Google Scholar 

  87. Yang, H., Antonante, P., Tzoumas, V., Carlone, L.: Graduated non-convexity for robust spatial perception: from non-minimal solvers to global outlier rejection. RA-L 5(2), 1127–1134 (2020)

    Google Scholar 

  88. Yi, K.M., Trulls, E., Lepetit, V., Fua, P.: LIFT: learned invariant feature transform. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 467–483. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_28

    Chapter  Google Scholar 

  89. Zhou, Q.Y., Park, J., Koltun, V.: Open3D: a modern library for 3D data processing. arXiv:1801.09847 (2018)

Download references

Acknowledgements

This paper would not have been possible without the hard work and contributions of Gabriela Evrova, Silvano Galliani, Michael Baumgartner, Cedric Cagniart, Jeffrey Delmerico, Jonas Hein, Dawid Jeczmionek, Mirlan Karimov, Maximilian Mews, Patrick Misteli, Juan Nieto, Sònia Batllori Pallarès, Rémi Pautrat, Songyou Peng, Iago Suarez, Rui Wang, Jeremy Wanner, Silvan Weder and our colleagues in CVG at ETH Zürich and the wider Microsoft Mixed Reality & AI team.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Paul-Edouard Sarlin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sarlin, PE. et al. (2022). LaMAR: Benchmarking Localization and Mapping for Augmented Reality. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13667. Springer, Cham. https://doi.org/10.1007/978-3-031-20071-7_40

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-20071-7_40

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-20070-0

  • Online ISBN: 978-3-031-20071-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics