Skip to main content
Log in

Visual and Positioning Information Fusion Towards Urban Place Recognition

  • Original Research
  • Published:
SN Computer Science Aims and scope Submit manuscript

Abstract

Recognizing unfamiliar places is a challenging task for humans. Smartphones equipped with sensors (i.e., camera, GPS) and advancements in computer vision provide various opportunities for creating intelligent solutions. Recent studies have focused on landmark recognition. However, compared to landmarks, points of interest (POI) in urban areas pose unique challenges (i.e., repetitive features, high POI density, appearance variances, and perceptual aliasing). This study presents a hierarchical place recognition pipeline that can assist human users in exploring an urban environment. Our key contributions are (1) a novel place-wise re-ranker that fuses visual similarity and distance measures, (2) a hierarchical pipeline comprising POI detection, location-based filtering, image retrieval, and re-ranking to address urban challenges, and (3) a new densely distributed dataset comprising day and night images from urban Tokyo. The proposed hierarchical pipeline achieves a 96.86% and 94.85% recall@1 for day and night, respectively, outperforming the baselines. The novel lightweight re-ranking method improves recall and performs faster than the baselines.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Zhang X, Wang L, Su Y. Visual place recognition: a survey from deep learning perspective. Pattern Recognit. 2021;113:107760. https://doi.org/10.1016/j.patcog.2020.107760.

    Article  Google Scholar 

  2. Masone C, Caputo B. A survey on deep visual place recognition. IEEE Access. 2021;9:19516–47. https://doi.org/10.1109/ACCESS.2021.3054937.

    Article  Google Scholar 

  3. Humenberger M, Cabon Y, Pion N, Weinzaepfel P, Lee D, Guérin N, Sattler T, Csurka G. Investigating the role of image retrieval for visual localization. Int J Comput Vis. 2022;130(7):1811–36. https://doi.org/10.1007/s11263-022-01615-7.

    Article  Google Scholar 

  4. Yadav R, Kala R. Fusion of visual odometry and place recognition for slam in extreme conditions. Appl Intell. 2022;52(10):11928–47. https://doi.org/10.1007/s10489-021-03050-6.

    Article  Google Scholar 

  5. Djenouri Y, Hatleskog J, Hjelmervik J, Bjorne E, Utstumo T, Mobarhan M. Deep learning based decomposition for visual navigation in industrial platforms. Appl Intell. 2022;52(7):8101–17. https://doi.org/10.1007/s10489-021-02908-z.

    Article  Google Scholar 

  6. El-taher FE-z, Taha A, Courtney J, Mckeever S. A systematic review of urban navigation systems for visually impaired people. Sensors. 2021. https://doi.org/10.3390/s21093103.

  7. Doan D, Latif Y, Chin T-J, Liu Y, Do T-T, Reid I. Scalable place recognition under appearance change for autonomous driving. 2019:9318–27. https://doi.org/10.1109/ICCV.2019.00941.

  8. Fan C, Zhou Z, He X, Fan Y, Zhang L, Wu X, Hu X. Bio-inspired multisensor navigation system based on the skylight compass and visual place recognition for unmanned aerial vehicles. IEEE Sens J. 2022;22(15):15419–28. https://doi.org/10.1109/JSEN.2022.3187052.

    Article  Google Scholar 

  9. Garg S, Fischer T, Milford M. Where is your place, visual place recognition? 2021. arXiv:2103.06443.

  10. Jégou H, Douze M, Schmid C, Pérez P. Aggregating local descriptors into a compact image representation. In: 2010 IEEE computer society conference on computer vision and pattern recognition, 2010. pp. 3304–11. https://doi.org/10.1109/CVPR.2010.5540039.

  11. Arandjelović R, Gronat P, Torii A, Pajdla T, Sivic J. Netvlad: Cnn architecture for weakly supervised place recognition. IEEE Trans Pattern Anal Mach Intell. 2018;40(6):1437–51. https://doi.org/10.1109/TPAMI.2017.2711011.

    Article  Google Scholar 

  12. Noh H, Araujo A, Sim J, Weyand T, Han B. Large-scale image retrieval with attentive deep local features. 2017;3476–85. https://doi.org/10.1109/ICCV.2017.374.

  13. Cao B, Araujo A, Sim J. Unifying deep local and global features for image search. In: Vedaldi A, Bischof H, Brox T, Frahm J-M, editors. Computer vision—ECCV 2020. Cham: Springer; 2020. pp. 726–43.

  14. Weyand T, Araujo A, Cao B, Sim J. Google landmarks dataset v2—a large-scale benchmark for instance-level recognition and retrieval. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), 2020. pp. 2572–81. https://doi.org/10.1109/CVPR42600.2020.00265.

  15. Hausler S, Garg S, Xu M, Milford M, Fischer T. Patch-netvlad: multi-scale fusion of locally-global descriptors for place recognition. In: 2021 IEEE/CVF conference on computer vision and pattern recognition (CVPR), 2021. pp. 14136–47. https://doi.org/10.1109/CVPR46437.2021.01392.

  16. Hui L, Cheng M, Xie J, Yang J, Cheng M-M. Efficient 3d point cloud feature learning for large-scale place recognition. IEEE Trans Image Process. 2022;31:1258–70. https://doi.org/10.1109/TIP.2021.3136714.

    Article  Google Scholar 

  17. Hettiarachchi D, Kamijo S. Visual and location information fusion for hierarchical place recognition. In: 2022 IEEE international conference on consumer electronics (ICCE), 2022. pp. 1–6. https://doi.org/10.1109/ICCE53296.2022.9730537.

  18. Tan F, Yuan J, Ordonez V. Instance-level image retrieval using reranking transformers. In: 2021 IEEE/CVF international conference on computer vision (ICCV), 2021. pp. 12085–95. https://doi.org/10.1109/ICCV48922.2021.01189.

  19. Kim HJ, Dunn E, Frahm J-M. Learned contextual feature reweighting for image geo-localization. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), 2017. pp. 3251–60. https://doi.org/10.1109/CVPR.2017.346.

  20. Burges C, Shaked T, Renshaw E, Lazier A, Deeds M, Hamilton N, Hullender G. Learning to rank using gradient descent. 2005;89–96. https://doi.org/10.1145/1102351.1102363.

  21. Lowry S, Sünderhauf N, Newman P, Leonard JJ, Cox D, Corke P, Milford MJ. Visual place recognition: a survey. IEEE Trans Robot. 2016;32(1):1–19. https://doi.org/10.1109/TRO.2015.2496823.

    Article  Google Scholar 

  22. Lowe DG. Object recognition from local scale-invariant features. In: Proceedings of the seventh IEEE international conference on computer vision. 1999;2:1150–72. https://doi.org/10.1109/ICCV.1999.790410.

  23. Bay H, Ess A, Tuytelaars T, Gool LV. Speeded-up robust features (SURF). Comput Vis Image Understand. 2008;110(3):346–59. https://doi.org/10.1016/j.cviu.2007.09.014.

  24. Oliva A, Torralba A. Building the gist of a scene: the role of global image features in recognition. Prog Brain Res. 2006;155:23–36 (Publisher: Elsevier)

  25. Dalal N, Triggs B. Histograms of oriented gradients for human detection. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), 2005;1:886–931. https://doi.org/10.1109/CVPR.2005.177.

  26. Teichmann M, Araujo A, Zhu M, Sim J. Detect-to-retrieve: efficient regional aggregation for image search. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), 2019. pp. 5104–13. https://doi.org/10.1109/CVPR.2019.00525.

  27. Beis JS, Lowe DG. Shape indexing using approximate nearest-neighbour search in high-dimensional spaces. In: Proceedings of IEEE computer society conference on computer vision and pattern recognition, 1997. pp. 1000–6. https://doi.org/10.1109/CVPR.1997.609451.

  28. Nistér D, Stewénius H. Scalable recognition with a vocabulary tree. 2006;2:2161–8. https://doi.org/10.1109/CVPR.2006.264.

  29. Jégou H, Perronnin F, Douze M, Sánchez J, Pérez P, Schmid C. Aggregating local image descriptors into compact codes. IEEE Trans Pattern Anal Mach Intell. 2012;34(9):1704–16. https://doi.org/10.1109/TPAMI.2011.235.

    Article  Google Scholar 

  30. Chum O, Mikulík A, Perdoch M, Matas J. Total recall ii: query expansion revisited. In: CVPR 2011,2011;889–96. https://doi.org/10.1109/CVPR.2011.5995601.

  31. Radenović F, Tolias G, Chum O. Fine-tuning cnn image retrieval with no human annotation. IEEE Trans Pattern Anal Mach Intell. 2019;41(7):1655–68. https://doi.org/10.1109/TPAMI.2018.2846566.

    Article  Google Scholar 

  32. Chang C, Yu G, Liu C, Volkovs M. Explore-exploit graph traversal for image retrieval. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), 2019;9415–23. https://doi.org/10.1109/CVPR.2019.00965.

  33. Iscen A, Tolias G, Avrithis Y, Furon T, Chum O. Efficient diffusion on region manifolds: Recovering small objects with compact cnn representations. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), 2017;926–35. https://doi.org/10.1109/CVPR.2017.105.

  34. Zou Z, Shi Z, Guo Y, Ye J. Object detection in 20 years: a survey. 2019. arXiv:1905.05055.

  35. Girshick R. Fast r-cnn. In: 2015 IEEE international conference on computer vision (ICCV), 2015;1440–8. https://doi.org/10.1109/ICCV.2015.169.

  36. Ren S, He K, Girshick R, Sun J. Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell. 2017;39(6):1137–49. https://doi.org/10.1109/TPAMI.2016.2577031.

    Article  Google Scholar 

  37. Szegedy C, Ioffe S, Vanhoucke V, Alemi A. Inception-v4, inception-resnet and the impact of residual connections on learning. In: AAAI conference on artificial intelligence 2016.

  38. Liu L, Ouyang W, Wang X, Fieguth P, Chen J, Liu X, Pietikäinen M. Deep learning for generic object detection: a survey. Int J Comput Vis. 2020. https://doi.org/10.1007/s11263-019-01247-4.

  39. Philbin J, Chum O, Isard M, Sivic J, Zisserman A. Object retrieval with large vocabularies and fast spatial matching. In: 2007 IEEE conference on computer vision and pattern recognition, 2007. pp. 1–8. https://doi.org/10.1109/CVPR.2007.383172.

  40. Philbin J, Chum O, Isard M, Sivic J, Zisserman A. Lost in quantization: improving particular object retrieval in large scale image databases. In: 2008 IEEE conference on computer vision and pattern recognition, 2008. pp. 1–8. https://doi.org/10.1109/CVPR.2008.4587635.

  41. Radenovic F, Iscen A, Tolias G, Avrithis Y, Chum O. Revisiting oxford and paris: Large-scale image retrieval benchmarking. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, 2018. pp. 5706–15. https://doi.org/10.1109/CVPR.2018.00598.

  42. Jegou H, Douze M, Schmid C. Hamming embedding and weak geometric consistency for large scale image search. In: Forsyth D, Torr P, Zisserman A, editors. Computer vision—ECCV 2008. Berlin: Springer; 2008. pp. 304–17.

  43. Warburg F, Hauberg S, López-Antequera M, Gargallo P, Kuang Y, Civera J. Mapillary street-level sequences: a dataset for lifelong place recognition. In: Computer vision and pattern recognition (CVPR) 2020.

  44. Mequanint E, Tesfaye Y, Idrees H, Prati A, Pelillo M, Shah M. Large-scale image geo-localization using dominant sets. IEEE Trans Pattern Anal Mach Intell. 2017.https://doi.org/10.1109/TPAMI.2017.2787132.

  45. Haklay M, Weber P. Openstreetmap: user-generated street maps. IEEE Perv Comput. 2008;7(4):12–8. https://doi.org/10.1109/MPRV.2008.80.

    Article  Google Scholar 

  46. Kuznetsova A, Rom H, Alldrin N, Uijlings J, Krasin I, Pont-Tuset J, Kamali S, Popov S, Malloci M, Kolesnikov A. The open images dataset v4. Int J Comput Vis. 2020;128(7):1956–81 (Publisher: Springer).

  47. Hoare CA. Quicksort. Comput J. 1962;5(1):10–6.

    Article  MathSciNet  MATH  Google Scholar 

  48. Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL. Microsoft COCO: common objects in context. In: Fleet D, Pajdla T, Schiele B, Tuytelaars T, editors. Computer vision—ECCV 2014. Cham: Springer; 2014. pp. 740–55.

  49. Veness C. Calculate distance and bearing between two latitude/longitude points using haversine formula in JavaScript., Movable-type.co.uk.

  50. Zheng C, Cham T-J, Cai J. T2net: synthetic-to-realistic translation for solving single-image depth estimation tasks. In: Proceedings of the European conference on computer vision (ECCV), 2018. pp. 767–83.

  51. Merry K, Bettinger P. Smartphone GPS accuracy study in an urban environment. PloS One. 2019;14(7):0219890 (Publisher: Public Library of Science San Francisco, CA USA).

  52. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), 2016. pp. 770–8. https://doi.org/10.1109/CVPR.2016.90.

  53. Deng J, Guo J, Xue N, Zafeiriou S. Arcface: additive angular margin loss for deep face recognition. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), 2019. pp. 4685–4694. https://doi.org/10.1109/CVPR.2019.00482.

  54. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC. Ssd: single shot multibox detector. In: European conference on computer vision, 2016. pp. 21–37 (Springer).

  55. Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S. Feature pyramid networks for object detection. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), 2017. pp. 936–44. https://doi.org/10.1109/CVPR.2017.106.

  56. Mehraliyev F, Chan ICC, Choi Y, Koseoglu MA, Law R. A state-of-the-art review of smart tourism research. J Travel Tour Mark. 2020;37(1):78–91.

    Article  Google Scholar 

  57. Gretzel U, Sigala M, Xiang Z, Koo C. Smart tourism: foundations and developments. Electron Mark. 2015;25(3):179–88. https://doi.org/10.1007/s12525-015-0196-8.

    Article  Google Scholar 

  58. Saura JR, Ribeiro-Soriano D, Palacios-Marqués D. Assessing behavioral data science privacy issues in government artificial intelligence deployment. Govern Inf Q. 2022. https://doi.org/10.1016/j.giq.2022.101679.

  59. Saura JR, Ribeiro-Soriano D, Palacios-Marqués D. From user-generated data to data-driven innovation: a research agenda to understand user privacy in digital markets. Int J Inf Manag. 2021;60:102331. https://doi.org/10.1016/j.ijinfomgt.2021.102331.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dulmini Hettiarachchi.

Ethics declarations

Conflict of interest

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hettiarachchi, D., Kamijo, S. Visual and Positioning Information Fusion Towards Urban Place Recognition. SN COMPUT. SCI. 4, 44 (2023). https://doi.org/10.1007/s42979-022-01472-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42979-022-01472-8

Keywords

Navigation