Abstract
Recognizing unfamiliar places is a challenging task for humans. Smartphones equipped with sensors (i.e., camera, GPS) and advancements in computer vision provide various opportunities for creating intelligent solutions. Recent studies have focused on landmark recognition. However, compared to landmarks, points of interest (POI) in urban areas pose unique challenges (i.e., repetitive features, high POI density, appearance variances, and perceptual aliasing). This study presents a hierarchical place recognition pipeline that can assist human users in exploring an urban environment. Our key contributions are (1) a novel place-wise re-ranker that fuses visual similarity and distance measures, (2) a hierarchical pipeline comprising POI detection, location-based filtering, image retrieval, and re-ranking to address urban challenges, and (3) a new densely distributed dataset comprising day and night images from urban Tokyo. The proposed hierarchical pipeline achieves a 96.86% and 94.85% recall@1 for day and night, respectively, outperforming the baselines. The novel lightweight re-ranking method improves recall and performs faster than the baselines.
Similar content being viewed by others
References
Zhang X, Wang L, Su Y. Visual place recognition: a survey from deep learning perspective. Pattern Recognit. 2021;113:107760. https://doi.org/10.1016/j.patcog.2020.107760.
Masone C, Caputo B. A survey on deep visual place recognition. IEEE Access. 2021;9:19516–47. https://doi.org/10.1109/ACCESS.2021.3054937.
Humenberger M, Cabon Y, Pion N, Weinzaepfel P, Lee D, Guérin N, Sattler T, Csurka G. Investigating the role of image retrieval for visual localization. Int J Comput Vis. 2022;130(7):1811–36. https://doi.org/10.1007/s11263-022-01615-7.
Yadav R, Kala R. Fusion of visual odometry and place recognition for slam in extreme conditions. Appl Intell. 2022;52(10):11928–47. https://doi.org/10.1007/s10489-021-03050-6.
Djenouri Y, Hatleskog J, Hjelmervik J, Bjorne E, Utstumo T, Mobarhan M. Deep learning based decomposition for visual navigation in industrial platforms. Appl Intell. 2022;52(7):8101–17. https://doi.org/10.1007/s10489-021-02908-z.
El-taher FE-z, Taha A, Courtney J, Mckeever S. A systematic review of urban navigation systems for visually impaired people. Sensors. 2021. https://doi.org/10.3390/s21093103.
Doan D, Latif Y, Chin T-J, Liu Y, Do T-T, Reid I. Scalable place recognition under appearance change for autonomous driving. 2019:9318–27. https://doi.org/10.1109/ICCV.2019.00941.
Fan C, Zhou Z, He X, Fan Y, Zhang L, Wu X, Hu X. Bio-inspired multisensor navigation system based on the skylight compass and visual place recognition for unmanned aerial vehicles. IEEE Sens J. 2022;22(15):15419–28. https://doi.org/10.1109/JSEN.2022.3187052.
Garg S, Fischer T, Milford M. Where is your place, visual place recognition? 2021. arXiv:2103.06443.
Jégou H, Douze M, Schmid C, Pérez P. Aggregating local descriptors into a compact image representation. In: 2010 IEEE computer society conference on computer vision and pattern recognition, 2010. pp. 3304–11. https://doi.org/10.1109/CVPR.2010.5540039.
Arandjelović R, Gronat P, Torii A, Pajdla T, Sivic J. Netvlad: Cnn architecture for weakly supervised place recognition. IEEE Trans Pattern Anal Mach Intell. 2018;40(6):1437–51. https://doi.org/10.1109/TPAMI.2017.2711011.
Noh H, Araujo A, Sim J, Weyand T, Han B. Large-scale image retrieval with attentive deep local features. 2017;3476–85. https://doi.org/10.1109/ICCV.2017.374.
Cao B, Araujo A, Sim J. Unifying deep local and global features for image search. In: Vedaldi A, Bischof H, Brox T, Frahm J-M, editors. Computer vision—ECCV 2020. Cham: Springer; 2020. pp. 726–43.
Weyand T, Araujo A, Cao B, Sim J. Google landmarks dataset v2—a large-scale benchmark for instance-level recognition and retrieval. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), 2020. pp. 2572–81. https://doi.org/10.1109/CVPR42600.2020.00265.
Hausler S, Garg S, Xu M, Milford M, Fischer T. Patch-netvlad: multi-scale fusion of locally-global descriptors for place recognition. In: 2021 IEEE/CVF conference on computer vision and pattern recognition (CVPR), 2021. pp. 14136–47. https://doi.org/10.1109/CVPR46437.2021.01392.
Hui L, Cheng M, Xie J, Yang J, Cheng M-M. Efficient 3d point cloud feature learning for large-scale place recognition. IEEE Trans Image Process. 2022;31:1258–70. https://doi.org/10.1109/TIP.2021.3136714.
Hettiarachchi D, Kamijo S. Visual and location information fusion for hierarchical place recognition. In: 2022 IEEE international conference on consumer electronics (ICCE), 2022. pp. 1–6. https://doi.org/10.1109/ICCE53296.2022.9730537.
Tan F, Yuan J, Ordonez V. Instance-level image retrieval using reranking transformers. In: 2021 IEEE/CVF international conference on computer vision (ICCV), 2021. pp. 12085–95. https://doi.org/10.1109/ICCV48922.2021.01189.
Kim HJ, Dunn E, Frahm J-M. Learned contextual feature reweighting for image geo-localization. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), 2017. pp. 3251–60. https://doi.org/10.1109/CVPR.2017.346.
Burges C, Shaked T, Renshaw E, Lazier A, Deeds M, Hamilton N, Hullender G. Learning to rank using gradient descent. 2005;89–96. https://doi.org/10.1145/1102351.1102363.
Lowry S, Sünderhauf N, Newman P, Leonard JJ, Cox D, Corke P, Milford MJ. Visual place recognition: a survey. IEEE Trans Robot. 2016;32(1):1–19. https://doi.org/10.1109/TRO.2015.2496823.
Lowe DG. Object recognition from local scale-invariant features. In: Proceedings of the seventh IEEE international conference on computer vision. 1999;2:1150–72. https://doi.org/10.1109/ICCV.1999.790410.
Bay H, Ess A, Tuytelaars T, Gool LV. Speeded-up robust features (SURF). Comput Vis Image Understand. 2008;110(3):346–59. https://doi.org/10.1016/j.cviu.2007.09.014.
Oliva A, Torralba A. Building the gist of a scene: the role of global image features in recognition. Prog Brain Res. 2006;155:23–36 (Publisher: Elsevier)
Dalal N, Triggs B. Histograms of oriented gradients for human detection. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), 2005;1:886–931. https://doi.org/10.1109/CVPR.2005.177.
Teichmann M, Araujo A, Zhu M, Sim J. Detect-to-retrieve: efficient regional aggregation for image search. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), 2019. pp. 5104–13. https://doi.org/10.1109/CVPR.2019.00525.
Beis JS, Lowe DG. Shape indexing using approximate nearest-neighbour search in high-dimensional spaces. In: Proceedings of IEEE computer society conference on computer vision and pattern recognition, 1997. pp. 1000–6. https://doi.org/10.1109/CVPR.1997.609451.
Nistér D, Stewénius H. Scalable recognition with a vocabulary tree. 2006;2:2161–8. https://doi.org/10.1109/CVPR.2006.264.
Jégou H, Perronnin F, Douze M, Sánchez J, Pérez P, Schmid C. Aggregating local image descriptors into compact codes. IEEE Trans Pattern Anal Mach Intell. 2012;34(9):1704–16. https://doi.org/10.1109/TPAMI.2011.235.
Chum O, Mikulík A, Perdoch M, Matas J. Total recall ii: query expansion revisited. In: CVPR 2011,2011;889–96. https://doi.org/10.1109/CVPR.2011.5995601.
Radenović F, Tolias G, Chum O. Fine-tuning cnn image retrieval with no human annotation. IEEE Trans Pattern Anal Mach Intell. 2019;41(7):1655–68. https://doi.org/10.1109/TPAMI.2018.2846566.
Chang C, Yu G, Liu C, Volkovs M. Explore-exploit graph traversal for image retrieval. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), 2019;9415–23. https://doi.org/10.1109/CVPR.2019.00965.
Iscen A, Tolias G, Avrithis Y, Furon T, Chum O. Efficient diffusion on region manifolds: Recovering small objects with compact cnn representations. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), 2017;926–35. https://doi.org/10.1109/CVPR.2017.105.
Zou Z, Shi Z, Guo Y, Ye J. Object detection in 20 years: a survey. 2019. arXiv:1905.05055.
Girshick R. Fast r-cnn. In: 2015 IEEE international conference on computer vision (ICCV), 2015;1440–8. https://doi.org/10.1109/ICCV.2015.169.
Ren S, He K, Girshick R, Sun J. Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell. 2017;39(6):1137–49. https://doi.org/10.1109/TPAMI.2016.2577031.
Szegedy C, Ioffe S, Vanhoucke V, Alemi A. Inception-v4, inception-resnet and the impact of residual connections on learning. In: AAAI conference on artificial intelligence 2016.
Liu L, Ouyang W, Wang X, Fieguth P, Chen J, Liu X, Pietikäinen M. Deep learning for generic object detection: a survey. Int J Comput Vis. 2020. https://doi.org/10.1007/s11263-019-01247-4.
Philbin J, Chum O, Isard M, Sivic J, Zisserman A. Object retrieval with large vocabularies and fast spatial matching. In: 2007 IEEE conference on computer vision and pattern recognition, 2007. pp. 1–8. https://doi.org/10.1109/CVPR.2007.383172.
Philbin J, Chum O, Isard M, Sivic J, Zisserman A. Lost in quantization: improving particular object retrieval in large scale image databases. In: 2008 IEEE conference on computer vision and pattern recognition, 2008. pp. 1–8. https://doi.org/10.1109/CVPR.2008.4587635.
Radenovic F, Iscen A, Tolias G, Avrithis Y, Chum O. Revisiting oxford and paris: Large-scale image retrieval benchmarking. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, 2018. pp. 5706–15. https://doi.org/10.1109/CVPR.2018.00598.
Jegou H, Douze M, Schmid C. Hamming embedding and weak geometric consistency for large scale image search. In: Forsyth D, Torr P, Zisserman A, editors. Computer vision—ECCV 2008. Berlin: Springer; 2008. pp. 304–17.
Warburg F, Hauberg S, López-Antequera M, Gargallo P, Kuang Y, Civera J. Mapillary street-level sequences: a dataset for lifelong place recognition. In: Computer vision and pattern recognition (CVPR) 2020.
Mequanint E, Tesfaye Y, Idrees H, Prati A, Pelillo M, Shah M. Large-scale image geo-localization using dominant sets. IEEE Trans Pattern Anal Mach Intell. 2017.https://doi.org/10.1109/TPAMI.2017.2787132.
Haklay M, Weber P. Openstreetmap: user-generated street maps. IEEE Perv Comput. 2008;7(4):12–8. https://doi.org/10.1109/MPRV.2008.80.
Kuznetsova A, Rom H, Alldrin N, Uijlings J, Krasin I, Pont-Tuset J, Kamali S, Popov S, Malloci M, Kolesnikov A. The open images dataset v4. Int J Comput Vis. 2020;128(7):1956–81 (Publisher: Springer).
Hoare CA. Quicksort. Comput J. 1962;5(1):10–6.
Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL. Microsoft COCO: common objects in context. In: Fleet D, Pajdla T, Schiele B, Tuytelaars T, editors. Computer vision—ECCV 2014. Cham: Springer; 2014. pp. 740–55.
Veness C. Calculate distance and bearing between two latitude/longitude points using haversine formula in JavaScript., Movable-type.co.uk.
Zheng C, Cham T-J, Cai J. T2net: synthetic-to-realistic translation for solving single-image depth estimation tasks. In: Proceedings of the European conference on computer vision (ECCV), 2018. pp. 767–83.
Merry K, Bettinger P. Smartphone GPS accuracy study in an urban environment. PloS One. 2019;14(7):0219890 (Publisher: Public Library of Science San Francisco, CA USA).
He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), 2016. pp. 770–8. https://doi.org/10.1109/CVPR.2016.90.
Deng J, Guo J, Xue N, Zafeiriou S. Arcface: additive angular margin loss for deep face recognition. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), 2019. pp. 4685–4694. https://doi.org/10.1109/CVPR.2019.00482.
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC. Ssd: single shot multibox detector. In: European conference on computer vision, 2016. pp. 21–37 (Springer).
Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S. Feature pyramid networks for object detection. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), 2017. pp. 936–44. https://doi.org/10.1109/CVPR.2017.106.
Mehraliyev F, Chan ICC, Choi Y, Koseoglu MA, Law R. A state-of-the-art review of smart tourism research. J Travel Tour Mark. 2020;37(1):78–91.
Gretzel U, Sigala M, Xiang Z, Koo C. Smart tourism: foundations and developments. Electron Mark. 2015;25(3):179–88. https://doi.org/10.1007/s12525-015-0196-8.
Saura JR, Ribeiro-Soriano D, Palacios-Marqués D. Assessing behavioral data science privacy issues in government artificial intelligence deployment. Govern Inf Q. 2022. https://doi.org/10.1016/j.giq.2022.101679.
Saura JR, Ribeiro-Soriano D, Palacios-Marqués D. From user-generated data to data-driven innovation: a research agenda to understand user privacy in digital markets. Int J Inf Manag. 2021;60:102331. https://doi.org/10.1016/j.ijinfomgt.2021.102331.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have no competing interests to declare that are relevant to the content of this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Hettiarachchi, D., Kamijo, S. Visual and Positioning Information Fusion Towards Urban Place Recognition. SN COMPUT. SCI. 4, 44 (2023). https://doi.org/10.1007/s42979-022-01472-8
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s42979-022-01472-8