Visual and Positioning Information Fusion Towards Urban Place Recognition

Hettiarachchi, Dulmini; Kamijo, Shunsuke

doi:10.1007/s42979-022-01472-8

Visual and Positioning Information Fusion Towards Urban Place Recognition

Original Research
Published: 07 November 2022

Volume 4, article number 44, (2023)
Cite this article

SN Computer Science Aims and scope Submit manuscript

277 Accesses
Explore all metrics

Abstract

Recognizing unfamiliar places is a challenging task for humans. Smartphones equipped with sensors (i.e., camera, GPS) and advancements in computer vision provide various opportunities for creating intelligent solutions. Recent studies have focused on landmark recognition. However, compared to landmarks, points of interest (POI) in urban areas pose unique challenges (i.e., repetitive features, high POI density, appearance variances, and perceptual aliasing). This study presents a hierarchical place recognition pipeline that can assist human users in exploring an urban environment. Our key contributions are (1) a novel place-wise re-ranker that fuses visual similarity and distance measures, (2) a hierarchical pipeline comprising POI detection, location-based filtering, image retrieval, and re-ranking to address urban challenges, and (3) a new densely distributed dataset comprising day and night images from urban Tokyo. The proposed hierarchical pipeline achieves a 96.86% and 94.85% recall@1 for day and night, respectively, outperforming the baselines. The novel lightweight re-ranking method improves recall and performs faster than the baselines.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 4

Don’t Be Confused: Region Mapping Based Visual Place Recognition

Optimal Densely Connected Networks with Pyramid Spatial Matching Scheme for Visual Place Recognition

Coarse-to-Fine Visual Place Recognition

References

Zhang X, Wang L, Su Y. Visual place recognition: a survey from deep learning perspective. Pattern Recognit. 2021;113:107760. https://doi.org/10.1016/j.patcog.2020.107760.
Article Google Scholar
Masone C, Caputo B. A survey on deep visual place recognition. IEEE Access. 2021;9:19516–47. https://doi.org/10.1109/ACCESS.2021.3054937.
Article Google Scholar
Humenberger M, Cabon Y, Pion N, Weinzaepfel P, Lee D, Guérin N, Sattler T, Csurka G. Investigating the role of image retrieval for visual localization. Int J Comput Vis. 2022;130(7):1811–36. https://doi.org/10.1007/s11263-022-01615-7.
Article Google Scholar
Yadav R, Kala R. Fusion of visual odometry and place recognition for slam in extreme conditions. Appl Intell. 2022;52(10):11928–47. https://doi.org/10.1007/s10489-021-03050-6.
Article Google Scholar
Djenouri Y, Hatleskog J, Hjelmervik J, Bjorne E, Utstumo T, Mobarhan M. Deep learning based decomposition for visual navigation in industrial platforms. Appl Intell. 2022;52(7):8101–17. https://doi.org/10.1007/s10489-021-02908-z.
Article Google Scholar
El-taher FE-z, Taha A, Courtney J, Mckeever S. A systematic review of urban navigation systems for visually impaired people. Sensors. 2021. https://doi.org/10.3390/s21093103.
Doan D, Latif Y, Chin T-J, Liu Y, Do T-T, Reid I. Scalable place recognition under appearance change for autonomous driving. 2019:9318–27. https://doi.org/10.1109/ICCV.2019.00941.
Fan C, Zhou Z, He X, Fan Y, Zhang L, Wu X, Hu X. Bio-inspired multisensor navigation system based on the skylight compass and visual place recognition for unmanned aerial vehicles. IEEE Sens J. 2022;22(15):15419–28. https://doi.org/10.1109/JSEN.2022.3187052.
Article Google Scholar
Garg S, Fischer T, Milford M. Where is your place, visual place recognition? 2021. arXiv:2103.06443.
Jégou H, Douze M, Schmid C, Pérez P. Aggregating local descriptors into a compact image representation. In: 2010 IEEE computer society conference on computer vision and pattern recognition, 2010. pp. 3304–11. https://doi.org/10.1109/CVPR.2010.5540039.
Arandjelović R, Gronat P, Torii A, Pajdla T, Sivic J. Netvlad: Cnn architecture for weakly supervised place recognition. IEEE Trans Pattern Anal Mach Intell. 2018;40(6):1437–51. https://doi.org/10.1109/TPAMI.2017.2711011.
Article Google Scholar
Noh H, Araujo A, Sim J, Weyand T, Han B. Large-scale image retrieval with attentive deep local features. 2017;3476–85. https://doi.org/10.1109/ICCV.2017.374.
Cao B, Araujo A, Sim J. Unifying deep local and global features for image search. In: Vedaldi A, Bischof H, Brox T, Frahm J-M, editors. Computer vision—ECCV 2020. Cham: Springer; 2020. pp. 726–43.
Weyand T, Araujo A, Cao B, Sim J. Google landmarks dataset v2—a large-scale benchmark for instance-level recognition and retrieval. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), 2020. pp. 2572–81. https://doi.org/10.1109/CVPR42600.2020.00265.
Hausler S, Garg S, Xu M, Milford M, Fischer T. Patch-netvlad: multi-scale fusion of locally-global descriptors for place recognition. In: 2021 IEEE/CVF conference on computer vision and pattern recognition (CVPR), 2021. pp. 14136–47. https://doi.org/10.1109/CVPR46437.2021.01392.
Hui L, Cheng M, Xie J, Yang J, Cheng M-M. Efficient 3d point cloud feature learning for large-scale place recognition. IEEE Trans Image Process. 2022;31:1258–70. https://doi.org/10.1109/TIP.2021.3136714.
Article Google Scholar
Hettiarachchi D, Kamijo S. Visual and location information fusion for hierarchical place recognition. In: 2022 IEEE international conference on consumer electronics (ICCE), 2022. pp. 1–6. https://doi.org/10.1109/ICCE53296.2022.9730537.
Tan F, Yuan J, Ordonez V. Instance-level image retrieval using reranking transformers. In: 2021 IEEE/CVF international conference on computer vision (ICCV), 2021. pp. 12085–95. https://doi.org/10.1109/ICCV48922.2021.01189.
Kim HJ, Dunn E, Frahm J-M. Learned contextual feature reweighting for image geo-localization. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), 2017. pp. 3251–60. https://doi.org/10.1109/CVPR.2017.346.
Burges C, Shaked T, Renshaw E, Lazier A, Deeds M, Hamilton N, Hullender G. Learning to rank using gradient descent. 2005;89–96. https://doi.org/10.1145/1102351.1102363.
Lowry S, Sünderhauf N, Newman P, Leonard JJ, Cox D, Corke P, Milford MJ. Visual place recognition: a survey. IEEE Trans Robot. 2016;32(1):1–19. https://doi.org/10.1109/TRO.2015.2496823.
Article Google Scholar
Lowe DG. Object recognition from local scale-invariant features. In: Proceedings of the seventh IEEE international conference on computer vision. 1999;2:1150–72. https://doi.org/10.1109/ICCV.1999.790410.
Bay H, Ess A, Tuytelaars T, Gool LV. Speeded-up robust features (SURF). Comput Vis Image Understand. 2008;110(3):346–59. https://doi.org/10.1016/j.cviu.2007.09.014.
Oliva A, Torralba A. Building the gist of a scene: the role of global image features in recognition. Prog Brain Res. 2006;155:23–36 (Publisher: Elsevier)
Dalal N, Triggs B. Histograms of oriented gradients for human detection. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), 2005;1:886–931. https://doi.org/10.1109/CVPR.2005.177.
Teichmann M, Araujo A, Zhu M, Sim J. Detect-to-retrieve: efficient regional aggregation for image search. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), 2019. pp. 5104–13. https://doi.org/10.1109/CVPR.2019.00525.
Beis JS, Lowe DG. Shape indexing using approximate nearest-neighbour search in high-dimensional spaces. In: Proceedings of IEEE computer society conference on computer vision and pattern recognition, 1997. pp. 1000–6. https://doi.org/10.1109/CVPR.1997.609451.
Nistér D, Stewénius H. Scalable recognition with a vocabulary tree. 2006;2:2161–8. https://doi.org/10.1109/CVPR.2006.264.
Jégou H, Perronnin F, Douze M, Sánchez J, Pérez P, Schmid C. Aggregating local image descriptors into compact codes. IEEE Trans Pattern Anal Mach Intell. 2012;34(9):1704–16. https://doi.org/10.1109/TPAMI.2011.235.
Article Google Scholar
Chum O, Mikulík A, Perdoch M, Matas J. Total recall ii: query expansion revisited. In: CVPR 2011,2011;889–96. https://doi.org/10.1109/CVPR.2011.5995601.
Radenović F, Tolias G, Chum O. Fine-tuning cnn image retrieval with no human annotation. IEEE Trans Pattern Anal Mach Intell. 2019;41(7):1655–68. https://doi.org/10.1109/TPAMI.2018.2846566.
Article Google Scholar
Chang C, Yu G, Liu C, Volkovs M. Explore-exploit graph traversal for image retrieval. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), 2019;9415–23. https://doi.org/10.1109/CVPR.2019.00965.
Iscen A, Tolias G, Avrithis Y, Furon T, Chum O. Efficient diffusion on region manifolds: Recovering small objects with compact cnn representations. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), 2017;926–35. https://doi.org/10.1109/CVPR.2017.105.
Zou Z, Shi Z, Guo Y, Ye J. Object detection in 20 years: a survey. 2019. arXiv:1905.05055.
Girshick R. Fast r-cnn. In: 2015 IEEE international conference on computer vision (ICCV), 2015;1440–8. https://doi.org/10.1109/ICCV.2015.169.
Ren S, He K, Girshick R, Sun J. Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell. 2017;39(6):1137–49. https://doi.org/10.1109/TPAMI.2016.2577031.
Article Google Scholar
Szegedy C, Ioffe S, Vanhoucke V, Alemi A. Inception-v4, inception-resnet and the impact of residual connections on learning. In: AAAI conference on artificial intelligence 2016.
Liu L, Ouyang W, Wang X, Fieguth P, Chen J, Liu X, Pietikäinen M. Deep learning for generic object detection: a survey. Int J Comput Vis. 2020. https://doi.org/10.1007/s11263-019-01247-4.
Philbin J, Chum O, Isard M, Sivic J, Zisserman A. Object retrieval with large vocabularies and fast spatial matching. In: 2007 IEEE conference on computer vision and pattern recognition, 2007. pp. 1–8. https://doi.org/10.1109/CVPR.2007.383172.
Philbin J, Chum O, Isard M, Sivic J, Zisserman A. Lost in quantization: improving particular object retrieval in large scale image databases. In: 2008 IEEE conference on computer vision and pattern recognition, 2008. pp. 1–8. https://doi.org/10.1109/CVPR.2008.4587635.
Radenovic F, Iscen A, Tolias G, Avrithis Y, Chum O. Revisiting oxford and paris: Large-scale image retrieval benchmarking. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, 2018. pp. 5706–15. https://doi.org/10.1109/CVPR.2018.00598.
Jegou H, Douze M, Schmid C. Hamming embedding and weak geometric consistency for large scale image search. In: Forsyth D, Torr P, Zisserman A, editors. Computer vision—ECCV 2008. Berlin: Springer; 2008. pp. 304–17.
Warburg F, Hauberg S, López-Antequera M, Gargallo P, Kuang Y, Civera J. Mapillary street-level sequences: a dataset for lifelong place recognition. In: Computer vision and pattern recognition (CVPR) 2020.
Mequanint E, Tesfaye Y, Idrees H, Prati A, Pelillo M, Shah M. Large-scale image geo-localization using dominant sets. IEEE Trans Pattern Anal Mach Intell. 2017.https://doi.org/10.1109/TPAMI.2017.2787132.
Haklay M, Weber P. Openstreetmap: user-generated street maps. IEEE Perv Comput. 2008;7(4):12–8. https://doi.org/10.1109/MPRV.2008.80.
Article Google Scholar
Kuznetsova A, Rom H, Alldrin N, Uijlings J, Krasin I, Pont-Tuset J, Kamali S, Popov S, Malloci M, Kolesnikov A. The open images dataset v4. Int J Comput Vis. 2020;128(7):1956–81 (Publisher: Springer).
Hoare CA. Quicksort. Comput J. 1962;5(1):10–6.
Article MathSciNet MATH Google Scholar
Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL. Microsoft COCO: common objects in context. In: Fleet D, Pajdla T, Schiele B, Tuytelaars T, editors. Computer vision—ECCV 2014. Cham: Springer; 2014. pp. 740–55.
Veness C. Calculate distance and bearing between two latitude/longitude points using haversine formula in JavaScript., Movable-type.co.uk.
Zheng C, Cham T-J, Cai J. T2net: synthetic-to-realistic translation for solving single-image depth estimation tasks. In: Proceedings of the European conference on computer vision (ECCV), 2018. pp. 767–83.
Merry K, Bettinger P. Smartphone GPS accuracy study in an urban environment. PloS One. 2019;14(7):0219890 (Publisher: Public Library of Science San Francisco, CA USA).
He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), 2016. pp. 770–8. https://doi.org/10.1109/CVPR.2016.90.
Deng J, Guo J, Xue N, Zafeiriou S. Arcface: additive angular margin loss for deep face recognition. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), 2019. pp. 4685–4694. https://doi.org/10.1109/CVPR.2019.00482.
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC. Ssd: single shot multibox detector. In: European conference on computer vision, 2016. pp. 21–37 (Springer).
Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S. Feature pyramid networks for object detection. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), 2017. pp. 936–44. https://doi.org/10.1109/CVPR.2017.106.
Mehraliyev F, Chan ICC, Choi Y, Koseoglu MA, Law R. A state-of-the-art review of smart tourism research. J Travel Tour Mark. 2020;37(1):78–91.
Article Google Scholar
Gretzel U, Sigala M, Xiang Z, Koo C. Smart tourism: foundations and developments. Electron Mark. 2015;25(3):179–88. https://doi.org/10.1007/s12525-015-0196-8.
Article Google Scholar
Saura JR, Ribeiro-Soriano D, Palacios-Marqués D. Assessing behavioral data science privacy issues in government artificial intelligence deployment. Govern Inf Q. 2022. https://doi.org/10.1016/j.giq.2022.101679.
Saura JR, Ribeiro-Soriano D, Palacios-Marqués D. From user-generated data to data-driven innovation: a research agenda to understand user privacy in digital markets. Int J Inf Manag. 2021;60:102331. https://doi.org/10.1016/j.ijinfomgt.2021.102331.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Graduate School of Interdisciplinary Information Studies, The University of Tokyo, Tokyo, Japan
Dulmini Hettiarachchi
Institute of Industrial Science, The University of Tokyo, Tokyo, Japan
Dulmini Hettiarachchi & Shunsuke Kamijo
Interfaculty Initiative in Information Studies, The University of Tokyo, Tokyo, Japan
Shunsuke Kamijo

Authors

Dulmini Hettiarachchi
View author publications
You can also search for this author in PubMed Google Scholar
Shunsuke Kamijo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dulmini Hettiarachchi.

Ethics declarations

Conflict of interest

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Hettiarachchi, D., Kamijo, S. Visual and Positioning Information Fusion Towards Urban Place Recognition. SN COMPUT. SCI. 4, 44 (2023). https://doi.org/10.1007/s42979-022-01472-8

Download citation

Received: 20 June 2022
Accepted: 21 October 2022
Published: 07 November 2022
DOI: https://doi.org/10.1007/s42979-022-01472-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Visual and Positioning Information Fusion Towards Urban Place Recognition

Abstract

Access this article

Similar content being viewed by others

Don’t Be Confused: Region Mapping Based Visual Place Recognition

Optimal Densely Connected Networks with Pyramid Spatial Matching Scheme for Visual Place Recognition

Coarse-to-Fine Visual Place Recognition

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Visual and Positioning Information Fusion Towards Urban Place Recognition

Abstract

Access this article

Similar content being viewed by others

Don’t Be Confused: Region Mapping Based Visual Place Recognition

Optimal Densely Connected Networks with Pyramid Spatial Matching Scheme for Visual Place Recognition

Coarse-to-Fine Visual Place Recognition

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation