Semantic signatures for large-scale visual localization

Weng, Li; Gouet-Brunet, Valérie; Soheilian, Bahman

doi:10.1007/s11042-020-08992-6

Semantic signatures for large-scale visual localization

Published: 07 May 2020

Volume 80, pages 22347–22372, (2021)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

362 Accesses
3 Citations
3 Altmetric
Explore all metrics

Abstract

Visual localization is a useful alternative to standard localization techniques. It works by utilizing cameras. In a typical scenario, features are extracted from captured images and compared with geo-referenced databases. Location information is then inferred from the matching results. Conventional schemes mainly use low-level visual features. These approaches offer good accuracy but suffer from scalability issues. In order to assist localization in large urban areas, this work explores a different path by utilizing high-level semantic information. It is found that object information in a street view can facilitate localization. A novel descriptor scheme called “semantic signature” is proposed to summarize this information. A semantic signature consists of type and angle information of visible objects at a spatial location. Several metrics and protocols are proposed for signature comparison and retrieval. They illustrate different trade-offs between accuracy and complexity. Extensive simulation results confirm the potential of the proposed scheme in large-scale applications. This paper is an extended version of a conference paper in CBMI’18. A more efficient retrieval protocol is presented with additional experiment results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Introduction to Large-Scale Visual Geo-localization

Feature-based visual simultaneous localization and mapping: a survey

Article 16 January 2020

Is Geometry Enough for Matching in Visual Localization?

Notes

OpenStreetMap: https://www.openstreetmap.org/
Mapillary: https://www.mapillary.com
Open Data Paris (https://opendata.paris.fr) hosts a collection of more than 200 public datasets provided by the city of Paris and its partners.

References

Agarwal S, Snavely N, Simon I, Seitz SM, Szeliski R (2009) Building Rome in a day. In: Proc. of international conference on computer vision (ICCV), pp. 72–79, https://doi.org/10.1109/ICCV.2009.5459148, (to appear in print)
Arandjelovic R, Gronat P, Torii A, Pajdla T, Sivic J (2016) NetVLAD: CNN architecture for weakly supervised place recognition. In: Proc. of IEEE conference on computer vision and pattern recognition (CVPR), pp. 5297–5307
Arandjeloviċ R., Zisserman A (2014) Dislocation: Scalable descriptor distinctiveness for location recognition. In: Proc. of asian conference on computer vision (ACCV), pp. 188–204
Ardeshir S, Zamir AR, Torroella A, Shah M (2014) GIS-Assisted object detection and geospatial localization. In: Proc. of eupropean conference on computer vision (ECCV), pp. 602–617
Arth C, Pirchheim C, Ventura J, Schmalstieg D, Lepetit V (2015) Instant outdoor localization and SLAM initialization from 2.5D maps. In: Proc. of international symposium on mixed and augmented reality (ISMAR)
Arya S, Mount DM (1993) Approximate nearest neighbor queries in fixed dimensions. In: Proc. of ACM-SIAM symposium on discrete algorithms (SODA), pp. 271–280
Bhowmik N, Weng L, Gouet-Brunet V, Soheilian B (2017) Cross-domain image localization by adaptive feature fusion. In: Proc. of joint urban remote sensing event, p. 4
Brachmann E, Rother C (2018) Learning less is more - 6D camera localization via 3D surface regression. In: IEEE Conference on computer vision and pattern recognition (CVPR)
Calonder M, Lepetit V, Strecha C, Fua P (2010) BRIEF: Binary robust independent elementary features. In: ECCV, pp. 778–792
Chen DM, Baatz G, Köser K., Tsai SS, Vedantham R, Pylvänäinen T., Roimela K, Chen X, Bach J, Pollefeys M, Girod B, Grzeszczuk R (2011) City-scale landmark identification on mobile devices. In: Proc. of IEEE conference on computer vision and pattern recognition (CVPR), pp. 737–744, https://doi.org/10.1109/CVPR.2011.5995610, (to appear in print)
Crandall DJ, Backstrom L, Huttenlocher D, Kleinberg J (2009) Mapping the world’s photos. In: Proc. of international conference on world wide web (WWW), pp. 761–770. ACM, https://doi.org/10.1145/1526709.1526812, (to appear in print)
Fischler MA, Bolles RC (1981) Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6):381–395. https://doi.org/10.1145/358669.358692
Article MathSciNet Google Scholar
Girshick R (2015) Fast R-CNN. In: Proc. of IEEE international conference on computer vision and pattern recognition, pp. 1440–1448
He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 37(9):1904–1916
Article Google Scholar
Irschara A, Zach C, Frahm JM, Bischof H (2009) From structure-from-motion point clouds to fast location recognition. In: Proc. of IEEE conference on computer vision and pattern recognition (CVPR), pp. 2599–2606, https://doi.org/10.1109/CVPR.2009.5206587, (to appear in print)
Iscen A, Tolias G, Avrithis Y, Furon T, Chum O (2017) Panorama to panorama matching for location recognition. In: Proc. of ACM international conference on multimedia retrieval, pp. 392–396
Jaccard P (1912) The distribution of the flora in the alpine zone. New Phytol 11(2):37–50
Article Google Scholar
Jégou H., Douze M, Schmid C, Pérez P. (2010) Aggregating local descriptors into a compact image representation. In: Proc. of IEEE conference on computer vision and pattern recognition (CVPR), pp. 3304–3311
Li Y, Snavely N, Huttenlocher D, Fua P (2012) Worldwide pose estimation using 3D point clouds. In: Proc. of european conference on computer vision (ECCV), pp. 15–29, https://doi.org/10.1007/978-3-642-33718-5_2, (to appear in print)
Li Y, Snavely N, Huttenlocher DP (2010) Location recognition using prioritized feature matching. In: Proc. of european conference on computer vision (ECCV), pp. 791–804
Lim H, Sinha SN, Cohen MF, Uyttendaele M (2012) Real-time image-based 6-DOF localization in large-scale environments. In: Proc. of IEEE conference on computer vision and pattern recognition (CVPR), pp. 1043–1050, https://doi.org/10.1109/CVPR.2012.6247782, (to appear in print)
Lin T, Goyal P, Girshick R, He K (2017) Dollaŕ, P.: Focal loss for dense object detection. In: 2017 IEEE International conference on computer vision (ICCV), pp. 2999–3007, https://doi.org/10.1109/ICCV.2017.324, (to appear in print)
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. International Journal on Computer Vision (IJCV) 60(2):91–110. https://doi.org/10.1023/B:VISI.0000029664.99615.94
Article Google Scholar
Lowry S, Sünderhauf N., Newman P, Leonard JJ, Cox D, Corke P, Milford MJ (2016) Visual place recognition: a survey. IEEE Trans Robot 32(1):1–19
Article Google Scholar
Navarro G (2001) A guided tour to approximate string matching. ACM Comput Surv 33(1):31–88
Article Google Scholar
Nister D, Stewenius H (2006) Scalable recognition with a vocabulary tree. In: Proc. of IEEE conference on computer vision and pattern recognition (CVPR), vol. 2, pp. 2161–2168, https://doi.org/10.1109/CVPR.2006.264, (to appear in print)
Piasco N, Sidibé D., Demonceaux C, Gouet-Brunet V (2018) A survey on visual-based localization: on the benefit of heterogeneous data. Pattern Recogn 74:90–109
Article Google Scholar
Qu X, Soheilian B, Paparoditis N (2015) Vehicle localization using mono-camera and geo-referenced traffic signs. In: Proc. of IEEE intelligent vehicles symposium, pp. 605–610, https://doi.org/10.1109/IVS.2015.7225751, (to appear in print)
Redmon J, Farhadi A Yolov3: An incremental improvement. CoRR abs/1804.02767 (2018). 1804.02767
Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: Towards real-time object detection with region proposal networks. In: Advances in neural information processing systems (NIPS), pp. 91–99
Sattler T, Leibe B, Kobbelt L (2011) Fast image-based localization using direct 2D-to-3D matching. In: Proc. of international conference on computer vision (ICCV), pp. 667–674, https://doi.org/10.1109/ICCV.2011.6126302, (to appear in print)
Sattler T, Torii A, Sivic J, Pollefeys M, Taira H, Okutomi M, Pajdla T (2017) Are large-scale 3D models really necessary for accurate visual localization?. In: Proc. of IEEE conference on computer vision and pattern recognition (CVPR), p. 10
Schindler G, Brown M, Szeliski R (2007) City-scale location recognition. In: Proc. of IEEE conference on computer vision and pattern recognition (CVPR), pp. 1–7, https://doi.org/10.1109/CVPR.2007.383150, (to appear in print)
Shrivastava A, Malisiewicz T, Gupta A, Efros AA (2011) Data-driven visual similarity for cross-domain image matching. ACM Trans Graph. 30(6):10. https://doi.org/10.1145/2070781.2024188
Article Google Scholar
Snavely N, Seitz SM, Szeliski R (2006) Photo tourism: Exploring photo collections in 3D. ACM Trans Graph. 25(3):835–846. https://doi.org/10.1145/1141911.1141964
Article Google Scholar
Snoek CGM, Worring M, Smeulders AWM (2005) Early versus late fusion in semantic video analysis.. In: ACM International Conference on Multimedia, pp. 399–402. New York, NY, USA,http://doi.acm.org/10.1145/1101149.1101236, https://doi.org/10.1145/1101149.1101236, (to appear in print)
Song Y, Chen X, Wang X, Zhang Y (2016) Li, j.: 6-DOF image localization from massive geo-tagged reference images. IEEE Transactions on Multimedia 18(8):1542–1554. https://doi.org/10.1109/TMM.2016.2568743
Article Google Scholar
Tola E, Lepetit V, Fua P (2008) A fast local descriptor for dense matching. In: Proc. of IEEE conference on computer vision and pattern recognition (CVPR), pp. 1–8, https://doi.org/10.1109/CVPR.2008.4587673, (to appear in print)
Torii A, Arandjelovic R, Sivic J, Okutomi M (2015) Pajdla, t.: 24/7 place recognition by view synthesis. In: Proc. of IEEE conference on computer vision and pattern recognition (CVPR), pp. 1808–1817
Vrochidis S, Huet B, Chang EY (2019) Kompatsiaris, I. (eds.): Big Data Analytics for Large-Scale Multimedia Search Wiley
Weng L, Soheilian B, Gouet-Brunet V (2018) Semantic signatures for urban visual localization. In: International conference on content-based multimedia indexing (CBMI), pp. 1–6, https://doi.org/10.1109/CBMI.2018.8516492, (to appear in print)
Zamir AR, Shah M (2010) Accurate image localization based on google maps street view. In: Proc. of european conference on computer vision (ECCV), pp. 255–268
Zamir AR, Shah M (2014) Image geo-localization based on multiple nearest neighbor feature matching using generalized graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence 36(8):1546–1558. https://doi.org/10.1109/TPAMI.2014.2299799
Article Google Scholar
Zhang J, Hallquist A, Liang E, Zakhor A (2011) Location-based image retrieval for urban environments. In: Proc. of IEEE international conference on image processing (ICIP), pp. 3677–3680

Download references

Author information

Authors and Affiliations

Department of Automation (Artificial Intelligence), Hangzhou Dianzi University, 310018, Hangzhou, China
Li Weng
LaSTIG Lab., Univ. Gustave Eiffel, ENSG, IGN, 94160, Saint-Mande, France
Valérie Gouet-Brunet & Bahman Soheilian

Authors

Li Weng
View author publications
You can also search for this author in PubMed Google Scholar
Valérie Gouet-Brunet
View author publications
You can also search for this author in PubMed Google Scholar
Bahman Soheilian
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Li Weng.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This research was supported by Zhejiang Provincial Natural Science Foundation of China under Grant No. LY19F030022, National Natural Science Foundation of China under Grant No. 61873077, and the European project KET ENIAC Things2Do under ENIAC JU grant agreement No. 621221.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Weng, L., Gouet-Brunet, V. & Soheilian, B. Semantic signatures for large-scale visual localization. Multimed Tools Appl 80, 22347–22372 (2021). https://doi.org/10.1007/s11042-020-08992-6

Download citation

Received: 28 February 2019
Revised: 28 December 2019
Accepted: 23 April 2020
Published: 07 May 2020
Issue Date: June 2021
DOI: https://doi.org/10.1007/s11042-020-08992-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Semantic signatures for large-scale visual localization

Abstract

Access this article

Similar content being viewed by others

Introduction to Large-Scale Visual Geo-localization

Feature-based visual simultaneous localization and mapping: a survey

Is Geometry Enough for Matching in Visual Localization?

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Semantic signatures for large-scale visual localization

Abstract

Access this article

Similar content being viewed by others

Introduction to Large-Scale Visual Geo-localization

Feature-based visual simultaneous localization and mapping: a survey

Is Geometry Enough for Matching in Visual Localization?

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation