Abstract
Visual localization is a useful alternative to standard localization techniques. It works by utilizing cameras. In a typical scenario, features are extracted from captured images and compared with geo-referenced databases. Location information is then inferred from the matching results. Conventional schemes mainly use low-level visual features. These approaches offer good accuracy but suffer from scalability issues. In order to assist localization in large urban areas, this work explores a different path by utilizing high-level semantic information. It is found that object information in a street view can facilitate localization. A novel descriptor scheme called “semantic signature” is proposed to summarize this information. A semantic signature consists of type and angle information of visible objects at a spatial location. Several metrics and protocols are proposed for signature comparison and retrieval. They illustrate different trade-offs between accuracy and complexity. Extensive simulation results confirm the potential of the proposed scheme in large-scale applications. This paper is an extended version of a conference paper in CBMI’18. A more efficient retrieval protocol is presented with additional experiment results.
Similar content being viewed by others
Notes
OpenStreetMap: https://www.openstreetmap.org/
Mapillary: https://www.mapillary.com
Open Data Paris (https://opendata.paris.fr) hosts a collection of more than 200 public datasets provided by the city of Paris and its partners.
References
Agarwal S, Snavely N, Simon I, Seitz SM, Szeliski R (2009) Building Rome in a day. In: Proc. of international conference on computer vision (ICCV), pp. 72–79, https://doi.org/10.1109/ICCV.2009.5459148, (to appear in print)
Arandjelovic R, Gronat P, Torii A, Pajdla T, Sivic J (2016) NetVLAD: CNN architecture for weakly supervised place recognition. In: Proc. of IEEE conference on computer vision and pattern recognition (CVPR), pp. 5297–5307
Arandjeloviċ R., Zisserman A (2014) Dislocation: Scalable descriptor distinctiveness for location recognition. In: Proc. of asian conference on computer vision (ACCV), pp. 188–204
Ardeshir S, Zamir AR, Torroella A, Shah M (2014) GIS-Assisted object detection and geospatial localization. In: Proc. of eupropean conference on computer vision (ECCV), pp. 602–617
Arth C, Pirchheim C, Ventura J, Schmalstieg D, Lepetit V (2015) Instant outdoor localization and SLAM initialization from 2.5D maps. In: Proc. of international symposium on mixed and augmented reality (ISMAR)
Arya S, Mount DM (1993) Approximate nearest neighbor queries in fixed dimensions. In: Proc. of ACM-SIAM symposium on discrete algorithms (SODA), pp. 271–280
Bhowmik N, Weng L, Gouet-Brunet V, Soheilian B (2017) Cross-domain image localization by adaptive feature fusion. In: Proc. of joint urban remote sensing event, p. 4
Brachmann E, Rother C (2018) Learning less is more - 6D camera localization via 3D surface regression. In: IEEE Conference on computer vision and pattern recognition (CVPR)
Calonder M, Lepetit V, Strecha C, Fua P (2010) BRIEF: Binary robust independent elementary features. In: ECCV, pp. 778–792
Chen DM, Baatz G, Köser K., Tsai SS, Vedantham R, Pylvänäinen T., Roimela K, Chen X, Bach J, Pollefeys M, Girod B, Grzeszczuk R (2011) City-scale landmark identification on mobile devices. In: Proc. of IEEE conference on computer vision and pattern recognition (CVPR), pp. 737–744, https://doi.org/10.1109/CVPR.2011.5995610, (to appear in print)
Crandall DJ, Backstrom L, Huttenlocher D, Kleinberg J (2009) Mapping the world’s photos. In: Proc. of international conference on world wide web (WWW), pp. 761–770. ACM, https://doi.org/10.1145/1526709.1526812, (to appear in print)
Fischler MA, Bolles RC (1981) Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6):381–395. https://doi.org/10.1145/358669.358692
Girshick R (2015) Fast R-CNN. In: Proc. of IEEE international conference on computer vision and pattern recognition, pp. 1440–1448
He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 37(9):1904–1916
Irschara A, Zach C, Frahm JM, Bischof H (2009) From structure-from-motion point clouds to fast location recognition. In: Proc. of IEEE conference on computer vision and pattern recognition (CVPR), pp. 2599–2606, https://doi.org/10.1109/CVPR.2009.5206587, (to appear in print)
Iscen A, Tolias G, Avrithis Y, Furon T, Chum O (2017) Panorama to panorama matching for location recognition. In: Proc. of ACM international conference on multimedia retrieval, pp. 392–396
Jaccard P (1912) The distribution of the flora in the alpine zone. New Phytol 11(2):37–50
Jégou H., Douze M, Schmid C, Pérez P. (2010) Aggregating local descriptors into a compact image representation. In: Proc. of IEEE conference on computer vision and pattern recognition (CVPR), pp. 3304–3311
Li Y, Snavely N, Huttenlocher D, Fua P (2012) Worldwide pose estimation using 3D point clouds. In: Proc. of european conference on computer vision (ECCV), pp. 15–29, https://doi.org/10.1007/978-3-642-33718-5_2, (to appear in print)
Li Y, Snavely N, Huttenlocher DP (2010) Location recognition using prioritized feature matching. In: Proc. of european conference on computer vision (ECCV), pp. 791–804
Lim H, Sinha SN, Cohen MF, Uyttendaele M (2012) Real-time image-based 6-DOF localization in large-scale environments. In: Proc. of IEEE conference on computer vision and pattern recognition (CVPR), pp. 1043–1050, https://doi.org/10.1109/CVPR.2012.6247782, (to appear in print)
Lin T, Goyal P, Girshick R, He K (2017) Dollaŕ, P.: Focal loss for dense object detection. In: 2017 IEEE International conference on computer vision (ICCV), pp. 2999–3007, https://doi.org/10.1109/ICCV.2017.324, (to appear in print)
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. International Journal on Computer Vision (IJCV) 60(2):91–110. https://doi.org/10.1023/B:VISI.0000029664.99615.94
Lowry S, Sünderhauf N., Newman P, Leonard JJ, Cox D, Corke P, Milford MJ (2016) Visual place recognition: a survey. IEEE Trans Robot 32(1):1–19
Navarro G (2001) A guided tour to approximate string matching. ACM Comput Surv 33(1):31–88
Nister D, Stewenius H (2006) Scalable recognition with a vocabulary tree. In: Proc. of IEEE conference on computer vision and pattern recognition (CVPR), vol. 2, pp. 2161–2168, https://doi.org/10.1109/CVPR.2006.264, (to appear in print)
Piasco N, Sidibé D., Demonceaux C, Gouet-Brunet V (2018) A survey on visual-based localization: on the benefit of heterogeneous data. Pattern Recogn 74:90–109
Qu X, Soheilian B, Paparoditis N (2015) Vehicle localization using mono-camera and geo-referenced traffic signs. In: Proc. of IEEE intelligent vehicles symposium, pp. 605–610, https://doi.org/10.1109/IVS.2015.7225751, (to appear in print)
Redmon J, Farhadi A Yolov3: An incremental improvement. CoRR abs/1804.02767 (2018). 1804.02767
Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: Towards real-time object detection with region proposal networks. In: Advances in neural information processing systems (NIPS), pp. 91–99
Sattler T, Leibe B, Kobbelt L (2011) Fast image-based localization using direct 2D-to-3D matching. In: Proc. of international conference on computer vision (ICCV), pp. 667–674, https://doi.org/10.1109/ICCV.2011.6126302, (to appear in print)
Sattler T, Torii A, Sivic J, Pollefeys M, Taira H, Okutomi M, Pajdla T (2017) Are large-scale 3D models really necessary for accurate visual localization?. In: Proc. of IEEE conference on computer vision and pattern recognition (CVPR), p. 10
Schindler G, Brown M, Szeliski R (2007) City-scale location recognition. In: Proc. of IEEE conference on computer vision and pattern recognition (CVPR), pp. 1–7, https://doi.org/10.1109/CVPR.2007.383150, (to appear in print)
Shrivastava A, Malisiewicz T, Gupta A, Efros AA (2011) Data-driven visual similarity for cross-domain image matching. ACM Trans Graph. 30(6):10. https://doi.org/10.1145/2070781.2024188
Snavely N, Seitz SM, Szeliski R (2006) Photo tourism: Exploring photo collections in 3D. ACM Trans Graph. 25(3):835–846. https://doi.org/10.1145/1141911.1141964
Snoek CGM, Worring M, Smeulders AWM (2005) Early versus late fusion in semantic video analysis.. In: ACM International Conference on Multimedia, pp. 399–402. New York, NY, USA,http://doi.acm.org/10.1145/1101149.1101236, https://doi.org/10.1145/1101149.1101236, (to appear in print)
Song Y, Chen X, Wang X, Zhang Y (2016) Li, j.: 6-DOF image localization from massive geo-tagged reference images. IEEE Transactions on Multimedia 18(8):1542–1554. https://doi.org/10.1109/TMM.2016.2568743
Tola E, Lepetit V, Fua P (2008) A fast local descriptor for dense matching. In: Proc. of IEEE conference on computer vision and pattern recognition (CVPR), pp. 1–8, https://doi.org/10.1109/CVPR.2008.4587673, (to appear in print)
Torii A, Arandjelovic R, Sivic J, Okutomi M (2015) Pajdla, t.: 24/7 place recognition by view synthesis. In: Proc. of IEEE conference on computer vision and pattern recognition (CVPR), pp. 1808–1817
Vrochidis S, Huet B, Chang EY (2019) Kompatsiaris, I. (eds.): Big Data Analytics for Large-Scale Multimedia Search Wiley
Weng L, Soheilian B, Gouet-Brunet V (2018) Semantic signatures for urban visual localization. In: International conference on content-based multimedia indexing (CBMI), pp. 1–6, https://doi.org/10.1109/CBMI.2018.8516492, (to appear in print)
Zamir AR, Shah M (2010) Accurate image localization based on google maps street view. In: Proc. of european conference on computer vision (ECCV), pp. 255–268
Zamir AR, Shah M (2014) Image geo-localization based on multiple nearest neighbor feature matching using generalized graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence 36(8):1546–1558. https://doi.org/10.1109/TPAMI.2014.2299799
Zhang J, Hallquist A, Liang E, Zakhor A (2011) Location-based image retrieval for urban environments. In: Proc. of IEEE international conference on image processing (ICIP), pp. 3677–3680
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This research was supported by Zhejiang Provincial Natural Science Foundation of China under Grant No. LY19F030022, National Natural Science Foundation of China under Grant No. 61873077, and the European project KET ENIAC Things2Do under ENIAC JU grant agreement No. 621221.
Rights and permissions
About this article
Cite this article
Weng, L., Gouet-Brunet, V. & Soheilian, B. Semantic signatures for large-scale visual localization. Multimed Tools Appl 80, 22347–22372 (2021). https://doi.org/10.1007/s11042-020-08992-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-020-08992-6