Skip to main content
Log in

Semantic signatures for large-scale visual localization

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Visual localization is a useful alternative to standard localization techniques. It works by utilizing cameras. In a typical scenario, features are extracted from captured images and compared with geo-referenced databases. Location information is then inferred from the matching results. Conventional schemes mainly use low-level visual features. These approaches offer good accuracy but suffer from scalability issues. In order to assist localization in large urban areas, this work explores a different path by utilizing high-level semantic information. It is found that object information in a street view can facilitate localization. A novel descriptor scheme called “semantic signature” is proposed to summarize this information. A semantic signature consists of type and angle information of visible objects at a spatial location. Several metrics and protocols are proposed for signature comparison and retrieval. They illustrate different trade-offs between accuracy and complexity. Extensive simulation results confirm the potential of the proposed scheme in large-scale applications. This paper is an extended version of a conference paper in CBMI’18. A more efficient retrieval protocol is presented with additional experiment results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22

Similar content being viewed by others

Notes

  1. OpenStreetMap: https://www.openstreetmap.org/

  2. Mapillary: https://www.mapillary.com

  3. Open Data Paris (https://opendata.paris.fr) hosts a collection of more than 200 public datasets provided by the city of Paris and its partners.

References

  1. Agarwal S, Snavely N, Simon I, Seitz SM, Szeliski R (2009) Building Rome in a day. In: Proc. of international conference on computer vision (ICCV), pp. 72–79, https://doi.org/10.1109/ICCV.2009.5459148, (to appear in print)

  2. Arandjelovic R, Gronat P, Torii A, Pajdla T, Sivic J (2016) NetVLAD: CNN architecture for weakly supervised place recognition. In: Proc. of IEEE conference on computer vision and pattern recognition (CVPR), pp. 5297–5307

  3. Arandjeloviċ R., Zisserman A (2014) Dislocation: Scalable descriptor distinctiveness for location recognition. In: Proc. of asian conference on computer vision (ACCV), pp. 188–204

  4. Ardeshir S, Zamir AR, Torroella A, Shah M (2014) GIS-Assisted object detection and geospatial localization. In: Proc. of eupropean conference on computer vision (ECCV), pp. 602–617

  5. Arth C, Pirchheim C, Ventura J, Schmalstieg D, Lepetit V (2015) Instant outdoor localization and SLAM initialization from 2.5D maps. In: Proc. of international symposium on mixed and augmented reality (ISMAR)

  6. Arya S, Mount DM (1993) Approximate nearest neighbor queries in fixed dimensions. In: Proc. of ACM-SIAM symposium on discrete algorithms (SODA), pp. 271–280

  7. Bhowmik N, Weng L, Gouet-Brunet V, Soheilian B (2017) Cross-domain image localization by adaptive feature fusion. In: Proc. of joint urban remote sensing event, p. 4

  8. Brachmann E, Rother C (2018) Learning less is more - 6D camera localization via 3D surface regression. In: IEEE Conference on computer vision and pattern recognition (CVPR)

  9. Calonder M, Lepetit V, Strecha C, Fua P (2010) BRIEF: Binary robust independent elementary features. In: ECCV, pp. 778–792

  10. Chen DM, Baatz G, Köser K., Tsai SS, Vedantham R, Pylvänäinen T., Roimela K, Chen X, Bach J, Pollefeys M, Girod B, Grzeszczuk R (2011) City-scale landmark identification on mobile devices. In: Proc. of IEEE conference on computer vision and pattern recognition (CVPR), pp. 737–744, https://doi.org/10.1109/CVPR.2011.5995610, (to appear in print)

  11. Crandall DJ, Backstrom L, Huttenlocher D, Kleinberg J (2009) Mapping the world’s photos. In: Proc. of international conference on world wide web (WWW), pp. 761–770. ACM, https://doi.org/10.1145/1526709.1526812, (to appear in print)

  12. Fischler MA, Bolles RC (1981) Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6):381–395. https://doi.org/10.1145/358669.358692

    Article  MathSciNet  Google Scholar 

  13. Girshick R (2015) Fast R-CNN. In: Proc. of IEEE international conference on computer vision and pattern recognition, pp. 1440–1448

  14. He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 37(9):1904–1916

    Article  Google Scholar 

  15. Irschara A, Zach C, Frahm JM, Bischof H (2009) From structure-from-motion point clouds to fast location recognition. In: Proc. of IEEE conference on computer vision and pattern recognition (CVPR), pp. 2599–2606, https://doi.org/10.1109/CVPR.2009.5206587, (to appear in print)

  16. Iscen A, Tolias G, Avrithis Y, Furon T, Chum O (2017) Panorama to panorama matching for location recognition. In: Proc. of ACM international conference on multimedia retrieval, pp. 392–396

  17. Jaccard P (1912) The distribution of the flora in the alpine zone. New Phytol 11(2):37–50

    Article  Google Scholar 

  18. Jégou H., Douze M, Schmid C, Pérez P. (2010) Aggregating local descriptors into a compact image representation. In: Proc. of IEEE conference on computer vision and pattern recognition (CVPR), pp. 3304–3311

  19. Li Y, Snavely N, Huttenlocher D, Fua P (2012) Worldwide pose estimation using 3D point clouds. In: Proc. of european conference on computer vision (ECCV), pp. 15–29, https://doi.org/10.1007/978-3-642-33718-5_2, (to appear in print)

  20. Li Y, Snavely N, Huttenlocher DP (2010) Location recognition using prioritized feature matching. In: Proc. of european conference on computer vision (ECCV), pp. 791–804

  21. Lim H, Sinha SN, Cohen MF, Uyttendaele M (2012) Real-time image-based 6-DOF localization in large-scale environments. In: Proc. of IEEE conference on computer vision and pattern recognition (CVPR), pp. 1043–1050, https://doi.org/10.1109/CVPR.2012.6247782, (to appear in print)

  22. Lin T, Goyal P, Girshick R, He K (2017) Dollaŕ, P.: Focal loss for dense object detection. In: 2017 IEEE International conference on computer vision (ICCV), pp. 2999–3007, https://doi.org/10.1109/ICCV.2017.324, (to appear in print)

  23. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. International Journal on Computer Vision (IJCV) 60(2):91–110. https://doi.org/10.1023/B:VISI.0000029664.99615.94

    Article  Google Scholar 

  24. Lowry S, Sünderhauf N., Newman P, Leonard JJ, Cox D, Corke P, Milford MJ (2016) Visual place recognition: a survey. IEEE Trans Robot 32(1):1–19

    Article  Google Scholar 

  25. Navarro G (2001) A guided tour to approximate string matching. ACM Comput Surv 33(1):31–88

    Article  Google Scholar 

  26. Nister D, Stewenius H (2006) Scalable recognition with a vocabulary tree. In: Proc. of IEEE conference on computer vision and pattern recognition (CVPR), vol. 2, pp. 2161–2168, https://doi.org/10.1109/CVPR.2006.264, (to appear in print)

  27. Piasco N, Sidibé D., Demonceaux C, Gouet-Brunet V (2018) A survey on visual-based localization: on the benefit of heterogeneous data. Pattern Recogn 74:90–109

    Article  Google Scholar 

  28. Qu X, Soheilian B, Paparoditis N (2015) Vehicle localization using mono-camera and geo-referenced traffic signs. In: Proc. of IEEE intelligent vehicles symposium, pp. 605–610, https://doi.org/10.1109/IVS.2015.7225751, (to appear in print)

  29. Redmon J, Farhadi A Yolov3: An incremental improvement. CoRR abs/1804.02767 (2018). 1804.02767

  30. Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: Towards real-time object detection with region proposal networks. In: Advances in neural information processing systems (NIPS), pp. 91–99

  31. Sattler T, Leibe B, Kobbelt L (2011) Fast image-based localization using direct 2D-to-3D matching. In: Proc. of international conference on computer vision (ICCV), pp. 667–674, https://doi.org/10.1109/ICCV.2011.6126302, (to appear in print)

  32. Sattler T, Torii A, Sivic J, Pollefeys M, Taira H, Okutomi M, Pajdla T (2017) Are large-scale 3D models really necessary for accurate visual localization?. In: Proc. of IEEE conference on computer vision and pattern recognition (CVPR), p. 10

  33. Schindler G, Brown M, Szeliski R (2007) City-scale location recognition. In: Proc. of IEEE conference on computer vision and pattern recognition (CVPR), pp. 1–7, https://doi.org/10.1109/CVPR.2007.383150, (to appear in print)

  34. Shrivastava A, Malisiewicz T, Gupta A, Efros AA (2011) Data-driven visual similarity for cross-domain image matching. ACM Trans Graph. 30(6):10. https://doi.org/10.1145/2070781.2024188

    Article  Google Scholar 

  35. Snavely N, Seitz SM, Szeliski R (2006) Photo tourism: Exploring photo collections in 3D. ACM Trans Graph. 25(3):835–846. https://doi.org/10.1145/1141911.1141964

    Article  Google Scholar 

  36. Snoek CGM, Worring M, Smeulders AWM (2005) Early versus late fusion in semantic video analysis.. In: ACM International Conference on Multimedia, pp. 399–402. New York, NY, USA,http://doi.acm.org/10.1145/1101149.1101236, https://doi.org/10.1145/1101149.1101236, (to appear in print)

  37. Song Y, Chen X, Wang X, Zhang Y (2016) Li, j.: 6-DOF image localization from massive geo-tagged reference images. IEEE Transactions on Multimedia 18(8):1542–1554. https://doi.org/10.1109/TMM.2016.2568743

    Article  Google Scholar 

  38. Tola E, Lepetit V, Fua P (2008) A fast local descriptor for dense matching. In: Proc. of IEEE conference on computer vision and pattern recognition (CVPR), pp. 1–8, https://doi.org/10.1109/CVPR.2008.4587673, (to appear in print)

  39. Torii A, Arandjelovic R, Sivic J, Okutomi M (2015) Pajdla, t.: 24/7 place recognition by view synthesis. In: Proc. of IEEE conference on computer vision and pattern recognition (CVPR), pp. 1808–1817

  40. Vrochidis S, Huet B, Chang EY (2019) Kompatsiaris, I. (eds.): Big Data Analytics for Large-Scale Multimedia Search Wiley

  41. Weng L, Soheilian B, Gouet-Brunet V (2018) Semantic signatures for urban visual localization. In: International conference on content-based multimedia indexing (CBMI), pp. 1–6, https://doi.org/10.1109/CBMI.2018.8516492, (to appear in print)

  42. Zamir AR, Shah M (2010) Accurate image localization based on google maps street view. In: Proc. of european conference on computer vision (ECCV), pp. 255–268

  43. Zamir AR, Shah M (2014) Image geo-localization based on multiple nearest neighbor feature matching using generalized graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence 36(8):1546–1558. https://doi.org/10.1109/TPAMI.2014.2299799

    Article  Google Scholar 

  44. Zhang J, Hallquist A, Liang E, Zakhor A (2011) Location-based image retrieval for urban environments. In: Proc. of IEEE international conference on image processing (ICIP), pp. 3677–3680

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Li Weng.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This research was supported by Zhejiang Provincial Natural Science Foundation of China under Grant No. LY19F030022, National Natural Science Foundation of China under Grant No. 61873077, and the European project KET ENIAC Things2Do under ENIAC JU grant agreement No. 621221.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Weng, L., Gouet-Brunet, V. & Soheilian, B. Semantic signatures for large-scale visual localization. Multimed Tools Appl 80, 22347–22372 (2021). https://doi.org/10.1007/s11042-020-08992-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-020-08992-6

Keywords

Navigation