Skip to main content

Technologies for Visual Localization and Augmented Reality in Smart Cities

  • Chapter
  • First Online:
Sensing the Past

Part of the book series: Geotechnologies and the Environment ((GEOTECH,volume 16))

Abstract

The widespread diffusion of smart devices, such as smartphones and tablets, and the new emerging trend of wearable devices, such as smart glasses and smart watches, has pushed forward the development of applications where the user can interact relying on his or her position and field of view. In this way, users can also receive additional information in augmented reality, that is, seeing the information through the smart device, overlaid on top of the real scene. The GPS or the compass can be used to localize the user when augmented reality has to be provided with scenes of large size, for instance, squares or large buildings. However, when augmented reality has to be offered for enriching the view of small objects or small details of larger objects, for instance, statues, paintings, or epigraphs, a more precise positioning is needed. Visual object recognition and tracking technologies offer very detailed and fine-grained positioning capabilities. This chapter discusses the techniques enabling a precise positioning of the user and the subsequent experience in augmented reality, focusing on algorithms for image matching and homography estimation between the images seen by smart devices and images representing objects of interest.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Notes

  1. 1.

    CPU: central processing unit; GPU: graphics processing unit.

  2. 2.

    http://artoolkit.org/

  3. 3.

    http://www.openscenegraph.org/

  4. 4.

    https://unity3d.com/

  5. 5.

    https://www.qualcomm.com/products/vuforia

  6. 6.

    http://www.t-immersion.com/products/dfusion-suite

  7. 7.

    http://www.wikitude.com/blog/dev2dev/

  8. 8.

    http://vcg.isti.cnr.it/vcglib/

  9. 9.

    https://libgdx.badlogicgames.com/

  10. 10.

    http://opencv.org/

  11. 11.

    http://www.cs.cornell.edu/~snavely/bundler/

  12. 12.

    https://photosynth.net/

References

  • Akusok A, Miche Y, Karhunen J, Bjork KM, Nian R, Lendasse A (2015) Arbitrary category classification of websites based on image content. IEEE Comput Intell Mag 10(2):30–41. 10.1109/MCI.2015.2405317

    Article  Google Scholar 

  • Amato G, Bolettieri P, Falchi F, Gennaro C (2013) Large scale image retrieval using vector of locally aggregated descriptors. In: Similarity search and applications. Springer, Heidelberg, pp 245–256

    Chapter  Google Scholar 

  • Amato G, Falchi F, Claudio G (2015) Fast image classification for monument recognition. J Comput Cult Herit 8(4):18:1–18:25. http://dl.acm.org/citation.cfm?id=2724727

  • Amato G, Falchi F, Gennaro C (2011a) Geometric consistency checks for KNN based image classification relying on local features. In: Proceedings of the fourth international conference on SImilarity Search and APplications (SISAP’11). ACM, New York, pp 81–88. 10.1145/1995412.1995428. http://doi.acm.org/10.1145/1995412.1995428

  • Amato G, Falchi F, Gennaro C (2011b) Geometric consistency checks for KNN based image classification relying on local features. In: Proceedings of the fourth international conference on SImilarity Search and APplications. ACM, New York, pp 81–88

    Google Scholar 

  • Azuma RT et al (1997) A survey of augmented reality. Presence 6(4):355–385

    Google Scholar 

  • Bay H, Tuytelaars T, Van Gool L (2006) Surf: speeded up robust features. In: Leonardis A, Bischof H, Pinz A (eds) Computer vision – ECCV 2006. Lecture notes in computer science, vol 3951. Springer, Berlin/Heidelberg, pp 404–417

    Google Scholar 

  • Bergamasco F, Albarelli A, Rodola E, Torsello A (2011) Rune-tag: a high accuracy fiducial marker with strong occlusion resilience. In: IEEE conference on computer vision and pattern recognition (CVPR 2011). IEEE, pp 113–120

    Google Scholar 

  • Bhattacharya P, Gavrilova M (2013) A survey of landmark recognition using the bag-of-words framework. In: Intelligent computer graphics 2012. Springer, pp 243–263

    Google Scholar 

  • Chum O, Philbin J, Sivic J, Isard M, Zisserman A (2007) Total recall: automatic query expansion with a generative feature model for object retrieval. In: IEEE 11th international conference on computer vision (ICCV 2007). IEEE, pp 1–8

    Google Scholar 

  • Crandall DJ, Backstrom L, Huttenlocher D, Kleinberg J (2009) Mapping the world’s photos. In: Proceedings of the 18th international conference on world wide web. ACM, pp 761–770

    Google Scholar 

  • Datar M, Immorlica N, Indyk P, Mirrokni VS (2004) Locality-sensitive hashing scheme based on p-stable distributions. In: Proceedings of the twentieth annual symposium on computational geometry. ACM, pp 253–262

    Google Scholar 

  • Datta R, Li J, Wang JZ (2005) Content-based image retrieval: approaches and trends of the new age. In: Proceedings of the 7th ACM SIGMM international workshop on multimedia information retrieval (MIR’05). ACM, New York, pp 253–262

    Google Scholar 

  • Delhumeau J, Gosselin PH, Jégou H, Pérez P (2013) Revisiting the VLAD image representation. In: Proceedings of the 21st ACM international conference on multimedia. ACM, pp 653–656

    Google Scholar 

  • Fischler MA, Bolles RC (1981) Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun ACM 24(6):381–395

    Article  Google Scholar 

  • Hartley R, Zisserman A (2003) Multiple view geometry in computer vision. Cambridge University Press, Cambridge/New York

    Google Scholar 

  • Heinly J, Dunn E, Frahm JM (2012) Comparative evaluation of binary features. In: Computer vision–ECCV 2012. Springer, pp 759–773

    Google Scholar 

  • Jegou H, Douze M, Schmid C (2008) Hamming embedding and weak geometric consistency for large scale image search. In: Computer vision–ECCV 2008. Springer, pp 304–317

    Google Scholar 

  • Jégou H, Douze M, Schmid C, Pérez P (2010) Aggregating local descriptors into a compact image representation. In: IEEE conference on computer vision & pattern recognition, IEEE Computer Society, Washington DC, http://ieeexplore.ieee.org/document/5540039/

  • Kumar DS, Jawahar C (2006) Robust homography-based control for camera positioning in piecewise planar environments. In: Computer vision, graphics and image processing. Springer, pp 906–918

    Google Scholar 

  • Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: IEEE computer society conference on computer vision and pattern recognition 2006, vol 2. IEEE, pp 2169–2178

    Google Scholar 

  • Leutenegger S, Chli M, Siegwart R (2011) Brisk: binary robust invariant scalable keypoints. In: IEEE international conference on computer vision (ICCV 2011). IEEE, pp 2548–2555

    Google Scholar 

  • Lowe D (1999) Object recognition from local scale-invariant features. In: The proceedings of the seventh IEEE international conference on computer vision 1999, vol 2. IEEE, pp 1150–1157

    Google Scholar 

  • Lucas BD, Kanade T et al (1981) An iterative image registration technique with an application to stereo vision. In: IJCAI, IEEE Computer Society, Washington DC, vol 81. pp 674–679. http://ieeexplore.ieee.org/document/5540039/

  • Novak D, Batko M, Zezula P (to appear) Large-scale image retrieval using neural net descriptors. In: 38th ACM SIGIR international conference of research and development on information retrieval. Springer

    Google Scholar 

  • Novak D, Zezula P (2014) Rank aggregation of candidate sets for efficient similarity search. In: Database and expert systems applications. Springer, pp 42–58

    Google Scholar 

  • Perronnin F, Dance C (2007) Fisher kernels on visual vocabularies for image categorization. In: IEEE conference on computer vision and pattern recognition (CVPR’07). IEEE, pp 1–8

    Google Scholar 

  • Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2007) Object retrieval with large vocabularies and fast spatial matching. In: IEEE conference on computer vision and pattern recognition (CVPR’07). IEEE, pp 1–8

    Google Scholar 

  • Razavian AS, Azizpour H, Sullivan J, Carlsson S (2014) CNN features off-the-shelf: an astounding baseline for recognition. In: IEEE conference on computer vision and pattern recognition workshops (CVPRW 2014). IEEE, pp 512–519

    Google Scholar 

  • Rublee E, Rabaud V, Konolige K, Bradski G (2011) Orb: an efficient alternative to sift or surf. In: IEEE international conference on computer vision (ICCV 2011). IEEE, pp 2564–2571

    Google Scholar 

  • Schroth G, Huitl R, Chen D, Abu-Alqumsan M, Al-Nuaimi A, Steinbach E (2011) Mobile visual location recognition. IEEE Signal Process Mag 28(4):77–89

    Article  Google Scholar 

  • Shi J, Tomasi C (1994) Good features to track. In: IEEE computer society conference on computer vision and pattern recognition (CVPR 1994). IEEE, pp 593–600

    Google Scholar 

  • Sivic J, Zisserman A (2003) Video Google: a text retrieval approach to object matching in videos. In: Proceedings of the ninth IEEE international conference on computer vision (ICCV’03), vol 2, pp 1470. IEEE Computer Society, Washington, DC

    Google Scholar 

  • Snavely N, Seitz SM, Szeliski R (2006) Photo tourism: exploring photo collections in 3D. In: ACM transactions on graphics (TOG), vol 25. ACM, pp 835–846

    Google Scholar 

  • Tolias G, Bursuc A, Furon T, Jégou H (2015) Rotation and translation covariant match kernels for image retrieval. Comput Vis Image Underst 140:9–20

    Article  Google Scholar 

  • Weyand T, Leibe B (2015) Visual landmark recognition from internet photo collections: a large-scale evaluation. Comput Vis Image Underst 135:1–15

    Article  Google Scholar 

  • Yang X, Cheng KT (2014) Local difference binary for ultrafast and distinctive feature description. IEEE Trans Pattern Anal Mach Intell 36(1):188–194

    Article  Google Scholar 

  • Zhang Z (2000) A flexible new technique for camera calibration. IEEE Trans Pattern Anal Mach Intell 22(11):1330–1334

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Giuseppe Amato .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this chapter

Cite this chapter

Amato, G., Cardillo, F.A., Falchi, F. (2017). Technologies for Visual Localization and Augmented Reality in Smart Cities. In: Masini, N., Soldovieri, F. (eds) Sensing the Past. Geotechnologies and the Environment, vol 16. Springer, Cham. https://doi.org/10.1007/978-3-319-50518-3_20

Download citation

Publish with us

Policies and ethics