Technologies for Visual Localization and Augmented Reality in Smart Cities

  • Giuseppe Amato
  • Franco Alberto Cardillo
  • Fabrizio Falchi
Chapter
Part of the Geotechnologies and the Environment book series (GEOTECH, volume 16)

Abstract

The widespread diffusion of smart devices, such as smartphones and tablets, and the new emerging trend of wearable devices, such as smart glasses and smart watches, has pushed forward the development of applications where the user can interact relying on his or her position and field of view. In this way, users can also receive additional information in augmented reality, that is, seeing the information through the smart device, overlaid on top of the real scene. The GPS or the compass can be used to localize the user when augmented reality has to be provided with scenes of large size, for instance, squares or large buildings. However, when augmented reality has to be offered for enriching the view of small objects or small details of larger objects, for instance, statues, paintings, or epigraphs, a more precise positioning is needed. Visual object recognition and tracking technologies offer very detailed and fine-grained positioning capabilities. This chapter discusses the techniques enabling a precise positioning of the user and the subsequent experience in augmented reality, focusing on algorithms for image matching and homography estimation between the images seen by smart devices and images representing objects of interest.

Keywords

Localization Augmented reality Deep learning Smart cities Landmark recognition 

References

  1. Akusok A, Miche Y, Karhunen J, Bjork KM, Nian R, Lendasse A (2015) Arbitrary category classification of websites based on image content. IEEE Comput Intell Mag 10(2):30–41. 10.1109/MCI.2015.2405317 CrossRefGoogle Scholar
  2. Amato G, Bolettieri P, Falchi F, Gennaro C (2013) Large scale image retrieval using vector of locally aggregated descriptors. In: Similarity search and applications. Springer, Heidelberg, pp 245–256CrossRefGoogle Scholar
  3. Amato G, Falchi F, Claudio G (2015) Fast image classification for monument recognition. J Comput Cult Herit 8(4):18:1–18:25. http://dl.acm.org/citation.cfm?id=2724727
  4. Amato G, Falchi F, Gennaro C (2011a) Geometric consistency checks for KNN based image classification relying on local features. In: Proceedings of the fourth international conference on SImilarity Search and APplications (SISAP’11). ACM, New York, pp 81–88. 10.1145/1995412.1995428. http://doi.acm.org/10.1145/1995412.1995428
  5. Amato G, Falchi F, Gennaro C (2011b) Geometric consistency checks for KNN based image classification relying on local features. In: Proceedings of the fourth international conference on SImilarity Search and APplications. ACM, New York, pp 81–88Google Scholar
  6. Azuma RT et al (1997) A survey of augmented reality. Presence 6(4):355–385Google Scholar
  7. Bay H, Tuytelaars T, Van Gool L (2006) Surf: speeded up robust features. In: Leonardis A, Bischof H, Pinz A (eds) Computer vision – ECCV 2006. Lecture notes in computer science, vol 3951. Springer, Berlin/Heidelberg, pp 404–417Google Scholar
  8. Bergamasco F, Albarelli A, Rodola E, Torsello A (2011) Rune-tag: a high accuracy fiducial marker with strong occlusion resilience. In: IEEE conference on computer vision and pattern recognition (CVPR 2011). IEEE, pp 113–120Google Scholar
  9. Bhattacharya P, Gavrilova M (2013) A survey of landmark recognition using the bag-of-words framework. In: Intelligent computer graphics 2012. Springer, pp 243–263Google Scholar
  10. Chum O, Philbin J, Sivic J, Isard M, Zisserman A (2007) Total recall: automatic query expansion with a generative feature model for object retrieval. In: IEEE 11th international conference on computer vision (ICCV 2007). IEEE, pp 1–8Google Scholar
  11. Crandall DJ, Backstrom L, Huttenlocher D, Kleinberg J (2009) Mapping the world’s photos. In: Proceedings of the 18th international conference on world wide web. ACM, pp 761–770Google Scholar
  12. Datar M, Immorlica N, Indyk P, Mirrokni VS (2004) Locality-sensitive hashing scheme based on p-stable distributions. In: Proceedings of the twentieth annual symposium on computational geometry. ACM, pp 253–262Google Scholar
  13. Datta R, Li J, Wang JZ (2005) Content-based image retrieval: approaches and trends of the new age. In: Proceedings of the 7th ACM SIGMM international workshop on multimedia information retrieval (MIR’05). ACM, New York, pp 253–262Google Scholar
  14. Delhumeau J, Gosselin PH, Jégou H, Pérez P (2013) Revisiting the VLAD image representation. In: Proceedings of the 21st ACM international conference on multimedia. ACM, pp 653–656Google Scholar
  15. Fischler MA, Bolles RC (1981) Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun ACM 24(6):381–395CrossRefGoogle Scholar
  16. Hartley R, Zisserman A (2003) Multiple view geometry in computer vision. Cambridge University Press, Cambridge/New YorkGoogle Scholar
  17. Heinly J, Dunn E, Frahm JM (2012) Comparative evaluation of binary features. In: Computer vision–ECCV 2012. Springer, pp 759–773Google Scholar
  18. Jegou H, Douze M, Schmid C (2008) Hamming embedding and weak geometric consistency for large scale image search. In: Computer vision–ECCV 2008. Springer, pp 304–317Google Scholar
  19. Jégou H, Douze M, Schmid C, Pérez P (2010) Aggregating local descriptors into a compact image representation. In: IEEE conference on computer vision & pattern recognition, IEEE Computer Society, Washington DC, http://ieeexplore.ieee.org/document/5540039/
  20. Kumar DS, Jawahar C (2006) Robust homography-based control for camera positioning in piecewise planar environments. In: Computer vision, graphics and image processing. Springer, pp 906–918Google Scholar
  21. Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: IEEE computer society conference on computer vision and pattern recognition 2006, vol 2. IEEE, pp 2169–2178Google Scholar
  22. Leutenegger S, Chli M, Siegwart R (2011) Brisk: binary robust invariant scalable keypoints. In: IEEE international conference on computer vision (ICCV 2011). IEEE, pp 2548–2555Google Scholar
  23. Lowe D (1999) Object recognition from local scale-invariant features. In: The proceedings of the seventh IEEE international conference on computer vision 1999, vol 2. IEEE, pp 1150–1157Google Scholar
  24. Lucas BD, Kanade T et al (1981) An iterative image registration technique with an application to stereo vision. In: IJCAI, IEEE Computer Society, Washington DC, vol 81. pp 674–679. http://ieeexplore.ieee.org/document/5540039/
  25. Novak D, Batko M, Zezula P (to appear) Large-scale image retrieval using neural net descriptors. In: 38th ACM SIGIR international conference of research and development on information retrieval. SpringerGoogle Scholar
  26. Novak D, Zezula P (2014) Rank aggregation of candidate sets for efficient similarity search. In: Database and expert systems applications. Springer, pp 42–58Google Scholar
  27. Perronnin F, Dance C (2007) Fisher kernels on visual vocabularies for image categorization. In: IEEE conference on computer vision and pattern recognition (CVPR’07). IEEE, pp 1–8Google Scholar
  28. Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2007) Object retrieval with large vocabularies and fast spatial matching. In: IEEE conference on computer vision and pattern recognition (CVPR’07). IEEE, pp 1–8Google Scholar
  29. Razavian AS, Azizpour H, Sullivan J, Carlsson S (2014) CNN features off-the-shelf: an astounding baseline for recognition. In: IEEE conference on computer vision and pattern recognition workshops (CVPRW 2014). IEEE, pp 512–519Google Scholar
  30. Rublee E, Rabaud V, Konolige K, Bradski G (2011) Orb: an efficient alternative to sift or surf. In: IEEE international conference on computer vision (ICCV 2011). IEEE, pp 2564–2571Google Scholar
  31. Schroth G, Huitl R, Chen D, Abu-Alqumsan M, Al-Nuaimi A, Steinbach E (2011) Mobile visual location recognition. IEEE Signal Process Mag 28(4):77–89CrossRefGoogle Scholar
  32. Shi J, Tomasi C (1994) Good features to track. In: IEEE computer society conference on computer vision and pattern recognition (CVPR 1994). IEEE, pp 593–600Google Scholar
  33. Sivic J, Zisserman A (2003) Video Google: a text retrieval approach to object matching in videos. In: Proceedings of the ninth IEEE international conference on computer vision (ICCV’03), vol 2, pp 1470. IEEE Computer Society, Washington, DCGoogle Scholar
  34. Snavely N, Seitz SM, Szeliski R (2006) Photo tourism: exploring photo collections in 3D. In: ACM transactions on graphics (TOG), vol 25. ACM, pp 835–846Google Scholar
  35. Tolias G, Bursuc A, Furon T, Jégou H (2015) Rotation and translation covariant match kernels for image retrieval. Comput Vis Image Underst 140:9–20CrossRefGoogle Scholar
  36. Weyand T, Leibe B (2015) Visual landmark recognition from internet photo collections: a large-scale evaluation. Comput Vis Image Underst 135:1–15CrossRefGoogle Scholar
  37. Yang X, Cheng KT (2014) Local difference binary for ultrafast and distinctive feature description. IEEE Trans Pattern Anal Mach Intell 36(1):188–194CrossRefGoogle Scholar
  38. Zhang Z (2000) A flexible new technique for camera calibration. IEEE Trans Pattern Anal Mach Intell 22(11):1330–1334CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Giuseppe Amato
    • 1
  • Franco Alberto Cardillo
    • 1
  • Fabrizio Falchi
    • 1
  1. 1.CNR-ISTIPisaItaly

Personalised recommendations