Advertisement

Multimedia Tools and Applications

, Volume 51, Issue 2, pp 555–592 | Cite as

VIRaL: Visual Image Retrieval and Localization

  • Yannis Kalantidis
  • Giorgos Tolias
  • Yannis Avrithis
  • Marios Phinikettos
  • Evaggelos Spyrou
  • Phivos Mylonas
  • Stefanos Kollias
Article

Abstract

New applications are emerging every day exploiting the huge data volume in community photo collections. Most focus on popular subsets, e.g., images containing landmarks or associated to Wikipedia articles. In this work we are concerned with the problem of accurately finding the location where a photo is taken without needing any metadata, that is, solely by its visual content. We also recognize landmarks where applicable, automatically linking them to Wikipedia. We show that the time is right for automating the geo-tagging process, and we show how this can work at large scale. In doing so, we do exploit redundancy of content in popular locations—but unlike most existing solutions, we do not restrict to landmarks. In other words, we can compactly represent the visual content of all thousands of images depicting e.g., the Parthenon and still retrieve any single, isolated, non-landmark image like a house or a graffiti on a wall. Starting from an existing, geo-tagged dataset, we cluster images into sets of different views of the same scene. This is a very efficient, scalable, and fully automated mining process. We then align all views in a set to one reference image and construct a 2D scene map. Our indexing scheme operates directly on scene maps. We evaluate our solution on a challenging one million urban image dataset and provide public access to our service through our online application, VIRaL.

Keywords

Image retrieval Image clustering Sub-linear indexing Geotagging Location recognition Landmark recognition Image localization 

Notes

Acknowledgements

This work was partially supported by the European Commission under contract FP7-215453 WeKnowIt.

References

  1. 1.
    Agarwal S, Snavely N, Simon I, Seitz SM, Szeliski R (2009) Building Rome in a day. In: International conference on computer visionGoogle Scholar
  2. 2.
    Avrithis Y, Kalantidis Y, Tolias G, Spyrou E (2010) Retrieving landmark and non-landmark images from community photo collections. In: ACM multimedia. Firenze, ItalyGoogle Scholar
  3. 3.
    Avrithis Y, Tolias G, Kalantidis Y (2010) Feature map hashing: sub-linear indexing of appearance and global geometry. In: ACM multimedia. Firenze, ItalyGoogle Scholar
  4. 4.
    Bay H, Tuytelaars T, Van Gool L (2006) SURF: speeded up robust features. In: European conference on computer visionGoogle Scholar
  5. 5.
    Cheng Y (1995) Mean shift, mode seeking, and clustering. IEEE Trans Pattern Anal Mach Intell 17(8):790–799CrossRefGoogle Scholar
  6. 6.
    Chum O, Matas J (2010) Large-scale discovery of spatially related images. IEEE Trans Pattern Anal Mach Intell 32(2):371–377CrossRefGoogle Scholar
  7. 7.
    Chum O, Matas J, Kittler J (2003) Locally optimized RANSAC. In: German association for pattern recognition. Springer, Berlin, p 236Google Scholar
  8. 8.
    Chum O, Perdoch M, Matas J (2009) Geometric min-hashing: finding a (thick) needle in a haystack. In: Computer vision and pattern recognitionGoogle Scholar
  9. 9.
    Chum O, Philbin J, Sivic J, Isard M, Zisserman A (2007) Total recall: automatic query expansion with a generative feature model for object retrieval. In: International conference on computer visionGoogle Scholar
  10. 10.
    Crandall D, Backstrom L, Huttenlocher D, Kleinberg J (2009) Mapping the world’s photos. In: International World Wide Web conferenceGoogle Scholar
  11. 11.
    Gammeter S, Bossard L, Quack T, Van Gool L (2009) I know what you did last summer: object-level auto-annotation of holiday snaps. In: International conference on computer visionGoogle Scholar
  12. 12.
    Hartley R, Zisserman A (2000) Multiple view geometry. Cambridge University Press, CambridgezbMATHGoogle Scholar
  13. 13.
    Hays J, Efros AA (2008) IM2GPS: estimating geographic information from a single image. In: Computer vision and pattern recognitionGoogle Scholar
  14. 14.
    Heath K, Gelfand N, Ovsjanikov M, Aanjaneya M, Guibas LJ (2010) Image webs: computing and exploiting connectivity in image collections. In: Computer vision and pattern recognitionGoogle Scholar
  15. 15.
    Jegou H, Douze M, Schmid C (2010) Improving bag-of-features for large scale image search. Int J Comput Vis 1–21Google Scholar
  16. 16.
    Jegou H, Douze M, Schmid C, Perez P (2010) Aggregating local descriptors into a compact image representation. In: Computer vision and pattern recognitionGoogle Scholar
  17. 17.
    Johansson B, Cipolla R (2002) A system for automatic pose—estimation from a single image in a city scene. In: IASTED international conference on signal processing, pattern recognition and applicationsGoogle Scholar
  18. 18.
    Kalogerakis E, Vesselova O, Hays J, Efros AA, Hertzmann A (2009) Image sequence geolocation with human travel priors. In: International conference on computer visionGoogle Scholar
  19. 19.
    Kennedy L, Naaman M, Ahern S, Nair R, Rattenbury T (2007) How flickr helps us make sense of the world: Context and content in community-contributed media collections. In: ACM multimedia, vol 3, pp 631–640Google Scholar
  20. 20.
    Lampert CH (2009) Detecting objects in large image collections and videos by efficient subimage retrieval. In: International conference on computer visionGoogle Scholar
  21. 21.
    Leibe B, Leonardis A, Schiele B (2008) Robust object detection with interleaved categorization and segmentation. Int J Comput Vis 77(1):259–289CrossRefGoogle Scholar
  22. 22.
    Levenshtein VI (1965) Binary codes capable of correcting spurious insertions and deletions of ones. Probl Inf Transm 1(1):8–17MathSciNetGoogle Scholar
  23. 23.
    Li X, Wu C, Zach C, Lazebnik S, Frahm JM (2008) Modeling and recognition of landmark image collections using iconic scene graphs. In: European conference on computer vision. Springer, Berlin, pp 427–440Google Scholar
  24. 24.
    Li Y, Crandall DJ, Huttenlocher DP (2009) Landmark classification in large-scale image collections. In: International conference on computer VISIONGoogle Scholar
  25. 25.
    Lowe DG (2001) Local feature view clustering for 3D object recognition. In: Computer vision and pattern recognitionGoogle Scholar
  26. 26.
    Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110CrossRefGoogle Scholar
  27. 27.
    Matas J, Chum O, Urban M, Pajdla T (2004) Robust wide-baseline stereo from maximally stable extremal regions. Image Vis Comput 22(10):761–767CrossRefGoogle Scholar
  28. 28.
    McCallum A, Nigam K, Ungar LH (2000) Efficient clustering of high-dimensional data sets with application to reference matching. In: 6Th ACM international conference on knowledge discovery and data mining, p 178Google Scholar
  29. 29.
    Muja M, Lowe DG (2009) Fast approximate nearest neighbors with automatic algorithm configuration. In: International conference on computer visionGoogle Scholar
  30. 30.
    Nister D, Stewenius H (2006) Scalable recognition with a vocabulary tree. In: Computer vision and pattern recognitionGoogle Scholar
  31. 31.
    Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis 42(3):145–175zbMATHCrossRefGoogle Scholar
  32. 32.
    Perdoch M, Chum O, Matas J (2009) Efficient representation of local geometry for large scale object retrieval. In: Computer vision and pattern recognitionGoogle Scholar
  33. 33.
    Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2007) Object retrieval with large vocabularies and fast spatial matching. In: Computer vision and pattern recognitionGoogle Scholar
  34. 34.
    Quack T, Leibe B, Van Gool L (2008) World-scale mining of objects and events from community photo collections. In: CIVR, pp 47–56Google Scholar
  35. 35.
    Robertson D, Cipolla R (2004) An image-based system for urban navigation. In: British machine vision conferenceGoogle Scholar
  36. 36.
    Schaffalitzky F, Zisserman A (2002) Multi-view matching for unordered image sets, or how do i organize my holiday snaps. In: European conference on computer visionGoogle Scholar
  37. 37.
    Schindler G, Brown M, Szeliski R (2007) City-scale location recognition. In: Computer vision and pattern recognitionGoogle Scholar
  38. 38.
    Silpa-Anan C, Hartley R (2008) Optimised KD-trees for fast image descriptor matching. In: Computer vision and pattern recognitionGoogle Scholar
  39. 39.
    Simon I, Snavely N, Seitz SM (2007) Scene summarization for online image collections. In: International conference on computer visionGoogle Scholar
  40. 40.
    Sivic J, Zisserman A (2003) Video Google: a text retrieval approach to object matching in videos. In: International conference on computer vision, pp 1470–1477Google Scholar
  41. 41.
    Snavely N, Seitz SM, Szeliski R (2006) Photo tourism: exploring photo collections in 3D. In: Computer graphics and interactive techniques, pp 835–846Google Scholar
  42. 42.
    Snavely N, Seitz SM, Szeliski R (2008) Skeletal graphs for efficient structure from motion. In: Computer vision and pattern recognitionGoogle Scholar
  43. 43.
    Steinhoff U, Omercevic D, Perko R, Schiele B, Leonardis A (2007) How computer vision can help in outdoor positioning. In: European conference on ambient intelligenceGoogle Scholar
  44. 44.
    Tipping M, Schölkopf B (2001) A kernel approach for vector quantization with guaranteed distortion bounds. In: Artificial intelligence and statistics, pp 129–134Google Scholar
  45. 45.
    Zhang W, Kosecka J (2006) Image based localization in urban environments. In: International symposium on 3D data processing, visualization and transmissionGoogle Scholar
  46. 46.
    Zheng Y, Zhao M, Song Y, Adam H, Buddemeier U, Bissacco A, Brucher F, Chua TS, Neven H (2009) Tour the world: building a web-scale landmark recognition engine. In: Computer vision and pattern recognitionGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  • Yannis Kalantidis
    • 1
  • Giorgos Tolias
    • 1
  • Yannis Avrithis
    • 1
  • Marios Phinikettos
    • 1
  • Evaggelos Spyrou
    • 1
  • Phivos Mylonas
    • 1
  • Stefanos Kollias
    • 1
  1. 1.National Technical University of AthensAthensGreece

Personalised recommendations