Multimedia Tools and Applications

, Volume 51, Issue 1, pp 187–211 | Cite as

Geotagging in multimedia and computer vision—a survey

Article

Abstract

Geo-tagging is a fast-emerging trend in digital photography and community photo sharing. The presence of geographically relevant metadata with images and videos has opened up interesting research avenues within the multimedia and computer vision domains. In this paper, we survey geo-tagging related research within the context of multimedia and along three dimensions: (1) Modalities in which geographical information can be extracted, (2) Applications that can benefit from the use of geographical information, and (3) The interplay between modalities and applications. Our survey will introduce research problems and discuss significant approaches. We will discuss the nature of different modalities and lay out factors that are expected to govern the choices with respect to multimedia and vision applications. Finally, we discuss future research directions in this field.

Keywords

Geotagging GPS Multimedia Context Location Recognition 

References

  1. 1.
    Agarwal S, Snavely N, Simon I, Seitz SM, Szeliski R (2009) Building Rome in a day. In Proceedings of ICCVGoogle Scholar
  2. 2.
    Ahlers D, Boll S (2008) Oh Web image, where art thou? In Proceedings of MMMGoogle Scholar
  3. 3.
    Ames M, Naaman M (2007) Why we tag: motivations for annotation in mobile and online media. In Proceedings of SIGCHI Conference on Human Factors in Computing SystemsGoogle Scholar
  4. 4.
    Amitay E, Har’El N, Sivan R, Soffer A (2004) Web-a-where: geotagging web content. In Proceedings of ACM SIGIR Conference on Research and Development in Information RetrievalGoogle Scholar
  5. 5.
    Arslan S, Zimmermann R, Kim SH (2008) Viewable scene modeling for geospatial video search. In Proceedings of ACM MultimediaGoogle Scholar
  6. 6.
    Arslan S, Zhang L, Kim SH, He M, Zimmermann R (2009) GRVS: A georeferenced video search engine. In Proceedings of ACM MultimediaGoogle Scholar
  7. 7.
    Arslan S, Kim SH, He M, Zimmermann R (2010) Relevance ranking in georeferenced video search. Multimed Syst 16(2):105–125CrossRefGoogle Scholar
  8. 8.
    Backstrom L, Sun E, Marlow C (2010) Find me if you can: improving geographical prediction with social and spatial proximity. In Proceedings of WWWGoogle Scholar
  9. 9.
    Bay H, Tuytelaars T, Van Gool L (2006) Surf: speeded up robust features. In Proceedings of ECCVGoogle Scholar
  10. 10.
    Benz U, Hofmann P, Willhauck G, Lingenfelder I, Heynen M (2004) Multiresolution object oriented fuzzy analysis of remote sensing data for GIS information. ISPRS J Photogramm Remote Sens 58:239–258CrossRefGoogle Scholar
  11. 11.
    Cao L, Luo J, Kautz H, Huang T (2008) Annotating collections of geotagged photos using hierarchical event and scene models. In Proceedings of IEEE CVPRGoogle Scholar
  12. 12.
    Cao L, Luo J, Huang TS (2008) Annotating photo collections by label propagation according to multiple proximity cues. In Proceedings of ACM MultimediaGoogle Scholar
  13. 13.
    Cao L, Yu J, Luo J, Huang TS (2009) Enhancing semantic and geographic annotation of Web images via logistic canonical correlation regression. In Proceedings of ACM MultimediaGoogle Scholar
  14. 14.
    Cao L, Luo J, Gallagher A, Jin X, Han J, Huang TS (2010) A worldwide tourism recommendation system based on geotagged web photos. In Proceedings of ICASSPGoogle Scholar
  15. 15.
    Cham TJ, Ciptadi A, Tan WC, Pham MT, Chia LT (2010) Estimating camera pose from a single urban ground-view omnidirectional image and a 2D building outline map. In Proceedings of IEEE CVPR 2010Google Scholar
  16. 16.
    Chen Y, Chen XY, Rao FY, Yu XL, Li Y, Liu D (2004) LORE: an infrastructure to support location-aware services. IBM J Res Develop 48(5/6):601–616CrossRefGoogle Scholar
  17. 17.
    Chen L, Ozsu MT, Oria V (2005) Robust and fast similarity search for moving object trajectories. In Proceedings of ACM SIGMODGoogle Scholar
  18. 18.
    Chen W-C, Battestini A, Gelfand N, Setlur V (2009) Visual summaries of popular landmarks from community photo collections. In Proceedings of ACM MultimediaGoogle Scholar
  19. 19.
    Crandall D, Backstrom L, Huttenlocher D, Kleinberg J (2009) Mapping the world’s photos. In Proceedings of WWWGoogle Scholar
  20. 20.
    Cristani M, Perina A, Castellani U, Murino V (2008) Geo-located image analysis using latent representations. In Proceedings of IEEE CVPRGoogle Scholar
  21. 21.
    Davis M, Smith M, Canny D, Good N, King S, Jankiraman R (2005) Toward context-aware face recognition. In Proceedings of ACM MultimediaGoogle Scholar
  22. 22.
    De Silva GC, Aizawa K (2009) Retrieving multimedia travel stories using location data and spatial queries. In Proceedings of ACM MultimediaGoogle Scholar
  23. 23.
    Divvala S, Hoiem D, Hays J, Efros A, Hebert M (2009) An empirical study of context in object detection. In Proceedings of IEEE CVPRGoogle Scholar
  24. 24.
    Epshtein B, Ofek E, Wexler Y, Zhang P (2007) Hierarchical photo organization using geo-relevance. In Proceedings of 15th ACM Intl. Symposium on Advances in Geographic Information SystemsGoogle Scholar
  25. 25.
    Gallagher A (2009) A framework for using context to understanding images of people. Ph. D. ThesisGoogle Scholar
  26. 26.
    Gallagher A, Joshi D, Yu J, Luo J (2009) Geo-location inference from image content and user tags. In Proceedings of the IEEE Workshop on Internet Vision (with CVPR)Google Scholar
  27. 27.
    Goodchild MF (2007) Citizens as sensors: the world of volunteered geography. GeoJournal 69(4):211–221CrossRefGoogle Scholar
  28. 28.
    Hao Q, Cai R, Yang J -M, Xiao R, Liu L, Wang S, Zhang L (2009) Travelscope: standing on the shoulders of dedicated travelers. In Proceedings of ACM MultimediaGoogle Scholar
  29. 29.
    Hao Q, Cai R, Wang C, Xiao R, Yang J -M, Pang Y, Zhang L (2010) Equip tourists with knowledge mined from travelogues. In Proceedings of WWWGoogle Scholar
  30. 30.
    Hartley R, Zisserman A (2004) Multiple view geometry in computer vision. Cambridge University PressGoogle Scholar
  31. 31.
    Hays J, Efros A (2008) IM2GPS: estimating geographic information from a single image. In Proceedings of IEEE CVPRGoogle Scholar
  32. 32.
    Hinz S, Baumgartner A (2003) Automatic extraction of urban road networks from multi-view aerial imagery. ISPRS J Photogramm Remote Sens 83–98Google Scholar
  33. 33.
    Hinze A, Voisard A (2003) Location and time-based information delivery in tourism, Advances in spatial and temporal databases. Lect Notes Comput Sci 2750:489–507CrossRefGoogle Scholar
  34. 34.
    Hsieh C-C, Cheng W-H, Chang C-H, Chuang Y-Y, Wu J-L (2008) Photo navigator. In Proceedings of ACM MultimediaGoogle Scholar
  35. 35.
    Jacobs N, Satkin S, Roman N, Speyer R, Pless R (2007) Geolocating static cameras. In Proceedings of IEEE ICCVGoogle Scholar
  36. 36.
    Jaffe A, Tassa T, Davis M (2006) Generating summaries and visualization for large collections of geo-referenced photographs. In Proceedings of ACM Multimedia Information Retrieval (MIR) WorkshopGoogle Scholar
  37. 37.
    Ji R, Xie X, Yao H, Ma W-Y (2009) Mining city landmarks from blogs by graph modeling. In Proceedings of ACM MultimediaGoogle Scholar
  38. 38.
    Jin X, Davis DH (2005) An integrated system for automatic road mapping from high-resolution multi-spectral satellite imagery by information fusion. Information Fusion 6(4):257–273CrossRefGoogle Scholar
  39. 39.
    Jin X, Gallagher A, Cao L, Luo J, Han J (2010) The wisdom of social multimedia: using Flickr for prediction and forecast. In Proceedings of ACM MultimediaGoogle Scholar
  40. 40.
    Joshi D, Luo J (2008) Inferring generic activities and events from image content and bags of geo-tags. In Proceedings of ACM CIVRGoogle Scholar
  41. 41.
    Joshi D, Gallagher A, Yu J, Luo J (2010) Exploring user image tags for geo-location inference. In Proceedings of IEEE ICASSPGoogle Scholar
  42. 42.
    Kalogerakis E, Vesselova O, Hays J, Efros A, Hertzmann A (2009) Image sequence geolocation with human travel priors. In Proceedings of IEEE ICCVGoogle Scholar
  43. 43.
    Kaminsky R, Snavely N, Seitz SM, Szeliski R (2009) Alignment of 3D point clouds to overhead images. In Proceedings of the IEEE Workshop on Internet Vision (with CVPR)Google Scholar
  44. 44.
    Kennedy L, Naaman M (2008) Generating diverse and representative image search results for landmarks. In Proceedings of WWWGoogle Scholar
  45. 45.
    Kennedy L, Naaman M, Ahern S, Nair R, Rattenbury T (2007) How Flickr helps us make sense of the world: context and content in community-contributed media collections. In Proceedings of ACM MultimediaGoogle Scholar
  46. 46.
    Kim SH, Arslan S, Yu B, Zimmermann R (2010) Vector model in support of versatile georeferenced video search. In Proceedings of ACM Multimedia Systems ConferenceGoogle Scholar
  47. 47.
    Kleban J, Moxley E, Xu J, Manjunath BS (2009) Global annotation on georeferenced photographs. In Proceedings of ACM CIVRGoogle Scholar
  48. 48.
    Kosecka J, Zhang W (2002) Video compass. In Proceedings of European Conference on Computer Vision (ECCV)Google Scholar
  49. 49.
    Leung D, Newsame S (2010) Proximate sensing: inferring what-is-where from georeferenced photo collections. In Proceedings of IEEE CVPRGoogle Scholar
  50. 50.
    Li X, Wu C, Zach C, Lazebnik S, Frahm J-M (2008) Modeling and recognition of landmark image collections using iconic scene graphs. In Proceedings of ECCVGoogle Scholar
  51. 51.
    Li Y, Crandall D, Huttenlocher D (2009) Landmark classification in large-scale image collections. In Proceedings of ICCVGoogle Scholar
  52. 52.
    Liao L, Fox D, and Kautz, H (2007) Extracting places and activities from GPS traces using hierarchical conditional random fields. Int J Rob ResGoogle Scholar
  53. 53.
    Liu L, Wolfson O, Yin H (2006) Extracting semantic location from outdoor positioning systems. In Proceedings of the IEEE International Conference on Mobile Data ManagementGoogle Scholar
  54. 54.
    Lothe P, Bourgeois S, Royer E, Dhome M, Naudet-Collette S (2010) Real-time vehicle global localization with a single camera in dense urban areas: exploitation of coarse 3D city models. In Proceedings of CVPRGoogle Scholar
  55. 55.
    Lowe D (2004) Distinctive image features from scale-invariant keypoints. J Comput VisGoogle Scholar
  56. 56.
    Luo J, Boutell M, Brown C (2006) Pictures are not taken in a vacuum: An overview of exploiting context for semantic scene content understanding. IEEE Signal Process Mag 23(2):101–114CrossRefGoogle Scholar
  57. 57.
    Luo J, Yu J, Joshi D, Hao W (2008) Event recognition: viewing the world with a third eye. In Proceedings of ACM MultimediaGoogle Scholar
  58. 58.
    Luo Z, Li H, Tang J, Hong R, Chua T-S (2009) ViewFocus: explore places of interests on Google maps using photos with view direction filtering. In Proceedings of ACM MultimediaGoogle Scholar
  59. 59.
    Luo Z, Li H, Tang J, Hong R, Chua T –S (2010) Estimating poses of world’s photos with geographic metadata. In Proceedings of MMMGoogle Scholar
  60. 60.
    Matas J, Chum O, Urban M, Pajdla T (2002) Robust wide-baseline stereo from maximally stable extremal regions. Image Vis Comput 22(10)Google Scholar
  61. 61.
    Moxley E, Kleban J, Manjunath BS (2008) SpiritTagger: a geo-aware tag suggestion tool mined from Flickr. In Proceedings of ACM Multimedia Information Retrieval (MIR)Google Scholar
  62. 62.
    Naaman M, Song Y -J, Paepcke A, Garcia-Molina H (2004) Automatic organization for digital photographs with geographic coordinates. In Proceedings of ACM/IEEE-CS Joint Conference on Digital LibrariesGoogle Scholar
  63. 63.
    Naaman M, Yeh RB, Garcia-Molina H, Paepcke A (2005) Leveraging context to resolve identity in photo albums. In Proceedings of ACM/IEEE-CS Joint Conference on Digital librariesGoogle Scholar
  64. 64.
    O’Hare N (2007) Semi-automatic person-annotation in context-aware personal photo-collections. Ph. D. thesisGoogle Scholar
  65. 65.
    O’Hare N, Smeaton A (2009) Context-aware person identification in personal photo collections. IEEE Transactions on MultimediaGoogle Scholar
  66. 66.
    Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vision 42:145–175MATHCrossRefGoogle Scholar
  67. 67.
    Park M, Luo J, Collins R, Liu Y (2010) Beyond GPS: determining viewing direction of a geotagged image. n Proceedings of ACM MultimediaGoogle Scholar
  68. 68.
    Paucher R, Turk M (2010) Location-based augmented reality on mobile phones. In Proceedings of IEEE CVPRGoogle Scholar
  69. 69.
    Pavlidis T (2009) Why meaningful automatic tagging of images is very hard. In Proceedings of IEEE ICMEGoogle Scholar
  70. 70.
    Pei J, Han J, Mortazavi-Asl B, Wang J, Pinto H, Chen Q, Dayal U, Hsu M (2004) Mining sequential patterns by pattern-growth: the prefixspan approach. IEEE Trans Knowl Data Eng 16(11):1424–1440CrossRefGoogle Scholar
  71. 71.
    Pelekis N, Kopanakis I, Kotsifakos EE, Frentzos E, Theodoridis Y (2009) Clustering trajectories of moving objects in an uncertain world. In Proceedings of ICDMGoogle Scholar
  72. 72.
    Pigeau A, Gelgon M (2005) Building and tracking hierarchical geographical & temporal partitions for image collection management on mobile devices. In Proceedings of ACM MultimediaGoogle Scholar
  73. 73.
    Popescu A, Grefenstette G (2009) Deducing trip related information from Flickr. In Proceedings of WWWGoogle Scholar
  74. 74.
    Popescu A, Moëllic P-A (2009) MonuAnno: Automatic annotation of georeferenced landmarks images. In Proceedings of ACM CIVR.Google Scholar
  75. 75.
    Popescu A, Grefenstette G, Moëllic P-A (2009) Mining tourist information from user-supplied collections. In Proceedings of CIKMGoogle Scholar
  76. 76.
    Quack T, Leibe B, Van Gool L (2008) World-scale mining of objects and events from community photo collections. In Proceedings of CIVRGoogle Scholar
  77. 77.
    Rattenbury T, Naaman M (2009) Methods for extracting place semantics from Flickr tags. ACM Trans Web 3(1):1–30CrossRefGoogle Scholar
  78. 78.
    Rattenbury T, Good N, Naaman M (2007) Towards automatic extraction of event and place semantics from Flickr tags. In Proceedings of SIGIRGoogle Scholar
  79. 79.
    Sakaki T, Okazaki M, Matsuo Y (2010) Earthquake shakes Twitter users: real-time event detection by social sensors. In Proceedings of International Conference on WWWGoogle Scholar
  80. 80.
    Schaffalitzky F, Zisserman A (2002) Multi-view matching for unordered image sets. In Proceedings of ECCVGoogle Scholar
  81. 81.
    Schiller JH, Voisard A (2004) Location-based services. Morgan KaufmannGoogle Scholar
  82. 82.
    Schindler G, Brown M, Szeliski R (2007) City-scale location recognition. In Proceedings of IEEE CVPRGoogle Scholar
  83. 83.
    Schindler G, Krishnamurthy P, Lublinerman R, Liu Y, Dellaert F (2008) Detecting and matching repeated patterns for automatic geo-tagging in urban environments. In Proceedings of IEEE CVPRGoogle Scholar
  84. 84.
    Serdyukov P, Murdock V, van Zwol R (2009) Placing Flickr photos on a map. In Proceedings of SIGIRGoogle Scholar
  85. 85.
    Simon I, Seitz SM (2008) Scene segmentation using the wisdom of crowds. In Proceedings of ECCVGoogle Scholar
  86. 86.
    Simon I, Snavely N, Seitz SM (2007) Scene summarization for online image collections. In Proceedings of IEEE ICCVGoogle Scholar
  87. 87.
    Singh V, Gao M, Jain R (2010) Social Pixels: genesis and evaluation. In Proceedings of ACM MultimediaGoogle Scholar
  88. 88.
    Snavely N, Seitz SM, Szeliski R (2006) Photo tourism: exploring photo collections in 3D. ACM Trans Graph 25(3):835–846CrossRefGoogle Scholar
  89. 89.
    Snavely N, Garg R, Seitz SM, Szeliski R (2008) Finding paths through the world’s photos. ACM Trans Graph 27(3)Google Scholar
  90. 90.
    Snavely N, Seitz SM, Szeliski R (2008) Modeling the world from internet photo collections. Int J Comput Vision 80(2):189–210CrossRefGoogle Scholar
  91. 91.
    Sunkavalli K, Romeiro F, Matusik W, Zickler T, Pfister H (2008) What do color changes reveal about an outdoor scene. In Proceedings of CVPRGoogle Scholar
  92. 92.
    Szeliski R (2005) Where am I? In Proceedings of IEEE ICCV Computer Vision Contest. http://research.microsoft.com/en-us/um/people/szeliski/VisionContest05/old_ideas.htm
  93. 93.
    Torniai C, Battle S, Cayzer S (2006). Sharing, discovering and browsing geotagged pictures on the web. SpringerGoogle Scholar
  94. 94.
    Toyama K, Logan R, Roseway A (2003) Geographic location tags on digital images. In Proceedings of ACM MultimediaGoogle Scholar
  95. 95.
    Trinder JC, Wang Y (1998) Automatic road extraction from aerial images. Digital Signal Process 8(4):215–224CrossRefGoogle Scholar
  96. 96.
    Tsai C -M, Qamra A, Chang E (2005) Extent: inferring image metadata from context and content. In Proceedings of IEEE ICMEGoogle Scholar
  97. 97.
    Tsikrika T, Diou C, de Vries A, Delopoulos A (2009) Image annotation using clickthrough data. In Proceedings of ACM CIVRGoogle Scholar
  98. 98.
    Tuytelaars T, Van Gool L (2004) Matching widely separated views based on affine invariant regions. Int J Comput VisGoogle Scholar
  99. 99.
    Ueda T, Amagasa T, Yoshikawa M, Uemura S (2002) A system for retrieval and digest creation of video data based on geographic objects. In Proceedings of International Conference on Database and Expert Systems ApplicationsGoogle Scholar
  100. 100.
    Wei X -Y, Jiang Y-G, Ngo C-W (2009) Exploring inter-concept relationship with context space for semantic video indexing. In Proceedings of ACM CIVRGoogle Scholar
  101. 101.
    Wolf L, Bileschi S (2006) A critical view of context. Int J Comput Vision 68(1):43–52CrossRefGoogle Scholar
  102. 102.
    Yanai K, Kawakubo H, Qiu B (2009) A visual analysis of the relationship between word concepts and geographical locations, In Proceedings of CIVRGoogle Scholar
  103. 103.
    Yanai K, Yaegashi K, Qiu B (2009) Detecting cultural differences using consumer-generated geotagged photos. In Proceedings of International Workshop on Location and the WebGoogle Scholar
  104. 104.
    Yu J, Luo J (2008) Leveraging probabilistic season and location context models for scene understanding. In Proceedings of ACM CIVRGoogle Scholar
  105. 105.
    Yuan J, Luo J, Kautz H, Wu Y (2008) Mining GPS traces and visual words for event classification. In Proceedings of Multimedia Information Retrieval (MIR)Google Scholar
  106. 106.
    Zhang W, Kosecka J (2006) Image based localization in urban environments. In Proceedings of 3DPVTGoogle Scholar
  107. 107.
    Zheng Y, Wang L, Zhang R, Xie X, Ma W-Y (2009) GeoLife: managing and understanding your past life over maps. In Proceedings of MDMGoogle Scholar
  108. 108.
    Zheng Y, Zhang L, Xie X, Ma W-Y (2009) Mining interesting locations and travel sequences from GPS trajectories. In Proceedings of WWWGoogle Scholar
  109. 109.
    Zheng Y, Zhao M, Song Y, Hartwig A, Buddemeier U, Bissacco A, Brucher F, Chua T-S, Neven H (2009) Tour the world: building a web-scale landmark recognition engine. In Proceedings of CVPRGoogle Scholar
  110. 110.
    Zheng VW, Zheng Y, Xie X, Yang Q (2010) Collaborative location and activity recommendations with GPS history data. In Proceedings of WWWGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  • Jiebo Luo
    • 1
  • Dhiraj Joshi
    • 1
  • Jie Yu
    • 1
  • Andrew Gallagher
    • 1
  1. 1.Kodak Research LaboratoriesEastman Kodak CompanyRochesterUSA

Personalised recommendations