Skip to main content
Log in

Geotagging in multimedia and computer vision—a survey

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Geo-tagging is a fast-emerging trend in digital photography and community photo sharing. The presence of geographically relevant metadata with images and videos has opened up interesting research avenues within the multimedia and computer vision domains. In this paper, we survey geo-tagging related research within the context of multimedia and along three dimensions: (1) Modalities in which geographical information can be extracted, (2) Applications that can benefit from the use of geographical information, and (3) The interplay between modalities and applications. Our survey will introduce research problems and discuss significant approaches. We will discuss the nature of different modalities and lay out factors that are expected to govern the choices with respect to multimedia and vision applications. Finally, we discuss future research directions in this field.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. http://zonetag.research.yahoo.com

  2. http://tagmaps.research.yahoo.com/

References

  1. Agarwal S, Snavely N, Simon I, Seitz SM, Szeliski R (2009) Building Rome in a day. In Proceedings of ICCV

  2. Ahlers D, Boll S (2008) Oh Web image, where art thou? In Proceedings of MMM

  3. Ames M, Naaman M (2007) Why we tag: motivations for annotation in mobile and online media. In Proceedings of SIGCHI Conference on Human Factors in Computing Systems

  4. Amitay E, Har’El N, Sivan R, Soffer A (2004) Web-a-where: geotagging web content. In Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval

  5. Arslan S, Zimmermann R, Kim SH (2008) Viewable scene modeling for geospatial video search. In Proceedings of ACM Multimedia

  6. Arslan S, Zhang L, Kim SH, He M, Zimmermann R (2009) GRVS: A georeferenced video search engine. In Proceedings of ACM Multimedia

  7. Arslan S, Kim SH, He M, Zimmermann R (2010) Relevance ranking in georeferenced video search. Multimed Syst 16(2):105–125

    Article  Google Scholar 

  8. Backstrom L, Sun E, Marlow C (2010) Find me if you can: improving geographical prediction with social and spatial proximity. In Proceedings of WWW

  9. Bay H, Tuytelaars T, Van Gool L (2006) Surf: speeded up robust features. In Proceedings of ECCV

  10. Benz U, Hofmann P, Willhauck G, Lingenfelder I, Heynen M (2004) Multiresolution object oriented fuzzy analysis of remote sensing data for GIS information. ISPRS J Photogramm Remote Sens 58:239–258

    Article  Google Scholar 

  11. Cao L, Luo J, Kautz H, Huang T (2008) Annotating collections of geotagged photos using hierarchical event and scene models. In Proceedings of IEEE CVPR

  12. Cao L, Luo J, Huang TS (2008) Annotating photo collections by label propagation according to multiple proximity cues. In Proceedings of ACM Multimedia

  13. Cao L, Yu J, Luo J, Huang TS (2009) Enhancing semantic and geographic annotation of Web images via logistic canonical correlation regression. In Proceedings of ACM Multimedia

  14. Cao L, Luo J, Gallagher A, Jin X, Han J, Huang TS (2010) A worldwide tourism recommendation system based on geotagged web photos. In Proceedings of ICASSP

  15. Cham TJ, Ciptadi A, Tan WC, Pham MT, Chia LT (2010) Estimating camera pose from a single urban ground-view omnidirectional image and a 2D building outline map. In Proceedings of IEEE CVPR 2010

  16. Chen Y, Chen XY, Rao FY, Yu XL, Li Y, Liu D (2004) LORE: an infrastructure to support location-aware services. IBM J Res Develop 48(5/6):601–616

    Article  Google Scholar 

  17. Chen L, Ozsu MT, Oria V (2005) Robust and fast similarity search for moving object trajectories. In Proceedings of ACM SIGMOD

  18. Chen W-C, Battestini A, Gelfand N, Setlur V (2009) Visual summaries of popular landmarks from community photo collections. In Proceedings of ACM Multimedia

  19. Crandall D, Backstrom L, Huttenlocher D, Kleinberg J (2009) Mapping the world’s photos. In Proceedings of WWW

  20. Cristani M, Perina A, Castellani U, Murino V (2008) Geo-located image analysis using latent representations. In Proceedings of IEEE CVPR

  21. Davis M, Smith M, Canny D, Good N, King S, Jankiraman R (2005) Toward context-aware face recognition. In Proceedings of ACM Multimedia

  22. De Silva GC, Aizawa K (2009) Retrieving multimedia travel stories using location data and spatial queries. In Proceedings of ACM Multimedia

  23. Divvala S, Hoiem D, Hays J, Efros A, Hebert M (2009) An empirical study of context in object detection. In Proceedings of IEEE CVPR

  24. Epshtein B, Ofek E, Wexler Y, Zhang P (2007) Hierarchical photo organization using geo-relevance. In Proceedings of 15th ACM Intl. Symposium on Advances in Geographic Information Systems

  25. Gallagher A (2009) A framework for using context to understanding images of people. Ph. D. Thesis

  26. Gallagher A, Joshi D, Yu J, Luo J (2009) Geo-location inference from image content and user tags. In Proceedings of the IEEE Workshop on Internet Vision (with CVPR)

  27. Goodchild MF (2007) Citizens as sensors: the world of volunteered geography. GeoJournal 69(4):211–221

    Article  Google Scholar 

  28. Hao Q, Cai R, Yang J -M, Xiao R, Liu L, Wang S, Zhang L (2009) Travelscope: standing on the shoulders of dedicated travelers. In Proceedings of ACM Multimedia

  29. Hao Q, Cai R, Wang C, Xiao R, Yang J -M, Pang Y, Zhang L (2010) Equip tourists with knowledge mined from travelogues. In Proceedings of WWW

  30. Hartley R, Zisserman A (2004) Multiple view geometry in computer vision. Cambridge University Press

  31. Hays J, Efros A (2008) IM2GPS: estimating geographic information from a single image. In Proceedings of IEEE CVPR

  32. Hinz S, Baumgartner A (2003) Automatic extraction of urban road networks from multi-view aerial imagery. ISPRS J Photogramm Remote Sens 83–98

  33. Hinze A, Voisard A (2003) Location and time-based information delivery in tourism, Advances in spatial and temporal databases. Lect Notes Comput Sci 2750:489–507

    Article  Google Scholar 

  34. Hsieh C-C, Cheng W-H, Chang C-H, Chuang Y-Y, Wu J-L (2008) Photo navigator. In Proceedings of ACM Multimedia

  35. Jacobs N, Satkin S, Roman N, Speyer R, Pless R (2007) Geolocating static cameras. In Proceedings of IEEE ICCV

  36. Jaffe A, Tassa T, Davis M (2006) Generating summaries and visualization for large collections of geo-referenced photographs. In Proceedings of ACM Multimedia Information Retrieval (MIR) Workshop

  37. Ji R, Xie X, Yao H, Ma W-Y (2009) Mining city landmarks from blogs by graph modeling. In Proceedings of ACM Multimedia

  38. Jin X, Davis DH (2005) An integrated system for automatic road mapping from high-resolution multi-spectral satellite imagery by information fusion. Information Fusion 6(4):257–273

    Article  Google Scholar 

  39. Jin X, Gallagher A, Cao L, Luo J, Han J (2010) The wisdom of social multimedia: using Flickr for prediction and forecast. In Proceedings of ACM Multimedia

  40. Joshi D, Luo J (2008) Inferring generic activities and events from image content and bags of geo-tags. In Proceedings of ACM CIVR

  41. Joshi D, Gallagher A, Yu J, Luo J (2010) Exploring user image tags for geo-location inference. In Proceedings of IEEE ICASSP

  42. Kalogerakis E, Vesselova O, Hays J, Efros A, Hertzmann A (2009) Image sequence geolocation with human travel priors. In Proceedings of IEEE ICCV

  43. Kaminsky R, Snavely N, Seitz SM, Szeliski R (2009) Alignment of 3D point clouds to overhead images. In Proceedings of the IEEE Workshop on Internet Vision (with CVPR)

  44. Kennedy L, Naaman M (2008) Generating diverse and representative image search results for landmarks. In Proceedings of WWW

  45. Kennedy L, Naaman M, Ahern S, Nair R, Rattenbury T (2007) How Flickr helps us make sense of the world: context and content in community-contributed media collections. In Proceedings of ACM Multimedia

  46. Kim SH, Arslan S, Yu B, Zimmermann R (2010) Vector model in support of versatile georeferenced video search. In Proceedings of ACM Multimedia Systems Conference

  47. Kleban J, Moxley E, Xu J, Manjunath BS (2009) Global annotation on georeferenced photographs. In Proceedings of ACM CIVR

  48. Kosecka J, Zhang W (2002) Video compass. In Proceedings of European Conference on Computer Vision (ECCV)

  49. Leung D, Newsame S (2010) Proximate sensing: inferring what-is-where from georeferenced photo collections. In Proceedings of IEEE CVPR

  50. Li X, Wu C, Zach C, Lazebnik S, Frahm J-M (2008) Modeling and recognition of landmark image collections using iconic scene graphs. In Proceedings of ECCV

  51. Li Y, Crandall D, Huttenlocher D (2009) Landmark classification in large-scale image collections. In Proceedings of ICCV

  52. Liao L, Fox D, and Kautz, H (2007) Extracting places and activities from GPS traces using hierarchical conditional random fields. Int J Rob Res

  53. Liu L, Wolfson O, Yin H (2006) Extracting semantic location from outdoor positioning systems. In Proceedings of the IEEE International Conference on Mobile Data Management

  54. Lothe P, Bourgeois S, Royer E, Dhome M, Naudet-Collette S (2010) Real-time vehicle global localization with a single camera in dense urban areas: exploitation of coarse 3D city models. In Proceedings of CVPR

  55. Lowe D (2004) Distinctive image features from scale-invariant keypoints. J Comput Vis

  56. Luo J, Boutell M, Brown C (2006) Pictures are not taken in a vacuum: An overview of exploiting context for semantic scene content understanding. IEEE Signal Process Mag 23(2):101–114

    Article  Google Scholar 

  57. Luo J, Yu J, Joshi D, Hao W (2008) Event recognition: viewing the world with a third eye. In Proceedings of ACM Multimedia

  58. Luo Z, Li H, Tang J, Hong R, Chua T-S (2009) ViewFocus: explore places of interests on Google maps using photos with view direction filtering. In Proceedings of ACM Multimedia

  59. Luo Z, Li H, Tang J, Hong R, Chua T –S (2010) Estimating poses of world’s photos with geographic metadata. In Proceedings of MMM

  60. Matas J, Chum O, Urban M, Pajdla T (2002) Robust wide-baseline stereo from maximally stable extremal regions. Image Vis Comput 22(10)

  61. Moxley E, Kleban J, Manjunath BS (2008) SpiritTagger: a geo-aware tag suggestion tool mined from Flickr. In Proceedings of ACM Multimedia Information Retrieval (MIR)

  62. Naaman M, Song Y -J, Paepcke A, Garcia-Molina H (2004) Automatic organization for digital photographs with geographic coordinates. In Proceedings of ACM/IEEE-CS Joint Conference on Digital Libraries

  63. Naaman M, Yeh RB, Garcia-Molina H, Paepcke A (2005) Leveraging context to resolve identity in photo albums. In Proceedings of ACM/IEEE-CS Joint Conference on Digital libraries

  64. O’Hare N (2007) Semi-automatic person-annotation in context-aware personal photo-collections. Ph. D. thesis

  65. O’Hare N, Smeaton A (2009) Context-aware person identification in personal photo collections. IEEE Transactions on Multimedia

  66. Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vision 42:145–175

    Article  MATH  Google Scholar 

  67. Park M, Luo J, Collins R, Liu Y (2010) Beyond GPS: determining viewing direction of a geotagged image. n Proceedings of ACM Multimedia

  68. Paucher R, Turk M (2010) Location-based augmented reality on mobile phones. In Proceedings of IEEE CVPR

  69. Pavlidis T (2009) Why meaningful automatic tagging of images is very hard. In Proceedings of IEEE ICME

  70. Pei J, Han J, Mortazavi-Asl B, Wang J, Pinto H, Chen Q, Dayal U, Hsu M (2004) Mining sequential patterns by pattern-growth: the prefixspan approach. IEEE Trans Knowl Data Eng 16(11):1424–1440

    Article  Google Scholar 

  71. Pelekis N, Kopanakis I, Kotsifakos EE, Frentzos E, Theodoridis Y (2009) Clustering trajectories of moving objects in an uncertain world. In Proceedings of ICDM

  72. Pigeau A, Gelgon M (2005) Building and tracking hierarchical geographical & temporal partitions for image collection management on mobile devices. In Proceedings of ACM Multimedia

  73. Popescu A, Grefenstette G (2009) Deducing trip related information from Flickr. In Proceedings of WWW

  74. Popescu A, Moëllic P-A (2009) MonuAnno: Automatic annotation of georeferenced landmarks images. In Proceedings of ACM CIVR.

  75. Popescu A, Grefenstette G, Moëllic P-A (2009) Mining tourist information from user-supplied collections. In Proceedings of CIKM

  76. Quack T, Leibe B, Van Gool L (2008) World-scale mining of objects and events from community photo collections. In Proceedings of CIVR

  77. Rattenbury T, Naaman M (2009) Methods for extracting place semantics from Flickr tags. ACM Trans Web 3(1):1–30

    Article  Google Scholar 

  78. Rattenbury T, Good N, Naaman M (2007) Towards automatic extraction of event and place semantics from Flickr tags. In Proceedings of SIGIR

  79. Sakaki T, Okazaki M, Matsuo Y (2010) Earthquake shakes Twitter users: real-time event detection by social sensors. In Proceedings of International Conference on WWW

  80. Schaffalitzky F, Zisserman A (2002) Multi-view matching for unordered image sets. In Proceedings of ECCV

  81. Schiller JH, Voisard A (2004) Location-based services. Morgan Kaufmann

  82. Schindler G, Brown M, Szeliski R (2007) City-scale location recognition. In Proceedings of IEEE CVPR

  83. Schindler G, Krishnamurthy P, Lublinerman R, Liu Y, Dellaert F (2008) Detecting and matching repeated patterns for automatic geo-tagging in urban environments. In Proceedings of IEEE CVPR

  84. Serdyukov P, Murdock V, van Zwol R (2009) Placing Flickr photos on a map. In Proceedings of SIGIR

  85. Simon I, Seitz SM (2008) Scene segmentation using the wisdom of crowds. In Proceedings of ECCV

  86. Simon I, Snavely N, Seitz SM (2007) Scene summarization for online image collections. In Proceedings of IEEE ICCV

  87. Singh V, Gao M, Jain R (2010) Social Pixels: genesis and evaluation. In Proceedings of ACM Multimedia

  88. Snavely N, Seitz SM, Szeliski R (2006) Photo tourism: exploring photo collections in 3D. ACM Trans Graph 25(3):835–846

    Article  Google Scholar 

  89. Snavely N, Garg R, Seitz SM, Szeliski R (2008) Finding paths through the world’s photos. ACM Trans Graph 27(3)

  90. Snavely N, Seitz SM, Szeliski R (2008) Modeling the world from internet photo collections. Int J Comput Vision 80(2):189–210

    Article  Google Scholar 

  91. Sunkavalli K, Romeiro F, Matusik W, Zickler T, Pfister H (2008) What do color changes reveal about an outdoor scene. In Proceedings of CVPR

  92. Szeliski R (2005) Where am I? In Proceedings of IEEE ICCV Computer Vision Contest. http://research.microsoft.com/en-us/um/people/szeliski/VisionContest05/old_ideas.htm

  93. Torniai C, Battle S, Cayzer S (2006). Sharing, discovering and browsing geotagged pictures on the web. Springer

  94. Toyama K, Logan R, Roseway A (2003) Geographic location tags on digital images. In Proceedings of ACM Multimedia

  95. Trinder JC, Wang Y (1998) Automatic road extraction from aerial images. Digital Signal Process 8(4):215–224

    Article  Google Scholar 

  96. Tsai C -M, Qamra A, Chang E (2005) Extent: inferring image metadata from context and content. In Proceedings of IEEE ICME

  97. Tsikrika T, Diou C, de Vries A, Delopoulos A (2009) Image annotation using clickthrough data. In Proceedings of ACM CIVR

  98. Tuytelaars T, Van Gool L (2004) Matching widely separated views based on affine invariant regions. Int J Comput Vis

  99. Ueda T, Amagasa T, Yoshikawa M, Uemura S (2002) A system for retrieval and digest creation of video data based on geographic objects. In Proceedings of International Conference on Database and Expert Systems Applications

  100. Wei X -Y, Jiang Y-G, Ngo C-W (2009) Exploring inter-concept relationship with context space for semantic video indexing. In Proceedings of ACM CIVR

  101. Wolf L, Bileschi S (2006) A critical view of context. Int J Comput Vision 68(1):43–52

    Article  Google Scholar 

  102. Yanai K, Kawakubo H, Qiu B (2009) A visual analysis of the relationship between word concepts and geographical locations, In Proceedings of CIVR

  103. Yanai K, Yaegashi K, Qiu B (2009) Detecting cultural differences using consumer-generated geotagged photos. In Proceedings of International Workshop on Location and the Web

  104. Yu J, Luo J (2008) Leveraging probabilistic season and location context models for scene understanding. In Proceedings of ACM CIVR

  105. Yuan J, Luo J, Kautz H, Wu Y (2008) Mining GPS traces and visual words for event classification. In Proceedings of Multimedia Information Retrieval (MIR)

  106. Zhang W, Kosecka J (2006) Image based localization in urban environments. In Proceedings of 3DPVT

  107. Zheng Y, Wang L, Zhang R, Xie X, Ma W-Y (2009) GeoLife: managing and understanding your past life over maps. In Proceedings of MDM

  108. Zheng Y, Zhang L, Xie X, Ma W-Y (2009) Mining interesting locations and travel sequences from GPS trajectories. In Proceedings of WWW

  109. Zheng Y, Zhao M, Song Y, Hartwig A, Buddemeier U, Bissacco A, Brucher F, Chua T-S, Neven H (2009) Tour the world: building a web-scale landmark recognition engine. In Proceedings of CVPR

  110. Zheng VW, Zheng Y, Xie X, Yang Q (2010) Collaborative location and activity recommendations with GPS history data. In Proceedings of WWW

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dhiraj Joshi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Luo, J., Joshi, D., Yu, J. et al. Geotagging in multimedia and computer vision—a survey. Multimed Tools Appl 51, 187–211 (2011). https://doi.org/10.1007/s11042-010-0623-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-010-0623-y

Keywords

Navigation