Abstract
Geo-tagging is a fast-emerging trend in digital photography and community photo sharing. The presence of geographically relevant metadata with images and videos has opened up interesting research avenues within the multimedia and computer vision domains. In this paper, we survey geo-tagging related research within the context of multimedia and along three dimensions: (1) Modalities in which geographical information can be extracted, (2) Applications that can benefit from the use of geographical information, and (3) The interplay between modalities and applications. Our survey will introduce research problems and discuss significant approaches. We will discuss the nature of different modalities and lay out factors that are expected to govern the choices with respect to multimedia and vision applications. Finally, we discuss future research directions in this field.
This is a preview of subscription content, access via your institution.





References
- 1.
Agarwal S, Snavely N, Simon I, Seitz SM, Szeliski R (2009) Building Rome in a day. In Proceedings of ICCV
- 2.
Ahlers D, Boll S (2008) Oh Web image, where art thou? In Proceedings of MMM
- 3.
Ames M, Naaman M (2007) Why we tag: motivations for annotation in mobile and online media. In Proceedings of SIGCHI Conference on Human Factors in Computing Systems
- 4.
Amitay E, Har’El N, Sivan R, Soffer A (2004) Web-a-where: geotagging web content. In Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval
- 5.
Arslan S, Zimmermann R, Kim SH (2008) Viewable scene modeling for geospatial video search. In Proceedings of ACM Multimedia
- 6.
Arslan S, Zhang L, Kim SH, He M, Zimmermann R (2009) GRVS: A georeferenced video search engine. In Proceedings of ACM Multimedia
- 7.
Arslan S, Kim SH, He M, Zimmermann R (2010) Relevance ranking in georeferenced video search. Multimed Syst 16(2):105–125
- 8.
Backstrom L, Sun E, Marlow C (2010) Find me if you can: improving geographical prediction with social and spatial proximity. In Proceedings of WWW
- 9.
Bay H, Tuytelaars T, Van Gool L (2006) Surf: speeded up robust features. In Proceedings of ECCV
- 10.
Benz U, Hofmann P, Willhauck G, Lingenfelder I, Heynen M (2004) Multiresolution object oriented fuzzy analysis of remote sensing data for GIS information. ISPRS J Photogramm Remote Sens 58:239–258
- 11.
Cao L, Luo J, Kautz H, Huang T (2008) Annotating collections of geotagged photos using hierarchical event and scene models. In Proceedings of IEEE CVPR
- 12.
Cao L, Luo J, Huang TS (2008) Annotating photo collections by label propagation according to multiple proximity cues. In Proceedings of ACM Multimedia
- 13.
Cao L, Yu J, Luo J, Huang TS (2009) Enhancing semantic and geographic annotation of Web images via logistic canonical correlation regression. In Proceedings of ACM Multimedia
- 14.
Cao L, Luo J, Gallagher A, Jin X, Han J, Huang TS (2010) A worldwide tourism recommendation system based on geotagged web photos. In Proceedings of ICASSP
- 15.
Cham TJ, Ciptadi A, Tan WC, Pham MT, Chia LT (2010) Estimating camera pose from a single urban ground-view omnidirectional image and a 2D building outline map. In Proceedings of IEEE CVPR 2010
- 16.
Chen Y, Chen XY, Rao FY, Yu XL, Li Y, Liu D (2004) LORE: an infrastructure to support location-aware services. IBM J Res Develop 48(5/6):601–616
- 17.
Chen L, Ozsu MT, Oria V (2005) Robust and fast similarity search for moving object trajectories. In Proceedings of ACM SIGMOD
- 18.
Chen W-C, Battestini A, Gelfand N, Setlur V (2009) Visual summaries of popular landmarks from community photo collections. In Proceedings of ACM Multimedia
- 19.
Crandall D, Backstrom L, Huttenlocher D, Kleinberg J (2009) Mapping the world’s photos. In Proceedings of WWW
- 20.
Cristani M, Perina A, Castellani U, Murino V (2008) Geo-located image analysis using latent representations. In Proceedings of IEEE CVPR
- 21.
Davis M, Smith M, Canny D, Good N, King S, Jankiraman R (2005) Toward context-aware face recognition. In Proceedings of ACM Multimedia
- 22.
De Silva GC, Aizawa K (2009) Retrieving multimedia travel stories using location data and spatial queries. In Proceedings of ACM Multimedia
- 23.
Divvala S, Hoiem D, Hays J, Efros A, Hebert M (2009) An empirical study of context in object detection. In Proceedings of IEEE CVPR
- 24.
Epshtein B, Ofek E, Wexler Y, Zhang P (2007) Hierarchical photo organization using geo-relevance. In Proceedings of 15th ACM Intl. Symposium on Advances in Geographic Information Systems
- 25.
Gallagher A (2009) A framework for using context to understanding images of people. Ph. D. Thesis
- 26.
Gallagher A, Joshi D, Yu J, Luo J (2009) Geo-location inference from image content and user tags. In Proceedings of the IEEE Workshop on Internet Vision (with CVPR)
- 27.
Goodchild MF (2007) Citizens as sensors: the world of volunteered geography. GeoJournal 69(4):211–221
- 28.
Hao Q, Cai R, Yang J -M, Xiao R, Liu L, Wang S, Zhang L (2009) Travelscope: standing on the shoulders of dedicated travelers. In Proceedings of ACM Multimedia
- 29.
Hao Q, Cai R, Wang C, Xiao R, Yang J -M, Pang Y, Zhang L (2010) Equip tourists with knowledge mined from travelogues. In Proceedings of WWW
- 30.
Hartley R, Zisserman A (2004) Multiple view geometry in computer vision. Cambridge University Press
- 31.
Hays J, Efros A (2008) IM2GPS: estimating geographic information from a single image. In Proceedings of IEEE CVPR
- 32.
Hinz S, Baumgartner A (2003) Automatic extraction of urban road networks from multi-view aerial imagery. ISPRS J Photogramm Remote Sens 83–98
- 33.
Hinze A, Voisard A (2003) Location and time-based information delivery in tourism, Advances in spatial and temporal databases. Lect Notes Comput Sci 2750:489–507
- 34.
Hsieh C-C, Cheng W-H, Chang C-H, Chuang Y-Y, Wu J-L (2008) Photo navigator. In Proceedings of ACM Multimedia
- 35.
Jacobs N, Satkin S, Roman N, Speyer R, Pless R (2007) Geolocating static cameras. In Proceedings of IEEE ICCV
- 36.
Jaffe A, Tassa T, Davis M (2006) Generating summaries and visualization for large collections of geo-referenced photographs. In Proceedings of ACM Multimedia Information Retrieval (MIR) Workshop
- 37.
Ji R, Xie X, Yao H, Ma W-Y (2009) Mining city landmarks from blogs by graph modeling. In Proceedings of ACM Multimedia
- 38.
Jin X, Davis DH (2005) An integrated system for automatic road mapping from high-resolution multi-spectral satellite imagery by information fusion. Information Fusion 6(4):257–273
- 39.
Jin X, Gallagher A, Cao L, Luo J, Han J (2010) The wisdom of social multimedia: using Flickr for prediction and forecast. In Proceedings of ACM Multimedia
- 40.
Joshi D, Luo J (2008) Inferring generic activities and events from image content and bags of geo-tags. In Proceedings of ACM CIVR
- 41.
Joshi D, Gallagher A, Yu J, Luo J (2010) Exploring user image tags for geo-location inference. In Proceedings of IEEE ICASSP
- 42.
Kalogerakis E, Vesselova O, Hays J, Efros A, Hertzmann A (2009) Image sequence geolocation with human travel priors. In Proceedings of IEEE ICCV
- 43.
Kaminsky R, Snavely N, Seitz SM, Szeliski R (2009) Alignment of 3D point clouds to overhead images. In Proceedings of the IEEE Workshop on Internet Vision (with CVPR)
- 44.
Kennedy L, Naaman M (2008) Generating diverse and representative image search results for landmarks. In Proceedings of WWW
- 45.
Kennedy L, Naaman M, Ahern S, Nair R, Rattenbury T (2007) How Flickr helps us make sense of the world: context and content in community-contributed media collections. In Proceedings of ACM Multimedia
- 46.
Kim SH, Arslan S, Yu B, Zimmermann R (2010) Vector model in support of versatile georeferenced video search. In Proceedings of ACM Multimedia Systems Conference
- 47.
Kleban J, Moxley E, Xu J, Manjunath BS (2009) Global annotation on georeferenced photographs. In Proceedings of ACM CIVR
- 48.
Kosecka J, Zhang W (2002) Video compass. In Proceedings of European Conference on Computer Vision (ECCV)
- 49.
Leung D, Newsame S (2010) Proximate sensing: inferring what-is-where from georeferenced photo collections. In Proceedings of IEEE CVPR
- 50.
Li X, Wu C, Zach C, Lazebnik S, Frahm J-M (2008) Modeling and recognition of landmark image collections using iconic scene graphs. In Proceedings of ECCV
- 51.
Li Y, Crandall D, Huttenlocher D (2009) Landmark classification in large-scale image collections. In Proceedings of ICCV
- 52.
Liao L, Fox D, and Kautz, H (2007) Extracting places and activities from GPS traces using hierarchical conditional random fields. Int J Rob Res
- 53.
Liu L, Wolfson O, Yin H (2006) Extracting semantic location from outdoor positioning systems. In Proceedings of the IEEE International Conference on Mobile Data Management
- 54.
Lothe P, Bourgeois S, Royer E, Dhome M, Naudet-Collette S (2010) Real-time vehicle global localization with a single camera in dense urban areas: exploitation of coarse 3D city models. In Proceedings of CVPR
- 55.
Lowe D (2004) Distinctive image features from scale-invariant keypoints. J Comput Vis
- 56.
Luo J, Boutell M, Brown C (2006) Pictures are not taken in a vacuum: An overview of exploiting context for semantic scene content understanding. IEEE Signal Process Mag 23(2):101–114
- 57.
Luo J, Yu J, Joshi D, Hao W (2008) Event recognition: viewing the world with a third eye. In Proceedings of ACM Multimedia
- 58.
Luo Z, Li H, Tang J, Hong R, Chua T-S (2009) ViewFocus: explore places of interests on Google maps using photos with view direction filtering. In Proceedings of ACM Multimedia
- 59.
Luo Z, Li H, Tang J, Hong R, Chua T –S (2010) Estimating poses of world’s photos with geographic metadata. In Proceedings of MMM
- 60.
Matas J, Chum O, Urban M, Pajdla T (2002) Robust wide-baseline stereo from maximally stable extremal regions. Image Vis Comput 22(10)
- 61.
Moxley E, Kleban J, Manjunath BS (2008) SpiritTagger: a geo-aware tag suggestion tool mined from Flickr. In Proceedings of ACM Multimedia Information Retrieval (MIR)
- 62.
Naaman M, Song Y -J, Paepcke A, Garcia-Molina H (2004) Automatic organization for digital photographs with geographic coordinates. In Proceedings of ACM/IEEE-CS Joint Conference on Digital Libraries
- 63.
Naaman M, Yeh RB, Garcia-Molina H, Paepcke A (2005) Leveraging context to resolve identity in photo albums. In Proceedings of ACM/IEEE-CS Joint Conference on Digital libraries
- 64.
O’Hare N (2007) Semi-automatic person-annotation in context-aware personal photo-collections. Ph. D. thesis
- 65.
O’Hare N, Smeaton A (2009) Context-aware person identification in personal photo collections. IEEE Transactions on Multimedia
- 66.
Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vision 42:145–175
- 67.
Park M, Luo J, Collins R, Liu Y (2010) Beyond GPS: determining viewing direction of a geotagged image. n Proceedings of ACM Multimedia
- 68.
Paucher R, Turk M (2010) Location-based augmented reality on mobile phones. In Proceedings of IEEE CVPR
- 69.
Pavlidis T (2009) Why meaningful automatic tagging of images is very hard. In Proceedings of IEEE ICME
- 70.
Pei J, Han J, Mortazavi-Asl B, Wang J, Pinto H, Chen Q, Dayal U, Hsu M (2004) Mining sequential patterns by pattern-growth: the prefixspan approach. IEEE Trans Knowl Data Eng 16(11):1424–1440
- 71.
Pelekis N, Kopanakis I, Kotsifakos EE, Frentzos E, Theodoridis Y (2009) Clustering trajectories of moving objects in an uncertain world. In Proceedings of ICDM
- 72.
Pigeau A, Gelgon M (2005) Building and tracking hierarchical geographical & temporal partitions for image collection management on mobile devices. In Proceedings of ACM Multimedia
- 73.
Popescu A, Grefenstette G (2009) Deducing trip related information from Flickr. In Proceedings of WWW
- 74.
Popescu A, Moëllic P-A (2009) MonuAnno: Automatic annotation of georeferenced landmarks images. In Proceedings of ACM CIVR.
- 75.
Popescu A, Grefenstette G, Moëllic P-A (2009) Mining tourist information from user-supplied collections. In Proceedings of CIKM
- 76.
Quack T, Leibe B, Van Gool L (2008) World-scale mining of objects and events from community photo collections. In Proceedings of CIVR
- 77.
Rattenbury T, Naaman M (2009) Methods for extracting place semantics from Flickr tags. ACM Trans Web 3(1):1–30
- 78.
Rattenbury T, Good N, Naaman M (2007) Towards automatic extraction of event and place semantics from Flickr tags. In Proceedings of SIGIR
- 79.
Sakaki T, Okazaki M, Matsuo Y (2010) Earthquake shakes Twitter users: real-time event detection by social sensors. In Proceedings of International Conference on WWW
- 80.
Schaffalitzky F, Zisserman A (2002) Multi-view matching for unordered image sets. In Proceedings of ECCV
- 81.
Schiller JH, Voisard A (2004) Location-based services. Morgan Kaufmann
- 82.
Schindler G, Brown M, Szeliski R (2007) City-scale location recognition. In Proceedings of IEEE CVPR
- 83.
Schindler G, Krishnamurthy P, Lublinerman R, Liu Y, Dellaert F (2008) Detecting and matching repeated patterns for automatic geo-tagging in urban environments. In Proceedings of IEEE CVPR
- 84.
Serdyukov P, Murdock V, van Zwol R (2009) Placing Flickr photos on a map. In Proceedings of SIGIR
- 85.
Simon I, Seitz SM (2008) Scene segmentation using the wisdom of crowds. In Proceedings of ECCV
- 86.
Simon I, Snavely N, Seitz SM (2007) Scene summarization for online image collections. In Proceedings of IEEE ICCV
- 87.
Singh V, Gao M, Jain R (2010) Social Pixels: genesis and evaluation. In Proceedings of ACM Multimedia
- 88.
Snavely N, Seitz SM, Szeliski R (2006) Photo tourism: exploring photo collections in 3D. ACM Trans Graph 25(3):835–846
- 89.
Snavely N, Garg R, Seitz SM, Szeliski R (2008) Finding paths through the world’s photos. ACM Trans Graph 27(3)
- 90.
Snavely N, Seitz SM, Szeliski R (2008) Modeling the world from internet photo collections. Int J Comput Vision 80(2):189–210
- 91.
Sunkavalli K, Romeiro F, Matusik W, Zickler T, Pfister H (2008) What do color changes reveal about an outdoor scene. In Proceedings of CVPR
- 92.
Szeliski R (2005) Where am I? In Proceedings of IEEE ICCV Computer Vision Contest. http://research.microsoft.com/en-us/um/people/szeliski/VisionContest05/old_ideas.htm
- 93.
Torniai C, Battle S, Cayzer S (2006). Sharing, discovering and browsing geotagged pictures on the web. Springer
- 94.
Toyama K, Logan R, Roseway A (2003) Geographic location tags on digital images. In Proceedings of ACM Multimedia
- 95.
Trinder JC, Wang Y (1998) Automatic road extraction from aerial images. Digital Signal Process 8(4):215–224
- 96.
Tsai C -M, Qamra A, Chang E (2005) Extent: inferring image metadata from context and content. In Proceedings of IEEE ICME
- 97.
Tsikrika T, Diou C, de Vries A, Delopoulos A (2009) Image annotation using clickthrough data. In Proceedings of ACM CIVR
- 98.
Tuytelaars T, Van Gool L (2004) Matching widely separated views based on affine invariant regions. Int J Comput Vis
- 99.
Ueda T, Amagasa T, Yoshikawa M, Uemura S (2002) A system for retrieval and digest creation of video data based on geographic objects. In Proceedings of International Conference on Database and Expert Systems Applications
- 100.
Wei X -Y, Jiang Y-G, Ngo C-W (2009) Exploring inter-concept relationship with context space for semantic video indexing. In Proceedings of ACM CIVR
- 101.
Wolf L, Bileschi S (2006) A critical view of context. Int J Comput Vision 68(1):43–52
- 102.
Yanai K, Kawakubo H, Qiu B (2009) A visual analysis of the relationship between word concepts and geographical locations, In Proceedings of CIVR
- 103.
Yanai K, Yaegashi K, Qiu B (2009) Detecting cultural differences using consumer-generated geotagged photos. In Proceedings of International Workshop on Location and the Web
- 104.
Yu J, Luo J (2008) Leveraging probabilistic season and location context models for scene understanding. In Proceedings of ACM CIVR
- 105.
Yuan J, Luo J, Kautz H, Wu Y (2008) Mining GPS traces and visual words for event classification. In Proceedings of Multimedia Information Retrieval (MIR)
- 106.
Zhang W, Kosecka J (2006) Image based localization in urban environments. In Proceedings of 3DPVT
- 107.
Zheng Y, Wang L, Zhang R, Xie X, Ma W-Y (2009) GeoLife: managing and understanding your past life over maps. In Proceedings of MDM
- 108.
Zheng Y, Zhang L, Xie X, Ma W-Y (2009) Mining interesting locations and travel sequences from GPS trajectories. In Proceedings of WWW
- 109.
Zheng Y, Zhao M, Song Y, Hartwig A, Buddemeier U, Bissacco A, Brucher F, Chua T-S, Neven H (2009) Tour the world: building a web-scale landmark recognition engine. In Proceedings of CVPR
- 110.
Zheng VW, Zheng Y, Xie X, Yang Q (2010) Collaborative location and activity recommendations with GPS history data. In Proceedings of WWW
Author information
Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Luo, J., Joshi, D., Yu, J. et al. Geotagging in multimedia and computer vision—a survey. Multimed Tools Appl 51, 187–211 (2011). https://doi.org/10.1007/s11042-010-0623-y
Published:
Issue Date:
Keywords
- Geotagging
- GPS
- Multimedia
- Context
- Location
- Recognition