Large-Scale Image Geolocalization



In this chapter, we explore the task of global image geolocalization—estimating where on the Earth a photograph was captured. We examine variants of the “im2gps” algorithm using millions of “geotagged” Internet photographs as training data. We first discuss a simple to understand nearest-neighbor baseline. Next, we introduce a lazy-learning approach with more sophisticated features that doubles the performance of the original “im2gps” algorithm. Beyond quantifying geolocalization accuracy, we also analyze (a) how the nonuniform distribution of training data impacts the algorithm (b) how performance compares to baselines such as random guessing and land-cover recognition and (c) whether geolocalization is simply landmark or “instance level” recognition at a large scale. We also show that geolocation estimates can provide the basis for image understanding tasks such as population density estimation or land cover estimation. This work was originally described, in part, in “im2gps” [9] which was the first attempt at global geolocalization using Internet-derived training data.


Visual Word Query Image Scene Category Lazy Learning Gist Descriptor 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



We thank Steve Schlosser, Julio Lopez, and Intel Research Pittsburgh for helping us overcome the logistical and computational challenges of this project. All visualizations and geographic data sources are derived from NASA data. Funding for this work was provided by an NSF fellowship to James Hays and NSF grants CAREER 1149853, CAREER 0546547, and CCF-0541230.


  1. 1.
    G. Baatz, O. Saurer, K.Köser, M. Pollefeys, Large scale visual geo-localization of images in mountainous terrain, In Proceedings of the 12th European Conference on Computer Vision - Volume Part II, (2012), pp. 517–530Google Scholar
  2. 2.
    M. Bar, The proactive brain: using analogies and associations to generate predictions. Trends Cogn. Sci. 11(7), 280–289 (2007)CrossRefGoogle Scholar
  3. 3.
    S.S. Chris Atkeson, Andrew Moore, Locally weighted learning. AI. Review 11, 11–73 (1997)Google Scholar
  4. 4.
    O. Chum, J. Philbin, J. Sivic, M. Isard, A. Zisserman, Total recall: Automatic query expansion with a generative feature model for object retrieval, in Proceedings of ICCV, 2007Google Scholar
  5. 5.
    D. Comaniciu, P. Meer, Mean shift: A robust approach toward feature space analysis. IEEE Trans. Pattern Anal. Mach. Intell. 24(5), 603–619 (2002)CrossRefGoogle Scholar
  6. 6.
    D.J. Crandall, L. Backstrom, D. Huttenlocher, J. Kleinberg. Mapping the world’s photos, in WWW ’09: Proceedings of the 18th international conference on World wide web 2009, pp. 761–770, 2009Google Scholar
  7. 7.
    J. Hays, A. Efros. Where in the world? human and computer geolocation of images, in Vision sciences society meeting, 2009Google Scholar
  8. 8.
    J. Hays, A.A. Efros. Scene completion using millions of photographs, in ACM Transactions on Graphics (SIGGRAPH 2007), 26(3), 2007Google Scholar
  9. 9.
    J. Hays, A.A. Efros. im2gps: estimating geographic information from a single image, in CVPR, 2008Google Scholar
  10. 10.
    D. Hoiem, A. Efros, M. Hebert, Recovering surface layout from an image. Int. J. Comput. Vision. 75(1), 151–172 (2007)CrossRefGoogle Scholar
  11. 11.
    N. Jacobs, S. Satkin, N. Roman, R. Speyer, R. Pless, Geolocating static cameras, in Proceedings, ICCV, 2007Google Scholar
  12. 12.
    E. Kalogerakis, O. Vesselova, J. Hays, A.A. Efros, A. Hertzmann. Image sequence geolocation with human travel priors, in Proceedings of the IEEE International Conference on Computer Vision (ICCV ’09) (2009)Google Scholar
  13. 13.
    J. Kosecka, W. Zhang. Video compass, in ECCV ’02: Proceedings of the 7th European Conference on Computer Vision-Part IV, 2002, pp. 476–490Google Scholar
  14. 14.
    J.-F. Lalonde, D. Hoiem, A.A. Efros, C. Rother, J. Winn, A. Criminisi. Photo clip art. ACM Transactions on Graphics (SIGGRAPH 2007), vol. 26(3) (August 2007)Google Scholar
  15. 15.
    S. Lazebnik, C. Schmid, J. Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories, in CVPR (2006)Google Scholar
  16. 16.
    L.-J. Li, L.F. Fei, What, where and who? classifying events by scene and object recognition, in Proceedings, ICCV, (2007)Google Scholar
  17. 17.
    T.-Y. Lin, S. Belongie, J. Hays. Cross-view image geolocalization, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (Portland, OR, June 2013)Google Scholar
  18. 18.
    D. Lowe, Object recognition from local scale-invariant features. ICCV 2, 1150–1157 (1999)Google Scholar
  19. 19.
    J. Luo, D. Joshi, J. Yu, A. Gallagher, Geotagging in multimedia and computer visiona survey. Multime’d Tools Appl. 51, 187–211 (2011)CrossRefGoogle Scholar
  20. 20.
    D. Martin, C. Fowlkes, D. Tal, J. Malik. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics, in Proceedings ICCV (July 2001)Google Scholar
  21. 21.
    J. Matas, O. Chum, M. Urban, T. Pajdla, Robust wide-baseline stereo from maximally stable extremal regions. Image Vis. Comput. 22(10), 761–767 (2004)CrossRefGoogle Scholar
  22. 22.
    A. Oliva, A. Torralba, Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vision 42(3), 145–175 (2001)CrossRefMATHGoogle Scholar
  23. 23.
    A. Oliva, A. Torralba. Building the gist of a scene: The role of global image features in recognition, in Visual Perception, Progress in Brain Research, 2006, vol. 155Google Scholar
  24. 24.
    J. Philbin, O. Chum, M. Isard, J. Sivic, A. Zisserman. Object retrieval with large vocabularies and fast spatial matching, in CVPR (2007)Google Scholar
  25. 25.
    J. Philbin, O. Chum, M. Isard, J. Sivic, A. Zisserman. Lost in quantization: Improving particular object retrieval in large scale image databases, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2008)Google Scholar
  26. 26.
    T. Quack, B. Leibe, L. Van Gool. World-scale mining of objects and events from community photo collections, in CIVR ’08: Proceedings of the 2008 international conference on Content-based image and video retrieval (2008)Google Scholar
  27. 27.
    L.W. Renninger, J. Malik, When is scene recognition just texture recognition? Vis. Res. 44, 2301–2311 (2004)CrossRefGoogle Scholar
  28. 28.
    I. Simon, N. Snavely, S.M. Seitz. Scene summarization for online image collections, in Proceedings, ICCV (2007)Google Scholar
  29. 29.
    J. Sivic, A. Zisserman, Video Google: A text retrieval approach to object matching in videos. ICCV 2, 1470–1477 (2003)Google Scholar
  30. 30.
    N. Snavely, S.M. Seitz, R. Szeliski, Photo tourism: exploring photo collections in 3d. ACM Trans. Graph. 25(3), 835–846 (2006)CrossRefGoogle Scholar
  31. 31.
    R. Szeliski. “Where am I?”: ICCV 2005 Computer Vision Contest.
  32. 32.
    W. Thompson, C. Valiquette, B. Bennett, K. Sutherland, Geometric reasoning for map-based localization. Spatial Cogn. Comput 1(3), 291–321 (1999)Google Scholar
  33. 33.
    A. Torralba, R. Fergus, W.T. Freeman, 80 million tiny images: a large dataset for non-parametric object and scene recognition. IEEE PAMI 30(11), 1958–1970 (2008)CrossRefGoogle Scholar
  34. 34.
    J. Vogel, B. Schiele, Semantic modeling of natural scenes for content-based image retrieval. Int. J. Comput. Vis. 72(2), 133–157 (2007)CrossRefGoogle Scholar
  35. 35.
    J. Xiao, J. Hays, K. Ehinger, A. Oliva, A. Torralba. Sun database: Large-scale scene recognition from abbey to zoo, in CVPR (2010)Google Scholar
  36. 36.
    H. Zhang, A.C. Berg, M. Maire, J. Malik. Svm-knn: Discriminative nearest neighbor classification for visual category recognition, in CVPR ’06 (2006)Google Scholar
  37. 37.
    W. Zhang, J. Kosecka. Image based localization in urban environments, in 3DPVT ’06 (2006)Google Scholar
  38. 38.
    Y. Zheng, M. Zhao, Y. Song, H. Adam, U. Buddemeier, A. Bissacco, F. Brucher, T.-S. Chua, H. Neven. Tour the world: building a web-scale landmark recognition engine, in CVPR (2009)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Brown UniversityProvidenceUSA
  2. 2.University of CaliforniaBerkeleyUSA

Personalised recommendations