Cross-View Image Geo-localization

  • Tsung-Yi Lin
  • Serge Belongie
  • James Hays
Part of the Advances in Computer Vision and Pattern Recognition book series (ACVPR)


The recent availability of large amounts of geo-tagged imagery has inspired a number of data-driven solutions to the image geo-localization problem. Existing approaches predict the location of a query image by matching it to a database of geo-referenced photographs. While there are many geo-tagged images available on photo sharing and Street View sites, most are clustered around landmarks and urban areas. The vast majority of the Earth’s land area has no ground-level reference photos available, which limits the applicability of all existing image geo-localization methods. On the other hand, there is no shortage of visual and geographic data that densely covers the Earth—we examine overhead imagery and land cover survey data—but the relationship between this data and ground-level query photographs is complex. In this chapter, we introduce a cross-view feature translation approach to greatly extend the reach of image geo-localization methods. We can often localize a query even if it has no corresponding ground-level images in the database. A key idea is to learn a mapping from ground-level appearance to overhead appearance and land cover attributes. This relationship is learned from sparsely available geo-tagged ground-level images and the corresponding aerial and land cover data at those locations. We perform experiments over a 1135 km\(^2\) region containing a variety of scenes and land cover types. For each query, our algorithm produces a probability density over the region of interest.


Land Cover Training Image Canonical Correlation Analysis Query Image Aerial Image 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This research was supported by the Intelligence Advanced Research Projects Activity (IARPA) via Air Force Research Laboratory, contract FA8650-12-C-7212. The U.S. government is authorized to reproduce and distribute reprints for governmental purposes notwithstanding any copyright annotation thereon. Disclaimer: The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA, AFRL, or the U.S. government.


  1. 1.
    Agarwal S, Snavely N, Simon I, Seitz SM, Szeliski R (2009) Building Rome in a day. In: ICCV, 3Google Scholar
  2. 2.
    Baatz G, Saurer O, Köser K, Pollefeys M (2012) Large scale visual geo-localization of images in mountainous terrain. In: ECCV, 3, 12Google Scholar
  3. 3.
    Chang C-C, Lin C-J (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 11Google Scholar
  4. 4.
    Chen D, Baatz G, Köser K, Tsai S, Vedantham R, Pylvanainen T, Roimela K, Chen X, Bach J, Pollefeys M, Girod B, Grzeszczuk R (2011) City-scale landmark identification on mobile devices. In: CVPR, 3Google Scholar
  5. 5.
    Crandall DJ, Backstrom L, Huttenlocher D, Kleinberg J (2009) Mapping the world’s photos. In: WWW, 3Google Scholar
  6. 6.
    Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: CVPR, 7Google Scholar
  7. 7.
    Hardoon DR, Szedmak SR, Shawe-Taylor JR (2004) Canonical correlation analysis: an overview with application to learning methods. Neural Comput 16(12):2639–2664Google Scholar
  8. 8.
    Hays J (2009) Large scale scene matching for graphics and vision. Ph.D. thesis, Carnegie Mellon University, 3Google Scholar
  9. 9.
    Hays J, Efros A (2008) IM2GPS: estimating geographic information from a single image. In: CVPR, 2, 3, 7Google Scholar
  10. 10.
    Irschara A, Zach C, Frahm J-M, Bischof H (2009) From structure-from-motion point clouds to fast location recognition. In: CVPR, 3Google Scholar
  11. 11.
    Jacobs N, Satkin S, Roman N, Speyer R, Pless R (2007) Geolocating static cameras. In: ICCV, Oct 2007, 3Google Scholar
  12. 12.
    Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: CVPR, 7Google Scholar
  13. 13.
    Li X, Wu C, Zach C, Lazebnik S, Frahm J (2008) Modeling and recognition of landmark image collections using iconic scene graphs. In: ECCV, 3Google Scholar
  14. 14.
    Li Y, Snavely N, Huttenlocher DP (2010) Location recognition using prioritized feature matching. In: ECCV, 3Google Scholar
  15. 15.
    Li Y, Snavely N, Huttenlocher D, Fua P (2012) Worldwide pose estimation using 3D point clouds. In: ECCV, 3Google Scholar
  16. 16.
    Liu J, Shah M, Kuipers B, Savarese S (2011) Cross-view action recognition via view knowledge transfer. In: CVPR, 3Google Scholar
  17. 17.
    Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. IJCV, 7Google Scholar
  18. 18.
    Ordonez V, Kulkarni G, Berg TL (2011) Im2text: Describing images using 1 million captioned photographs. In: NIPS, 3Google Scholar
  19. 19.
    Rasiwasia N, Costa Pereira J, Coviello E, Doyle G, Lanckriet G, Levy R, Vasconcelos N (2010) A new approach to cross-modal multimedia retrieval. In: ACM international conference on multimedia, 3Google Scholar
  20. 20.
    Roshan Zamir A, Shah M (2010) Accurate image localization based on Google maps street view. In: ECCV, 3Google Scholar
  21. 21.
    Sattler T, Leibe B, Kobbelt L (2011) Fast image-based localization using direct 2D-to-3D matching. In: ICCV, 3Google Scholar
  22. 22.
    Schindler G, Brown M, Szeliski R (2007) City-scale location recognition. In: CVPR, 3Google Scholar
  23. 23.
    Sharma A, Kumar A, Daumé III H, Jacobs DW (2012) Generalized multiview analysis: a discriminative latent space. In: CVPR, 3Google Scholar
  24. 24.
    Shechtman E, Irani M (2007) Matching local self-similarities across images and videos. In: CVPR, 7Google Scholar
  25. 25.
    Xiao J, Hays J, Ehinger KA, Oliva A, Torralba A (2010) SUN database: large-scale scene recognition from abbey to zoo. In: CVPR, 7Google Scholar
  26. 26.
    Zhang H, Berg AC, Maire M, Malik J (2006) SVM-KNN: discriminative nearest neighbor classification for visual category recognition. In: CVPR, 3Google Scholar
  27. 27.
    Zheng Y, Zhao M, Song Y, Adam H, Buddemeier U, Bissacco A, Brucher F, Chua T, Neven H, Yagnik J (2009) Tour the world: building a web-scale landmark recognition engine. In: CVPR, 3Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.Cornell UniversityIthacaUSA
  2. 2.Brown UniversityProvidenceUSA

Personalised recommendations