State-of-the-art in visual geo-localization

Abstract

Large-scale visual geo-localization has recently gained a lot of attention in computer vision research and new methods are proposed steadily. However, surveys of visual geo-localization methods are rare and they focus mainly on city-scale localization methods. We present a comprehensive and balanced study of existing visual geo-localization domains, including city-scale, global approaches and methods for natural environments. We describe the methods to show their pros and cons, application domains, datasets, as well as evaluation techniques. We categorize the reviewed methods by two criteria. The first is the type of data the method uses for geo-location estimation. The second criterion is the target environment for which the method has been proposed and validated. Based on this categorization, we analyze important conditions that must be considered while solving geo-localization problems. Each category is in a different state of research—while city-scale image-based methods received a lot of attention, other categories such as natural environments using cross-domain data sources are still challenging problems under active research. Future research of large-scale visual geo-localization is discussed, primarily the challenging and new research category—geo-localization in natural environments.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Notes

  1. 1.

    Author’s note: in a reference database of geo-tagged images.

  2. 2.

    http://crcv.ucf.edu/projects/GMCP_Geolocalization/.

  3. 3.

    http://graphics.cs.cmu.edu/projects/im2gps/.

  4. 4.

    http://webscope.sandbox.yahoo.com/catalog.php?datatype=i&did=67.

  5. 5.

    https://purl.stanford.edu/vn158kj2087.

  6. 6.

    https://roboticvision.atlassian.net/wiki/pages/viewpage.action?pageId=14188617.

  7. 7.

    http://cphoto.fit.vutbr.cz/elevation/.

  8. 8.

    http://www.cs.cornell.edu/projects/bigsfm/#data.

  9. 9.

    http://vision.soic.indiana.edu/projects/disco/.

  10. 10.

    https://landmark3d.codeplex.com/.

  11. 11.

    http://mi.eng.cam.ac.uk/projects/relocalisation/#results.

  12. 12.

    http://cvg.ethz.ch/research/mountain-localization/.

  13. 13.

    http://cs.uky.edu/~scott/research/deeplyfound/.

  14. 14.

    http://nationalmap.gov/elevation.html.

  15. 15.

    http://nationalmap.gov/elevation.html.

  16. 16.

    http://www.mrlc.gov/nlcd11_data.php.

  17. 17.

    http://gapanalysis.usgs.gov/gaplandcover/.

  18. 18.

    https://code.google.com/p/s2-geometry-library/.

  19. 19.

    http://www.geoguessr.com.

  20. 20.

    http://purl.stanford.edu/vn158kj2087.

  21. 21.

    https://www.geoguessr.com/.

  22. 22.

    http://dish.andrewsullivan.com/vfyw-contest/.

  23. 23.

    http://mi.eng.cam.ac.uk/projects/relocalisation/#results.

  24. 24.

    http://www.google.com/mobile/goggles.

References

  1. 1.

    Agarwal S, Snavely N, Simon I, Seitz SM, Szeliski R (2009) Building Rome in a day. In: Proceedings of the 2009 IEEE 12th international conference on computer vision. IEEE, New York, NY, USA, pp 72–79

  2. 2.

    Arandjelović R, Gronat P, Torii A, Pajdla T, Sivic J (2016) NetVLAD: CNN architecture for weakly supervised place recognition. In: Proceedings of the 2016 IEEE conference on computer vision and pattern recognition. IEEE Computer Society Press, Washington, D.C., USA, pp 5297–5307

  3. 3.

    Ardeshir S, Zamir AR, Torroella A, Shah M (2014) Gis-assisted object detection and geospatial localization. In: D. Fleet, T. Pajdla, B. Schiele, T. Tuytelaars (eds.) Computer Vision—ECCV 2014: 13th European conference, Zurich, Switzerland, September 6–12, 2014, Proceedings, Part VI. Springer, pp 602–617

  4. 4.

    Aubry M, Russell BC, Sivic J (2014) Painting-to-3D model alignment via discriminative visual elements. ACM Trans Graph 33(2):14:1–14:14

    Article  Google Scholar 

  5. 5.

    Avrithis Y, Kalantidis Y, Tolias G, Spyrou E (2010) Retrieving landmark and non-landmark images from community photo collections. In: Proceedings of the 18th ACM international conference on multimedia, MM ’10. ACM, New York, NY, USA, pp 153–162

  6. 6.

    Baatz G, Köser K, Chen D, Grzeszczuk R, Pollefeys M (2010) Handling urban location recognition as a 2d homothetic problem. In: Daniilidis K, Maragos P, Paragios N (eds) Computer vision—ECCV 2010: 11th European conference on computer vision, Heraklion, Crete, Greece, September 5–11, 2010, Proceedings, Part VI. Springer, Berlin, pp 266–279

  7. 7.

    Baatz G, Saurer O, Köser K, Pollefeys M (2012) Large scale visual geo-localization of images in mountainous terrain. In: Fitzgibbon A, Lazebnik S, Perona P, Sato Y, Schmid C (eds) Computer vision—ECCV 2012: 12th European conference on computer vision, Florence, Italy, October 7–13, 2012, Proceedings, Part II. Springer, Berlin, pp 517–530

  8. 8.

    Baatz G, Saurer O, Köser K, Pollefeys M (2012) Leveraging Topographic Maps for Image to Terrain Alignment. In: 2012 second international conference on 3D imaging, modeling, processing, visualization and transmission. IEEE, New York, NY, USA, pp 487–492

  9. 9.

    Baboud L, Čadík M, Eisemann E, Seidel HP (2011) Automatic Photo-to-terrain alignment for the annotation of mountain pictures. In: Proceedings of the 2011 IEEE conference on computer vision and pattern recognition. IEEE Computer Society Press, Washington, D.C., USA, pp 41–48

  10. 10.

    Bansal M, Sawhney HS, Cheng H, Daniilidis K (2011) Geo-localization of street views with aerial image databases. In: Proceedings of the 19th ACM international conference on multimedia, MM ’11. ACM, New York, NY, USA, pp 1125–1128

  11. 11.

    Behringer R (1999) Improving registration precision through visual horizon silhouette matching. In: Proceedings of the international workshop on augmented reality: placing artificial objects in real scenes, IWAR ’98. A. K. Peters Ltd, Natick, MA, USA, pp 225–232

  12. 12.

    Bergamo A, Sinha SN, Torresani L (2013) Leveraging structure from motion to learn discriminative codebooks for scalable landmark classification. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition. IEEE Computer Society Press, Washington, D.C., USA, pp 763–770

  13. 13.

    Brown M, Lowe DG (2005) Unsupervised 3D object recognition and reconstruction in unordered datasets. In: Proceedings of international conference on 3-D digital imaging and modeling, 3DIM. IEEE, New York, NY, USA, pp 56–63

  14. 14.

    Brubaker MA, Geiger A, Urtasun R (2013) Lost! leveraging the crowd for probabilistic visual self-localization. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition. IEEE Computer Society Press, Washington, D.C., USA, pp 3057–3064

  15. 15.

    Čadík M, Vašíček J, Hradiš M, Radenović F, Chum O (2015) Camera elevation estimation from a single mountain landscape photograph. In: Jones MW, Xie X, Tam GKL (eds) Proceedings of the British machine vision conference (BMVC), pp 30.1–30.12. BMVA Press

  16. 16.

    Castaldo F, Zamir A, Angst R, Palmieri F, Savarese S (2015) Semantic cross-view matching. In: 2015 IEEE international conference on computer vision workshop. IEEE, New York, NY, USA, pp 1044–1052

  17. 17.

    Chen DM, Baatz G, Köser K, Tsai SS, Vedantham R, Pylvänäinen T, Roimela K, Chen X, Bach J, Pollefeys M, Girod B, Grzeszczuk R (2011) City-scale landmark identification on mobile devices. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition. IEEE Computer Society Press, Washington, D.C., USA, pp 737–744

  18. 18.

    Chen Y, Qian G, Gunda K, Gupta H, Shafique K (2015) Camera geolocation from mountain images. In: 2015 18th international conference on information fusion. IEEE, New York, NY, USA, pp 1587–1596

  19. 19.

    Conte G, Doherty P (2009) Vision-based unmanned aerial vehicle navigation using geo-referenced information. EURASIP J Adv Signal Process 2009(1):10:1–10:18

    MATH  Google Scholar 

  20. 20.

    Crandall D, Owens A, Snavely N, Huttenlocher D (2011) Discrete-continuous optimization for large-scale structure from motion. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition. IEEE Computer Society Press, Washington, D.C., USA, pp 3001–3008

  21. 21.

    Dean J, Corrado GS, Monga R, Chen K, Devin M, Le QV, Mao MZ, Ranzato MA, Senior A, Tucker P, Yang K, Ng AY (2012) Large Scale Distributed Deep Networks. In: Proceedings of the 25th international conference on neural information processing systems, NIPS’12, pp 1223–1231. Curran Associates Inc

  22. 22.

    Feremans C, Labbé M, Laporte G (2003) Generalized network design problems. Eur J Oper Res 148(1):1–13

    MathSciNet  Article  MATH  Google Scholar 

  23. 23.

    Fischler Ma, Bolles RC (1981) Random sample consensus: a paradigm for model fitting with applicatlons to image analysis and automated cartography. Commun ACM 24(6):381–395

    Article  Google Scholar 

  24. 24.

    Flatow D, Naaman M, Xie KE, Volkovich Y, Kanza Y (2015) On the accuracy of hyper-local geotagging of social media content. In: Proceedings of the eighth ACM international conference on web search and data mining, WSDM ’15. ACM, New York, NY, USA, pp 127–136

  25. 25.

    Fry J, Xian G, Jin S, Dewitz J, Homer C, Yang L, Barnes C, Herold N, Wickham J (2011) Completion of the 2006 national land cover database for the conterminous United States. Photogramm Eng Remote Sens 77(9):858–864

    Google Scholar 

  26. 26.

    Gallagher A, Joshi D, Yu J, Luo J (2009) Geo-location inference from image content and user tags. In: Proceedings of the 2009 IEEE conference on computer vision and pattern recognition. IEEE Computer Society Press, Washington, D.C., USA, pp 55–62

  27. 27.

    Gesch D, Oimoen M, Greenlee S, Nelson C, Steuck M, Tyler D (2002) The national elevation dataset. Photogramm Eng Remote Sens 68:5–11

    Google Scholar 

  28. 28.

    Grzeszczuk R, Košecká J, Vedantham R, Hile H (2009) Creating compact architectural models by geo-registering image collections. In: Proceedings of the 2009 IEEE 12th international conference on computer vision workshops. IEEE, New York, NY, USA, pp 1718–1725

  29. 29.

    Hakeem A, Vezzani R, Shah M, Cucchiara R (2006) Estimating geospatial trajectory of a moving camera. In: Proceedings of the 18th international conference on pattern recognition, vol 2. IEEE, New York, NY, USA, pp 82–87

  30. 30.

    Hammoud RI, Kuzdeba SA, Berard B, Tom V, Ivey R, Bostwick R, Handuber J, Vinciguerra L, Shnidman N, Smiley B (2013) Overhead-based image and video geo-localization framework. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition workshops. IEEE Computer Society Press, Washington, D.C., USA, pp 320–327

  31. 31.

    Hao Q, Cai R, Li Z, Zhang L, Pang Y, Wu F (2012) 3D visual phrases for landmark recognition. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition. IEEE Computer Society Press, Washington, D.C., USA, pp 3594–3601

  32. 32.

    Hartley R, Zisserman A (2004) Multiple view geometry in computer vision. Cambridge University Press, Cambridge

    Google Scholar 

  33. 33.

    Hays J, Efros AA (2008) IM2GPS: Estimating geographic information from a single image. In: Proceedings of the 26th IEEE conference on computer vision and pattern recognition. IEEE, New York, NY, USA, pp 1–8

  34. 34.

    Hays J, Efros AA (2015) Multimodal location estimation of videos and images. In: Choi J, Friedland G (eds) Multimodal location estimation of videos and images, chap. Large-scale image geolocalization, pp 41–62. Springer, Berlin

  35. 35.

    Heinly J, Sch JL, Dunn E, Frahm JM, (2015) Reconstructing the world* in six days. In: Proceedings of the 2015 IEEE conference on computer vision and pattern recognition. IEEE, New York, NY, USA, pp 3287–3295

  36. 36.

    Homer C, Dewitz J, Yang L, Jin S, Danielson P, Xian G, Coulston J, Herold N, Wickham J, Megown K (2011) Completion of the 2011 national land cover database for the conterminous United States-representing a decade of land cover change information. Photogramm Eng Remote Sens 81:345–354

    Google Scholar 

  37. 37.

    Irschara A, Zach C, Frahm JM, Bischof H (2009) From structure-from-motion point clouds to fast location recognition. In: 2009 IEEE computer society conference on computer vision and pattern recognition workshops. IEEE Computer Society Press, Washington, D.C., USA, pp 2599–2606

  38. 38.

    Jacobs N, Satkin S, Roman N, Speyer R, Pless R (2007) Geolocating static cameras. In: Proceedings of the IEEE international conference on computer vision. IEEE, New York, NY, USA, pp 1–6

  39. 39.

    Ji R, Gao Y, Liu W, Xie X, Tian Q, Li X (2015) When location meets social multimedia: a survey on vision-based recognition and mining for geo-social multimedia analytics. ACM Trans Intell Syst Technol 6(1):1:1–1:18

    Article  Google Scholar 

  40. 40.

    Johns E, Yang GZ (2011) From images to scenes: compressing an image cluster into a single scene model for place recognition. In: Proceedings of the IEEE international conference on computer vision. IEEE, New York, NY, USA, pp 874–881

  41. 41.

    Kalogerakis E, Vesselova O, Hays J, Efros AA, Hertzmann A (2009) Image sequence geolocation with human travel priors. In: Proceedings of the IEEE international conference on computer vision, pp 253–260

  42. 42.

    Kelm P, Schmiedeke S, Sikora T (2011) A hierarchical, multi-modal approach for placing videos on the map using millions of Flickr photographs. In: Proceedings of the 2011 ACM workshop on social and behavioural networked media access, SBNMA ’11. ACM, New York, NY, USA, pp 15–20

  43. 43.

    Kelm P, Schmiedeke S, Sikora T (2011) Multi-modal, multi-resource methods for placing Flickr videos on the map. In: Proceedings of the 1st ACM international conference on multimedia retrieval, ICMR ’11. ACM, New York, NY, USA, pp 1–8

  44. 44.

    Kendall A, Cipolla R (2016) Modelling uncertainty in deep learning for camera relocalization. In: Proceedings of the international conference on robotics and automation (ICRA). IEEE, New York, NY, USA, pp 4762–4769

  45. 45.

    Kendall A, Grimes M, Cipolla R (2015) PoseNet: a convolutional network for real-time 6-DOF camera relocalization. In: Proceedings of the 2015 IEEE international conference on computer vision. IEEE, New York, NY, USA, pp 2938–2946

  46. 46.

    Klein G, Murray D (2007) Parallel tracking and mapping for small AR workspaces. In: Proceedings of the 2007 6th IEEE and ACM international symposium on mixed and augmented reality, ISMAR ’07. IEEE Computer Society, Washington, DC, USA, pp 1–10

  47. 47.

    Kopf J, Neubert B, Chen B, Cohen M, Cohen-Or D, Deussen O, Uyttendaele M, Lischinski D (2008) Deep photo: model-based photograph enhancement and viewing. ACM Trans Graph 27(5):1–10

    Article  Google Scholar 

  48. 48.

    Košecká J, Zhang W (2002) Video compass. In: Heyden A, Sparr G, Nielsen M, Johansen P (eds) Computer Vision – ECCV 2002: 7th European conference on computer vision Copenhagen, Denmark, May 28–31, 2002 Proceedings, Part IV. Springer, Berlin, pp 476–490

  49. 49.

    Lalonde JF, Narasimhan SG, Efros AA (2010) What do the sun and the sky tell us about the camera? Int J Comput Vis 88(1):24–51

    Article  Google Scholar 

  50. 50.

    Larnaout D, Bourgeois S, Gay-Bellile V, Dhome M (2012) Towards bundle adjustment with gis constraints for online geo-localization of a vehicle in urban center. In: 2012 Second international conference on 3D imaging, modeling, processing, visualization transmission. IEEE, New York, NY, USA, pp 348–355

  51. 51.

    Larnaout D, Gay-Bellile V, Bourgeois S, Dhome M (2013) Vehicle 6-DoF localization based on SLAM constrained by GPS and digital elevation model information. In: Proceedings of the 2013 20th IEEE international conference on image processing (ICIP). IEEE, New York, NY, USA, pp 2504–2508

  52. 52.

    Levinson J, Thrun S (2010) Robust vehicle localization in urban environments using probabilistic maps. In: 2010 IEEE international conference on robotics and automation. IEEE, New York, NY, USA, pp 4372–4378

  53. 53.

    Li Y, Crandall DJ, Huttenlocher DP (2009) Landmark classification in large-scale image collections. In: Proceedings of the IEEE international conference on computer vision. IEEE, New York, NY, USA, pp 1957–1964

  54. 54.

    Li Y, Snavely N, Huttenlocher D, Fua P (2012) Worldwide pose estimation using 3d point clouds. In: Fitzgibbon A, Lazebnik S, Perona P, Sato Y, Schmid C (eds) Computer vision—ECCV 2012: 12th European conference on computer vision, Florence, Italy, October 7–13, 2012, Proceedings, part I. Springer, Berlin, pp 15–29

  55. 55.

    Li Y, Snavely N, Huttenlocher DP (2010) Location recognition using prioritized feature matching. In: Daniilidis K, Maragos P, Paragios N (eds) Computer vision—ECCV 2010: 11th European conference on computer vision, Heraklion, Crete, Greece, September 5–11, 2010, Proceedings, part II. Springer, Berlin, pp 791–804

  56. 56.

    Lin TY, Belongie S, Hays J (2013) Cross-view image geolocalization. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition. IEEE Computer Society Press, Washington, D.C., USA, pp 891–898

  57. 57.

    Lin TY, Belongie S, Hays J (2015) Learning deep representations for ground-to-aerial geolocalization. In: Proceedings of the 2015 IEEE conference on computer vision and pattern recognition. IEEE, New York, NY, USA, pp 5007–5015

  58. 58.

    Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110

    Article  Google Scholar 

  59. 59.

    Middelberg S, Sattler T, Untzelmann O, Kobbelt L (2014) Scalable 6-dof localization on mobile devices. In: Fleet D, Pajdla T, Schiele B, Tuytelaars T (eds) Computer vision—ECCV 2014: 13th European conference, Zurich, Switzerland, September 6–12, 2014, Proceedings, part II. Springer, Cham, pp 268–283

  60. 60.

    Mishkin D, Perdoch M, Matas J (2015) Place recognition with WxBS retrieval. In: CVPR 2015 workshop on visual place recognition in changing environments

  61. 61.

    Montemerlo M, Becker J, Bhat S, Dahlkamp H (2008) Junior: the stanford entry in the urban challenge. J Field Robot 25(9):569–597

    Article  Google Scholar 

  62. 62.

    Muja M, Lowe DG (2009) Fast approximate nearest neighbors with automatic algorithm configuration. In: International conference on computer vision theory and applications VISAPP ’09. SciTePress, Setúbal, Portugal, pp 331–340

  63. 63.

    Naval PC (1998) Camera pose estimation by alignment from a single mountain image. In: International Symposium on Intelligent Robotic Systems, pp 157–163

  64. 64.

    Naval PC, Mukunoki M, Minoh M, Ikeda K (1997) Estimating camera position and orientation from geographical map and mountain image. In: 38th Research meeting of the pattern sensing group, Society of Instrument and Control Engineers, pp 9–16

  65. 65.

    Porzi L, Buló SR, Valigi P, Lanz O, Ricci E (2014) Learning contours for automatic annotations of mountains pictures on a smartphone. In: Proceedings of the international conference on distributed smart cameras, pp 13:1–13:6. ACM, New York, NY, USA

  66. 66.

    Produit T, Tuia D, Golay F, Strecha C (2012) Pose estimation of landscape images using DEM and orthophotos. In: 2012 International conference on computer vision in remote sensing (CVRS). IEEE, New York, NY, USA, pp 209–214

  67. 67.

    Raguram R, Wu C, Frahm JM, Lazebnik S (2011) Modeling and recognition of landmark image collections using iconic scene graphs. Int J Comput Vis 95(3):213–239

    Article  Google Scholar 

  68. 68.

    Ramalingam S, Bouaziz S, Sturm P, Brand M (2010) SKYLINE2GPS: Localization in urban canyons using omni-skylines. In: 2010 IEEE/RSJ international conference on intelligent robots and systems. IEEE, New York, NY, USA, pp 3816–3823

  69. 69.

    Robertsone D, Cipolla R (2004) An image-based system for urban navigation. In: Proceedings of the British machine vision conference, pp 84.1–84.10. BMVA Press

  70. 70.

    Sattler T, Havlena M, Radenovi F, Schindler K, Pollefeys M (2015) Hyperpoints and fine vocabularies for large-scale location recognition. In: Proceedings of the 2015 IEEE international conference on computer vision. IEEE, New York, NY, USA, pp 2102–2110

  71. 71.

    Sattler T, Leibe B, Kobbelt L (2011) Fast image-based localization using direct 2D-to-3D matching. In: Proceedings of the IEEE international conference on computer vision. IEEE, New York, NY, USA, pp 667–674

  72. 72.

    Sattler T, Leibe B, Kobbelt L (2012) Improving image-based localization by active correspondence search. In: Fitzgibbon A, Lazebnik S, Perona P, Sato Y, Schmid C (eds) Computer vision—ECCV 2012: 12th European conference on computer vision, Florence, Italy, October 7–13, 2012, Proceedings, part I. Springer, Berlin, pp 752–765

  73. 73.

    Sattler T, Weyand T, Leibe B, Kobbelt L (2012) Image retrieval for image-based localization revisited. BMVA Press, Guildford

    Google Scholar 

  74. 74.

    Saurer O, Baatz G, Köser K, Ladický L, Pollefeys M (2016) Image based geo-localization in the alps. Int J Comput Vis 116(3):213–225

    MathSciNet  Article  Google Scholar 

  75. 75.

    Schindler G, Brown M, Szeliski R (2007) City-scale location recognition. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition. IEEE Computer Society Press, Washington, D.C., USA, pp 1–7

  76. 76.

    Senlet T, El-Gaaly T, Elgammal A (2014) Hierarchical semantic hashing: visual localization from buildings on maps. In: Proceedings—international conference on pattern recognition. IEEE, New York, NY, USA, pp 2990–2995

  77. 77.

    Shrivastava A, Malisiewicz T, Gupta A, Efros Aa (2011) Data-driven visual similarity for cross-domain image matching. ACM Trans Graph 30(6):1

    Article  Google Scholar 

  78. 78.

    Sivic J, Zisserman A (2003) Video Google: a text retrieval approach to object matching in videos. In: Proceedings of the Ninth IEEE international conference on computer vision, vol 2. IEEE, New York, NY, USA, pp 1470–1477

  79. 79.

    Snavely N, Seitz SM, Szeliski R (2006) Photo tourism: exploring photo collections in 3D. ACM Trans Graph 25(3):835–846

    Article  Google Scholar 

  80. 80.

    Snavely N, Seitz SM, Szeliski R (2008) Modeling the world from Internet photo collections. Int J Comput Vis 80(2):189–210

    Article  Google Scholar 

  81. 81.

    Stein F, Medioni G (1995) Map-based localization using the panoramic horizon. In: IEEE transactions on robotics and automation, vol 11. IEEE, New York, NY, USA, pp 892–896

  82. 82.

    Svärm L, Enqvist O, Oskarsson M, Kahl F (2014) Accurate localization and pose estimation for large 3D models. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition. IEEE Computer Society Press, Washington, D.C., USA, pp 532–539

  83. 83.

    Talluri R, Aggarwal J (1992) Position estimation for an autonomous mobile robot in an outdoor environment. IEEE Trans Robot Autom 8(5):573–584

    Article  Google Scholar 

  84. 84.

    Talluri R, Aggarwal JK (1993) Image map correspondence for mobile robot self-location using computer graphics. IEEE Trans Pattern Anal Mach Intell 15(6):597–601

    Article  Google Scholar 

  85. 85.

    Thomee B, Shamma DA, Friedland G, Elizalde B, Ni K, Poland D, Borth D, Li LJ (2016) YFCC100M: the new data in multimedia research. Commun ACM 59(2):64–73

    Article  Google Scholar 

  86. 86.

    Tomasi C, Kanade T (1991) Detection and tracking of point features. Technical report, Carnegie Mellon University

  87. 87.

    Tzeng E, Zhai A, Clements M, Townshend R, Zakhor A (2013) User-driven geolocation of untagged desert imagery using digital elevation models. In: Proceedings of the 2013 IEEE conference on computer vision and pattern recognition workshops (CVPRW). IEEE, New York, NY, USA, pp 237–244

  88. 88.

    Vaca-Castano G, Zamir AR, Shah M (2012) City scale geo-spatial trajectory estimation of a moving camera. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition. IEEE Computer Society Press, Washington, D.C., USA, pp 1186–1193

  89. 89.

    Viswanathan A, Pires BR, Huber D (2014) Vision based robot localization by ground to satellite matching in GPS-denied situations. In: IEEE international conference on intelligent robots and systems. IEEE, New York, NY, USA, pp 192–198

  90. 90.

    Weyand T, Kostrikov I, Philbin J (2016) Planet-photo geolocation with convolutional neural networks. In: Leibe B, Matas J, Sebe N, Welling M (eds) Computer vision—ECCV 2016: 14th European conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, part VIII. Springer, Cham, pp 37–55

  91. 91.

    Woo J, Son K, Li T, Kim GS, Kweon IS (2007) Vision-based UAV Navigation in Mountain Area. In: Proceedings of the IAPR conference on machine vision applications (IAPR MVA 2007), pp 236–239

  92. 92.

    Workman S, Souvenir R, Jacobs N (2015) Wide-area image geolocalization with aerial reference imagery. In: Proceedings of the 2015 IEEE international conference on computer vision. IEEE, New York, NY, USA, pp 3961–3969

  93. 93.

    Zamir AR, Ardeshir S, Shah M (2014) GPS-tag refinement using random walks with an adaptive damping factor. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition. IEEE Computer Society Press, Washington, D.C., USA, pp 4280–4287

  94. 94.

    Zamir AR, Shah M (2010) Accurate image localization based on google maps street view. In: Daniilidis K, Maragos P, Paragios N (eds) Computer vision—ECCV 2010: 11th European conference on computer vision, Heraklion, Crete, Greece, September 5–11, 2010, Proceedings, part IV. Springer, Berlin, pp 255–268

  95. 95.

    Zamir AR, Shah M (2014) Image geo-localization based on multiple nearest neighbor feature matching using generalized graphs. IEEE Trans Pattern Anal Mach Intell 36(8):1546–1558

    Article  Google Scholar 

  96. 96.

    Zeisl B, Sattler T, Pollefeys M (2015) Camera pose voting for large-scale image-based localization. In: Proceedings of the 2015 IEEE international conference on computer vision. IEEE, New York, NY, USA, pp 2704–2712

  97. 97.

    Zhang W, Košecká J (2006) Image based localization in urban environments. In: Third international symposium on 3D data processing, visualization, and transmission (3DPVT’06), pp 33–40. IEEE

  98. 98.

    Zheng YT, Zhao M, Song Y, Adam H, Buddemeier U, Bissacco A, Brucher F, Chua TS, Neven H (2009) Tour the world: building a web-scale landmark recognition engine. In: Proceedings of the 2009 IEEE conference on computer vision and pattern recognition. IEEE, New York, NY, USA, pp 1085–1092

  99. 99.

    Zhou B, Lapedriza A, Xiao J, Torralba A, Oliva A (2014) Learning deep features for scene recognition using places database. In: Advances in neural information processing systems 27. Curran Associates Inc, Red Hook, NY, USA, pp 487–495

Download references

Acknowledgements

This work was supported by SoMoPro II grant (financial contribution from the EU 7 FP People Programme Marie Curie Actions, REA 291782, and from the South Moravian Region). The content of this article does not reflect the official opinion of the European Union. Responsibility for the information and views expressed therein lies entirely with the authors. This work was also supported by The Ministry of Education, Youth and Sports from the Large Infrastructures for Research, Experimental Development and Innovations project "IT4Innovations National Supercomputing Center - LM2015070".

Author information

Affiliations

Authors

Corresponding author

Correspondence to Jan Brejcha.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Brejcha, J., Čadík, M. State-of-the-art in visual geo-localization. Pattern Anal Applic 20, 613–637 (2017). https://doi.org/10.1007/s10044-017-0611-1

Download citation

Keywords

  • Visual geo-localization
  • City-scale localization
  • Natural environments
  • Image geo-location
  • Visual odometry
  • Geo-tagging
  • Image to model registration
  • 3D alignment
  • Cross-domain registration
  • Extrinsic calibration
  • 6 DOF