PlaNet - Photo Geolocation with Convolutional Neural Networks

  • Tobias WeyandEmail author
  • Ilya Kostrikov
  • James Philbin
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9912)


Is it possible to determine the location of a photo from just its pixels? While the general problem seems exceptionally difficult, photos often contain cues such as landmarks, weather patterns, vegetation, road markings, or architectural details, which in combination allow to infer where the photo was taken. Previously, this problem has been approached using image retrieval methods. In contrast, we pose the problem as one of classification by subdividing the surface of the earth into thousands of multi-scale geographic cells, and train a deep network using millions of geotagged images. We show that the resulting model, called PlaNet, outperforms previous approaches and even attains superhuman accuracy in some cases. Moreover, we extend our model to photo albums by combining it with a long short-term memory (LSTM) architecture. By learning to exploit temporal coherence to geolocate uncertain photos, this model achieves a 50 % performance improvement over the single-image model.


Hide Markov Model Image Retrieval Query Image Convolutional Neural Network Street Level 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Arandjelovic, R., Gronat, P., Torii, A., Pajdla, T., Sivic, J.: NetVLAD: CNN architecture for weakly supervised place recognition. In: CVPR (2016)Google Scholar
  2. 2.
    Avrithis, Y., Kalantidis, Y., Tolias, G., Spyrou, E.: Retrieving landmark and non-landmark images from community photo collections. In: ACM Multimedia, pp. 153–162 (2010)Google Scholar
  3. 3.
    Azizpour, H., Razavian, A.S., Sullivan, J., Maki, A., Carlsson, S.: From generic to specific deep representations for visual recognition. In: CVPR DeepVision Workshop (2015)Google Scholar
  4. 4.
    Baatz, G., Köser, K., Chen, D., Grzeszczuk, R., Pollefeys, M.: Handling urban location recognition as a 2D homothetic problem. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part VI. LNCS, vol. 6316, pp. 266–279. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  5. 5.
    Babenko, A., Lempitsky, V.: Aggregating local deep features for image retrieval. In: ICCV (2015)Google Scholar
  6. 6.
    Babenko, A., Slesarev, A., Chigorin, A., Lempitsky, V.: Neural codes for image retrieval. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part I. LNCS, vol. 8689, pp. 584–599. Springer, Heidelberg (2014)Google Scholar
  7. 7.
    Bergamo, A., Sinha, S.N., Torresani, L.: Leveraging structure from motion to learn discriminative codebooks for scalable landmark classification. In: CVPR, pp. 763–770 (2013)Google Scholar
  8. 8.
    Cao, S., Snavely, N.: Graph-based discriminative learning for location recognition. IJCV 112(2), 239–254 (2015)CrossRefMathSciNetGoogle Scholar
  9. 9.
    Chen, C.Y., Grauman, K.: Clues from the beaten path: location estimation with bursty sequences of tourist photos. In: CVPR (2011)Google Scholar
  10. 10.
    Chen, D., Baatz, G., Köser, K., Tsai, S., Vedantham, R., Pylvänäinen, T., Roimela, K., Chen, X., Bach, J., Pollefeys, M., Girod, B., Grzeszczuk, R.: City-scale landmark identification on mobile devices. In: CVPR, pp. 737–744 (2011)Google Scholar
  11. 11.
    Dean, J., Corrado, G.S., Monga, R., Chen, K., Devin, M., Le, Q.V., Mao, M.Z., Ranzato, M., Senior, A., Tucker, P., Yang, K., Ng, A.Y.: Large scale distributed deep networks. In: NIPS (2012)Google Scholar
  12. 12.
    Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR (2009)Google Scholar
  13. 13.
    Douze, M., Jégou, H., Harsimrat, S., Amsaleg, L., Schmid, C.: Evaluation of GIST descriptors for web-scale image search. In: CIVR (2009)Google Scholar
  14. 14.
    Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. JMLR 12, 2121–2159 (2011)zbMATHMathSciNetGoogle Scholar
  15. 15.
    Elman, J.: Finding structure in time. Cogn. Sci. 14(2), 179–211 (1990)CrossRefGoogle Scholar
  16. 16.
    Gammeter, S., Quack, T., Van Gool, L.: I know what you did last summer: object-level auto-annotation of holiday snaps. In: ICCV, pp. 614–621 (2009)Google Scholar
  17. 17.
    Graves, A., Schmidthuber, J.: Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw. 18(5–6), 602–610 (2005)CrossRefGoogle Scholar
  18. 18.
    Gronat, P., Obozinski, G., Sivic, J., Pajdla, T.: Learning per-location classifiers for visual place recognition. In: CVPR (2013)Google Scholar
  19. 19.
    Hays, J., Efros, A.: IM2GPS: estimating geographic information from a single image. In: CVPR (2008)Google Scholar
  20. 20.
    Hays, J., Efros, A.: Large-scale image geolocalization. In: Choi, J., Friedland, G. (eds.) Multimodal Location Estimation of Videos and Images, pp. 41–62. Springer, Cham (2014)Google Scholar
  21. 21.
    Hochreiter, S., Schmidthuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  22. 22.
    Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML (2015)Google Scholar
  23. 23.
    Irschara, A., Zach, C., Frahm, J.M., Bischof, H.: From structure-from-motion point clouds to fast location recognition. In: CVPR (2009)Google Scholar
  24. 24.
    Johns, E., Yang, G.Z.: From images to scenes: compressing an image cluster into a single scene model for place recognition. In: ICCV, pp. 874–881 (2011)Google Scholar
  25. 25.
    Jegou, H., Douze, M., Schmid, C.: Hamming embedding and weak geometric consistency for large scale image search. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 304–317. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  26. 26.
    Jégou, H., Douze, M., Schmid, C.: Improving bag-of-features for large scale image search. IJCV 87(3), 316–336 (2010)CrossRefGoogle Scholar
  27. 27.
    Kalogerakis, E., Vesselova, O., Hays, J., Efros, A., Hertzmann, A.: Image sequence geolocation with human travel priors. In: ICCV (2009)Google Scholar
  28. 28.
    Kendall, A., Grimes, M., Cipolla, R.: PoseNet: a convolutional network for real-time 6-DOF camera relocalization. In: ICCV (2015)Google Scholar
  29. 29.
    Kim, H.J., Dunn, E., Frahm, J.M.: Predicting good features for image geo-localization using per-bundle VLAD. In: ICCV (2015)Google Scholar
  30. 30.
    Knopp, J., Sivic, J., Pajdla, T.: Avoiding confusing features in place recognition. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part I. LNCS, vol. 6311, pp. 748–761. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  31. 31.
    Lee, S., Zhang, H., Crandall, D.J.: Predicting geo-informative attributes in large-scale image collections using convolutional neural networks. In: WACV (2015)Google Scholar
  32. 32.
    Li, Y., Crandall, D.J., Huttenlocher, D.P.: Landmark classification in large-scale image collections. In: ICCV, pp. 1957–1964 (2009)Google Scholar
  33. 33.
    Li, Y., Snavely, N., Huttenlocher, D.P.: Location recognition using prioritized feature matching. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part II. LNCS, vol. 6312, pp. 791–804. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  34. 34.
    Li, Y., Snavely, N., Huttenlocher, D., Fua, P.: Worldwide pose estimation using 3D point clouds. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part I. LNCS, vol. 7572, pp. 15–29. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  35. 35.
    Lin, T.Y., Belongie, S., Hays, J.: Cross-view image geolocalization. In: CVPR (2013)Google Scholar
  36. 36.
    Lin, T.Y., Cui, Y., Belongie, S., Hays, J.: Learning deep representations for ground-to-aerial geolocalization. In: CVPR (2015)Google Scholar
  37. 37.
    Mikulík, A., Perdoch, M., Chum, O., Matas, J.: Learning a fine vocabulary. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6313, pp. 1–14. Springer, Heidelberg (2010). doi: 10.1007/978-3-642-15558-1_1 CrossRefGoogle Scholar
  38. 38.
    Nistér, D., Stewénius, H.: Scalable recognition with a vocabulary tree. In: CVPR, pp. 2161–2168 (2006)Google Scholar
  39. 39.
    Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: CVPR (2007)Google Scholar
  40. 40.
    Quack, T., Leibe, B., Van Gool, L.: World-scale mining of objects and events from community photo collections. In: CIVR, pp. 47–56 (2008)Google Scholar
  41. 41.
    Ramalingam, S., Bouaziz, S., Sturm, P., Brand, M.: SKYLINE2GPS: localization in urban canyons using omni-skylines. In: IROS (2010)Google Scholar
  42. 42.
    Razavian, A.S., Azizpour, H., Sullivan, J., Carlsson, S.: CNN features off-the-shelf: an astounding baseline for recognition. In: CVPR 2014 DeepVision Workshop (2014)Google Scholar
  43. 43.
    Sattler, T., Leibe, B., Kobbelt, L.: Fast image-based localization using direct 2d-to-3d matching. In: ICCV, pp. 667–674 (2011)Google Scholar
  44. 44.
    Sattler, T., Leibe, B., Kobbelt, L.: Improving image-based localization by active correspondence search. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part I. LNCS, vol. 7572, pp. 752–765. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  45. 45.
    Sattler, T., Weyand, T., Leibe, B., Kobbelt, L.: Image retrieval for image-based localization revisited. In: BMVC, pp. 76.1–76.12 (2012)Google Scholar
  46. 46.
    Schindler, G., Brown, M., Szeliski, R.: City-scale location recognition. In: CVPR (2007)Google Scholar
  47. 47.
    Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., LeCun, Y.: OverFeat: integrated recognition, localization and detection using convolutional networks. In: ICLR (2014)Google Scholar
  48. 48.
    Sivic, J., Zisserman, A.: Video Google: a text retrieval approach to object matching in videos. In: ICCV, vol. 2, pp. 1470–1477 (2003)Google Scholar
  49. 49.
    Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: CVPR (2015)Google Scholar
  50. 50.
    Tolias, G., Sicre, R., Jegou, H.: Particular object retrieval with integral max-pooling of CNN activations. In: ICLR (2016)Google Scholar
  51. 51.
    Tolias, G., Avrithis, Y., Jegou, H.: To aggregate or not to aggregate: selective matchkernels for image search. In: ICCV (2013)Google Scholar
  52. 52.
    Wang, J., Song, Y., Leung, T., Rosenberg, C., Wang, J., Philbin, J., Chen, B., Wu, Y.: Learning fine-grained image similarity with deep ranking. In: CVPR (2014)Google Scholar
  53. 53.
    Workman, S., Souvenir, R., Jacobs, N.: Wide-area image geolocalization with aerial reference imagery. In: ICCV (2015)Google Scholar
  54. 54.
    Xiao, J., Ehinger, K.A., Hays, J., Torralba, A., Oliva, A.: SUN database: exploring a large collection of scene categories. IJCV (2014)Google Scholar
  55. 55.
    Xie, L., Hong, R., Zhang, B., Tian, Q.: Image classification and retrieval are ONE. In: ICMR (2015)Google Scholar
  56. 56.
    Zamir, A.R., Shah, M.: Accurate image localization based on Google maps street view. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 255–268. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  57. 57.
    Zamir, A.R., Shah, M.: Image geo-localization based on multiple nearest neighbor feature matching using generalized graphs. PAMI 36(8), 1546–1558 (2014)CrossRefGoogle Scholar
  58. 58.
    Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part I. LNCS, vol. 8689, pp. 818–833. Springer, Heidelberg (2014)Google Scholar
  59. 59.
    Zheng, Y.T., Zhao, M., Song, Y., Adam, H., Buddemeier, U., Bissacco, A., Brucher, F., Chua, T.S., Neven, H.: Tour the world: building a web-scale landmark recognition engine. In: CVPR, pp. 961–962 (2009)Google Scholar
  60. 60.
    Zhou, B., Lapedriza, A., Xiao, J., Torralba, A., Oliva, A.: Learning deep features for scene recognition using places database. In: NIPS (2014)Google Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  1. 1.GoogleLos AngelesUSA
  2. 2.RWTH Aachen UniversityAachenGermany
  3. 3.ZooxMenlo ParkUSA

Personalised recommendations