International Journal of Computer Vision

, Volume 118, Issue 3, pp 319–336 | Cite as

Learning and Calibrating Per-Location Classifiers for Visual Place Recognition

  • Petr Gronát
  • Josef Sivic
  • Guillaume Obozinski
  • Tomas Pajdla


The aim of this work is to localize a query photograph by finding other images depicting the same place in a large geotagged image database. This is a challenging task due to changes in viewpoint, imaging conditions and the large size of the image database. The contribution of this work is two-fold. First, we cast the place recognition problem as a classification task and use the available geotags to train a classifier for each location in the database in a similar manner to per-exemplar SVMs in object recognition. Second, as only one or a few positive training examples are available for each location, we propose two methods to calibrate all the per-location SVM classifiers without the need for additional positive training data. The first method relies on p-values from statistical hypothesis testing and uses only the available negative training data. The second method performs an affine calibration by appropriately normalizing the learnt classifier hyperplane and does not need any additional labelled training data. We test the proposed place recognition method with the bag-of-visual-words and Fisher vector image representations suitable for large scale indexing. Experiments are performed on three datasets: 25,000 and 55,000 geotagged street view images of Pittsburgh, and the 24/7 Tokyo benchmark containing 76,000 images with varying illumination conditions. The results show improved place recognition accuracy of the learnt image representation over direct matching of raw image descriptors.


Place recognition Exemplar SVM Geo-localization  Classifier calibration 



This work was supported by the MSR-INRIA laboratory, the EIT-ICT labs, Google, the ERC project LEAP and the EC project RVO13000 - Conceptual development of research organization. Supported by the Intelligence Advanced Research Projects Activity (IARPA) via Air Force Research Laboratory, contract FA8650-12-C-7212. The U.S. Government is authorized to reproduce and distribute reprints for governmental purposes notwithstanding any copyright annotation thereon. Disclaimer: The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA, AFRL or the U.S. Government.


  1. Agarwal, S., Snavely, N., Simon, I., Seitz, S. & Szeliski, R. (2009). Building Rome in a day. In ICCV (pp. 72–79).Google Scholar
  2. Arandjelović, R. & Zisserman, A. (2012). Three things everyone should know to improve object retrieval. In IEEE PAMI.Google Scholar
  3. Aubry, M., Maturana, D., Efros, A., Russell, B. & Sivic, J. (2014). Seeing 3D chairs: exemplar part-based 2D–3D alignment using a large dataset of CAD models. In CVPR.Google Scholar
  4. Aubry, M., Russell, B. & Sivic, J. (2014) Painting-to-3D model alignment via discriminative visual elements. ACM Transactions on Graphics.Google Scholar
  5. Bay, H., Tuytelaars, T. & Van Gool, L. (2006). SURF: Speeded up robust features. In ECCV.Google Scholar
  6. Cao, S. & Snavely, N. (2013). Graph-based discriminative learning for location recognition. In IEEE Conference on CVPR (pp. 700–707).Google Scholar
  7. Casella, G. & Berger, R. (2001). Statistical inference.Google Scholar
  8. Chen, D., Baatz, G., Köser, Tsai, S., Vedantham, R., Pylvanainen, T., Roimela, K., Chen, X., Bach, J., Pollefeys, M., Girod, B. & Grzeszczuk, R. (2011). City-scale landmark identification on mobile devices. In CVPR.Google Scholar
  9. Chum, O., Philbin, J., Sivic, J., Isard, M. & Zisserman, A. (2007). Total recall: Automatic query expansion with a generative feature model for object retrieval. In ICCV.Google Scholar
  10. Csurka, G., Bray, C., Dance, C., & Fan, L. (2004). Visual categorization with bags of keypoints. In Workshop on Statistical Learning in Computer Vision, ECCV (pp. 1–22).Google Scholar
  11. Cummins, M. & Newman, P. (2009). Highly scalable appearance-only SLAM - FAB-MAP 2.0. In Proceedings of Robotics: Science and Systems, Seattle, USA.Google Scholar
  12. Dalal, N. & Triggs, B. (2005). Histogram of oriented gradients for human detection. In CVPR.Google Scholar
  13. Doersch, C., Gupta, A. & Efros, A.A. (2013). Mid-level visual element discovery as discriminative mode seeking. In NIPS.Google Scholar
  14. Doersch, C., Singh, S., Gupta, A., Sivic, J., & Efros, A. A. (2012). What makes Paris look like Paris? SIGGRAPH, 31(4), 101.Google Scholar
  15. Fan, R. E., Chang, K. W., Hsieh, C. J., Wang, X. R., & Lin, C. J. (2008). LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research, 9, 1871–1874.zbMATHGoogle Scholar
  16. Gebel, M., & Weihs, C. (2007). Calibrating classifier scores into probabilities. Advances in Data Analysis (pp. 141–148). Berlin: Springer.CrossRefGoogle Scholar
  17. Gharbi, M., Malisiewicz, T., Paris, S., & Durand, F. (2012). A Gaussian approximation of feature space for fast image similarity. Technical Report, MIT.Google Scholar
  18. Google: ICMLA 2011 streetview recognition challenge.
  19. Gronát, P.: Project webpage: Learning and calibrating per-location classifiers for visual place recognition.
  20. Gronát, P. (2015). streetget.
  21. Gronát, P., Obozinski, G., Sivic, J. & Pajdla, T. (2013). Learning and calibrating per-location classifiers for visual place recognition. In CVPR.Google Scholar
  22. Hariharan, B., Malik, J. & Ramanan, D. (2012). Discriminative decorrelation for clustering and classification. In ECCV.Google Scholar
  23. Hays, J. & Efros, A.A. (2008). im2gps: estimating geographic information from a single image. In CVPR.Google Scholar
  24. Irschara, A., Zach, C., Frahm, J.M. & Bischof, H. (2009). From structure-from-motion point clouds to fast location recognition. In CVPR.Google Scholar
  25. Jégou, H. & Chum, O. (2012). Negative evidences and co-occurences in image retrieval: The benefit of PCA and whitening. In ECCV (pp. 774–787).Google Scholar
  26. Jegou, H., Douze, M., & Schmid, C. (2011). Product quantization for nearest neighbor search. IEEE Transactions on PAMI, 33(1), 117–128.CrossRefGoogle Scholar
  27. Jégou, H., Perronnin, F., Douze, M., Sánchez, J., Pérez, P., & Schmid, C. (2012). Aggregating local image descriptors into compact codes. IEEE Transactions on PAMI, 34, 1704–1716.CrossRefGoogle Scholar
  28. Kalogerakis, E., Vesselova, O., Hays, J., Efros, A. & Hertzmann, A. (2009). Image sequence geolocation with human travel priors. In ICCV (pp. 253–260).Google Scholar
  29. Klingner, B., Martin, D. & Roseborough, J. (2013). Street view motion-from-structure-from-motion. In ICCV.Google Scholar
  30. Knopp, J., Sivic, J. & Pajdla, T. (2010). Avoidng confusing features in place recognition. In ECCV.Google Scholar
  31. Li, Y., Crandall, D. & Huttenlocher, D. (2009). Landmark classification in large-scale image collections. In ICCV.Google Scholar
  32. Li, Y., Snavely, N. & Huttenlocher, D. (2010). Location recognition using prioritized feature matching. In ECCV.Google Scholar
  33. Li, Y., Snavely, N., Huttenlocher, D. & Fua, P. (2012). Worldwide pose estimation using 3d point clouds. In ECCV.Google Scholar
  34. Lowe, D. (2004). Distinctive image features from scale-invariant keypoints. IJCV, 60(2), 91–110.CrossRefGoogle Scholar
  35. Malisiewicz, T., Gupta, A. & Efros, A.A. (2011). Ensemble of exemplar-svms for object detection and beyond. In ICCV.Google Scholar
  36. Muja, M. & Lowe, D.G. (2014). Scalable nearest neighbor algorithms for high dimensional data. In IEEE Transactions on PAMI 36.Google Scholar
  37. Nister, D. & Stewenius, H. (2006). Scalable recognition with a vocabulary tree. In CVPR.Google Scholar
  38. Philbin, J., Chum, O., Isard, M., Sivic, J. & Zisserman, A. (2007). Object retrieval with large vocabularies and fast spatial matching. In CVPR.Google Scholar
  39. Philbin, J., Sivic, J. & Zisserman, A. (2010). Geometric latent dirichlet allocation on a matching graph for large-scale image datasets. In IJCV.Google Scholar
  40. Platt, J. (1999). Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Advances in Large Margin Classifiers, 10(3), 61–74.Google Scholar
  41. Sattler, T., Leibe, B. & Kobbelt, L. (2012). Improving image-based localization by active correspondence search. In ECCV.Google Scholar
  42. Sattler, T., Weyand, T., Leibe, B., & Kobbelt, L. (2012). Image retrieval for image-based localization revisited. In Proceedings of BMVC.Google Scholar
  43. Scheirer, W., Kumar, N., Belhumeur, P.N. & Boult, T.E. (2012). Multi-attribute spaces: Calibration for attribute fusion and similarity search. In CVPR.Google Scholar
  44. Schindler, G., Brown, M. & Szeliski, R. (2007). City-scale location recognition. In CVPR.Google Scholar
  45. Scholkopf, B., & Smola, A. (2002). Learning with kernels. Cambridge: MIT press.zbMATHGoogle Scholar
  46. Shrivastava, A., Malisiewicz, T., Gupta, A. & Efros, A.A. (2011). Data-driven visual similarity for cross-domain image matching. In SIGGRAPH ASIA.Google Scholar
  47. Singh, S., Gupta, A. & Efros, A.A. (2012). Unsupervised discovery of mid-level discriminative patches. In ECCV.Google Scholar
  48. Sivic, J., Zisserman, A. (2003). Video Google: A text retrieval approach to object matching in videos. In ICCV.
  49. Tighe, J. & Lazebnik, S. (2013). Finding things: Image parsing with regions and per-exemplar detectors. In CVPR.Google Scholar
  50. Torii, A. (2015). Project webpage: 24/7 place recognition by view synthesis.
  51. Torii, A., Arandjelović, R., Sivic, J., Okutomi, M. & Pajdla, T. (2015). 24/7 place recognition by view synthesis. In CVPR.Google Scholar
  52. Torii, A., Sivic, J. & Pajdla, T. (2011). Visual localization by linear combination of image descriptors. In IEEE Workshop on Mobile Vision.Google Scholar
  53. Torii, A., Sivic, J., Pajdla, T. & Okutomi, M. (2013) Visual place recognition with repetitive structures. In CVPR.Google Scholar
  54. Turcot, P., & Lowe, D. (2009). Better matching with fewer features: The selection of useful features in large database recognition problem. In WS-LAVD, ICCV.Google Scholar
  55. Zadrozny, B. & Elkan, C. (2002) Transforming classifier scores into accurate multiclass probability estimates. In ACM SIGKDD.Google Scholar
  56. Zamir, A. & Shah, M. (2010) Accurate image localization based on google maps street view. In ECCV.Google Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  • Petr Gronát
    • 1
    • 3
  • Josef Sivic
    • 1
  • Guillaume Obozinski
    • 2
  • Tomas Pajdla
    • 3
  1. 1.Inria, Willow project, Departement d’Informatique de l’Ecole Normale SuperieureParisFrance
  2. 2.Ecole des Ponts ParisTechMarne-la-ValléeFrance
  3. 3.Czech Technical University in Prague, Faculty of Electrical Engineering, Department of CyberneticsPraha 2Czech Republic

Personalised recommendations