Learning and Calibrating Per-Location Classifiers for Visual Place Recognition
- 429 Downloads
The aim of this work is to localize a query photograph by finding other images depicting the same place in a large geotagged image database. This is a challenging task due to changes in viewpoint, imaging conditions and the large size of the image database. The contribution of this work is two-fold. First, we cast the place recognition problem as a classification task and use the available geotags to train a classifier for each location in the database in a similar manner to per-exemplar SVMs in object recognition. Second, as only one or a few positive training examples are available for each location, we propose two methods to calibrate all the per-location SVM classifiers without the need for additional positive training data. The first method relies on p-values from statistical hypothesis testing and uses only the available negative training data. The second method performs an affine calibration by appropriately normalizing the learnt classifier hyperplane and does not need any additional labelled training data. We test the proposed place recognition method with the bag-of-visual-words and Fisher vector image representations suitable for large scale indexing. Experiments are performed on three datasets: 25,000 and 55,000 geotagged street view images of Pittsburgh, and the 24/7 Tokyo benchmark containing 76,000 images with varying illumination conditions. The results show improved place recognition accuracy of the learnt image representation over direct matching of raw image descriptors.
KeywordsPlace recognition Exemplar SVM Geo-localization Classifier calibration
This work was supported by the MSR-INRIA laboratory, the EIT-ICT labs, Google, the ERC project LEAP and the EC project RVO13000 - Conceptual development of research organization. Supported by the Intelligence Advanced Research Projects Activity (IARPA) via Air Force Research Laboratory, contract FA8650-12-C-7212. The U.S. Government is authorized to reproduce and distribute reprints for governmental purposes notwithstanding any copyright annotation thereon. Disclaimer: The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA, AFRL or the U.S. Government.
- Agarwal, S., Snavely, N., Simon, I., Seitz, S. & Szeliski, R. (2009). Building Rome in a day. In ICCV (pp. 72–79).Google Scholar
- Arandjelović, R. & Zisserman, A. (2012). Three things everyone should know to improve object retrieval. In IEEE PAMI.Google Scholar
- Aubry, M., Maturana, D., Efros, A., Russell, B. & Sivic, J. (2014). Seeing 3D chairs: exemplar part-based 2D–3D alignment using a large dataset of CAD models. In CVPR.Google Scholar
- Aubry, M., Russell, B. & Sivic, J. (2014) Painting-to-3D model alignment via discriminative visual elements. ACM Transactions on Graphics.Google Scholar
- Bay, H., Tuytelaars, T. & Van Gool, L. (2006). SURF: Speeded up robust features. In ECCV.Google Scholar
- Cao, S. & Snavely, N. (2013). Graph-based discriminative learning for location recognition. In IEEE Conference on CVPR (pp. 700–707).Google Scholar
- Casella, G. & Berger, R. (2001). Statistical inference.Google Scholar
- Chen, D., Baatz, G., Köser, Tsai, S., Vedantham, R., Pylvanainen, T., Roimela, K., Chen, X., Bach, J., Pollefeys, M., Girod, B. & Grzeszczuk, R. (2011). City-scale landmark identification on mobile devices. In CVPR.Google Scholar
- Chum, O., Philbin, J., Sivic, J., Isard, M. & Zisserman, A. (2007). Total recall: Automatic query expansion with a generative feature model for object retrieval. In ICCV.Google Scholar
- Csurka, G., Bray, C., Dance, C., & Fan, L. (2004). Visual categorization with bags of keypoints. In Workshop on Statistical Learning in Computer Vision, ECCV (pp. 1–22).Google Scholar
- Cummins, M. & Newman, P. (2009). Highly scalable appearance-only SLAM - FAB-MAP 2.0. In Proceedings of Robotics: Science and Systems, Seattle, USA.Google Scholar
- Dalal, N. & Triggs, B. (2005). Histogram of oriented gradients for human detection. In CVPR.Google Scholar
- Doersch, C., Gupta, A. & Efros, A.A. (2013). Mid-level visual element discovery as discriminative mode seeking. In NIPS.Google Scholar
- Doersch, C., Singh, S., Gupta, A., Sivic, J., & Efros, A. A. (2012). What makes Paris look like Paris? SIGGRAPH, 31(4), 101.Google Scholar
- Gharbi, M., Malisiewicz, T., Paris, S., & Durand, F. (2012). A Gaussian approximation of feature space for fast image similarity. Technical Report, MIT.Google Scholar
- Google: ICMLA 2011 streetview recognition challenge. http://www.icmla-conference.org/icmla11/challenge.htm.
- Gronát, P.: Project webpage: Learning and calibrating per-location classifiers for visual place recognition. http://www.di.ens.fr/willow/research/perlocation/.
- Gronát, P. (2015). streetget. http://www.di.ens.fr/willow/research/streetget/.
- Gronát, P., Obozinski, G., Sivic, J. & Pajdla, T. (2013). Learning and calibrating per-location classifiers for visual place recognition. In CVPR.Google Scholar
- Hariharan, B., Malik, J. & Ramanan, D. (2012). Discriminative decorrelation for clustering and classification. In ECCV.Google Scholar
- Hays, J. & Efros, A.A. (2008). im2gps: estimating geographic information from a single image. In CVPR.Google Scholar
- Irschara, A., Zach, C., Frahm, J.M. & Bischof, H. (2009). From structure-from-motion point clouds to fast location recognition. In CVPR.Google Scholar
- Jégou, H. & Chum, O. (2012). Negative evidences and co-occurences in image retrieval: The benefit of PCA and whitening. In ECCV (pp. 774–787).Google Scholar
- Kalogerakis, E., Vesselova, O., Hays, J., Efros, A. & Hertzmann, A. (2009). Image sequence geolocation with human travel priors. In ICCV (pp. 253–260).Google Scholar
- Klingner, B., Martin, D. & Roseborough, J. (2013). Street view motion-from-structure-from-motion. In ICCV.Google Scholar
- Knopp, J., Sivic, J. & Pajdla, T. (2010). Avoidng confusing features in place recognition. In ECCV.Google Scholar
- Li, Y., Crandall, D. & Huttenlocher, D. (2009). Landmark classification in large-scale image collections. In ICCV.Google Scholar
- Li, Y., Snavely, N. & Huttenlocher, D. (2010). Location recognition using prioritized feature matching. In ECCV.Google Scholar
- Li, Y., Snavely, N., Huttenlocher, D. & Fua, P. (2012). Worldwide pose estimation using 3d point clouds. In ECCV.Google Scholar
- Malisiewicz, T., Gupta, A. & Efros, A.A. (2011). Ensemble of exemplar-svms for object detection and beyond. In ICCV.Google Scholar
- Muja, M. & Lowe, D.G. (2014). Scalable nearest neighbor algorithms for high dimensional data. In IEEE Transactions on PAMI 36.Google Scholar
- Nister, D. & Stewenius, H. (2006). Scalable recognition with a vocabulary tree. In CVPR.Google Scholar
- Philbin, J., Chum, O., Isard, M., Sivic, J. & Zisserman, A. (2007). Object retrieval with large vocabularies and fast spatial matching. In CVPR.Google Scholar
- Philbin, J., Sivic, J. & Zisserman, A. (2010). Geometric latent dirichlet allocation on a matching graph for large-scale image datasets. In IJCV.Google Scholar
- Platt, J. (1999). Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Advances in Large Margin Classifiers, 10(3), 61–74.Google Scholar
- Sattler, T., Leibe, B. & Kobbelt, L. (2012). Improving image-based localization by active correspondence search. In ECCV.Google Scholar
- Sattler, T., Weyand, T., Leibe, B., & Kobbelt, L. (2012). Image retrieval for image-based localization revisited. In Proceedings of BMVC.Google Scholar
- Scheirer, W., Kumar, N., Belhumeur, P.N. & Boult, T.E. (2012). Multi-attribute spaces: Calibration for attribute fusion and similarity search. In CVPR.Google Scholar
- Schindler, G., Brown, M. & Szeliski, R. (2007). City-scale location recognition. In CVPR.Google Scholar
- Shrivastava, A., Malisiewicz, T., Gupta, A. & Efros, A.A. (2011). Data-driven visual similarity for cross-domain image matching. In SIGGRAPH ASIA.Google Scholar
- Singh, S., Gupta, A. & Efros, A.A. (2012). Unsupervised discovery of mid-level discriminative patches. In ECCV.Google Scholar
- Sivic, J., Zisserman, A. (2003). Video Google: A text retrieval approach to object matching in videos. In ICCV. http://www.robots.ox.ac.uk/vgg.
- Tighe, J. & Lazebnik, S. (2013). Finding things: Image parsing with regions and per-exemplar detectors. In CVPR.Google Scholar
- Torii, A. (2015). Project webpage: 24/7 place recognition by view synthesis. http://www.ok.ctrl.titech.ac.jp/torii/project/247/.
- Torii, A., Arandjelović, R., Sivic, J., Okutomi, M. & Pajdla, T. (2015). 24/7 place recognition by view synthesis. In CVPR.Google Scholar
- Torii, A., Sivic, J. & Pajdla, T. (2011). Visual localization by linear combination of image descriptors. In IEEE Workshop on Mobile Vision.Google Scholar
- Torii, A., Sivic, J., Pajdla, T. & Okutomi, M. (2013) Visual place recognition with repetitive structures. In CVPR.Google Scholar
- Turcot, P., & Lowe, D. (2009). Better matching with fewer features: The selection of useful features in large database recognition problem. In WS-LAVD, ICCV.Google Scholar
- Zadrozny, B. & Elkan, C. (2002) Transforming classifier scores into accurate multiclass probability estimates. In ACM SIGKDD.Google Scholar
- Zamir, A. & Shah, M. (2010) Accurate image localization based on google maps street view. In ECCV.Google Scholar