Generative Methods for Long-Term Place Recognition in Dynamic Scenes


This paper proposes a new framework for visual place recognition that incrementally learns models of each place and offers adaptability to dynamic elements in the scene. Traditional Bag-Of-Words (BOW) image-retrieval approaches to place recognition typically treat images in a holistic manner and are not capable of dealing with sub-scene dynamics, such as structural changes to a building façade or seasonal effects on foliage. However, by treating local features as observations of real-world landmarks in a scene that is observed repeatedly over a period of time, such dynamics can be modelled at a local level, and the spatio-temporal properties of each landmark can be independently updated incrementally. The method proposed models each place as a set of such landmarks and their geometric relationships. A new BOW filtering stage and geometric verification scheme are introduced to compute a similarity score between a query image and each scene model. As further training images are acquired for each place, the landmark properties are updated over time and in the long term, the model can adapt to dynamic behaviour in the scene. Results on an outdoor dataset of images captured along a 7 km path, over a period of 5 months, show an improvement in recognition performance when compared to state-of-the-art image retrieval approaches to place recognition.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16


  1. Agarwal, S., Snavely, N., Simon, I., Seitz, S. M., & Szeliski, R. (2009). Building rome in a day. In Proceedings of ICCV.

  2. Arandjelovic, R., & Zisserman, A. (2012). Three things everyone should know to improve object retrieval. In Proceedings of CVPR.

  3. Arnaud, E., Odone F., Delponte, E., & Verri, A. (2006). Trains of keypoints for 3d object recognition. In Proceedings of ICPR.

  4. Bowman, K. O., & Shenton, L. R. (2007). The beta distribution, moment method. Far East Journal of Theoretical Statistics, 23, 133–165.

    MATH  MathSciNet  Google Scholar 

  5. Cao, Y., Wang, C., Li, Z., & Zhang, L. (2010). Spatial bag-of-features. In Proceedings of CVPR.

  6. Chatfield, K., Lempitsky, V., Vedaldi, A., & Zisserman, A. (2011). The devil is in the details: An evaluation of recent feature encoding methods. In Proceedings of BMVC.

  7. Chum, O., Philbin, J., & Zisserman, A. (2008). Near duplicate image detection: Min-hash and tf-idf weighting. In Proceedings of BMVC.

  8. Chum, O., Mikulík, A., Perdoch, M., & Matas, J. (2011). Total recall ii: Query expansion revisited. In Proceedings of CVPR (pp. 889–896).

  9. Csurka, G., Dance, C. R., Fan, L., Willamowski, J., & Bray, C. (2004). Visual categorization with bags of keypoints. In Proceedings of ECCV International Workshop on Statistical Learning in Computer Vision.

  10. Cummins, M., & Newman, P. (2008). Fab-map: Probabilistic localization and mapping in the space of appearance. IJRR, 27, 647–665.

    Google Scholar 

  11. Cummins, M., & Newman, P. (2009). Highly scalable appearance-only slam-fab-map 2.0. In Proceedings of Robotics: Science and Systems.

  12. Hartley, R. I., & Zisserman, A. (2004). Multiple view geometry in computer vision. Cambridge: Cambridge University Press.

    Google Scholar 

  13. Jegou, H., & Chum, O. (2012). Negative evidences and co-occurrences in image retrieval: The benefit of pca and whitening. In Proceedings of ECCV.

  14. Jégou, H., Douze, M., & Schmid, C. (2010). Improving bag-of-features for large scale image search. IJCV, 87(3), 316–336.

    Article  Google Scholar 

  15. Johns, E., & Yang, G. Z. (2011a). From images to scenes: Compressing an image cluster into a single scene model for place recognition. In Proceedings of ICCV (pp. 874–881).

  16. Johns, E., & Yang, G. Z. (2011b). Global localization in a dense continuous topological map. In Proceedings of ICAR

  17. Johns, E., & Yang, G. Z. (2011c). Place recognition and online learning in dynamic scenes with spatio-temporal landmarks. In Proceedings of BMVC, (pp. 10.1–10.12).

  18. Johns, E., & Yang, G. Z. (2013a). Dynamic scene models for incremental, long-term, appearance-based localisation. In Procedings of ICRA.

  19. Johns, E., & Yang, G. Z. (2013b). Feature co-occurrence maps: Appearance-based localisation throughout the day. In Proceedings of ICRA.

  20. Leordeanu, M., & Hebert, M. (2005). A spectral technique for correspondence problems using pairwise constraints. In Proceedings of ICCV (pp. 1482–1489).

  21. Li, Y., Snavely, N., & Huttenlocher, D. P. (2010). Location recognition using prioritized feature matching. In Proceedings of ECCV (pp. 791–804).

  22. Lik, F., & Kosecka, J. (2006). Probabilistic location recognition using reduced feature set. In Proceedings of ICRA.

  23. Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. Trans IJCV, 60, 91–110.

    Article  Google Scholar 

  24. Luo, J., Pronobis, A., Caputo, B., & Jensfelt, P. (2007). Incremental learning for place recognition in dynamic environments. In Proceedings of IROS.

  25. Marszalek, M., & Schmid, C. (2006). Spatial weighting for bag-of-features. In Proceedings of CVPR.

  26. Mikullk, A., & Perdoch, M. (2010). Learning a fine vocabulary. In Proceedings of ECCV.

  27. Ni, K., Kannan, A., Criminis, A., & Winn, J. (2009). Epitomic location recognition. In IEEE Trans PAMI.

  28. Nister, D., & Stewenius, H. (2006). Scalable recognition with a vocabulary tree. In Proceedings of CVPR (pp. 1222–1229).

  29. Orabona, F., Jie L., & Caputo, B. (2010). Online-batch strongly convex multi kernel learning. In Proceedings of CVPR.

  30. Philbin, J., Chum, O., Isard, M., Sivic, J., & Zisserman, A. (2007). Object retrieval with large vocabularies and fast spatial matching. In Proceedings of CVPR (pp. 1–8).

  31. Philbin, J., Chum, O., Isard, M., Sivic, J., & Zisserman, A. (2008). Lost in quantization: Improving particular object retrieval in large scale image databases. In Proceedings of CVPR.

  32. Pronobis, A., & Caputo, B. (2007). Confidence-based cue integration for visual place recognition. In Proceedings of IROS.

  33. Raguram, R., Wu, C., Frahm, J. M., & Lazebnik, S. (2011). Modeling and recognition of landmark image collections using iconic scene graphs. Trans IJCV, 95(3), 213–239.

    Article  Google Scholar 

  34. Schindler, G., Brown, M., & Szeliski, R. (2007). City-scale location recognition. In Proceedings of CVPR.

  35. Se, S., Lowe, D., & Little, J. (2001). Vision-based mobile robot localization and mapping using scale-invariant features. In Proceedings of ICRA.

  36. Sivic, J., & Zisserman, A. (2003). Video google: A text retrieval approach to object matching in videos. In Proceedings of ICCV (pp. 1470–1477).

  37. Tolias, G., & Avrithis, Y. (2011). Speeded-up, relaxed spatial matching. In Proceedings of ICCV.

  38. Winn, J., Criminisi, A., & Minka, T. (2005). Object categorization by learned universal visual dictionary. In Proceedings of ICCV (pp. 1800–1807).

  39. Zhai, C., & Lafferty, J. (2001). A study of smoothing methods for language models applied to ad hoc information retrieval. In Proceedings of ACM SIGIR (pp. 334–342).

  40. Zhang, Y., Jia, Z., & Chen, T. (2011). Image retrieval with geometry-preserving visual phrases. In Proceedings of CVPR (pp. 809–816).

  41. Zheng, Y. T., Zhao, M., Song, Y., Adam, H., Buddemeier, U., Bissacco, A., et al. (2009). Tour the world: building a web-scale landmark recognition engine. In Proceedings of CVPR.

Download references

Author information



Corresponding author

Correspondence to Edward Johns.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Johns, E., Yang, GZ. Generative Methods for Long-Term Place Recognition in Dynamic Scenes. Int J Comput Vis 106, 297–314 (2014).

Download citation


  • Scene recognition
  • Appearance-based localization
  • Topological localization
  • Image retrieval
  • Simultaneous localization and mapping