Scene Recognition for Indoor Localization of Mobile Robots Using Deep CNN

  • Piotr Wozniak
  • Hadha Afrisal
  • Rigel Galindo Esparza
  • Bogdan KwolekEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11114)


In this paper we propose a deep neural network based algorithm for indoor place recognition. It uses transfer learning to retrain VGG-F, a pretrained convolutional neural network to classify places on images acquired by a humanoid robot. The network has been trained as well as evaluated on a dataset consisting of 8000 images, which were recorded in sixteen rooms. The dataset is freely accessed from our website. We demonstrated experimentally that the proposed algorithm considerably outperforms BoW algorithms, which are frequently used in loop-closure. It also outperforms an algorithm in which features extracted by FC-6 layer of the VGG-F are classified by a linear SVM.



This work was supported by Polish National Science Center (NCN) under a research grant 2014/15/B/ST6/02808.


  1. 1.
    Arroyo, R., Alcantarilla, P., Bergasa, L., Romera, E.: OpenABLE: an open-source toolbox for application in life-long visual localization of autonomous vehicles. In: IEEE International Conference on Intelligent Transportation Systems, pp. 965–970 (2016)Google Scholar
  2. 2.
    Bay, H., Tuytelaars, T., Van Gool, L.: SURF: Speeded up robust features. Eur. Conf. Comput. Vis. 3951, 404–417 (2006)Google Scholar
  3. 3.
    Cadena, C., et al.: Past, present, and future of simultaneous localization and mapping: toward the robust-perception age. IEEE Trans. Robot. 32(6), 1309–1332 (2016)CrossRefGoogle Scholar
  4. 4.
    Chatfield, K., Lempitsky, V.S., Vedaldi, A., Zisserman, A.: The devil is in the details: an evaluation of recent feature encoding methods. In: British Machine Vision Conference (BMVC) (2011)Google Scholar
  5. 5.
    Chen, Z., Lam, O., Jacobson, A., Milford, M.: Convolutional neural network-based place recognition. In: Australasian Conference on Robotics and Automation (2014).
  6. 6.
    Cummins, M., Newman, P.: FAB-MAP: probabilistic localization and mapping in the space of appearance. Int. J. Rob. Res. 27(6), 647–665 (2008)CrossRefGoogle Scholar
  7. 7.
    Galvez-Lopez, D., Tardos, T.: Bags of binary words for fast place recognition in image sequences. IEEE Trans. Robot. 28, 1188–1197 (2012)CrossRefGoogle Scholar
  8. 8.
    Garcia-Fidalgo, E., Ortiz, A.: Vision-based topological mapping and localization by means of local invariant features and map refinement. Robotica 33, 1446–1470 (2014)CrossRefGoogle Scholar
  9. 9.
    Harris, C., Stephens, M.: A combined corner and edge detector. Alvey Vis. Conf. 15, 10–5244 (1988)Google Scholar
  10. 10.
    Krizhevsky, A., Sutskever, I., Hinton, G.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Processing Systems, pp. 1097–1105 (2012)Google Scholar
  11. 11.
    Kuindersma, S., et al.: Optimization-based locomotion planning, estimation, and control design for the atlas humanoid robot. Adv. Neural Proc. Syst. 40, 429–455 (2016)Google Scholar
  12. 12.
    Leutenegger, S., Chli, M., Siegwart, R.: BRISK: binary robust invariant scalable keypoints. In: International Conference on Computer Vision (ICCV) (2011)Google Scholar
  13. 13.
    Levitt, T., Lawton, D.: Qualitative navigation for mobile robots. Artif. Intell. 44(3), 305–360 (1990)CrossRefGoogle Scholar
  14. 14.
    Li, Q., Li, K., You, X., Bu, S., Liu, Z.: Place recognition based on deep feature and adaptive weighting of similarity matrix. Neurocomputing 199, 114–127 (2016)CrossRefGoogle Scholar
  15. 15.
    Lowe, D.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Lowry, S., et al.: Visual place recognition: a survey. IEEE Trans. Robot. 32, 1–19 (2016)CrossRefGoogle Scholar
  17. 17.
    Newman, P., Ho, K.: SLAM-loop closing with visually salient features. In: Proceedings of IEEE International Conference on Robotics and Automation, pp. 635–642 (2005)Google Scholar
  18. 18.
    Oliva, A., Torralba, A.: Building the gist of a scene: the role of global image features in recognition. In: Visual Perception, Progress in Brain Research, vol. 155, pp. 23–36. Elsevier (2006)Google Scholar
  19. 19.
    Oriolo, G., Paolillo, A., Rosa, L., Vendittelli, M.: Humanoid odometric localization integrating kinematic, inertial and visual information. Auton. Robots 40, 867–879 (2016)CrossRefGoogle Scholar
  20. 20.
    Radford, N., et al.: Valkryrie: NASA’s first bipedal humanoid robot. J. Field Robot. 32, 397–419 (2015)CrossRefGoogle Scholar
  21. 21.
    Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: ORB: an efficient alternative to SIFT or SURF. In: International Conference on Computer Vision (ICCV), vol. 32 (2011)Google Scholar
  22. 22.
    Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)MathSciNetCrossRefGoogle Scholar
  23. 23.
    Sahdev, R., Tsotsos, J.: Indoor place recognition system for localization of mobile robots. In: IEEE Conference on Computer and Robot Vision, pp. 53–60 (2016)Google Scholar
  24. 24.
    Schönberger, J., Hardmeier, H., Sattler, T., Pollefeys, M.: Comparative evaluation of hand-crafted and learned local features. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6959–6968 (2017)Google Scholar
  25. 25.
    Simard, P., Steinkraus, D., Platt, J.: Best practices for convolutional neural networks applied to visual document analysis. In: International Conference on Document Analysis and Recognition, pp. 958–963 (2003)Google Scholar
  26. 26.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014)Google Scholar
  27. 27.
    Sivic, J., Russell, B., Efros, A., Zisserman, A., Freeman, W.: Discovering objects and their location in images. In: IEEE International Conference on Computer Vision, vol. 1, pp. 370–377 (2005)Google Scholar
  28. 28.
    Sünderhauf, N., Protzel, P.: BRIEF-Gist - closing the loop by simple means. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 1234–1241 (2011)Google Scholar
  29. 29.
    Sünderhauf, N., et al.: Place recognition with convNet landmarks: viewpoint-robust, condition-robust, training-free. In: Proceedings of Robotics: Science and Systems XII (2015)Google Scholar
  30. 30.
    Tai, L., Liu, M.: Deep-learning in mobile robotics - from perception to control systems: a survey on why and why not. arXiv (2016)Google Scholar
  31. 31.
    Torii, A., Sivic, J., Pajdla, T., Okutomi, M.: Visual place recognition with repetitive structures. In: Proceedings of the IEEE Conference on Computer, Vision and Pattern Recognition (2013)Google Scholar
  32. 32.
    Wang, Z., Wu, F., Hu, Z.: MSLD: a robust descriptor for line matching. Pattern Recogn. 42, 941–953 (2009)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Piotr Wozniak
    • 4
  • Hadha Afrisal
    • 2
  • Rigel Galindo Esparza
    • 3
  • Bogdan Kwolek
    • 1
    Email author
  1. 1.AGH University of Science and TechnologyKrakówPoland
  2. 2.Universitas Gadjah MadaYogyakartaIndonesia
  3. 3.Monterrey Institute of Technology and Higher EducationMonterreyMexico
  4. 4.Rzeszów University of TechnologyRzeszówPoland

Personalised recommendations