ORB-SLAM-CNN: Lessons in Adding Semantic Map Construction to Feature-Based SLAM

  • Andrew M. WebbEmail author
  • Gavin Brown
  • Mikel Luján
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11649)


Recent work has integrated semantics into the 3D scene models produced by visual SLAM systems. Though these systems operate close to real time, there is lacking a study of the ways to achieve real-time performance by trading off between semantic model accuracy and computational requirements. ORB-SLAM2 provides good scene accuracy and real-time processing while not requiring GPUs [1]. Following a ‘single view’ approach of overlaying a dense semantic map over the sparse SLAM scene model, we explore a method for automatically tuning the parameters of the system such that it operates in real time while maximizing prediction accuracy and map density.


Online parameter tuning SLAM Semantic segmentation 



The authors gratefully acknowledge the support of the EPSRC grants LAMBDA (EP/N035127/1), PAMELA (EP/K008730/1), and RAIN (EP/R026084/1).


  1. 1.
    Bodin, B., et al.: SLAMBench2: multi-objective head-to-head benchmarking for visual SLAM. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1–8 (2018)Google Scholar
  2. 2.
    Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Semantic image segmentation with deep convolutional nets and fully connected CRFs. In: ICLR (2015).
  3. 3.
    Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T., Nießner, M.: ScanNet: richly-annotated 3D reconstructions of indoor scenes. In: Proceedings of the Computer Vision and Pattern Recognition (CVPR). IEEE (2017)Google Scholar
  4. 4.
    Häne, C., Zach, C., Cohen, A., Angst, R., Pollefeys, M.: Joint 3D scene reconstruction and class segmentation. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition, pp. 97–104, June 2013.
  5. 5.
    Hinton, G.E.: Training products of experts by minimizing contrastive divergence. Neural Comput. 14(8), 1771–1800 (2002). Scholar
  6. 6.
    Howard, A.G., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications. CoRR abs/1704.04861 (2017).
  7. 7.
    Krähenbühl, P., Koltun, V.: Efficient inference in fully connected CRFs with Gaussian edge potentials. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24, pp. 109–117. Curran Associates, Inc. (2011).
  8. 8.
    Kundu, A., Li, Y., Dellaert, F., Li, F., Rehg, J.M.: Joint semantic segmentation and 3D reconstruction from monocular video. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 703–718. Springer, Cham (2014). Scholar
  9. 9.
    Li, X., Belaroussi, R.: Semi-dense 3D semantic mapping from monocular SLAM. CoRR abs/1611.04144 (2016).
  10. 10.
    McCormac, J., Handa, A., Davison, A.J., Leutenegger, S.: SemanticFusion: dense 3D semantic mapping with convolutional neural networks. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 4628–4635 (2017)Google Scholar
  11. 11.
    Mur-Artal, R., Tardós, J.D.: ORB-SLAM2: an open-source SLAM system for monocular, stereo and RGB-D cameras. IEEE Trans. Robot. 33(5), 1255–1262 (2017). Scholar
  12. 12.
    Pillai, S., Leonard, J.: Monocular SLAM supported object recognition. In: Proceedings of Robotics: Science and Systems (RSS), Rome, Italy, July 2015Google Scholar
  13. 13.
    Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: ORB: an efficient alternative to SIFT or SURF. In: 2011 IEEE International Conference on Computer Vision (ICCV), pp. 2564–2571. IEEE (2011)Google Scholar
  14. 14.
    Sünderhauf, N., et al.: Place categorization and semantic mapping on a mobile robot. In: IEEE International Conference on Robotics and Automation (ICRA 2016), Stockholm, Sweden. IEEE, May 2016Google Scholar
  15. 15.
    Vineet, V., et al.: Incremental dense semantic stereo fusion for large-scale semantic scene reconstruction. In: 2015 IEEE International Conference on Robotics and Automation (ICRA), pp. 75–82, May 2015.
  16. 16.
    Whelan, T., Leutenegger, S., Moreno, R.S., Glocker, B., Davison, A.: ElasticFusion: dense SLAM without a pose graph. In: Proceedings of Robotics: Science and Systems, Rome, Italy, July 2015.
  17. 17.
    Zheng, S., et al.: Conditional random fields as recurrent neural networks. In: ICCV, pp. 1529–1537 (2015)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.School of Computer ScienceUniversity of ManchesterManchesterUK

Personalised recommendations