SemanticFusion: Joint Labeling, Tracking and Mapping

  • Tommaso CavallariEmail author
  • Luigi Di Stefano
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9915)


Kick-started by deployment of the well-known KinectFusion, recent research on the task of RGBD-based dense volume reconstruction has focused on improving different shortcomings of the original algorithm. In this paper we tackle two of them: drift in the camera trajectory caused by the accumulation of small per-frame tracking errors and lack of semantic information within the output of the algorithm. Accordingly, we present an extended KinectFusion pipeline which takes into account per-pixel semantic labels gathered from the input frames. By such clues, we extend the memory structure holding the reconstructed environment so to store per-voxel information on the kinds of object likely to appear in each spatial location. We then take such information into account during the camera localization step to increase the accuracy in the estimated camera trajectory. Thus, we realize a SemanticFusion loop whereby per-frame labels help better track the camera and successful tracking enables to consolidate instantaneous semantic observations into a coherent volumetric map.


SLAM Deep learning Semantic segmentation Semantic fusion Semantic camera tracking 



We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Tesla K40 GPU used for this research.

Supplementary material

Supplementary material 1 (mp4 85304 KB)


  1. 1.
    Newcombe, R.A., Davison, A.J., Izadi, S., Kohli, P., Hilliges, O., Shotton, J., Molyneaux, D., Hodges, S., Kim, D., Fitzgibbon, A.: KinectFusion: real-time dense surface mapping and tracking. In: 2011 10th IEEE International Symposium on Mixed and Augmented Reality, pp. 127–136. IEEE, October 2011Google Scholar
  2. 2.
    Nießner, M., Zollhöfer, M., Izadi, S., Stamminger, M.: Real-time 3D reconstruction at scale using voxel hashing. ACM Trans. Graph. 32(6), 1–11 (2013)CrossRefGoogle Scholar
  3. 3.
    Newcombe, R.A., Fox, D., Seitz, S.M.: DynamicFusion: reconstruction and tracking of non-rigid scenes in real-time. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 343–352. IEEE, June 2015Google Scholar
  4. 4.
    Zhou, Q.-Y., Koltun, V.: Depth camera tracking with contour cues. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 632–638. IEEE, June 2015Google Scholar
  5. 5.
    Fioraio, N., Cerri, G., Di Stefano, L.: Towards semantic kinectfusion. In: Petrosino, A. (ed.) ICIAP 2013. LNCS, vol. 8157, pp. 299–308. Springer, Heidelberg (2013). doi: 10.1007/978-3-642-41184-7_31 CrossRefGoogle Scholar
  6. 6.
    Bylow, E., Olsson, C.: Robust camera tracking by combining color and depth measurements. In: 2014 22nd International Conference on Pattern Recognition (ICPR) (2014)Google Scholar
  7. 7.
    Roth, H., Marsette, V.: Moving volume kinectfusion. Proc. Br. Mach. Vis. Conf. 112(1–112), 11 (2012)Google Scholar
  8. 8.
    Whelan, T., Kaess, M., Fallon, M.: Kintinuous: spatially extended kinectfusion. In: Robotics Science and Systems (Workshop on RGB-D: Advanced Reasoning with Depth Cameras) (2012)Google Scholar
  9. 9.
    Chen, J., Bautembach, D., Izadi, S.: Scalable real-time volumetric surface reconstruction. ACM Trans. Graph. 32(4) (2013)Google Scholar
  10. 10.
    Henry, P., Fox, D., Bhowmik, A., Mongia, R.: Patch volumes: segmentation-based consistent mapping with RGB-D cameras. In: 2013 International Conference on 3D Vision, pp. 398–405 (2013)Google Scholar
  11. 11.
    Zhou, Q.Y., Miller, S., Koltun, V.: Elastic fragments for dense scene reconstruction. In: 2013 IEEE International Conference on Computer Vision, pp. 473–480. IEEE, December 2013Google Scholar
  12. 12.
    Zhou, Q.Y., Koltun, V.: Dense scene reconstruction with points of interest. ACM Trans. Graph. 32(4), 112:1–112:8 (2013)zbMATHGoogle Scholar
  13. 13.
    Fioraio, N., Taylor, J., Fitzgibbon, A., Di Stefano, L., Izadi, S.: Large-scale and drift-free surface reconstruction using online subvolume registration. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4475–4483. IEEE, June 2015Google Scholar
  14. 14.
    Dou, M., Taylor, J., Fuchs, H., Fitzgibbon, A., Izadi, S.: 3D scanning deformable objects with a single RGBD sensor. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 493–501. IEEE, June 2015Google Scholar
  15. 15.
    Fioraio, N., Di Stefano, L.: Joint detection, tracking and mapping by semantic bundle adjustment. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1538–1545, June 2013Google Scholar
  16. 16.
    Salas-Moreno, R.F., Newcombe, R.A., Strasdat, H., Kelly, P.H., Davison, A.J.: SLAM++: simultaneous localisation and mapping at the level of objects. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1352–1359, June 2013Google Scholar
  17. 17.
    Valentin, J., Vineet, V., Cheng, M.M., Kim, D., Shotton, J., Kohli, P., Niessner, M., Criminisi, A., Izadi, S., Torr, P.: SemanticPaint: interactive 3D labeling and learning at your fingertips. ACM Trans. Graph. (TOG) (2015)Google Scholar
  18. 18.
    Cavallari, T., Di Stefano, L.: Volume-based semantic labeling with signed distance functions. In: Pacific Rim Symposium on Image and Video Technology (2015)Google Scholar
  19. 19.
    Bylow, E., Sturm, J., Kerl, C., Kahl, F., Cremers, D.: Real-time camera tracking and 3D reconstruction using signed distance functions. In: Robotics: Science and Systems (RSS) (2013)Google Scholar
  20. 20.
    Canelhas, D.R., Stoyanov, T., Lilienthal, A.J.: SDF Tracker: a parallel algorithm for on-line pose estimation and scene reconstruction from depth images. In: IEEE International Conference on Intelligent Robots and Systems, pp. 3671–3676 (2013)Google Scholar
  21. 21.
    Blanco, J.: A tutorial on SE(3) transformation parameterizations and on-manifold optimization. University of Malaga. Technical Report (3) (2010)Google Scholar
  22. 22.
    Sturm, J., Engelhard, N., Endres, F., Burgard, W., Cremers, D.: A benchmark for the evaluation of RGB-D SLAM systems. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 573–580, October 2012Google Scholar
  23. 23.
    Handa, A., Whelan, T., McDonald, J., Davison, A.J.: A benchmark for RGB-D visual odometry, 3D reconstruction and SLAM. In: 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 1524–1531, May 2014Google Scholar
  24. 24.
    Xiao, J., Owens, A., Torralba, A.: SUN3D: a database of big spaces reconstructed using SfM and object labels. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1625–1632, December 2013Google Scholar
  25. 25.
    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)Google Scholar
  26. 26.
    Mottaghi, R., Chen, X., Liu, X., Cho, N.G., Lee, S.W., Fidler, S., Urtasun, R., Yuille, A.: The role of context for object detection and semantic segmentation in the wild. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 891–898 (2014)Google Scholar
  27. 27.
    Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from RGBD images. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7578, pp. 746–760. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-33715-4_54 CrossRefGoogle Scholar
  28. 28.
    Gupta, S., Girshick, R., Arbeláez, P., Malik, J.: Learning rich features from RGB-D images for object detection and segmentation. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 345–360. Springer, Heidelberg (2014). doi: 10.1007/978-3-319-10584-0_23 Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringUniversity of BolognaBolognaItaly

Personalised recommendations