Skip to main content
Log in

RGB-D joint modelling with scene geometric information for indoor semantic segmentation

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

This paper focuses on the problem of RGB-D semantic segmentation for indoor scenes. We introduce a novel gravity direction detection method based on vertical lines fitting combined 2D vision information and 3D geometric information to improve the original HHA depth encoding. Then to fuse two-stream networks of deep convolutional networks from RGB and depth encoding, we propose a joint modelling method by learning a weighted summing layer to fuse the prediction results. Finally, to refine the pixel-wise score maps, we adopt fully-connected CRF as a post-processing and propose a pairwise potential function combined normal kernel to explore geometric information. Experimental results show our proposed approach achieves state-of-the-art performance of RGB-D semantic segmentation on public dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Anand A, Koppula HS, Joachims T, Saxena A (2013) Contextually guided semantic labeling and search for three-dimensional point clouds. Int J Robot Res 32(1):19–34

    Article  Google Scholar 

  2. Banica D, Sminchisescu C (2015) Second-order constrained parametric proposals and sequential search-based structured prediction for semantic segmentation in rgb-d images. In: Computer Vision and Pattern Recognition

  3. Bingjie W, Junpeng Z, Chunjie W (2014) Spatial straightness error evaluation based on three-dimensional least squares method. Journal of Beijing University of Aeronautics and Astronautics 40:1477–1480 (in Chinese)

    Google Scholar 

  4. Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2014) Semantic image segmentation with deep convolutional nets and fully connected crfs. Comp Sci 357–361. https://arxiv.org/abs/1412.7062

  5. Couprie C, Farabet C, Najman L, LeCun Y (2013) Indoor semantic segmentation using depth information. In: international conference on learning Representa- tions. Number arXiv preprint arXiv:1301.3572

  6. Deng Z, Todorovic S, Latecki L J (2015) Semantic segmentation of rgbd images with mutex constraints. In: ICCV

  7. Eigen D, Fergus R (2015) Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2650–2658

  8. Farabet C, Couprie C, Najman L, LeCun Y (2013) Learning hierarchical features for scene labeling. IEEE Trans Pattern Anal Mach Intell 35(8):1915–1929

    Article  Google Scholar 

  9. Filliat D, Battesti E, Bazeille S, et al (2012) RGBD object recognition and visual texture classification for indoor semantic mapping. Technologies for Practical Robot Applications (TePRA), 2012 I.E. International Conference on IEEE, pp. 127–132

  10. Gupta S, Arbelaez P, Malik J (2013) Perceptual organization and recognition of indoor scenes from rgb-d images. In: CVPR. 564–571

  11. Gupta S, Girshick R, Arbelaez P, Malik J (2014) Learning rich features from RGB-D images for object detection and segmentation. In: ECCV

  12. He Y, Chiu WC, Keuper M, Fritz M (2017) Std2p: rgbd semantic segmentation using spatio-temporal data-driven pooling. In CVPR, 7158–7167

  13. Hong S, Noh H, Han B (2015) Decoupled deep neural network for semi- supervised semantic segmentation. NIPS 2015

  14. Khan S, Bennamoun M, Sohel F, Togneri R (2014) Geometry driven semantic labeling of indoor scenes. ECCV 2014 8689:679–694

    Google Scholar 

  15. Koppula H S, Anand A, Joachims T, et al (2011) Semantic labeling of 3D point clouds for indoor scenes. International Conference on Neural Information Processing Systems. Curran Associates Inc, pp. 244–252

  16. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In NIPS

  17. Li Z, Gan Y, Liang X, et al (2016) LSTM-CF: Unifying Context Modeling and Fusion with LSTMs for RGB-D Scene Labeling. In: European Conference on Computer Vision. Springer International Publishing, 541–557

  18. Liu F, Lin G, Shen C (2016) Discriminative Training of Deep Fully-connected Continuous CRF with Task-specific Loss. arXiv preprint arXiv:1601.07649

  19. Long J, Shelhamer E, and Darrell T (2015) Fully convolutional networks for semantic segmentation, In CVPR, pp. 3431–3440

  20. Noh H, Hong S, Han B (2015) Learning deconvolution network for semantic segmen- tation. arXiv preprint arXiv:1505.04366

  21. Ren X, Bo L, Fox D (2012) Rgb-(d) scene labeling: features and algorithms. In: CVPR 2759–2766

  22. Shuai B, Zuo Z, Wang B, et al (2016) DAG-recurrent neural networks for scene labeling. In: Computer Vision and Pattern Recognition. IEEE, pp. 3620–3629

  23. Shuai B, Zuo Z, Wang G, Wang B (2016) Scene parsing with integration of parametric and non-parametric models. IEEE Transactions on Image Processing A Publication of the IEEE Signal Processing Society 25(5):2379–2391

    Article  MathSciNet  Google Scholar 

  24. Silberman N, Fergus R (2011) Indoor scene segmentation using a structured light sensor. In: ICCV Workshops 601–608

  25. Silberman N, Hoiem D, Kohli P, Fergus R (2012) Indoor segmentation and support inference from rgbd images. In: ECCV, pp. 746–760

  26. Simonyan K and Zisserman A (2014) Very deep convolu- tional networks for large-scale image recognition. CoRR, abs/1409.1556

  27. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, and Rabinovich A (2014) Going deeper with convolutions. CoRR, abs/1409.4842

  28. Wang J, Wang Z, Tao D, et al (2016) Learning common and specific features for rgb-d semantic segmentation with deconvolutional networks. In: European Conference on Computer Vision. Springer International Publishing, pp. 664–679

Download references

Acknowledgments

This work is supported in part by Beijing Natural Science Foundation: 4142051.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hong Liu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, H., Wu, W., Wang, X. et al. RGB-D joint modelling with scene geometric information for indoor semantic segmentation. Multimed Tools Appl 77, 22475–22488 (2018). https://doi.org/10.1007/s11042-018-6056-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-018-6056-8

Keywords

Navigation