Overview of RGBD semantic segmentation based on deep learning

Zhang, Hongyan; Sheng, Victor S.; Xi, Xuefeng; Cui, Zhiming; Rong, Huan

doi:10.1007/s12652-022-03829-6

Overview of RGBD semantic segmentation based on deep learning

Original Research
Published: 07 April 2022

Volume 14, pages 13627–13645, (2023)
Cite this article

Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Hongyan Zhang ORCID: orcid.org/0000-0003-4342-7015¹,
Victor S. Sheng²,
Xuefeng Xi¹,
Zhiming Cui¹ &
…
Huan Rong³

702 Accesses
Explore all metrics

Abstract

Semantic segmentation is one of the basic tasks in computer vision. Its purpose is to achieve pixel-level scene segmentation. With the popularity of depth sensors, combining depth data with RGB images for semantic segmentation can improve the accuracy of semantic segmentation. First, this paper mainly summarizes the fusion of RGB information and depth information and then describes the RGBD semantic segmentation method, evaluation metrics, data set, and comparison of the results on the two mainstream data sets, and then make a prospect of possible future research directions, and finally, a conclusion is made. This part of the work has a certain guiding significance for future research on RGBD semantic segmentation and lays a foundation for later research.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 3

A survey on indoor RGB-D semantic segmentation: from hand-crafted features to deep convolutional neural networks

Article 21 May 2019

Depth-Aware CNN for RGB-D Segmentation

FuseNet: Incorporating Depth into Semantic Segmentation via Fusion-Based CNN Architecture

References

Armeni I, Sax S. Zamir AR, Savarese S (2017) Joint 2d-3d-semantic data for indoor scene understanding. https://doi.org/10.48550/arXiv.1702.01105
Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495
Article Google Scholar
Chang A, Dai A, Funkhouser T, Halber M, Niessner M, Savva M, Zhang Y (2017) Matterport3d: learning from rgb-d data in indoor environments. arXiv preprint arXiv:1709.06158
Chen LZ, Lin Z, Wang Z, Yang YL, Cheng MM (2021a) Spatial information guided convolution for real-time RGBD semantic segmentation. IEEE Trans Image Process 30:2313–2324
Article Google Scholar
Chen X, Lin K Y, Wang J, Wu W, Qian C, Li H, Zeng G (2020, August) Bi-directional cross-modality feature propagation with separation-and-aggregation gate for RGB-D semantic segmentation. In: European conference on computer vision. Springer, Cham, pp 561–577
Chen S, Zhu X, Liu W, He X, Liu J (2021b) Global-local propagation network for RGB-D semantic segmentation. arXiv preprint arXiv:2101.10801
Cheng Y, Cai R, Li Z, Zhao X, Huang, K (2017) Locality-sensitive deconvolution networks with gated fusion for rgb-d indoor semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3029–3037
Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Schiele B (2016) The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3213–3223
Couprie C, Farabet C, Najman L, LeCun, Y (2013) Indoor semantic segmentation using depth information. arXiv preprint arXiv:1301.3572
Dai A, Chang AX, Savva M, Halber M, Funkhouser T, Nießner M (2017) Scannet: richly-annotated 3d reconstructions of indoor scenes. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5828–5839
Deng L, Yang M, Li T, He Y, Wang C (2019) RFBNet: deep multimodal networks with residual fusion blocks for RGB-D semantic segmentation. arXiv preprint arXiv:1907.00135
Gao X, Yu J, Li J (2019, July) RGBD semantic segmentation based on global convolutional network. In: Proceedings of the 2019 4th international conference on robotics, control and automation, pp 192–197
Giannone G, Chidlovskii B (2019) Learning common representation from RGB and depth images. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp 0–0
Gupta S, Arbeláez P, Girshick R, Malik J (2015) Indoor scene understanding with rgb-d images: bottom-up segmentation, object detection and semantic segmentation. Int J Comput Vision 112(2):133–149
Article MathSciNet Google Scholar
Gupta S, Girshick R, Arbeláez P, Malik, J (2014) Learning rich features from RGB-D images for object detection and segmentation. In: European conference on computer vision Springer, Cham, pp 345–360
Hazirbas C, Ma L, Domokos C, Cremers D (2016) Fusenet: incorporating depth into semantic segmentation via fusion-based cnn architecture. In: Asian conference on computer vision. Springer, Cham, pp 213–228
He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916
Article Google Scholar
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
He Y, Chiu WC, Keuper M, Fritz M (2017) Std2p: Rgbd semantic segmentation using spatio-temporal data-driven pooling. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4837–4846
Hinton G, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2(7)
Hu X, Yang K, Fei L, Wang K (2019) Acnet: attention based network to exploit complementary features for rgbd semantic segmentation. In: 2019 IEEE international conference on image processing (ICIP). IEEE, pp 1440–1444
Janoch A, Karayev S, Jia Y, Barron JT, Fritz M, Saenko K, Darrell T (2013) A category-level 3d object dataset: putting the kinect to work. In: Consumer depth cameras for computer vision. Springer, London, pp 141–165
Jia F, Liu J, Tai XC (2021) A regularized convolutional neural network for semantic image segmentation. Anal Appl 19(01):147–165
Article MathSciNet MATH Google Scholar
Jiang J, Zhang Z, Huang Y, Zheng L (2017) Incorporating depth into both cnn and crf for indoor semantic segmentation. In: 2017 8th IEEE international conference on software engineering and service science (ICSESS). IEEE, pp 525–530
Jiang J, Zheng L, Luo F, Zhang Z (2018) Rednet: residual encoder-decoder network for indoor rgb-d semantic segmentation. arXiv preprint arXiv:1806.01054
Jiao J, Wei Y, Jie Z, Shi H, Lau RW, Huang TS (2019) Geometry-aware distillation for indoor semantic segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2869–2878
Kosiorek A (2017) 神经网络中的注意力机制. 机器人产业, 6
Lambert J, Liu Z, Sener O, Hays J, Koltun V (2020) MSeg: a composite dataset for multi-domain semantic segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, p 2879–2888
Li Z, Gan Y, Liang X, Yu Y, Cheng H, Lin L (2016) Lstm-cf: unifying context modeling and fusion with lstms for rgb-d scene labeling. In: European conference on computer vision. Springer, Cham, p 541–557
Li Y, Zhang J, Cheng Y, Huang K, Tan T (2017) Semantics-guided multi-level RGB-D feature fusion for indoor semantic segmentation. In: 2017 IEEE international conference on image processing (ICIP), pp 1262–1266. IEEE.
Lin D, Huang H (2019) Zig-zag network for semantic segmentation of RGB-D images. IEEE Trans Pattern Anal Mach Intell 42(10):2642–2655
Article Google Scholar
Lin X, Sánchez-Escobedo D, Casas JR, Pardàs M (2019) Depth estimation and semantic segmentation from a single RGB image using a hybrid convolutional neural network. Sensors 19(8):1795
Article Google Scholar
Lin D, Chen G, Cohen-Or D, Heng PA, Huang H (2017a) Cascaded feature network for semantic segmentation of RGB-D images. In: Proceedings of the IEEE international conference on computer vision, pp 1311–1319
Lin G, Milan A, Shen C, Reid I (2017b) Refinenet: multi-path refinement networks for high-resolution semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1925–1934
Lin D, Ji Y, Lischinski D, Cohen-Or D, Huang H (2018) Multi-scale context intertwining for semantic segmentation. In: Proceedings of the European conference on computer vision (ECCV), pp 603–619
Liu H, Wu W, Wang X, Qian Y (2018a) RGB-D joint modelling with scene geometric information for indoor semantic segmentation. Multimed Tools Appl 77(17):22475–22488
Article Google Scholar
Liu J, Wang Y, Li Y, Fu J, Li J, Lu H (2018b) Collaborative deconvolutional neural networks for joint depth estimation and semantic segmentation. IEEE Trans Neural Netw Learning Syst 29(11):5655–5666
Article MathSciNet Google Scholar
Liu Y, Chen K, Liu C, Qin Z, Luo Z, Wang J (2019) Structured knowledge distillation for semantic segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 2604–2613
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3431–3440
McCormac J, Handa A, Leutenegger S, Davison AJ (2016) Scenenet rgb-d: 5m photorealistic images of synthetic indoor trajectories with ground truth. arXiv preprint arXiv:1612.05079.
Nakajima Y, Kang B, Saito H, Kitani K (2019) Incremental class discovery for semantic segmentation with RGBD sensing. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 972–981
Park SJ, Hong KS, Lee S (2017) Rdfnet: Rgb-d multi-level residual feature fusion for indoor semantic segmentation. In: Proceedings of the IEEE international conference on computer vision, pp 4980–4989
Qi X, Liao R, Jia J, Fidler S, Urtasun R (2017) 3d graph neural networks for rgbd semantic segmentation. In: Proceedings of the IEEE international conference on computer vision, pp 5199–5208
Schneider L, Jasch M, Fröhlich B, Weber T, Franke U, Pollefeys M, Rätsch M (2017) Multimodal neural networks: Rgb-d for semantic segmentation and object detection. In: Scandinavian conference on image analysis Springer, Cham, pp 98–109
Seichter D, Köhler M, Lewandowski B, Wengefeld T, Gross HM (2021) Efficient rgb-d semantic segmentation for indoor scene analysis. In: 2021 IEEE international conference on robotics and automation (ICRA). IEEE, pp 13525–13531
Shi W, Zhu D, Zhang G, Chen L, Wang L, Li J, Zhang X (2019) Multilevel Cross-Aware RGBD Semantic Segmentation of Indoor Environments. In: 2019 IEEE international conference on cyborg and bionic systems (CBS). IEEE, pp 346–351
Silberman N, Hoiem D, Kohli P, Fergus R (2012) Indoor segmentation and support inference from rgbd images. In: European conference on computer vision, Springer, Berlin, Heidelberg, pp 746–760
Song S, Lichtenberg SP, Xiao J (2015) Sun rgb-d: a rgb-d scene understanding benchmark suite. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 567–576
Su W, Wang Z (2016) Regularized fully convolutional networks for RGB-D semantic segmentation. In: 2016 visual communications and image processing (VCIP). IEEE, pp. 1–4
Su Y, Yuan Y, Jiang Z (2021) Deep feature selection-and-fusion for RGB-D semantic segmentation. In: 2021 IEEE international conference on multimedia and expo (ICME) IEEE, pp 1–6
Sun L, Yang K, Hu X, Hu W, Wang K (2020) Real-time fusion network for RGB-D semantic segmentation incorporating unexpected obstacle detection for road-driving images. IEEE Robot Autom Lett 5(4):5558–5565
Article Google Scholar
Uhrig J, Schneider N, Schneider L, Franke U, Brox T, Geiger A (2017, October) Sparsity invariant cnns. In: 2017 international conference on 3D Vision (3DV) IEEE, pp 11–20
Wang Y, Chen Q, Chen S, Wu J (2020b) Multi-scale convolutional features network for semantic segmentation in indoor scenes. IEEE Access 8:89575–89583
Article Google Scholar
Wang W, Neumann U (2018) Depth-aware cnn for rgb-d segmentation. In: Proceedings of the European conference on computer vision (ECCV), pp 135–150
Wang J, Wang Z, Tao D, See S, Wang G (2016) Learning common and specific features for RGB-D semantic segmentation with deconvolutional networks. In: European conference on computer vision. Springer, Cham, pp 664–679
Wang G, Wang Z, Chen Y, Wang G, Chen J (2020) Indoor scene semantic segmentation based on RGB-D image and convolution neural network. J Phys Conf Ser 1637(1):012138
Article Google Scholar
Xiao J, Owens A, Torralba A (2013) Sun3d: a database of big spaces reconstructed using sfm and object labels. In: Proceedings of the IEEE international conference on computer vision, pp 1625–1632
Xing Y, Wang J, Chen X, Zeng G (2019a) 2.5 D convolution for RGB-D semantic segmentation. In: 2019a IEEE international conference on image processing (ICIP). IEEE, pp 1410–1414
Xing Y, Wang J, Chen X, Zeng G (2019b) Coupling two-stream RGB-D semantic segmentation network by idempotent mappings. In: 2019b IEEE international conference on image processing (ICIP). IEEE, pp 1850–1854
Yue Y, Zhou W, Lei J, Yu L (2021) Two-stage cascaded decoder for semantic segmentation of RGB-D images. IEEE Signal Process Lett 28:1115–1119
Article Google Scholar
Zhang G, Xue JH, Xie P, Yang S, Wang G (2021) Non-local aggregation for RGB-D semantic segmentation. IEEE Signal Process Lett 28:658–662
Article Google Scholar
Zhang Z, Cui Z, Xu C, Jie Z, Li X, Yang J (2018) Joint task-recursive learning for semantic segmentation and depth estimation. In: Proceedings of the European conference on computer vision (ECCV), pp 235–251
Zhang Z, Cui Z, Xu C, Yan Y, Sebe N, Yang J (2019) Pattern-affinitive propagation across depth, surface normal and semantic segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4106–4115
Zhen M, Wang J, Zhou L, Fang T, Quan L (2019) Learning fully dense neural networks for image semantic segmentation. Proc AAAI Conf Artif Intell 33(1):9283–9290
Google Scholar
Zheng Z, Xie D, Chen C, Zhu Z (2020) Multi-resolution cascaded network with depth-similar residual module for real-time semantic segmentation on RGB-D images. In: 2020 IEEE international conference on networking, sensing and control (ICNSC). IEEE, pp 1–6
Zhou L, Xu C, Cui Z, Yang J (2019) KIL: knowledge interactiveness learning for joint depth estimation and semantic segmentation. In: Asian conference on pattern recognition, Springer, Cham, pp 835–848
Zhou H, Qi L, Wan Z, Huang H, Yang X (2020a) RGB-D Co-attention network for semantic segmentation. In: Proceedings of the Asian conference on computer vision
Zhou W, Yuan J, Lei J, Luo T (2020b) TSNet: Three-stream self-attention network for RGB-D indoor semantic segmentation. IEEE Intell Syst 36(4):73–78
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, 215009, Jiangsu, China
Hongyan Zhang, Xuefeng Xi & Zhiming Cui
Department of Computer Science, University of Central Arkansas, Conway, AR, USA
Victor S. Sheng
School of Artifcial Intelligence, Nanjing University of Information Science and Technology, Nanjing, Jiangsu, China
Huan Rong

Authors

Hongyan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Victor S. Sheng
View author publications
You can also search for this author in PubMed Google Scholar
Xuefeng Xi
View author publications
You can also search for this author in PubMed Google Scholar
Zhiming Cui
View author publications
You can also search for this author in PubMed Google Scholar
Huan Rong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Victor S. Sheng or Xuefeng Xi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, H., Sheng, V.S., Xi, X. et al. Overview of RGBD semantic segmentation based on deep learning. J Ambient Intell Human Comput 14, 13627–13645 (2023). https://doi.org/10.1007/s12652-022-03829-6

Download citation

Received: 20 April 2021
Accepted: 10 March 2022
Published: 07 April 2022
Issue Date: October 2023
DOI: https://doi.org/10.1007/s12652-022-03829-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Overview of RGBD semantic segmentation based on deep learning

Abstract

Access this article

Similar content being viewed by others

A survey on indoor RGB-D semantic segmentation: from hand-crafted features to deep convolutional neural networks

Depth-Aware CNN for RGB-D Segmentation

FuseNet: Incorporating Depth into Semantic Segmentation via Fusion-Based CNN Architecture

References

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Overview of RGBD semantic segmentation based on deep learning

Abstract

Access this article

Similar content being viewed by others

A survey on indoor RGB-D semantic segmentation: from hand-crafted features to deep convolutional neural networks

Depth-Aware CNN for RGB-D Segmentation

FuseNet: Incorporating Depth into Semantic Segmentation via Fusion-Based CNN Architecture

References

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation