Abstract
Semantic segmentation is one of the basic tasks in computer vision. Its purpose is to achieve pixel-level scene segmentation. With the popularity of depth sensors, combining depth data with RGB images for semantic segmentation can improve the accuracy of semantic segmentation. First, this paper mainly summarizes the fusion of RGB information and depth information and then describes the RGBD semantic segmentation method, evaluation metrics, data set, and comparison of the results on the two mainstream data sets, and then make a prospect of possible future research directions, and finally, a conclusion is made. This part of the work has a certain guiding significance for future research on RGBD semantic segmentation and lays a foundation for later research.
Similar content being viewed by others
References
Armeni I, Sax S. Zamir AR, Savarese S (2017) Joint 2d-3d-semantic data for indoor scene understanding. https://doi.org/10.48550/arXiv.1702.01105
Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495
Chang A, Dai A, Funkhouser T, Halber M, Niessner M, Savva M, Zhang Y (2017) Matterport3d: learning from rgb-d data in indoor environments. arXiv preprint arXiv:1709.06158
Chen LZ, Lin Z, Wang Z, Yang YL, Cheng MM (2021a) Spatial information guided convolution for real-time RGBD semantic segmentation. IEEE Trans Image Process 30:2313–2324
Chen X, Lin K Y, Wang J, Wu W, Qian C, Li H, Zeng G (2020, August) Bi-directional cross-modality feature propagation with separation-and-aggregation gate for RGB-D semantic segmentation. In: European conference on computer vision. Springer, Cham, pp 561–577
Chen S, Zhu X, Liu W, He X, Liu J (2021b) Global-local propagation network for RGB-D semantic segmentation. arXiv preprint arXiv:2101.10801
Cheng Y, Cai R, Li Z, Zhao X, Huang, K (2017) Locality-sensitive deconvolution networks with gated fusion for rgb-d indoor semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3029–3037
Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Schiele B (2016) The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3213–3223
Couprie C, Farabet C, Najman L, LeCun, Y (2013) Indoor semantic segmentation using depth information. arXiv preprint arXiv:1301.3572
Dai A, Chang AX, Savva M, Halber M, Funkhouser T, Nießner M (2017) Scannet: richly-annotated 3d reconstructions of indoor scenes. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5828–5839
Deng L, Yang M, Li T, He Y, Wang C (2019) RFBNet: deep multimodal networks with residual fusion blocks for RGB-D semantic segmentation. arXiv preprint arXiv:1907.00135
Gao X, Yu J, Li J (2019, July) RGBD semantic segmentation based on global convolutional network. In: Proceedings of the 2019 4th international conference on robotics, control and automation, pp 192–197
Giannone G, Chidlovskii B (2019) Learning common representation from RGB and depth images. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp 0–0
Gupta S, Arbeláez P, Girshick R, Malik J (2015) Indoor scene understanding with rgb-d images: bottom-up segmentation, object detection and semantic segmentation. Int J Comput Vision 112(2):133–149
Gupta S, Girshick R, Arbeláez P, Malik, J (2014) Learning rich features from RGB-D images for object detection and segmentation. In: European conference on computer vision Springer, Cham, pp 345–360
Hazirbas C, Ma L, Domokos C, Cremers D (2016) Fusenet: incorporating depth into semantic segmentation via fusion-based cnn architecture. In: Asian conference on computer vision. Springer, Cham, pp 213–228
He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
He Y, Chiu WC, Keuper M, Fritz M (2017) Std2p: Rgbd semantic segmentation using spatio-temporal data-driven pooling. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4837–4846
Hinton G, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2(7)
Hu X, Yang K, Fei L, Wang K (2019) Acnet: attention based network to exploit complementary features for rgbd semantic segmentation. In: 2019 IEEE international conference on image processing (ICIP). IEEE, pp 1440–1444
Janoch A, Karayev S, Jia Y, Barron JT, Fritz M, Saenko K, Darrell T (2013) A category-level 3d object dataset: putting the kinect to work. In: Consumer depth cameras for computer vision. Springer, London, pp 141–165
Jia F, Liu J, Tai XC (2021) A regularized convolutional neural network for semantic image segmentation. Anal Appl 19(01):147–165
Jiang J, Zhang Z, Huang Y, Zheng L (2017) Incorporating depth into both cnn and crf for indoor semantic segmentation. In: 2017 8th IEEE international conference on software engineering and service science (ICSESS). IEEE, pp 525–530
Jiang J, Zheng L, Luo F, Zhang Z (2018) Rednet: residual encoder-decoder network for indoor rgb-d semantic segmentation. arXiv preprint arXiv:1806.01054
Jiao J, Wei Y, Jie Z, Shi H, Lau RW, Huang TS (2019) Geometry-aware distillation for indoor semantic segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2869–2878
Kosiorek A (2017) 神经网络中的注意力机制. 机器人产业, 6
Lambert J, Liu Z, Sener O, Hays J, Koltun V (2020) MSeg: a composite dataset for multi-domain semantic segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, p 2879–2888
Li Z, Gan Y, Liang X, Yu Y, Cheng H, Lin L (2016) Lstm-cf: unifying context modeling and fusion with lstms for rgb-d scene labeling. In: European conference on computer vision. Springer, Cham, p 541–557
Li Y, Zhang J, Cheng Y, Huang K, Tan T (2017) Semantics-guided multi-level RGB-D feature fusion for indoor semantic segmentation. In: 2017 IEEE international conference on image processing (ICIP), pp 1262–1266. IEEE.
Lin D, Huang H (2019) Zig-zag network for semantic segmentation of RGB-D images. IEEE Trans Pattern Anal Mach Intell 42(10):2642–2655
Lin X, Sánchez-Escobedo D, Casas JR, Pardàs M (2019) Depth estimation and semantic segmentation from a single RGB image using a hybrid convolutional neural network. Sensors 19(8):1795
Lin D, Chen G, Cohen-Or D, Heng PA, Huang H (2017a) Cascaded feature network for semantic segmentation of RGB-D images. In: Proceedings of the IEEE international conference on computer vision, pp 1311–1319
Lin G, Milan A, Shen C, Reid I (2017b) Refinenet: multi-path refinement networks for high-resolution semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1925–1934
Lin D, Ji Y, Lischinski D, Cohen-Or D, Huang H (2018) Multi-scale context intertwining for semantic segmentation. In: Proceedings of the European conference on computer vision (ECCV), pp 603–619
Liu H, Wu W, Wang X, Qian Y (2018a) RGB-D joint modelling with scene geometric information for indoor semantic segmentation. Multimed Tools Appl 77(17):22475–22488
Liu J, Wang Y, Li Y, Fu J, Li J, Lu H (2018b) Collaborative deconvolutional neural networks for joint depth estimation and semantic segmentation. IEEE Trans Neural Netw Learning Syst 29(11):5655–5666
Liu Y, Chen K, Liu C, Qin Z, Luo Z, Wang J (2019) Structured knowledge distillation for semantic segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 2604–2613
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3431–3440
McCormac J, Handa A, Leutenegger S, Davison AJ (2016) Scenenet rgb-d: 5m photorealistic images of synthetic indoor trajectories with ground truth. arXiv preprint arXiv:1612.05079.
Nakajima Y, Kang B, Saito H, Kitani K (2019) Incremental class discovery for semantic segmentation with RGBD sensing. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 972–981
Park SJ, Hong KS, Lee S (2017) Rdfnet: Rgb-d multi-level residual feature fusion for indoor semantic segmentation. In: Proceedings of the IEEE international conference on computer vision, pp 4980–4989
Qi X, Liao R, Jia J, Fidler S, Urtasun R (2017) 3d graph neural networks for rgbd semantic segmentation. In: Proceedings of the IEEE international conference on computer vision, pp 5199–5208
Schneider L, Jasch M, Fröhlich B, Weber T, Franke U, Pollefeys M, Rätsch M (2017) Multimodal neural networks: Rgb-d for semantic segmentation and object detection. In: Scandinavian conference on image analysis Springer, Cham, pp 98–109
Seichter D, Köhler M, Lewandowski B, Wengefeld T, Gross HM (2021) Efficient rgb-d semantic segmentation for indoor scene analysis. In: 2021 IEEE international conference on robotics and automation (ICRA). IEEE, pp 13525–13531
Shi W, Zhu D, Zhang G, Chen L, Wang L, Li J, Zhang X (2019) Multilevel Cross-Aware RGBD Semantic Segmentation of Indoor Environments. In: 2019 IEEE international conference on cyborg and bionic systems (CBS). IEEE, pp 346–351
Silberman N, Hoiem D, Kohli P, Fergus R (2012) Indoor segmentation and support inference from rgbd images. In: European conference on computer vision, Springer, Berlin, Heidelberg, pp 746–760
Song S, Lichtenberg SP, Xiao J (2015) Sun rgb-d: a rgb-d scene understanding benchmark suite. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 567–576
Su W, Wang Z (2016) Regularized fully convolutional networks for RGB-D semantic segmentation. In: 2016 visual communications and image processing (VCIP). IEEE, pp. 1–4
Su Y, Yuan Y, Jiang Z (2021) Deep feature selection-and-fusion for RGB-D semantic segmentation. In: 2021 IEEE international conference on multimedia and expo (ICME) IEEE, pp 1–6
Sun L, Yang K, Hu X, Hu W, Wang K (2020) Real-time fusion network for RGB-D semantic segmentation incorporating unexpected obstacle detection for road-driving images. IEEE Robot Autom Lett 5(4):5558–5565
Uhrig J, Schneider N, Schneider L, Franke U, Brox T, Geiger A (2017, October) Sparsity invariant cnns. In: 2017 international conference on 3D Vision (3DV) IEEE, pp 11–20
Wang Y, Chen Q, Chen S, Wu J (2020b) Multi-scale convolutional features network for semantic segmentation in indoor scenes. IEEE Access 8:89575–89583
Wang W, Neumann U (2018) Depth-aware cnn for rgb-d segmentation. In: Proceedings of the European conference on computer vision (ECCV), pp 135–150
Wang J, Wang Z, Tao D, See S, Wang G (2016) Learning common and specific features for RGB-D semantic segmentation with deconvolutional networks. In: European conference on computer vision. Springer, Cham, pp 664–679
Wang G, Wang Z, Chen Y, Wang G, Chen J (2020) Indoor scene semantic segmentation based on RGB-D image and convolution neural network. J Phys Conf Ser 1637(1):012138
Xiao J, Owens A, Torralba A (2013) Sun3d: a database of big spaces reconstructed using sfm and object labels. In: Proceedings of the IEEE international conference on computer vision, pp 1625–1632
Xing Y, Wang J, Chen X, Zeng G (2019a) 2.5 D convolution for RGB-D semantic segmentation. In: 2019a IEEE international conference on image processing (ICIP). IEEE, pp 1410–1414
Xing Y, Wang J, Chen X, Zeng G (2019b) Coupling two-stream RGB-D semantic segmentation network by idempotent mappings. In: 2019b IEEE international conference on image processing (ICIP). IEEE, pp 1850–1854
Yue Y, Zhou W, Lei J, Yu L (2021) Two-stage cascaded decoder for semantic segmentation of RGB-D images. IEEE Signal Process Lett 28:1115–1119
Zhang G, Xue JH, Xie P, Yang S, Wang G (2021) Non-local aggregation for RGB-D semantic segmentation. IEEE Signal Process Lett 28:658–662
Zhang Z, Cui Z, Xu C, Jie Z, Li X, Yang J (2018) Joint task-recursive learning for semantic segmentation and depth estimation. In: Proceedings of the European conference on computer vision (ECCV), pp 235–251
Zhang Z, Cui Z, Xu C, Yan Y, Sebe N, Yang J (2019) Pattern-affinitive propagation across depth, surface normal and semantic segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4106–4115
Zhen M, Wang J, Zhou L, Fang T, Quan L (2019) Learning fully dense neural networks for image semantic segmentation. Proc AAAI Conf Artif Intell 33(1):9283–9290
Zheng Z, Xie D, Chen C, Zhu Z (2020) Multi-resolution cascaded network with depth-similar residual module for real-time semantic segmentation on RGB-D images. In: 2020 IEEE international conference on networking, sensing and control (ICNSC). IEEE, pp 1–6
Zhou L, Xu C, Cui Z, Yang J (2019) KIL: knowledge interactiveness learning for joint depth estimation and semantic segmentation. In: Asian conference on pattern recognition, Springer, Cham, pp 835–848
Zhou H, Qi L, Wan Z, Huang H, Yang X (2020a) RGB-D Co-attention network for semantic segmentation. In: Proceedings of the Asian conference on computer vision
Zhou W, Yuan J, Lei J, Luo T (2020b) TSNet: Three-stream self-attention network for RGB-D indoor semantic segmentation. IEEE Intell Syst 36(4):73–78
Author information
Authors and Affiliations
Corresponding authors
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhang, H., Sheng, V.S., Xi, X. et al. Overview of RGBD semantic segmentation based on deep learning. J Ambient Intell Human Comput 14, 13627–13645 (2023). https://doi.org/10.1007/s12652-022-03829-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12652-022-03829-6