RGB-D Gate-guided edge distillation for indoor semantic segmentation

Zou, Wenbin; Peng, Yingqing; Zhang, Zhengyu; Tian, Shishun; Li, Xia

doi:10.1007/s11042-021-11395-w

RGB-D Gate-guided edge distillation for indoor semantic segmentation

1190: Depth-Related Processing and Applications in Visual Systems
Published: 02 June 2022

Volume 81, pages 35815–35830, (2022)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Wenbin Zou^1,2,
Yingqing Peng^1,2,
Zhengyu Zhang^1,2,
Shishun Tian ORCID: orcid.org/0000-0002-7616-8382^1,2 &
…
Xia Li^1,2

601 Accesses
8 Citations
1 Altmetric
Explore all metrics

Abstract

Fusing the RGB and depth information can significantly improve the performance of semantic segmentation since the depth data represents the geometric information. In this paper, we propose a novel Gate-guided Edge Distillation (GED) based approach to effectively generate edge-aware features by fusing the RGB and depth data, assisting the high-level semantic prediction. The proposed GED consists of two modules: gated fusion and edge distillation. The gated fusion module adaptively learns the relationship between RGB and depth data to generate complementary features. To address the adverse effects caused by redundant information of edge-aware features, edge distillation module enhances the semantic features of the same object while preserving the discrimination of the semantic features belonging to different objects. Besides, by using distilled edge-aware features as detailed guidance, the proposed edge-guided fusion module effectively fuses with semantic features. In addition, the complementary features are leveraged in multi-level feature fusion module to further enhance detailed information. Extensive experiments on the widely used SUN-RGBD and NYU-Dv2 datasets demonstrate that the proposed approach with ResNet-50 achieves state-of-the-art performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

UCTNet: Uncertainty-Aware Cross-Modal Transformer Network for Indoor RGB-D Semantic Segmentation

Multi-scale fusion for RGB-D indoor semantic segmentation

Article Open access 24 November 2022

CLGFormer: Cross-Level-Guided transformer for RGB-D semantic segmentation

Article 09 May 2024

References

Acuna D, Kar A, Fidler S (2019) Devil is in the edges: Learning semantic boundaries from noisy annotations. In: Proc IEEE Conf Comput Vis Pattern Recognit pp. 11075–11083
Chen LC, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proc Eur Conf Comput Vis pp. 801–818. Springer
Chen Z, Cong, R, Xu Q, Huang Q (2020) Dpanet: Depth potentiality-aware gated attention network for rgb-d salient object detection. IEEE Trans Image Process
Cheng Y, Cai R, Li Z, Zhao X, Huang K (2017) Locality-sensitive deconvolution networks with gated fusion for rgb-d indoor semantic segmentation. In: Proc IEEE Conf Comput Vis Pattern Recognit pp. 3029–3037
Couprie C, Farabet C, Najman L, LeCun Y (2013) Indoor semantic segmentation using depth information. arXiv preprint arXiv:1301.3572
Deng L, Yang M, Li T, He Y, Wang C (2019) Rfbnet: deep multimodal networks with residual fusion blocks for rgb-d semantic segmentation. arXiv preprint arXiv:1907.00135
Eigen D, Fergus R (2015) Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: Proc IEEE Int Conf Comput Vis pp. 2650–2658
Geng Q, Zhang H, Qi X, Huang G, Yang R, Zhou Z (2021) Gated path selection network for semantic segmentation. IEEE Trans Image Process 30:2436–2449
Article Google Scholar
Gupta S, Arbelaez P, Malik J (2013) Perceptual organization and recognition of indoor scenes from rgb-d images. In: Proc IEEE Conf Comput Vis Pattern Recognit pp. 564–571
Gupta S, Girshick R, Arbeláez P, Malik J (2014) Learning rich features from rgb-d images for object detection and segmentation. In: Proc Eur Conf Comput Vis pp. 345–360. Springer
Hazirbas C, Ma L, Domokos C, Cremers D (2016) Fusenet: Incorporating depth into semantic segmentation via fusion-based cnn architecture. In: Asian Conf Comput Vis pp. 213–228. Springer
He Y, Chiu WC, Keuper M, Fritz M (2017) Std2p: Rgbd semantic segmentation using spatio-temporal data-driven pooling. In: Proc IEEE Conf Comput Vis Pattern Recognit pp. 4837–4846. IEEE
Hu X, Yang K, Fei L, Wang K (2019) Acnet: Attention based network to exploit complementary features for rgbd semantic segmentation. In: Proc IEEE Int Conf Image Process pp. 1440–1444
Jiang J, Zheng L, Luo F, Zhang Z (2018) Rednet: Residual encoder-decoder network for indoor rgb-d semantic segmentation. arXiv preprint arXiv:1806.01054
Kendall A, Gal Y, Cipolla R (2018) Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: Proc IEEE Conf Comput Vis Pattern Recognit pp. 7482–7491. IEEE
Li W, Gu J, Dong Y, Dong Y, Han J (2020) Indoor scene understanding via rgb-d image segmentation employing depth-based cnn and crfs. Multimedia Tools and Applications 79(47):35475–35489
Article Google Scholar
Li X, Zhao H, Han L, Tong Y, Yang K (2019) Gff: Gated fully fusion for semantic segmentation. arXiv preprint arXiv:1904.01803
Li Z, Gan Y, Liang X, Yu Y, Cheng H, Lin L (2016) Lstm-cf: Unifying context modeling and fusion with lstms for rgb-d scene labeling. In: Proc Eur Conf Comput Vis pp. 541–557. Springer
Lin D, Chen G, Cohen-Or D, Heng PA, Huang, H (2017) Cascaded feature network for semantic segmentation of rgb-d images. In: Proc IEEE Int Conf Comput Vis pp. 1311–1319. IEEE
Lin D, Huang H (2019) Zig-zag network for semantic segmentation of rgb-d images. IEEE Trans Pattern Anal Mach Intell 42(10):2642–2655
Article Google Scholar
Lin G, Milan A, Shen C, Reid I (2017) Refinenet: Multi-path refinement networks for high-resolution semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1925–1934. IEEE
Liu H, Wu W, Wang X, Qian Y (2018) Rgb-d joint modelling with scene geometric information for indoor semantic segmentation. Multimedia Tools and Applications 77(17):22475–22488
Article Google Scholar
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proc IEEE Conf Comput Vis Pattern Recognit pp. 3431–3440. IEEE
Noh H, Hong S, Han B (2015) Learning deconvolution network for semantic segmentation. In: Proc IEEE Int Conf Comput Vis pp. 1520–1528. IEEE
Park SJ, Hong KS, Lee S (2017) Rdfnet: Rgb-d multi-level residual feature fusion for indoor semantic segmentation. In: Proc IEEE Int Conf Comput Vis pp. 4980–4989. IEEE
Qi X, Liao R, Jia J, Fidler S, Urtasun R (2017) 3d graph neural networks for rgbd semantic segmentation. In: Proc IEEE Int Conf Comput Vis pp. 5199–5208. IEEE
Ren X, Bo L, Fox D (2012) Rgb-(d) scene labeling: Features and algorithms. In: Proc IEEE Conf Comput Vis Pattern Recognit pp. 2759–2766. IEEE
Silberman N, Hoiem D, Kohli P, Fergus R (2012) Indoor segmentation and support inference from rgbd images. In: Proc Eur Conf Comput Vis pp. 746–760. Springer
Song S, Lichtenberg SP, Xiao J (2015) Sun rgb-d: A rgb-d scene understanding benchmark suite. In: Proc IEEE Conf Comput Vis Pattern Recognit pp. 567–576. IEEE
Takikawa T, Acuna D, Jampani V, Fidler S (2019) Gated-scnn: Gated shape cnns for semantic segmentation. In: Proc IEEE Int Conf Comput Vis pp. 5229–5238. IEEE
Thøgersen M, Escalera S, Gonzàlez J, Moeslund TB (2016) Segmentation of rgb-d indoor scenes by stacking random forests and conditional random fields. Pattern Recogn Lett 80:208–215
Article Google Scholar
Wang J, Wang Z, Tao D, See S, Wang G (2016) Learning common and specific features for rgb-d semantic segmentation with deconvolutional networks. In: Proc Eur Conf Comput Vis pp. 664–679. Springer
Wang W, Neumann U (2018) Depth-aware cnn for rgb-d segmentation. In: Proc Eur Conf Comput Vis pp. 135–150. IEEE
Xing Y, Wang J, Chen X, Zeng G (2019) Coupling two-stream rgb-d semantic segmentation network by idempotent mappings. In: Proc IEEE Int Conf Image Process pp. 1850–1854. IEEE
Yu J, Lin Z, Yang J, Shen X, Lu X, Huang TS (2019) Free-form image inpainting with gated convolution. In: Proc IEEE Int Conf Comput Vis pp. 4471–4480. IEEE
Yu Z, Liu W, Zou Y, Feng C, Ramalingam S, Kumar B, Kautz J (2018) Simultaneous edge alignment and learning. In: Proc Eur Conf Comput Vis pp. 388–404. Springer
Zheng Y, Gao X (2017) Indoor scene recognition via multi-task metric multi-kernel learning from rgb-d images. Multimed Tools Appl 76(3):4427–4443
Article Google Scholar

Download references

Funding

This work was supported in part by the National Natural Science Foundation of China under grants 62101344, 62171294, 61771321, 61871273, 61872429, in part by the key Project of DEGP under grants 2018KCXTD027, in part by the Natural Science Foundation of Guangdong Province, China under grants 2020A1515010959, in part by Natural Science Foundation of Shenzhen under grants JCYJ20200109105832261, JSGG20180508152022006, JCYJ20190808122409660 and in part by the Interdisciplinary Innovation Team of Shenzhen University.

Author information

Authors and Affiliations

College of Electronics and Information Engineering, Shenzhen University, Shenzhen, 518060, China
Wenbin Zou, Yingqing Peng, Zhengyu Zhang, Shishun Tian & Xia Li
Guangdong Provincial Key Laboratory of Intelligent Information Processing, Shenzhen Key Laboratory of Advanced Machine Learning and Applications, Shenzhen University, Shenzhen, 518060, China
Wenbin Zou, Yingqing Peng, Zhengyu Zhang, Shishun Tian & Xia Li

Authors

Wenbin Zou
View author publications
You can also search for this author in PubMed Google Scholar
Yingqing Peng
View author publications
You can also search for this author in PubMed Google Scholar
Zhengyu Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Shishun Tian
View author publications
You can also search for this author in PubMed Google Scholar
Xia Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shishun Tian.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zou, W., Peng, Y., Zhang, Z. et al. RGB-D Gate-guided edge distillation for indoor semantic segmentation. Multimed Tools Appl 81, 35815–35830 (2022). https://doi.org/10.1007/s11042-021-11395-w

Download citation

Received: 23 September 2020
Revised: 09 June 2021
Accepted: 30 July 2021
Published: 02 June 2022
Issue Date: October 2022
DOI: https://doi.org/10.1007/s11042-021-11395-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

RGB-D Gate-guided edge distillation for indoor semantic segmentation

Abstract

Access this article

Similar content being viewed by others

UCTNet: Uncertainty-Aware Cross-Modal Transformer Network for Indoor RGB-D Semantic Segmentation

Multi-scale fusion for RGB-D indoor semantic segmentation

CLGFormer: Cross-Level-Guided transformer for RGB-D semantic segmentation

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

RGB-D Gate-guided edge distillation for indoor semantic segmentation

Abstract

Access this article

Similar content being viewed by others

UCTNet: Uncertainty-Aware Cross-Modal Transformer Network for Indoor RGB-D Semantic Segmentation

Multi-scale fusion for RGB-D indoor semantic segmentation

CLGFormer: Cross-Level-Guided transformer for RGB-D semantic segmentation

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation