Skip to main content
Log in

RGB-D Gate-guided edge distillation for indoor semantic segmentation

  • 1190: Depth-Related Processing and Applications in Visual Systems
  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Fusing the RGB and depth information can significantly improve the performance of semantic segmentation since the depth data represents the geometric information. In this paper, we propose a novel Gate-guided Edge Distillation (GED) based approach to effectively generate edge-aware features by fusing the RGB and depth data, assisting the high-level semantic prediction. The proposed GED consists of two modules: gated fusion and edge distillation. The gated fusion module adaptively learns the relationship between RGB and depth data to generate complementary features. To address the adverse effects caused by redundant information of edge-aware features, edge distillation module enhances the semantic features of the same object while preserving the discrimination of the semantic features belonging to different objects. Besides, by using distilled edge-aware features as detailed guidance, the proposed edge-guided fusion module effectively fuses with semantic features. In addition, the complementary features are leveraged in multi-level feature fusion module to further enhance detailed information. Extensive experiments on the widely used SUN-RGBD and NYU-Dv2 datasets demonstrate that the proposed approach with ResNet-50 achieves state-of-the-art performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Acuna D, Kar A, Fidler S (2019) Devil is in the edges: Learning semantic boundaries from noisy annotations. In: Proc IEEE Conf Comput Vis Pattern Recognit pp. 11075–11083

  2. Chen LC, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proc Eur Conf Comput Vis pp. 801–818. Springer

  3. Chen Z, Cong, R, Xu Q, Huang Q (2020) Dpanet: Depth potentiality-aware gated attention network for rgb-d salient object detection. IEEE Trans Image Process

  4. Cheng Y, Cai R, Li Z, Zhao X, Huang K (2017) Locality-sensitive deconvolution networks with gated fusion for rgb-d indoor semantic segmentation. In: Proc IEEE Conf Comput Vis Pattern Recognit pp. 3029–3037

  5. Couprie C, Farabet C, Najman L, LeCun Y (2013) Indoor semantic segmentation using depth information. arXiv preprint arXiv:1301.3572

  6. Deng L, Yang M, Li T, He Y, Wang C (2019) Rfbnet: deep multimodal networks with residual fusion blocks for rgb-d semantic segmentation. arXiv preprint arXiv:1907.00135

  7. Eigen D, Fergus R (2015) Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: Proc IEEE Int Conf Comput Vis pp. 2650–2658

  8. Geng Q, Zhang H, Qi X, Huang G, Yang R, Zhou Z (2021) Gated path selection network for semantic segmentation. IEEE Trans Image Process 30:2436–2449

    Article  Google Scholar 

  9. Gupta S, Arbelaez P, Malik J (2013) Perceptual organization and recognition of indoor scenes from rgb-d images. In: Proc IEEE Conf Comput Vis Pattern Recognit pp. 564–571

  10. Gupta S, Girshick R, Arbeláez P, Malik J (2014) Learning rich features from rgb-d images for object detection and segmentation. In: Proc Eur Conf Comput Vis pp. 345–360. Springer

  11. Hazirbas C, Ma L, Domokos C, Cremers D (2016) Fusenet: Incorporating depth into semantic segmentation via fusion-based cnn architecture. In: Asian Conf Comput Vis pp. 213–228. Springer

  12. He Y, Chiu WC, Keuper M, Fritz M (2017) Std2p: Rgbd semantic segmentation using spatio-temporal data-driven pooling. In: Proc IEEE Conf Comput Vis Pattern Recognit pp. 4837–4846. IEEE

  13. Hu X, Yang K, Fei L, Wang K (2019) Acnet: Attention based network to exploit complementary features for rgbd semantic segmentation. In: Proc IEEE Int Conf Image Process pp. 1440–1444

  14. Jiang J, Zheng L, Luo F, Zhang Z (2018) Rednet: Residual encoder-decoder network for indoor rgb-d semantic segmentation. arXiv preprint arXiv:1806.01054

  15. Kendall A, Gal Y, Cipolla R (2018) Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: Proc IEEE Conf Comput Vis Pattern Recognit pp. 7482–7491. IEEE

  16. Li W, Gu J, Dong Y, Dong Y, Han J (2020) Indoor scene understanding via rgb-d image segmentation employing depth-based cnn and crfs. Multimedia Tools and Applications 79(47):35475–35489

    Article  Google Scholar 

  17. Li X, Zhao H, Han L, Tong Y, Yang K (2019) Gff: Gated fully fusion for semantic segmentation. arXiv preprint arXiv:1904.01803

  18. Li Z, Gan Y, Liang X, Yu Y, Cheng H, Lin L (2016) Lstm-cf: Unifying context modeling and fusion with lstms for rgb-d scene labeling. In: Proc Eur Conf Comput Vis pp. 541–557. Springer

  19. Lin D, Chen G, Cohen-Or D, Heng PA, Huang, H (2017) Cascaded feature network for semantic segmentation of rgb-d images. In: Proc IEEE Int Conf Comput Vis pp. 1311–1319. IEEE

  20. Lin D, Huang H (2019) Zig-zag network for semantic segmentation of rgb-d images. IEEE Trans Pattern Anal Mach Intell 42(10):2642–2655

    Article  Google Scholar 

  21. Lin G, Milan A, Shen C, Reid I (2017) Refinenet: Multi-path refinement networks for high-resolution semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1925–1934. IEEE

  22. Liu H, Wu W, Wang X, Qian Y (2018) Rgb-d joint modelling with scene geometric information for indoor semantic segmentation. Multimedia Tools and Applications 77(17):22475–22488

    Article  Google Scholar 

  23. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proc IEEE Conf Comput Vis Pattern Recognit pp. 3431–3440. IEEE

  24. Noh H, Hong S, Han B (2015) Learning deconvolution network for semantic segmentation. In: Proc IEEE Int Conf Comput Vis pp. 1520–1528. IEEE

  25. Park SJ, Hong KS, Lee S (2017) Rdfnet: Rgb-d multi-level residual feature fusion for indoor semantic segmentation. In: Proc IEEE Int Conf Comput Vis pp. 4980–4989. IEEE

  26. Qi X, Liao R, Jia J, Fidler S, Urtasun R (2017) 3d graph neural networks for rgbd semantic segmentation. In: Proc IEEE Int Conf Comput Vis pp. 5199–5208. IEEE

  27. Ren X, Bo L, Fox D (2012) Rgb-(d) scene labeling: Features and algorithms. In: Proc IEEE Conf Comput Vis Pattern Recognit pp. 2759–2766. IEEE

  28. Silberman N, Hoiem D, Kohli P, Fergus R (2012) Indoor segmentation and support inference from rgbd images. In: Proc Eur Conf Comput Vis pp. 746–760. Springer

  29. Song S, Lichtenberg SP, Xiao J (2015) Sun rgb-d: A rgb-d scene understanding benchmark suite. In: Proc IEEE Conf Comput Vis Pattern Recognit pp. 567–576. IEEE

  30. Takikawa T, Acuna D, Jampani V, Fidler S (2019) Gated-scnn: Gated shape cnns for semantic segmentation. In: Proc IEEE Int Conf Comput Vis pp. 5229–5238. IEEE

  31. Thøgersen M, Escalera S, Gonzàlez J, Moeslund TB (2016) Segmentation of rgb-d indoor scenes by stacking random forests and conditional random fields. Pattern Recogn Lett 80:208–215

    Article  Google Scholar 

  32. Wang J, Wang Z, Tao D, See S, Wang G (2016) Learning common and specific features for rgb-d semantic segmentation with deconvolutional networks. In: Proc Eur Conf Comput Vis pp. 664–679. Springer

  33. Wang W, Neumann U (2018) Depth-aware cnn for rgb-d segmentation. In: Proc Eur Conf Comput Vis pp. 135–150. IEEE

  34. Xing Y, Wang J, Chen X, Zeng G (2019) Coupling two-stream rgb-d semantic segmentation network by idempotent mappings. In: Proc IEEE Int Conf Image Process pp. 1850–1854. IEEE

  35. Yu J, Lin Z, Yang J, Shen X, Lu X, Huang TS (2019) Free-form image inpainting with gated convolution. In: Proc IEEE Int Conf Comput Vis pp. 4471–4480. IEEE

  36. Yu Z, Liu W, Zou Y, Feng C, Ramalingam S, Kumar B, Kautz J (2018) Simultaneous edge alignment and learning. In: Proc Eur Conf Comput Vis pp. 388–404. Springer

  37. Zheng Y, Gao X (2017) Indoor scene recognition via multi-task metric multi-kernel learning from rgb-d images. Multimed Tools Appl 76(3):4427–4443

    Article  Google Scholar 

Download references

Funding

This work was supported in part by the National Natural Science Foundation of China under grants 62101344, 62171294, 61771321, 61871273, 61872429, in part by the key Project of DEGP under grants 2018KCXTD027, in part by the Natural Science Foundation of Guangdong Province, China under grants 2020A1515010959, in part by Natural Science Foundation of Shenzhen under grants JCYJ20200109105832261, JSGG20180508152022006, JCYJ20190808122409660 and in part by the Interdisciplinary Innovation Team of Shenzhen University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shishun Tian.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zou, W., Peng, Y., Zhang, Z. et al. RGB-D Gate-guided edge distillation for indoor semantic segmentation. Multimed Tools Appl 81, 35815–35830 (2022). https://doi.org/10.1007/s11042-021-11395-w

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-021-11395-w

Keywords

Navigation