RGB-D indoor semantic segmentation network based on wavelet transform

Fan, Runze; Liu, Yuhong; Jiang, Shiyi; Zhang, Rongfen

doi:10.1007/s12530-022-09479-5

RGB-D indoor semantic segmentation network based on wavelet transform

Original Paper
Published: 19 December 2022

Volume 14, pages 981–991, (2023)
Cite this article

Evolving Systems Aims and scope Submit manuscript

Runze Fan¹,
Yuhong Liu¹^na1,
Shiyi Jiang¹^na1 &
…
Rongfen Zhang¹^na1

338 Accesses
Explore all metrics

Abstract

In computer vision, convolution and pooling operations often lose high-frequency information, and contour details also disappear as the network becomes deeper, especially in image semantic segmentation. For RGB-D image semantic segmentation, all the effective information of RGB and depth images can not be effectively used, while the form of wavelet transform can well preserve the low frequency and high frequency information of the original image. In order to solve the problem of information loss in RGB-D indoor semantic segmentation network, we proposed a RGB-D indoor semantic segmentation network based on wavelet transform. The wavelet transform fusion module is designed to preserve contour details, where discrete wavelet transform blocks are used in place of pooling operations. And a wavelet transform connection module is used to connect contextual information between the encoder and decoder. They can make full use of the complementarity of high and low frequency information to improve the segmentation accuracy of object edge contours. The proposed efficient method is evaluated on the commonly used indoor datasets NYUv2 and SUNRGB-D, and the results show that the proposed method achieves state-of-the-art performance and real-time inference.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-scale fusion for RGB-D indoor semantic segmentation

Article Open access 24 November 2022

An improved deep network-based RGB-D semantic segmentation method for indoor scenes

Article 02 August 2023

Semantic Segmentation of Indoor-Scene RGB-D Images Based on Iterative Contraction and Merging

Availability of data and materials

Publicly available dataset was used in this study. The NYUv2 dataset can be found here: https://cs.nyu.edu/~silberman/datasets/nyu_depth_v2.html. The SUNRGBD dataset can be found here: http://rgbd.cs.princeton.edu/.

References

Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495
Article Google Scholar
Borse S, Cai H, Zhang Y, Porikli F (2021) Hs3: learning with proper task complexity in hierarchically supervised semantic segmentation. arXiv preprint arXiv:2111.02333
Borse S, Wang Y, Zhang Y, Porikli F (2021) Inverseform: a loss function for structured boundary-aware segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5901–5911
Chang M, Guo F, Ji R (2018) Depth-assisted refinenet for indoor semantic segmentation. In: 2018 24th international conference on pattern recognition (ICPR). IEEE, pp 1845–1850
Chen L-Z, Lin Z, Wang Z, Yang Y-L, Cheng M-M (2021) Spatial information guided convolution for real-time rgbd semantic segmentation. IEEE Trans Image Process 30:2313–2324
Article Google Scholar
Couprie C, Farabet C, Najman L, LeCun Y (2013) Indoor semantic segmentation using depth information. arXiv preprint arXiv:1301.3572
Dong G, Yan Y, Shen C, Wang H (2020) Real-time high-performance semantic image segmentation of urban street scenes. IEEE Trans Intell Transp Syst 22(6):3258–3274
Article Google Scholar
Eigen D, Fergus R (2015) Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: Proceedings of the IEEE international conference on computer vision, pp 2650–2658
Gupta S, Girshick R, Arbeláez P, Malik J (2014) Learning rich features from rgb-d images for object detection and segmentation. In: European conference on computer vision. Springer, pp 345–360
Hazirbas C, Ma L, Domokos C, Cremers D (2016) Fusenet: incorporating depth into semantic segmentation via fusion-based cnn architecture. In: Asian conference on computer vision. Springer, pp 213–228
He Y, Chiu W-C, Keuper M, Fritz M (2017) Std2p: Rgbd semantic segmentation using spatio-temporal data-driven pooling. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4837–4846
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861
Hu Y, Chen Z, Lin W (2018) Rgb-d semantic segmentation: a review. In: 2018 IEEE international conference on multimedia and expo workshops (ICMEW). IEEE, pp 1–6
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141
Hu X, Yang K, Fei L, Wang K (2019) Acnet: attention based network to exploit complementary features for rgbd semantic segmentation. In: 2019 IEEE international conference on image processing (ICIP). IEEE, pp 1440–1444
Jiang J, Zheng L, Luo F, Zhang Z (2018) Rednet: Residual encoder-decoder network for indoor rgb-d semantic segmentation. arXiv preprint arXiv:1806.01054
Kaut H, Singh R (2016) A review on image segmentation techniques for future research study. Int J Eng Trends Technol 35(11):504–505
Article Google Scholar
Li Q, Shen L (2020) Wavesnet: Wavelet integrated deep networks for image segmentation. arXiv preprint arXiv:2005.14461
Li Q, Shen L, Guo S, Lai Z (2020) Wavelet integrated cnns for noise-robust image classification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7245–7254
Liu P, Zhang H, Lian W, Zuo W (2019) Multi-level wavelet convolutional neural networks. IEEE Access 7:74973–74985
Article Google Scholar
Li Y, Wang Y, Leng T, Zhijie W (2020) Wavelet u-net for medical image segmentation. In: International conference on artificial neural networks. Springer, pp 800–810
Li Y, Zhang J, Cheng Y, Huang K, Tan T (2017) Semantics-guided multi-level rgb-d feature fusion for indoor semantic segmentation. In: 2017 IEEE international conference on image processing (ICIP). IEEE, pp 1262–1266
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440
Mallat SG (1989) A theory for multiresolution signal decomposition: the wavelet representation. IEEE Trans Pattern Anal Mach Intell 11(7):674–693. https://doi.org/10.1109/34.192463
Article MATH Google Scholar
Pan SJ, Yang Q (2009) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359
Article Google Scholar
Park S-J, Hong K-S, Lee S (2017) Rdfnet: Rgb-d multi-level residual feature fusion for indoor semantic segmentation. In: Proceedings of the IEEE international conference on computer vision, pp 4980–4989
Ramamonjisoa M, Firman M, Watson J, Lepetit V, Turmukhambetov D (2021) Single image depth estimation using wavelet decomposition. arXiv preprint arXiv:2106.02022
Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 234–241
Seichter D, Köhler M, Lewandowski B, Wengefeld T, Gross H-M (2020) Efficient rgb-d semantic segmentation for indoor scene analysis. arXiv preprint arXiv:2011.06961
Silberman N, Hoiem D, Kohli P, Fergus R (2012) Indoor segmentation and support inference from rgbd images. In: European conference on computer vision. Springer, pp 746–760
Song S, Lichtenberg SP, Xiao J (2015) Sun rgb-d: a rgb-d scene understanding benchmark suite. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 567–576
Wu Y, Jiang J, Huang Z, Tian Y (2021) Fpanet: feature pyramid aggregation network for real-time semantic segmentation. Appl Intell 52:3319–3336. https://doi.org/10.1007/s10489-021-02603-z
Article Google Scholar
Xia H, Ma M, Li H, Song S (2021) Mc-net: multi-scale context-attention network for medical ct image segmentation. Appl Intell 1:1–12
Google Scholar
Xing Y, Wang J, Chen X, Zeng G (2019) Coupling two-stream rgb-d semantic segmentation network by idempotent mappings. In: 2019 IEEE international conference on image processing (ICIP). IEEE, pp 1850–1854
Xu K, Qin M, Sun F, Wang Y, Chen Y-K, Ren F (2020) Learning in the frequency domain. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1740–1749
Yuan X, Shi J, Gu L (2021) A review of deep learning methods for semantic segmentation of remote sensing imagery. Expert Syst Appl 169:114417. https://doi.org/10.1016/j.eswa.2020.114417
Article Google Scholar
Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2881–2890
Zhou W, Yuan J, Lei J, Luo T (2021) Tsnet: three-stream self-attention network for rgb-d indoor semantic segmentation. IEEE Intell Syst 36(4):73–78. https://doi.org/10.1109/MIS.2020.2999462
Article Google Scholar

Download references

Funding

This work was supported by Guizhou Provincial Science and Technology Foundation under Grant no. QKHJC-ZK[2021]Key001.

Author information

Yuhong Liu, Shiyi Jiang and Rongfen Zhang contributed equally to this work.

Authors and Affiliations

College of Big Data and Information, Guizhou University, Guiyang, 550025, Guizhou, People’s Republic of China
Runze Fan, Yuhong Liu, Shiyi Jiang & Rongfen Zhang

Authors

Runze Fan
View author publications
You can also search for this author in PubMed Google Scholar
Yuhong Liu
View author publications
You can also search for this author in PubMed Google Scholar
Shiyi Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Rongfen Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed equally to this work. All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Rongfen Zhang.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Fan, R., Liu, Y., Jiang, S. et al. RGB-D indoor semantic segmentation network based on wavelet transform. Evolving Systems 14, 981–991 (2023). https://doi.org/10.1007/s12530-022-09479-5

Download citation

Received: 07 March 2022
Accepted: 03 December 2022
Published: 19 December 2022
Issue Date: December 2023
DOI: https://doi.org/10.1007/s12530-022-09479-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

RGB-D indoor semantic segmentation network based on wavelet transform

Abstract

Access this article

Similar content being viewed by others

Multi-scale fusion for RGB-D indoor semantic segmentation

An improved deep network-based RGB-D semantic segmentation method for indoor scenes

Semantic Segmentation of Indoor-Scene RGB-D Images Based on Iterative Contraction and Merging

Availability of data and materials

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

RGB-D indoor semantic segmentation network based on wavelet transform

Abstract

Access this article

Similar content being viewed by others

Multi-scale fusion for RGB-D indoor semantic segmentation

An improved deep network-based RGB-D semantic segmentation method for indoor scenes

Semantic Segmentation of Indoor-Scene RGB-D Images Based on Iterative Contraction and Merging

Availability of data and materials

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation