Hybrid Shunted Transformer embedding UNet for remote sensing image semantic segmentation

Zhou, Huacong; Xiao, Xiangling; Li, Huihui; Liu, Xiaoyong; Liang, Peng

doi:10.1007/s00521-024-09888-4

Hybrid Shunted Transformer embedding UNet for remote sensing image semantic segmentation

Original Article
Published: 18 May 2024

(2024)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Huacong Zhou^1,5^na1,
Xiangling Xiao¹^na1,
Huihui Li ORCID: orcid.org/0000-0003-0463-8178^1,2,
Xiaoyong Liu^3,4 &
…
Peng Liang¹

70 Accesses
Explore all metrics

Abstract

With the development of deep learning, Remote Sensing Image (RSI) semantic segmentation has produced significant advances. However, due to the sparse distribution of the objects and the high similarity between classes, the task of semantic segmentation in RSI is still extremely challenging. In this paper, we propose a novel semantic segmentation framework for RSI called HST-UNet that can overcome the shortcomings of the existing models and extract and recover the global and local features of RSI, which is a hybrid semantic segmentation model with Shunted Transformer as encoder and Multi-Scale Convolutional Attention Network (MSCAN) as decoder. Then, to better fuse the information from the Encoder and the Decoder and alleviate the ambiguity, we design a Learnable Weighted Fusion (LWF) module to effectively connect to the decoder features. Extensive experiments demonstrate that the proposed HST-UNet outperforms the state-of-the-art methods, achieving F1 score/MIoU accuracy of 71.44%/83.00% on the ISPRS Vaihingen dataset and 77.36%/87.09% on ISPRS Potsdam dataset. The code will be available at https://github.com/HC-Zhou/HST-UNet.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A deep learning method for optimizing semantic segmentation accuracy of remote sensing images based on improved UNet

Article Open access 10 May 2023

Remote Sensing Image Semantic Segmentation Based on Fusion of Transformer and Lightweight Deeplabv3+

Intelligent Image Semantic Segmentation: A Review Through Deep Learning Techniques for Remote Sensing Image Analysis

Article 20 January 2022

Data availability

The data that support the findings of this study are available at http://www2.isprs.org/commissions/comm3/wg4/semantic-labeling.html with corresponding permission.

References

Luo H, Chen C, Fang L, Khoshelham K, Shen G (2020) Ms-rrfsegnet: multiscale regional relation feature segmentation network for semantic segmentation of urban scene point clouds. IEEE Trans Geosci Remote Sens 58(12):8301–8315
Article Google Scholar
Neupane B, Horanont T, Aryal J (2021) Deep learning-based semantic segmentation of urban features in satellite images: a review and meta-analysis. Remote Sens 13(4):808
Article Google Scholar
Yuhua C, Wen L, Luc VG (2018) Road: reality oriented adaptation for semantic segmentation of urban scenes. In: Proceedings of the IEEE conference on computer vision and pattern recognition, IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 7892–7901
Ji W, Xiaofan X, Murambadoro D (2015) Understanding urban wetland dynamics: cross-scale detection and analysis of remote sensing. Int J Remote Sens 36(7):1763–1788
Article Google Scholar
Granholm A-H, Lindgren N, Olofsson K, Nyström M, Allard A, Olsson H (2017) Estimating vertical canopy cover using dense image-based point cloud data in four vegetation types in southern sweden. Int J Remote Sens 38(7):1820–1838
Article Google Scholar
Shahbazi M, Théau J, Ménard P (2014) Recent applications of unmanned aerial imagery in natural resource management. GISci Remote Sens 51(4):339–365
Article Google Scholar
Clarke JDA, Gibson D, Apps H (2010) The use of lidar in applied interpretive landform mapping for natural resource management, murray river alluvial plain, australia. Int J Remote Sens 31(23):6275–6296
Article Google Scholar
Weber E, Kane H (2020) Building disaster damage assessment in satellite imagery with multi-temporal fusion. arXiv preprint arXiv:2004.05525
Chen W-J, Li C-C (2002) Rain retrievals using tropical rainfall measuring mission and geostationary meteorological satellite 5 data obtained during the scsmex. Int J Remote Sens 23(12):2425–2448
Article Google Scholar
Kaiming H, Xiangyu Z, Shaoqing R, Jian S (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Tan M, Le Q (2019) Efficientnet: rethinking model scaling for convolutional neural networks. In: International conference on machine learning, PMLR, pp 6105–6114
Liu Z, Mao H, Wu CY, Feichtenhofer C, Darrell T, Xie S (2022) A convnet for the 2020s. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11976–11986
Sun K, Xiao B, Liu D, Wang J (2019) Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5693–5703
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440
Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: Medical Image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5–9, 2015, Proceedings, Part III 18, Springer, pp 234–241
Zhao H, Sh, J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2881–2890
d’Ascoli S, Touvron H, Leavitt ML, Morcos AS, Biroli G, Sagun L (2021) Convit: improving vision transformers with soft convolutional inductive biases. In International Conference on Machine Learning, pp 2286–2296. PMLR
Fukui H, Hirakawa T, Yamashita T, Fujiyoshi H (2019) Attention branch network: learning of attention mechanism for visual explanation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10705–10714
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141
Fu J, Liu J, Tian H, Li Y, Bao Y, Fang Z, Lu H (2019) Dual attention network for scene segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3146–3154
Zhao H, Zhang Y, Liu S, Shi J, Loy CC, Lin D, Jia J (2018) Psanet: point-wise spatial attention network for scene parsing. In Proceedings of the European conference on computer vision (ECCV), pp 267–283
Yuan Y, Huang L, Guo J, Zhang C, Chen X, Wang J (2018) Ocnet: object context network for scene parsing. arXiv preprint arXiv:1809.00916
Guo MH, Lu CZ, Hou Q, Liu Z, Cheng MM, Hu SM (2022) Segnext: rethinking convolutional attention design for semantic segmentation. arXiv preprint arXiv:2209.08575
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S et al (2020) An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929
Zhu X, Su W, Lu L, Li B, Wang X, Dai J (2020) Deformable detr: deformable transformers for end-to-end object detection. arXiv preprint arXiv:2010.04159
Strudel R, Garcia R, Laptev I, Schmid C (2021) Segmenter: transformer for semantic segmentation. In Proceedings of the IEEE/CVF international conference on computer vision, pp 7262–7272
Bertasius G, Wang H, Torresani L (2021) Is space-time attention all you need for video understanding? In: ICML, vol 2, p 4
Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, pp 10012–10022
Ren S, Zhou D, He S, Feng J, Wang X (2022) Shunted self-attention via multi-scale token aggregation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10853–10862
Chen J, Lu Y, Yu Q, Luo X, Adeli E, Wang Y, Lu L, Yuille AL, Zhou Y (2021) Transunet: transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306
He X, Zhou Y, Zhao J, Zhang D, Yao R, Xue Y (2022) Swin transformer embedding unet for remote sensing image semantic segmentation. IEEE Trans Geosci Remote Sens 60:1–15
Article Google Scholar
Xiao X, Lian S, Luo Z, Li S (2018) Weighted res-unet for high-quality retina vessel segmentation. In 2018 9th international conference on information technology in medicine and education (ITME), IEEE, pp 327–331
Zhou L, Zhang C, Wu M (2018) D-linknet: linknet with pretrained encoder and dilated convolution for high resolution satellite imagery road extraction. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 182–186
Wang J, Long X, Chen G, Wu Z, Chen Z, Ding E (2022) U-hrnet: delving into improving semantic representation of high resolution network for dense prediction. arXiv preprint arXiv:2210.07140
Zoph B, Vasudevan V, Shlens J, Le QV (2018) Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8697–8710
Broni-Bediako C, Murata Y, Mormille LH, Atsumi M (2021) Evolutionary nas for aerial image segmentation with gene expression programming of cellular encoding. Neural Comput Appl 1–20
Woo S, Park J, Lee J-Y, Kweon IS (2018) Cbam: convolutional block attention module. In Proceedings of the European conference on computer vision (ECCV), pp 3–19
Zhu H, Zhang M, Zhang X, Zhang L (2021) Two-branch encoding and iterative attention decoding network for semantic segmentation. Neural Comput Appl 33:5151–5166
Article Google Scholar
Zheng S, Lu J, Zhao H, Zhu X, Luo Z, Wang Y, Fu Y, Feng J, Xiang T, Torr Philip HS et al (2021) Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6881–6890
Cao H, Wang Y, Chen J, Jiang D, Zhang X, Tian Q, Wang M (2021) Swin-unet: Unet-like pure transformer for medical image segmentation. arXiv preprint arXiv:2105.05537
Wang W, Xie E, Li X, Fan DP, Song K, Liang D, Lu T, Luo P, Shao L (2021) Pyramid vision transformer: a versatile backbone for dense prediction without convolutions. In Proceedings of the IEEE/CVF international conference on computer vision, pp 568–578
Zongwei Zhou Md, Siddiquee MR, Tajbakhsh N, Liang J (2019) Unet++: redesigning skip connections to exploit multiscale features in image segmentation. IEEE Trans Med Imaging 39(6):1856–1867
Article Google Scholar
Huang H, Lin L, Tong R, Hu H, Zhang Q, Iwamoto Y, Han X, Chen Y-W, Wu, J. (2020) Unet 3+: a full-scale connected unet for medical image segmentation. In ICASSP 2020-2020 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 1055–1059
Mubashar M, Ali H, Grönlund C, Azmat S (2022) R2u++: a multiscale recurrent residual u-net with dense skip connections for medical image segmentation. Neural Comput Appl 34(20):17723–17739
Article Google Scholar
Ibtehaz N, Sohel RM (2020) Multiresunet: rethinking the u-net architecture for multimodal biomedical image segmentation. Neural Netw 121:74–87
Article Google Scholar
Wang H, Cao P, Wang J, Zaiane OR (2022) Uctransnet: rethinking the skip connections in u-net from a channel-wise perspective with transformer. In Proceedings of the AAAI conference on artificial intelligence, vol 36, pp 2441–2449
Maggiori E, Tarabalka Y, Charpiat G, Alliez P (2017) High-resolution aerial image labeling with convolutional neural networks. IEEE Trans Geosci Remote Sens 55(12):7092–7103
Article Google Scholar
Liu Y, Minh Nguyen D, Deligiannis N, Ding W, Munteanu A (2017) Hourglass-shapenetwork based semantic segmentation for high resolution aerial imagery. Remote Sens 9(6):522
Article Google Scholar
Volpi M, Tuia D (2016) Dense semantic labeling of subdecimeter resolution images with convolutional neural networks. IEEE Trans Geosci Remote Sens 55(2):881–893
Article Google Scholar
Lichao M, Yuansheng H, Xiao Xiang Z (2020) Relation matters: relational context-aware fully convolutional network for semantic segmentation of high-resolution aerial images. IEEE Trans Geosci Remote Sens 58(11):7557–7569
Article Google Scholar
Li X, He H, Li X, Li D, Cheng G, Shi J, Lubin W, Yunhai T, Lin Z (2021) Pointflow: flowing semantics through points for aerial image segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4217–4226
Chen LC, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European conference on computer vision (ECCV), pp 801–818
Xiao T, Liu Y, Zhou B, Jiang Y, Sun J (2018) Unified perceptual parsing for scene understanding. In Proceedings of the European conference on computer vision (ECCV), pp 418–434
Bai L, Lin X, Ye Z, Xue D, Yao C, Hui M (2022) Msanlfnet: semantic segmentation network with multiscale attention and nonlocal filters for high-resolution remote sensing images. IEEE Geosci Remote Sens Lett 19:1–5
Google Scholar

Download references

Acknowledgments

This study was supported by National Natural Science Foundation of China (Grant Nos. 62006049, 62172113, and 62072123), Guangdong Basic and Applied Basic Research Foundation (No. 2023A1515010939), Project of Education Department of Guangdong Province (Grant Nos. 2022KTSCX068 and 2020ZDZX3059), The Ministry of education of Humanities and Social Science project (Grant No. 18JDGC012), Guangdong Science and Technology Project (Grant Nos. KTP20210197 and 2017A040403068), and Guangdong Science and Technology Innovation Strategy Special Fund Project (Climbing Plan) (No. pdjh2022b0302).

Author information

Huacong Zhou and Xiangling Xiao have contributed equally to this work.

Authors and Affiliations

School of Computer Science, Guangdong Polytechnic Normal University, Guangzhou, 510630, Guangdong, China
Huacong Zhou, Xiangling Xiao, Huihui Li & Peng Liang
Guangdong Provincial Key Laboratory of Intellectual Property and Big Data, Guangdong Polytechnic Normal University, Guangzhou, 510665, Guangdong, China
Huihui Li
School of Data Science and Engineering, Guangdong Polytechnic Normal University, Guangzhou, 510630, Guangdong, China
Xiaoyong Liu
Academy of Heyuan, Guangdong Polytechnic Normal University, Heyuan, 517099, Guangdong, China
Xiaoyong Liu
The Cyberspace Institute of Advanced Technology, Guangzhou University, Guangdong, 510006, Guangzhou, China
Huacong Zhou

Authors

Huacong Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Xiangling Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Huihui Li
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoyong Liu
View author publications
You can also search for this author in PubMed Google Scholar
Peng Liang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Huacong Zhou performed conceptualization, software, and writing—original draft; Xiangling Xiao performed validation, formal analysis, and writing—review and editing; Huihui Li performed methodology and funding acquisition; Xiaoyong Liu performed supervision and funding acquisition; and Peng Liang performed resources, validation, investigation, and funding acquisition.

Corresponding authors

Correspondence to Huihui Li or Xiaoyong Liu.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhou, H., Xiao, X., Li, H. et al. Hybrid Shunted Transformer embedding UNet for remote sensing image semantic segmentation. Neural Comput & Applic (2024). https://doi.org/10.1007/s00521-024-09888-4

Download citation

Received: 04 March 2023
Accepted: 23 April 2024
Published: 18 May 2024
DOI: https://doi.org/10.1007/s00521-024-09888-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hybrid Shunted Transformer embedding UNet for remote sensing image semantic segmentation

Abstract

Access this article

Similar content being viewed by others

A deep learning method for optimizing semantic segmentation accuracy of remote sensing images based on improved UNet

Remote Sensing Image Semantic Segmentation Based on Fusion of Transformer and Lightweight Deeplabv3+

Intelligent Image Semantic Segmentation: A Review Through Deep Learning Techniques for Remote Sensing Image Analysis

Data availability

References

Acknowledgments

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Hybrid Shunted Transformer embedding UNet for remote sensing image semantic segmentation

Abstract

Access this article

Similar content being viewed by others

A deep learning method for optimizing semantic segmentation accuracy of remote sensing images based on improved UNet

Remote Sensing Image Semantic Segmentation Based on Fusion of Transformer and Lightweight Deeplabv3+

Intelligent Image Semantic Segmentation: A Review Through Deep Learning Techniques for Remote Sensing Image Analysis

Data availability

References

Acknowledgments

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation