STU3: Multi-organ CT Medical Image Segmentation Model Based on Transformer and UNet

Zheng, Wenjin; Li, Bo; Chen, Wanyi

doi:10.1007/978-981-99-8850-1_14

Wenjin Zheng¹¹,
Bo Li¹¹ &
Wanyi Chen¹²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14473))

Included in the following conference series:

CAAI International Conference on Artificial Intelligence

229 Accesses

Abstract

With the popularity of artificial intelligence applications in the medical field, U-shaped convolutional neural network (CNN) has garnered significant attention for their efficacy in medical image analysis tasks. However, the intrinsic limitations of convolutional operation, particularly in the receptive field since it is an end-to-end learning method, impede the establishment of long-term semantic feature dependence and holistic context information connection. This results in the edge contour details insensitive during the image segmentation task. To mitigate these shortcomings, Transformer architectures equipped with self-attention mechanism offer a potential alternative for encoding long-term semantic features and capturing global contextual information. Motivated by these insights, this paper proposes a novel U-shaped Transformer architecture, denoted as STU3, specifically engineered for medical image segmentation. Initially, a parallel training paradigm is employed that distinguishes between global fine-grained and local coarse-grained image features, optimizing the feature extraction process. Secondly, to alleviate the restrictions on fine-grained feature fusion due to peer skip connections, we propose a Residual Full-scale Feature Fusion module (RFFF) as the global decoder component. Lastly, a Global-Local Feature Fusion Block (GLFB) is implemented to seamlessly integrate the fine-grained and coarse-grained features, thereby constructing a comprehensive global information dependency network. This ensures a high level of accuracy in medical image segmentation tasks. Experimental evaluations conducted on abdominal and cervical multi-organ CT datasets substantiate the superiority of the proposed STU3 model over most current models, particularly in terms of the Dice Similarity Coefficient evaluation metric.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Razzak, M.I., Naz, S., Zaib, A.: Deep learning for medical image processing: overview, challenges and the future. In: Classification in BioApps: Automation of Decision Making, pp. 323–350 (2018)
Google Scholar
Taghanaki, S.A., Abhishek, K., Cohen, J.P., Cohen-Adad, J., Hamarneh, G.: Deep semantic segmentation of natural and medical images: a review. Artif. Intell. Rev. 54(1), 137–178 (2020)
Article Google Scholar
Minaee, S., Boykov, Y., Porikli, F., Plaza, A., Kehtarnavaz, N., Terzopoulos, D.: Image segmentation using deep learning: a survey. CoRR, abs/2001.05566, pp. 1–22 (2020)
Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 234–241 (2015). https://doi.org/10.1007/978-3-319-24574-4_284
Zhou, Z., Siddiquee, R., Tajbakhsh, N., et al.: UNet++: redesigning skip connections to exploit multiscale features in image segmentation. IEEE Trans. Med. Imaging 39(6), 1856–1867 (2019)
Article Google Scholar
Huang, H., Lin, L., Tong, R., et al.: Unet 3+: a full-scale connected UNet for medical image segmentation. In: IEEE International Conference on Acoustics, Speech and Signal Processing (2020)
Google Scholar
Li, X., Chen, H., Qi, X., Dou, Q., Fu, C.W., Heng, P.-A.: H-Denseunet: hybrid densely connected UNet for liver and tumor segmentation from ct volumes. IEEE Trans. Med. Imaging 37(12), 2663–2674 (2018)
Article Google Scholar
Valanarasu, J.M.J., Sindagi, V.A., Hacihaliloglu, I., et al.: Kiu-net: overcomplete convolutional architectures for biomedical image and volumetric segmentation. IEEE Trans. Med. Imaging 41(4), 965–976 (2021)
Article Google Scholar
Chen, J., et al.: Transunet: transformers make strong encoders for medical image segmentation. CoRR, abs/2102.04306 (2021)
Google Scholar
Gu, Z., et al.: CE-net: context encoder network for 2d medical image segmentation. IEEE Trans. Med. Imaging 38(10), 2281–2292 (2019). https://doi.org/10.1109/TMI.2019.2903562
Article Google Scholar
Vaswani, A., Shazeer, N., Parmar, N.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
Google Scholar
Yuan, L., et al.: Tokens-to-token vit: training vision transformers from scratch on imagenet. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 558–567 (2021). https://doi.org/10.48550/arXiv.2101.11986
Touvron, H., Cord, M., Douze, M., et al.: Training data-efficient image transformers & distillation through attention. In: International Conference on Machine Learning, pp. 10347–10357. PMLR (2021)
Google Scholar
Gou, J., Yu, B., Maybank, S.J., et al.: Knowledge distillation: a survey. Int. J. Comput. Vision 129, 1789–1819 (2021)
Article Google Scholar
Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. CoRR, abs/2103.14030 (2021). https://arxiv.org/abs/2103.14030
Cao, H., et al.: Swin-unet: unet-like pure transformer for medical image segmentation. arXiv preprint arXiv:2105.05537 (2021)
Lin, A., Chen, B., Xu, J., et al.: DS-transunet: Dual swin transformer u-net for medical image segmentation. IEEE Trans. Instrum. Meas. 71, 1–15 (2022)
Google Scholar
Atek, S., Mehidi, I., Jabri, D., et al.: SwinT-Unet: hybrid architecture for medical image segmentation based on swin transformer block and dual-scale information. In: 2022 7th International Conference on Image and Signal Processing and their Applications (ISPA), pp. 1–6. IEEE (2022)
Google Scholar
Hatamizadeh A, Nath V, Tang Y, et al. Swin UNETR: swin transformers for semantic segmentation of brain Tumors in MRI images. In: Crimi, A., Bakas, S. (eds.) BrainLes 2021. LNCS, vol. 12962, pp. 272–284. Springer, Cham (2021). https://doi.org/10.1007/978-3-031-08999-2_22
Bojesomo, A., Al-Marzouqi, H., Liatsis, P.: Spatiotemporal swin transformer network for short time weather forecasting. In: 1st Workshop on Complex Data Challenges in Earth Observation, 01 November 2021 (2021)
Google Scholar
Shaw, P., Uszkoreit, J., Vaswani, A.: Self-attention with relative position representations. arXiv preprint arXiv:1803.02155 (2018)
Wu, K., Peng, H., Chen, M., et al.: Rethinking and improving relative position encoding for vision transformer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10033–10041 (2021)
Google Scholar
Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation, arXiv:1802.02611 (2018)
Isensee, F., Jaeger, P.F., Kohl, S.A.A., Petersen, J., Maier-Hein, K.H.: NNU-net: a self-configuring method for deep learning-based biomedical image segmentation. Nat. Methods 18(2), 203–211 (2021)
Google Scholar
Hatamizadeh, A., et al.: UNETR: transformers for 3D medical image segmentation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 574–584 (2022)
Google Scholar
Dice, L.R.: Measures of the amount of ecologic association between species. Ecology 26(3), 297–302 (1944). https://doi.org/10.2307/1932
Article Google Scholar
Liu, J., Zhang, Y., Chen, J.N., et al.: Clip-driven universal model for organ segmentation and Tumor detection. arXiv preprint arXiv:2301.00785 (2023)

Download references

Author information

Authors and Affiliations

College of Computer Science and Engineering, Chongqing University of Technology, Chongqing, 400054, China
Wenjin Zheng & Bo Li
College of Science, Chongqing University of Technology, Chongqing, 400054, China
Wanyi Chen

Authors

Wenjin Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Bo Li
View author publications
You can also search for this author in PubMed Google Scholar
Wanyi Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bo Li .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Lu Fang
Duke University, Durham, NC, USA
Jian Pei
Shanghai Jiao Tong Univeristy, Shanghai, China
Guangtao Zhai
Chinese Academy of Sciences, Beijing, China
Ruiping Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zheng, W., Li, B., Chen, W. (2024). STU3: Multi-organ CT Medical Image Segmentation Model Based on Transformer and UNet. In: Fang, L., Pei, J., Zhai, G., Wang, R. (eds) Artificial Intelligence. CICAI 2023. Lecture Notes in Computer Science(), vol 14473. Springer, Singapore. https://doi.org/10.1007/978-981-99-8850-1_14

Download citation

DOI: https://doi.org/10.1007/978-981-99-8850-1_14
Published: 04 February 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8849-5
Online ISBN: 978-981-99-8850-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics