Abstract
With the popularity of artificial intelligence applications in the medical field, U-shaped convolutional neural network (CNN) has garnered significant attention for their efficacy in medical image analysis tasks. However, the intrinsic limitations of convolutional operation, particularly in the receptive field since it is an end-to-end learning method, impede the establishment of long-term semantic feature dependence and holistic context information connection. This results in the edge contour details insensitive during the image segmentation task. To mitigate these shortcomings, Transformer architectures equipped with self-attention mechanism offer a potential alternative for encoding long-term semantic features and capturing global contextual information. Motivated by these insights, this paper proposes a novel U-shaped Transformer architecture, denoted as STU3, specifically engineered for medical image segmentation. Initially, a parallel training paradigm is employed that distinguishes between global fine-grained and local coarse-grained image features, optimizing the feature extraction process. Secondly, to alleviate the restrictions on fine-grained feature fusion due to peer skip connections, we propose a Residual Full-scale Feature Fusion module (RFFF) as the global decoder component. Lastly, a Global-Local Feature Fusion Block (GLFB) is implemented to seamlessly integrate the fine-grained and coarse-grained features, thereby constructing a comprehensive global information dependency network. This ensures a high level of accuracy in medical image segmentation tasks. Experimental evaluations conducted on abdominal and cervical multi-organ CT datasets substantiate the superiority of the proposed STU3 model over most current models, particularly in terms of the Dice Similarity Coefficient evaluation metric.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Razzak, M.I., Naz, S., Zaib, A.: Deep learning for medical image processing: overview, challenges and the future. In: Classification in BioApps: Automation of Decision Making, pp. 323–350 (2018)
Taghanaki, S.A., Abhishek, K., Cohen, J.P., Cohen-Adad, J., Hamarneh, G.: Deep semantic segmentation of natural and medical images: a review. Artif. Intell. Rev. 54(1), 137–178 (2020)
Minaee, S., Boykov, Y., Porikli, F., Plaza, A., Kehtarnavaz, N., Terzopoulos, D.: Image segmentation using deep learning: a survey. CoRR, abs/2001.05566, pp. 1–22 (2020)
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 234–241 (2015). https://doi.org/10.1007/978-3-319-24574-4_284
Zhou, Z., Siddiquee, R., Tajbakhsh, N., et al.: UNet++: redesigning skip connections to exploit multiscale features in image segmentation. IEEE Trans. Med. Imaging 39(6), 1856–1867 (2019)
Huang, H., Lin, L., Tong, R., et al.: Unet 3+: a full-scale connected UNet for medical image segmentation. In: IEEE International Conference on Acoustics, Speech and Signal Processing (2020)
Li, X., Chen, H., Qi, X., Dou, Q., Fu, C.W., Heng, P.-A.: H-Denseunet: hybrid densely connected UNet for liver and tumor segmentation from ct volumes. IEEE Trans. Med. Imaging 37(12), 2663–2674 (2018)
Valanarasu, J.M.J., Sindagi, V.A., Hacihaliloglu, I., et al.: Kiu-net: overcomplete convolutional architectures for biomedical image and volumetric segmentation. IEEE Trans. Med. Imaging 41(4), 965–976 (2021)
Chen, J., et al.: Transunet: transformers make strong encoders for medical image segmentation. CoRR, abs/2102.04306 (2021)
Gu, Z., et al.: CE-net: context encoder network for 2d medical image segmentation. IEEE Trans. Med. Imaging 38(10), 2281–2292 (2019). https://doi.org/10.1109/TMI.2019.2903562
Vaswani, A., Shazeer, N., Parmar, N.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
Yuan, L., et al.: Tokens-to-token vit: training vision transformers from scratch on imagenet. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 558–567 (2021). https://doi.org/10.48550/arXiv.2101.11986
Touvron, H., Cord, M., Douze, M., et al.: Training data-efficient image transformers & distillation through attention. In: International Conference on Machine Learning, pp. 10347–10357. PMLR (2021)
Gou, J., Yu, B., Maybank, S.J., et al.: Knowledge distillation: a survey. Int. J. Comput. Vision 129, 1789–1819 (2021)
Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. CoRR, abs/2103.14030 (2021). https://arxiv.org/abs/2103.14030
Cao, H., et al.: Swin-unet: unet-like pure transformer for medical image segmentation. arXiv preprint arXiv:2105.05537 (2021)
Lin, A., Chen, B., Xu, J., et al.: DS-transunet: Dual swin transformer u-net for medical image segmentation. IEEE Trans. Instrum. Meas. 71, 1–15 (2022)
Atek, S., Mehidi, I., Jabri, D., et al.: SwinT-Unet: hybrid architecture for medical image segmentation based on swin transformer block and dual-scale information. In: 2022 7th International Conference on Image and Signal Processing and their Applications (ISPA), pp. 1–6. IEEE (2022)
Hatamizadeh A, Nath V, Tang Y, et al. Swin UNETR: swin transformers for semantic segmentation of brain Tumors in MRI images. In: Crimi, A., Bakas, S. (eds.) BrainLes 2021. LNCS, vol. 12962, pp. 272–284. Springer, Cham (2021). https://doi.org/10.1007/978-3-031-08999-2_22
Bojesomo, A., Al-Marzouqi, H., Liatsis, P.: Spatiotemporal swin transformer network for short time weather forecasting. In: 1st Workshop on Complex Data Challenges in Earth Observation, 01 November 2021 (2021)
Shaw, P., Uszkoreit, J., Vaswani, A.: Self-attention with relative position representations. arXiv preprint arXiv:1803.02155 (2018)
Wu, K., Peng, H., Chen, M., et al.: Rethinking and improving relative position encoding for vision transformer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10033–10041 (2021)
Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation, arXiv:1802.02611 (2018)
Isensee, F., Jaeger, P.F., Kohl, S.A.A., Petersen, J., Maier-Hein, K.H.: NNU-net: a self-configuring method for deep learning-based biomedical image segmentation. Nat. Methods 18(2), 203–211 (2021)
Hatamizadeh, A., et al.: UNETR: transformers for 3D medical image segmentation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 574–584 (2022)
Dice, L.R.: Measures of the amount of ecologic association between species. Ecology 26(3), 297–302 (1944). https://doi.org/10.2307/1932
Liu, J., Zhang, Y., Chen, J.N., et al.: Clip-driven universal model for organ segmentation and Tumor detection. arXiv preprint arXiv:2301.00785 (2023)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Zheng, W., Li, B., Chen, W. (2024). STU3: Multi-organ CT Medical Image Segmentation Model Based on Transformer and UNet. In: Fang, L., Pei, J., Zhai, G., Wang, R. (eds) Artificial Intelligence. CICAI 2023. Lecture Notes in Computer Science(), vol 14473. Springer, Singapore. https://doi.org/10.1007/978-981-99-8850-1_14
Download citation
DOI: https://doi.org/10.1007/978-981-99-8850-1_14
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8849-5
Online ISBN: 978-981-99-8850-1
eBook Packages: Computer ScienceComputer Science (R0)