Skip to main content

STU3: Multi-organ CT Medical Image Segmentation Model Based on Transformer and UNet

  • Conference paper
  • First Online:
Artificial Intelligence (CICAI 2023)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14473))

Included in the following conference series:

  • 229 Accesses

Abstract

With the popularity of artificial intelligence applications in the medical field, U-shaped convolutional neural network (CNN) has garnered significant attention for their efficacy in medical image analysis tasks. However, the intrinsic limitations of convolutional operation, particularly in the receptive field since it is an end-to-end learning method, impede the establishment of long-term semantic feature dependence and holistic context information connection. This results in the edge contour details insensitive during the image segmentation task. To mitigate these shortcomings, Transformer architectures equipped with self-attention mechanism offer a potential alternative for encoding long-term semantic features and capturing global contextual information. Motivated by these insights, this paper proposes a novel U-shaped Transformer architecture, denoted as STU3, specifically engineered for medical image segmentation. Initially, a parallel training paradigm is employed that distinguishes between global fine-grained and local coarse-grained image features, optimizing the feature extraction process. Secondly, to alleviate the restrictions on fine-grained feature fusion due to peer skip connections, we propose a Residual Full-scale Feature Fusion module (RFFF) as the global decoder component. Lastly, a Global-Local Feature Fusion Block (GLFB) is implemented to seamlessly integrate the fine-grained and coarse-grained features, thereby constructing a comprehensive global information dependency network. This ensures a high level of accuracy in medical image segmentation tasks. Experimental evaluations conducted on abdominal and cervical multi-organ CT datasets substantiate the superiority of the proposed STU3 model over most current models, particularly in terms of the Dice Similarity Coefficient evaluation metric.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Razzak, M.I., Naz, S., Zaib, A.: Deep learning for medical image processing: overview, challenges and the future. In: Classification in BioApps: Automation of Decision Making, pp. 323–350 (2018)

    Google Scholar 

  2. Taghanaki, S.A., Abhishek, K., Cohen, J.P., Cohen-Adad, J., Hamarneh, G.: Deep semantic segmentation of natural and medical images: a review. Artif. Intell. Rev. 54(1), 137–178 (2020)

    Article  Google Scholar 

  3. Minaee, S., Boykov, Y., Porikli, F., Plaza, A., Kehtarnavaz, N., Terzopoulos, D.: Image segmentation using deep learning: a survey. CoRR, abs/2001.05566, pp. 1–22 (2020)

    Google Scholar 

  4. Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 234–241 (2015). https://doi.org/10.1007/978-3-319-24574-4_284

  5. Zhou, Z., Siddiquee, R., Tajbakhsh, N., et al.: UNet++: redesigning skip connections to exploit multiscale features in image segmentation. IEEE Trans. Med. Imaging 39(6), 1856–1867 (2019)

    Article  Google Scholar 

  6. Huang, H., Lin, L., Tong, R., et al.: Unet 3+: a full-scale connected UNet for medical image segmentation. In: IEEE International Conference on Acoustics, Speech and Signal Processing (2020)

    Google Scholar 

  7. Li, X., Chen, H., Qi, X., Dou, Q., Fu, C.W., Heng, P.-A.: H-Denseunet: hybrid densely connected UNet for liver and tumor segmentation from ct volumes. IEEE Trans. Med. Imaging 37(12), 2663–2674 (2018)

    Article  Google Scholar 

  8. Valanarasu, J.M.J., Sindagi, V.A., Hacihaliloglu, I., et al.: Kiu-net: overcomplete convolutional architectures for biomedical image and volumetric segmentation. IEEE Trans. Med. Imaging 41(4), 965–976 (2021)

    Article  Google Scholar 

  9. Chen, J., et al.: Transunet: transformers make strong encoders for medical image segmentation. CoRR, abs/2102.04306 (2021)

    Google Scholar 

  10. Gu, Z., et al.: CE-net: context encoder network for 2d medical image segmentation. IEEE Trans. Med. Imaging 38(10), 2281–2292 (2019). https://doi.org/10.1109/TMI.2019.2903562

    Article  Google Scholar 

  11. Vaswani, A., Shazeer, N., Parmar, N.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)

    Google Scholar 

  12. Yuan, L., et al.: Tokens-to-token vit: training vision transformers from scratch on imagenet. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 558–567 (2021). https://doi.org/10.48550/arXiv.2101.11986

  13. Touvron, H., Cord, M., Douze, M., et al.: Training data-efficient image transformers & distillation through attention. In: International Conference on Machine Learning, pp. 10347–10357. PMLR (2021)

    Google Scholar 

  14. Gou, J., Yu, B., Maybank, S.J., et al.: Knowledge distillation: a survey. Int. J. Comput. Vision 129, 1789–1819 (2021)

    Article  Google Scholar 

  15. Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. CoRR, abs/2103.14030 (2021). https://arxiv.org/abs/2103.14030

  16. Cao, H., et al.: Swin-unet: unet-like pure transformer for medical image segmentation. arXiv preprint arXiv:2105.05537 (2021)

  17. Lin, A., Chen, B., Xu, J., et al.: DS-transunet: Dual swin transformer u-net for medical image segmentation. IEEE Trans. Instrum. Meas. 71, 1–15 (2022)

    Google Scholar 

  18. Atek, S., Mehidi, I., Jabri, D., et al.: SwinT-Unet: hybrid architecture for medical image segmentation based on swin transformer block and dual-scale information. In: 2022 7th International Conference on Image and Signal Processing and their Applications (ISPA), pp. 1–6. IEEE (2022)

    Google Scholar 

  19. Hatamizadeh A, Nath V, Tang Y, et al. Swin UNETR: swin transformers for semantic segmentation of brain Tumors in MRI images. In: Crimi, A., Bakas, S. (eds.) BrainLes 2021. LNCS, vol. 12962, pp. 272–284. Springer, Cham (2021). https://doi.org/10.1007/978-3-031-08999-2_22

  20. Bojesomo, A., Al-Marzouqi, H., Liatsis, P.: Spatiotemporal swin transformer network for short time weather forecasting. In: 1st Workshop on Complex Data Challenges in Earth Observation, 01 November 2021 (2021)

    Google Scholar 

  21. Shaw, P., Uszkoreit, J., Vaswani, A.: Self-attention with relative position representations. arXiv preprint arXiv:1803.02155 (2018)

  22. Wu, K., Peng, H., Chen, M., et al.: Rethinking and improving relative position encoding for vision transformer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10033–10041 (2021)

    Google Scholar 

  23. Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation, arXiv:1802.02611 (2018)

  24. Isensee, F., Jaeger, P.F., Kohl, S.A.A., Petersen, J., Maier-Hein, K.H.: NNU-net: a self-configuring method for deep learning-based biomedical image segmentation. Nat. Methods 18(2), 203–211 (2021)

    Google Scholar 

  25. Hatamizadeh, A., et al.: UNETR: transformers for 3D medical image segmentation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 574–584 (2022)

    Google Scholar 

  26. Dice, L.R.: Measures of the amount of ecologic association between species. Ecology 26(3), 297–302 (1944). https://doi.org/10.2307/1932

    Article  Google Scholar 

  27. Liu, J., Zhang, Y., Chen, J.N., et al.: Clip-driven universal model for organ segmentation and Tumor detection. arXiv preprint arXiv:2301.00785 (2023)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bo Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zheng, W., Li, B., Chen, W. (2024). STU3: Multi-organ CT Medical Image Segmentation Model Based on Transformer and UNet. In: Fang, L., Pei, J., Zhai, G., Wang, R. (eds) Artificial Intelligence. CICAI 2023. Lecture Notes in Computer Science(), vol 14473. Springer, Singapore. https://doi.org/10.1007/978-981-99-8850-1_14

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-8850-1_14

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-8849-5

  • Online ISBN: 978-981-99-8850-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics