Skip to main content

A 3D Medical Image Segmentation Framework Fusing Convolution and Transformer Features

  • Conference paper
  • First Online:
Intelligent Computing Theories and Application (ICIC 2022)

Abstract

Medical images can be accurately segmented to provide reliable basis for clinical diagnosis and pathology research, and assist doctors to make more accurate diagnosis, as well as deep learning technology can accelerate this process. Convolutional Neural Networks (CNNs) and Transformer have become two mainstream architectures of deep learning in medical image segmentation. However, the Transformer architecture has limited ability to obtain local inductive bias, and the Transformer architecture is at a disadvantage in a small sample data set. Many theories and experiments show that the above problems can be effectively solved by fusing Convolution and Transformer features. In this manuscript, a new U-shaped segmentation model based on Convolution and swin-transformer framework is proposed, which is called CST-UNET. In the encoder part, it combines the advantages of both dilated convolution and Transformer, which can make the model fully obtain semantic inductive bias information and long-term information. At the same time, it has the advantages of fewer parameters and lower Flops. Even if it is trained on a small sample data set, the framework still has strong generalization ability. In addition, on BraTS2021 dataset, the Dice coefficients of ET, TC and WT are 85.46%, 89.38%, 92.35% respectively, and the result of HD95 are 7.95, 5.06 and 4.07 respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bakas, S., Akbari, H., Sotiras, A., Bilello, M., Rozycki, M., Kirby, J.S., et al.: Advancing The Cancer Genome Atlas glioma MRI collections with expert segmentation labels and radiomic features. Sci. Data. 4, 170117 (2017). https://doi.org/10.1038/sdata.2017.117

    Article  Google Scholar 

  2. Menze, B.H., Jakab, A., Bauer, S., Kalpathy-Cramer, J., Farahani, K., Kirby, J., et al.: The multimodal Brain Tumor Image Segmentation Benchmark (BRATS). IEEE Trans. Med. Imaging 34, 1993–2024 (2015). https://doi.org/10.1109/TMI.2014.2377694

    Article  Google Scholar 

  3. Baid, U., Ghodasara, S., Mohan, S., Bilello, M., Calabrese, E., Colak, E., et al.: The RSNA-ASNR-MICCAI BraTS 2021 Benchmark on Brain Tumor Segmentation and Radiogenomic Classification (2021). http://arxiv.org/abs/2107.02314

  4. Wang, W., Chen, C., Ding, M., Li, J., Yu, H., Zha, S.: TransBTS: Multimodal Brain Tumor Segmentation using Transformer. arXiv:2103.04430 [cs] (2021)

  5. Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28

    Chapter  Google Scholar 

  6. Chen, L.-C., Papandreou, G., Schroff, F., Adam, H.: Rethinking Atrous Convolution for Semantic Image Segmentation. arXiv:1706.05587 [cs] (2017)

  7. Long, J., Shelhamer, E., Darrell, T.: Fully Convolutional Networks for Semantic Segmentation. arXiv (2015). https://doi.org/10.48550/arXiv.1411.4038

  8. Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. arXiv (2016). https://doi.org/10.48550/arXiv.1511.00561

  9. Wu, Y., He, K.: Group Normalization. arXiv:1803.08494 [cs] (2018)

  10. Ioffe, S., Szegedy, C.: Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. arXiv:1502.03167 [cs] (2015)

  11. Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning. arXiv:1702.03118 [cs] (2017)

  12. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778. IEEE, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90

  13. Liu, Z., et al.: Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. arXiv:2103.14030 [cs] (2021)

  14. Dosovitskiy, A., et al.: An Image is Worth 16 × 16 Words: Transformers for Image Recognition at Scale. arXiv:2010.11929 [cs] (2021)

  15. Peng, Z., et al.: Conformer: Local Features Coupling Global Representations for Visual Recognition. arXiv:2105.03889 [cs] (2021)

  16. Chen, J., et al.: TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation. arXiv:2102.04306 [cs] (2021)

  17. Hatamizadeh, A., et al.: UNETR: Transformers for 3D Medical Image Segmentation. arXiv:2103.10504 [cs, eess] (2021)

  18. Zhou, H.-Y., Guo, J., Zhang, Y., Yu, L., Wang, L., Yu, Y.: nnFormer: Interleaved Transformer for Volumetric Segmentation. arXiv:2109.03201 [cs] (2022)

  19. Peiris, H., Hayat, M., Chen, Z., Egan, G., Harandi, M.: A Volumetric Transformer for Accurate 3D Tumor Segmentation. arXiv:2111.13300 [cs, eess] (2021)

  20. Wang, Z., Zhang, J., Zhang, X., Chen, P., Wang, B.: Transformer model for functional near-infrared spectroscopy classification. IEEE J. Biomed. Health Inform. 1 (2022). https://doi.org/10.1109/JBHI.2022.3140531

  21. Statistical analysis of multiple significance test methods for differential proteomics. https://doi.org/10.1186/1471-2105-11-S4-P30. Accessed 15 May 2022

  22. Cheng, M.-T., Ma, X.-S., Zhang, J.-Y., Wang, B.: Single photon transport in two waveguides chirally coupled by a quantum emitter. Opt. Express, OE. 24, 19988–19993 (2016). https://doi.org/10.1364/OE.24.019988

  23. Tang, M., Djelouah, A., Perazzi, F., Boykov, Y., Schroers, C.: Normalized Cut Loss for Weakly-supervised CNN Segmentation. http://arxiv.org/abs/1804.01346 (2018)

  24. Azad, R., Fayjie, A.R., Kauffman, C., Ayed, I.B., Pedersoli, M., Dolz, J.: On the Texture Bias for Few-Shot CNN Segmentation (2020). http://arxiv.org/abs/2003.04052

  25. Huo, Y., et al.: Fully automatic liver attenuation estimation combing CNN segmentation and morphological operations. Med. Phys. 46, 3508–3519 (2019). https://doi.org/10.1002/mp.13675

    Article  Google Scholar 

  26. Huang, H., et al.: UNet 3+: A full-scale connected UNet for medical image segmentation. In: ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1055–1059. IEEE, Barcelona, Spain (2020). https://doi.org/10.1109/ICASSP40776.2020.9053405

  27. Zhou, Z., Siddiquee, M.M.R., Tajbakhsh, N., Liang, J.: UNet++: Redesigning Skip Connections to Exploit Multiscale Features in Image Segmentation. http://arxiv.org/abs/1912.05074 (2020)

  28. Zhou, Y., Huang, W., Dong, P., Xia, Y., Wang, S.: D-UNet: a dimension-fusion U shape network for chronic stroke lesion segmentation. IEEE/ACM Trans. Comput. Biol. and Bioinf. 18, 940–950 (2021). https://doi.org/10.1109/TCBB.2019.2939522

    Article  Google Scholar 

  29. Çiçek, Ö., Abdulkadir, A., Lienkamp, S.S., Brox, T., Ronneberger, O.: 3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation. arXiv:1606.06650 [cs] (2016)

  30. Milletari, F., Navab, N., Ahmadi, S.-A.: V-Net: fully convolutional neural networks for volumetric medical image segmentation. In: 2016 Fourth International Conference on 3D Vision (3DV), pp. 565–571 (2016). https://doi.org/10.1109/3DV.2016.79

  31. Vaswani, A., et al.: Attention Is All You Need. arXiv:1706.03762 [cs] (2017)

  32. Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable DETR: Deformable Transformers for End-to-End Object Detection. arXiv:2010.04159 [cs] (2021)

  33. Liu, Z., et al.: Video Swin Transformer. arXiv:2106.13230 [cs] (2021)

  34. Arnab, A., Dehghani, M., Heigold, G., Sun, C., Lučić, M., Schmid, C.: ViViT: A Video Vision Transformer. arXiv:2103.15691 [cs] (2021)

  35. Valanarasu, J.M.J., Oza, P., Hacihaliloglu, I., Patel, V.M.: Medical Transformer: Gated Axial-Attention for Medical Image Segmentation. arXiv:2102.10662 [cs] (2021)

  36. Shen, H., Zhang, Y., Zheng, C., Wang, B., Chen, P.: A cascade graph convolutional network for predicting protein-ligand binding affinity. Int. J. Mol. Sci. 22, 4023 (2021). https://doi.org/10.3390/ijms22084023

    Article  Google Scholar 

  37. Hu, Q., Zhang, J., Chen, P., Wang, B.: Compound identification via deep classification model for electron-ionization mass spectrometry. Int. J. Mass Spectrom. 463, 116540 (2021). https://doi.org/10.1016/j.ijms.2021.116540

    Article  Google Scholar 

  38. Li, J., Su, Z., Geng, J., Yin, Y.: Real-time detection of steel strip surface defects based on improved YOLO detection network. IFAC-PapersOnLine 51, 76–81 (2018). https://doi.org/10.1016/j.ifacol.2018.09.412

    Article  Google Scholar 

  39. Xu, Y., Zhang, Q., Zhang, J., Tao, D.: ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias, vol. 14 (2021)

    Google Scholar 

  40. Tang, Y., et al.: Self-Supervised Pre-Training of Swin Transformers for 3D Medical Image Analysis. arXiv:2111.14791 [cs] (2022)

  41. Sundaresan, V., Griffanti, L., Jenkinson, M.: Brain tumour segmentation using a triplanar ensemble of U-Nets on MR images. In: Crimi, A., Bakas, S. (eds.) BrainLes 2020. LNCS, vol. 12658, pp. 340–353. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-72084-1_31

    Chapter  Google Scholar 

  42. Carreira, J., Zisserman, A.: Quo vadis, action recognition? A new model and the kinetics dataset. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4724–4733. IEEE, Honolulu, HI (2017). https://doi.org/10.1109/CVPR.2017.502

Download references

Acknowledgement

This work was supported by the National Natural Science Foundation of China (Nos. 62172004, 62072002, and 61872004), Educational Commission of Anhui Province (No. KJ2019ZD05).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ziheng Wu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhu, F. et al. (2022). A 3D Medical Image Segmentation Framework Fusing Convolution and Transformer Features. In: Huang, DS., Jo, KH., Jing, J., Premaratne, P., Bevilacqua, V., Hussain, A. (eds) Intelligent Computing Theories and Application. ICIC 2022. Lecture Notes in Computer Science, vol 13393. Springer, Cham. https://doi.org/10.1007/978-3-031-13870-6_63

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-13870-6_63

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-13869-0

  • Online ISBN: 978-3-031-13870-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics