Skip to main content

LeViT-UNet: Make Faster Encoders with Transformer for Medical Image Segmentation

  • Conference paper
  • First Online:
Pattern Recognition and Computer Vision (PRCV 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14432))

Included in the following conference series:

Abstract

Medical image segmentation plays an essential role in developing computer-assisted diagnosis and treatment systems, yet it still faces numerous challenges. In the past few years, Convolutional Neural Networks (CNNs) have been successfully applied to the task of medical image segmentation. Regrettably, due to the locality of convolution operations, these CNN-based architectures have their limitations in learning global context information in images, which might be crucial to the success of medical image segmentation. Meanwhile, the vision Transformer (ViT) architectures own the remarkable ability to extract long-range semantic features with the shortcoming of their computation complexity. To make medical image segmentation more efficient and accurate, we present a novel light-weight architecture named LeViT-UNet, which integrates multi-stage Transformer blocks in the encoder via LeViT, aiming to explore the effectiveness of fusion between local and global features together. Our experiments on two challenging segmentation benchmarks indicate that the proposed LeViT-UNet achieved competitive performance compared with various state-of-the-art methods in terms of efficiency and accuracy, suggesting that LeViT can be a faster feature encoder for medical images segmentation. LeViT-UNet-384, for instance, achieves Dice similarity coefficient (DSC) of 78.53% and 90.32% with a segmentation speed of 85 frames per second (FPS) in the Synapse and ACDC datasets, respectively. Therefore, the proposed architecture could be beneficial for prospective clinic trials conducted by the radiologists. Our source codes are publicly available at https://github.com/apple1986/LeViT_UNet.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.synapse.org/#!Synapse:syn3193805/wiki/217789.

  2. 2.

    https://www.creatis.insa-lyon.fr/Challenge/acdc/.

References

  1. Shelhamer, E., Long, J., Darrell, T.: Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 640–651 (2017)

    Article  Google Scholar 

  2. Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28

    Chapter  Google Scholar 

  3. Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: semantic image segmentation with deep convolutional nets, Atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2018)

    Article  Google Scholar 

  4. Cheng, F., et al.: Learning directional feature maps for cardiac MRI segmentation. In: Martel, A.L., et al. (eds.) MICCAI 2020. LNCS, vol. 12264, pp. 108–117. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59719-1_11

    Chapter  Google Scholar 

  5. Jin, Q., Meng, Z., Sun, C., Cui, H., Ran, S.: RA-UNet: a hybrid deep attention-aware network to extract liver and tumor in CT scans. Front. Bioeng. Biotechnol. 8, 605132 (2020)

    Article  Google Scholar 

  6. Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE (2017)

    Google Scholar 

  7. Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE (2019)

    Google Scholar 

  8. Oktay, O., et al.: Attention U-Net: learning where to look for the pancreas (2018)

    Google Scholar 

  9. Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale (2020)

    Google Scholar 

  10. Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., Jegou, H.: Training data-efficient image transformers & distillation through attention. In: Meila, M., Zhang, T. (eds.) Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pp. 10347–10357. PMLR (2021)

    Google Scholar 

  11. Graham, B., et al.: LeViT: a vision transformer in ConvNet’s clothing for faster inference. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE (2021)

    Google Scholar 

  12. Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE (2021)

    Google Scholar 

  13. Cao, H., et al.: Swin-Unet: Unet-like pure transformer for medical image segmentation (2021)

    Google Scholar 

  14. Chen, J., et al.: Transformers make strong encoders for medical image segmentation. TransUNet (2021)

    Google Scholar 

  15. Xiao, X., Lian, S., Luo, Z., Li, S.: Weighted res-UNet for high-quality retina vessel segmentation. In: 2018 9th International Conference on Information Technology in Medicine and Education (ITME). IEEE (2018)

    Google Scholar 

  16. Zongwei Zhou, Md., Siddiquee, M.R., Tajbakhsh, N., Liang, J.: UNet++: redesigning skip connections to exploit multiscale features in image segmentation. IEEE Trans. Med. Imaging 39(6), 1856–1867 (2020)

    Article  Google Scholar 

  17. Milletari, F., Navab, N., Ahmadi, S.A.: V-Net: fully convolutional neural networks for volumetric medical image segmentation. In: 2016 Fourth International Conference on 3D Vision (3DV). IEEE (2016)

    Google Scholar 

  18. Isensee, F., Jaeger, P.F., Kohl, S.A., Petersen, J., Maier-Hein, K.H.: nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nat. Methods 18(2), 203–211 (2021)

    Article  Google Scholar 

  19. Jie, H., Shen, L., Albanie, S., Sun, G., Enhua, W.: Squeeze-and-Excitation networks. IEEE Trans. Pattern Anal. Mach. Intell. 42(8), 2011–2023 (2020)

    Article  Google Scholar 

  20. Fu, et al.: Dual attention network for scene segmentation. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE (2019)

    Google Scholar 

  21. Wu, T., Tang, S., Zhang, R., Cao, J., Zhang, Y.: CGNet: a light-weight context guided network for semantic segmentation. IEEE Trans. Image Process. 30, 1169–1179 (2021)

    Article  Google Scholar 

  22. Poudel, R.P., Bonde, U., Liwicki, S., Zach, C.: ContextNet: exploring context and detail for semantic segmentation in real-time. In: British Machine Vision Conference 2018, BMVC 2018, Newcastle, UK, September 3–6 2018, p. 146. BMVA Press (2018)

    Google Scholar 

  23. Li, G., Yun, I., Kim, J., Kim, J.: DabNet: depth-wise asymmetric bottleneck for real-time semantic segmentation. In: 30th British Machine Vision Conference 2019, BMVC 2019, Cardiff, UK, September 9–12 2019, pp. 259. BMVA Press (2019)

    Google Scholar 

  24. Lo, S.Y., Hang, H.M., Chan, S.W., Lin, J.J.: Efficient dense modules of asymmetric convolution for real-time semantic segmentation. In: Xu, C., Kankanhalli, M.S., Aizawa, K., Jiang, S., Zimmermann, R., Cheng, W.-H. (eds.) MMAsia ’19, ACM Multimedia Asia, Beijing, China, December 16–18 2019, pp. 1– 6. ACM (2019)

    Google Scholar 

  25. Paszke, A., Chaurasia, A., Kim, S., Culurciello, E.: ENet: a deep neural network architecture for real-time semantic segmentation. CoRR, abs/1606.02147 (2016)

    Google Scholar 

  26. Liu, M., Yin, H.: Feature pyramid encoding network for real-time semantic segmentation. In: 30th British Machine Vision Conference 2019, BMVC 2019, Cardiff, UK, September 9–12 2019, pp. 260. BMVA Press (2019)

    Google Scholar 

  27. Zhang, X., Chen, Z., Wu, Q.J., Cai, L., Lu, D., Li, X.: Fast semantic segmentation for scene perception. IEEE Trans. Industr. Inf. 15(2), 1183–1192 (2019)

    Article  Google Scholar 

  28. Treml, M., Arjona-Medina, J., Unterthiner, T., Durgesh, R., Hochreiter, S.: Speeding up semantic segmentation for autonomous driving. In: NIPS 2016 Workshop - MLITS (2016)

    Google Scholar 

  29. Rudra P. K. Poudel, Stephan Liwicki, and Roberto Cipolla. Fast-SCNN: fast semantic segmentation network. In: 30th British Machine Vision Conference 2019, BMVC 2019, Cardiff, UK, September 9–12 2019, p. 289. BMVA Press (2019)

    Google Scholar 

Download references

Acknowledgments

This work is supported by the Guangdong Provincial Key Laboratory of Human Digital Twin (No. 2022B1212010004), and the Hubei Key Laboratory of Intelligent Robot in Wuhan Institute of Technology (No. HBIRL202202 and No. HBIRL202206).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xinglong Wu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Xu, G., Zhang, X., He, X., Wu, X. (2024). LeViT-UNet: Make Faster Encoders with Transformer for Medical Image Segmentation. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14432. Springer, Singapore. https://doi.org/10.1007/978-981-99-8543-2_4

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-8543-2_4

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-8542-5

  • Online ISBN: 978-981-99-8543-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics