LeViT-UNet: Make Faster Encoders with Transformer for Medical Image Segmentation

Xu, Guoping; Zhang, Xuan; He, Xinwei; Wu, Xinglong

doi:10.1007/978-981-99-8543-2_4

Guoping Xu¹⁵,
Xuan Zhang¹⁵,
Xinwei He¹⁶ &
…
Xinglong Wu¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14432))

Included in the following conference series:

Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

929 Accesses
24 Citations

Abstract

Medical image segmentation plays an essential role in developing computer-assisted diagnosis and treatment systems, yet it still faces numerous challenges. In the past few years, Convolutional Neural Networks (CNNs) have been successfully applied to the task of medical image segmentation. Regrettably, due to the locality of convolution operations, these CNN-based architectures have their limitations in learning global context information in images, which might be crucial to the success of medical image segmentation. Meanwhile, the vision Transformer (ViT) architectures own the remarkable ability to extract long-range semantic features with the shortcoming of their computation complexity. To make medical image segmentation more efficient and accurate, we present a novel light-weight architecture named LeViT-UNet, which integrates multi-stage Transformer blocks in the encoder via LeViT, aiming to explore the effectiveness of fusion between local and global features together. Our experiments on two challenging segmentation benchmarks indicate that the proposed LeViT-UNet achieved competitive performance compared with various state-of-the-art methods in terms of efficiency and accuracy, suggesting that LeViT can be a faster feature encoder for medical images segmentation. LeViT-UNet-384, for instance, achieves Dice similarity coefficient (DSC) of 78.53% and 90.32% with a segmentation speed of 85 frames per second (FPS) in the Synapse and ACDC datasets, respectively. Therefore, the proposed architecture could be beneficial for prospective clinic trials conducted by the radiologists. Our source codes are publicly available at https://github.com/apple1986/LeViT_UNet.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Shelhamer, E., Long, J., Darrell, T.: Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 640–651 (2017)
Article Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: semantic image segmentation with deep convolutional nets, Atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2018)
Article Google Scholar
Cheng, F., et al.: Learning directional feature maps for cardiac MRI segmentation. In: Martel, A.L., et al. (eds.) MICCAI 2020. LNCS, vol. 12264, pp. 108–117. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59719-1_11
Chapter Google Scholar
Jin, Q., Meng, Z., Sun, C., Cui, H., Ran, S.: RA-UNet: a hybrid deep attention-aware network to extract liver and tumor in CT scans. Front. Bioeng. Biotechnol. 8, 605132 (2020)
Article Google Scholar
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE (2017)
Google Scholar
Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE (2019)
Google Scholar
Oktay, O., et al.: Attention U-Net: learning where to look for the pancreas (2018)
Google Scholar
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale (2020)
Google Scholar
Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., Jegou, H.: Training data-efficient image transformers & distillation through attention. In: Meila, M., Zhang, T. (eds.) Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pp. 10347–10357. PMLR (2021)
Google Scholar
Graham, B., et al.: LeViT: a vision transformer in ConvNet’s clothing for faster inference. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE (2021)
Google Scholar
Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE (2021)
Google Scholar
Cao, H., et al.: Swin-Unet: Unet-like pure transformer for medical image segmentation (2021)
Google Scholar
Chen, J., et al.: Transformers make strong encoders for medical image segmentation. TransUNet (2021)
Google Scholar
Xiao, X., Lian, S., Luo, Z., Li, S.: Weighted res-UNet for high-quality retina vessel segmentation. In: 2018 9th International Conference on Information Technology in Medicine and Education (ITME). IEEE (2018)
Google Scholar
Zongwei Zhou, Md., Siddiquee, M.R., Tajbakhsh, N., Liang, J.: UNet++: redesigning skip connections to exploit multiscale features in image segmentation. IEEE Trans. Med. Imaging 39(6), 1856–1867 (2020)
Article Google Scholar
Milletari, F., Navab, N., Ahmadi, S.A.: V-Net: fully convolutional neural networks for volumetric medical image segmentation. In: 2016 Fourth International Conference on 3D Vision (3DV). IEEE (2016)
Google Scholar
Isensee, F., Jaeger, P.F., Kohl, S.A., Petersen, J., Maier-Hein, K.H.: nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nat. Methods 18(2), 203–211 (2021)
Article Google Scholar
Jie, H., Shen, L., Albanie, S., Sun, G., Enhua, W.: Squeeze-and-Excitation networks. IEEE Trans. Pattern Anal. Mach. Intell. 42(8), 2011–2023 (2020)
Article Google Scholar
Fu, et al.: Dual attention network for scene segmentation. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE (2019)
Google Scholar
Wu, T., Tang, S., Zhang, R., Cao, J., Zhang, Y.: CGNet: a light-weight context guided network for semantic segmentation. IEEE Trans. Image Process. 30, 1169–1179 (2021)
Article Google Scholar
Poudel, R.P., Bonde, U., Liwicki, S., Zach, C.: ContextNet: exploring context and detail for semantic segmentation in real-time. In: British Machine Vision Conference 2018, BMVC 2018, Newcastle, UK, September 3–6 2018, p. 146. BMVA Press (2018)
Google Scholar
Li, G., Yun, I., Kim, J., Kim, J.: DabNet: depth-wise asymmetric bottleneck for real-time semantic segmentation. In: 30th British Machine Vision Conference 2019, BMVC 2019, Cardiff, UK, September 9–12 2019, pp. 259. BMVA Press (2019)
Google Scholar
Lo, S.Y., Hang, H.M., Chan, S.W., Lin, J.J.: Efficient dense modules of asymmetric convolution for real-time semantic segmentation. In: Xu, C., Kankanhalli, M.S., Aizawa, K., Jiang, S., Zimmermann, R., Cheng, W.-H. (eds.) MMAsia ’19, ACM Multimedia Asia, Beijing, China, December 16–18 2019, pp. 1– 6. ACM (2019)
Google Scholar
Paszke, A., Chaurasia, A., Kim, S., Culurciello, E.: ENet: a deep neural network architecture for real-time semantic segmentation. CoRR, abs/1606.02147 (2016)
Google Scholar
Liu, M., Yin, H.: Feature pyramid encoding network for real-time semantic segmentation. In: 30th British Machine Vision Conference 2019, BMVC 2019, Cardiff, UK, September 9–12 2019, pp. 260. BMVA Press (2019)
Google Scholar
Zhang, X., Chen, Z., Wu, Q.J., Cai, L., Lu, D., Li, X.: Fast semantic segmentation for scene perception. IEEE Trans. Industr. Inf. 15(2), 1183–1192 (2019)
Article Google Scholar
Treml, M., Arjona-Medina, J., Unterthiner, T., Durgesh, R., Hochreiter, S.: Speeding up semantic segmentation for autonomous driving. In: NIPS 2016 Workshop - MLITS (2016)
Google Scholar
Rudra P. K. Poudel, Stephan Liwicki, and Roberto Cipolla. Fast-SCNN: fast semantic segmentation network. In: 30th British Machine Vision Conference 2019, BMVC 2019, Cardiff, UK, September 9–12 2019, p. 289. BMVA Press (2019)
Google Scholar

Download references

Acknowledgments

This work is supported by the Guangdong Provincial Key Laboratory of Human Digital Twin (No. 2022B1212010004), and the Hubei Key Laboratory of Intelligent Robot in Wuhan Institute of Technology (No. HBIRL202202 and No. HBIRL202206).

Author information

Authors and Affiliations

School of Computer Sciences and Engineering, Hubei Key Laboratory of Intelligent Robot, Wuhan Institute of Technology, Wuhan, 430205, Hubei, China
Guoping Xu, Xuan Zhang & Xinglong Wu
College of Informatics, Huazhong Agricultural University, Wuhan, 430070, Hubei, China
Xinwei He

Authors

Guoping Xu
View author publications
You can also search for this author in PubMed Google Scholar
Xuan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xinwei He
View author publications
You can also search for this author in PubMed Google Scholar
Xinglong Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xinglong Wu .

Editor information

Editors and Affiliations

Nanjing University of Information Science and Technology, Nanjing, China
Qingshan Liu
Xiamen University, Xiamen, China
Hanzi Wang
Beijing University of Posts and Telecommunications, Beijing, China
Zhanyu Ma
Sun Yat-sen University, Guangzhou, China
Weishi Zheng
Peking University, Beijing, China
Hongbin Zha
Chinese Academy of Sciences, Beijing, China
Xilin Chen
Chinese Academy of Sciences, Beijing, China
Liang Wang
Xiamen University, Xiamen, China
Rongrong Ji

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xu, G., Zhang, X., He, X., Wu, X. (2024). LeViT-UNet: Make Faster Encoders with Transformer for Medical Image Segmentation. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14432. Springer, Singapore. https://doi.org/10.1007/978-981-99-8543-2_4

Download citation

DOI: https://doi.org/10.1007/978-981-99-8543-2_4
Published: 29 December 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8542-5
Online ISBN: 978-981-99-8543-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

LeViT-UNet: Make Faster Encoders with Transformer for Medical Image Segmentation