Abstract
Medical image segmentation plays an essential role in developing computer-assisted diagnosis and treatment systems, yet it still faces numerous challenges. In the past few years, Convolutional Neural Networks (CNNs) have been successfully applied to the task of medical image segmentation. Regrettably, due to the locality of convolution operations, these CNN-based architectures have their limitations in learning global context information in images, which might be crucial to the success of medical image segmentation. Meanwhile, the vision Transformer (ViT) architectures own the remarkable ability to extract long-range semantic features with the shortcoming of their computation complexity. To make medical image segmentation more efficient and accurate, we present a novel light-weight architecture named LeViT-UNet, which integrates multi-stage Transformer blocks in the encoder via LeViT, aiming to explore the effectiveness of fusion between local and global features together. Our experiments on two challenging segmentation benchmarks indicate that the proposed LeViT-UNet achieved competitive performance compared with various state-of-the-art methods in terms of efficiency and accuracy, suggesting that LeViT can be a faster feature encoder for medical images segmentation. LeViT-UNet-384, for instance, achieves Dice similarity coefficient (DSC) of 78.53% and 90.32% with a segmentation speed of 85 frames per second (FPS) in the Synapse and ACDC datasets, respectively. Therefore, the proposed architecture could be beneficial for prospective clinic trials conducted by the radiologists. Our source codes are publicly available at https://github.com/apple1986/LeViT_UNet.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Shelhamer, E., Long, J., Darrell, T.: Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 640–651 (2017)
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: semantic image segmentation with deep convolutional nets, Atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2018)
Cheng, F., et al.: Learning directional feature maps for cardiac MRI segmentation. In: Martel, A.L., et al. (eds.) MICCAI 2020. LNCS, vol. 12264, pp. 108–117. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59719-1_11
Jin, Q., Meng, Z., Sun, C., Cui, H., Ran, S.: RA-UNet: a hybrid deep attention-aware network to extract liver and tumor in CT scans. Front. Bioeng. Biotechnol. 8, 605132 (2020)
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE (2017)
Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE (2019)
Oktay, O., et al.: Attention U-Net: learning where to look for the pancreas (2018)
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale (2020)
Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., Jegou, H.: Training data-efficient image transformers & distillation through attention. In: Meila, M., Zhang, T. (eds.) Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pp. 10347–10357. PMLR (2021)
Graham, B., et al.: LeViT: a vision transformer in ConvNet’s clothing for faster inference. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE (2021)
Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE (2021)
Cao, H., et al.: Swin-Unet: Unet-like pure transformer for medical image segmentation (2021)
Chen, J., et al.: Transformers make strong encoders for medical image segmentation. TransUNet (2021)
Xiao, X., Lian, S., Luo, Z., Li, S.: Weighted res-UNet for high-quality retina vessel segmentation. In: 2018 9th International Conference on Information Technology in Medicine and Education (ITME). IEEE (2018)
Zongwei Zhou, Md., Siddiquee, M.R., Tajbakhsh, N., Liang, J.: UNet++: redesigning skip connections to exploit multiscale features in image segmentation. IEEE Trans. Med. Imaging 39(6), 1856–1867 (2020)
Milletari, F., Navab, N., Ahmadi, S.A.: V-Net: fully convolutional neural networks for volumetric medical image segmentation. In: 2016 Fourth International Conference on 3D Vision (3DV). IEEE (2016)
Isensee, F., Jaeger, P.F., Kohl, S.A., Petersen, J., Maier-Hein, K.H.: nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nat. Methods 18(2), 203–211 (2021)
Jie, H., Shen, L., Albanie, S., Sun, G., Enhua, W.: Squeeze-and-Excitation networks. IEEE Trans. Pattern Anal. Mach. Intell. 42(8), 2011–2023 (2020)
Fu, et al.: Dual attention network for scene segmentation. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE (2019)
Wu, T., Tang, S., Zhang, R., Cao, J., Zhang, Y.: CGNet: a light-weight context guided network for semantic segmentation. IEEE Trans. Image Process. 30, 1169–1179 (2021)
Poudel, R.P., Bonde, U., Liwicki, S., Zach, C.: ContextNet: exploring context and detail for semantic segmentation in real-time. In: British Machine Vision Conference 2018, BMVC 2018, Newcastle, UK, September 3–6 2018, p. 146. BMVA Press (2018)
Li, G., Yun, I., Kim, J., Kim, J.: DabNet: depth-wise asymmetric bottleneck for real-time semantic segmentation. In: 30th British Machine Vision Conference 2019, BMVC 2019, Cardiff, UK, September 9–12 2019, pp. 259. BMVA Press (2019)
Lo, S.Y., Hang, H.M., Chan, S.W., Lin, J.J.: Efficient dense modules of asymmetric convolution for real-time semantic segmentation. In: Xu, C., Kankanhalli, M.S., Aizawa, K., Jiang, S., Zimmermann, R., Cheng, W.-H. (eds.) MMAsia ’19, ACM Multimedia Asia, Beijing, China, December 16–18 2019, pp. 1– 6. ACM (2019)
Paszke, A., Chaurasia, A., Kim, S., Culurciello, E.: ENet: a deep neural network architecture for real-time semantic segmentation. CoRR, abs/1606.02147 (2016)
Liu, M., Yin, H.: Feature pyramid encoding network for real-time semantic segmentation. In: 30th British Machine Vision Conference 2019, BMVC 2019, Cardiff, UK, September 9–12 2019, pp. 260. BMVA Press (2019)
Zhang, X., Chen, Z., Wu, Q.J., Cai, L., Lu, D., Li, X.: Fast semantic segmentation for scene perception. IEEE Trans. Industr. Inf. 15(2), 1183–1192 (2019)
Treml, M., Arjona-Medina, J., Unterthiner, T., Durgesh, R., Hochreiter, S.: Speeding up semantic segmentation for autonomous driving. In: NIPS 2016 Workshop - MLITS (2016)
Rudra P. K. Poudel, Stephan Liwicki, and Roberto Cipolla. Fast-SCNN: fast semantic segmentation network. In: 30th British Machine Vision Conference 2019, BMVC 2019, Cardiff, UK, September 9–12 2019, p. 289. BMVA Press (2019)
Acknowledgments
This work is supported by the Guangdong Provincial Key Laboratory of Human Digital Twin (No. 2022B1212010004), and the Hubei Key Laboratory of Intelligent Robot in Wuhan Institute of Technology (No. HBIRL202202 and No. HBIRL202206).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Xu, G., Zhang, X., He, X., Wu, X. (2024). LeViT-UNet: Make Faster Encoders with Transformer for Medical Image Segmentation. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14432. Springer, Singapore. https://doi.org/10.1007/978-981-99-8543-2_4
Download citation
DOI: https://doi.org/10.1007/978-981-99-8543-2_4
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8542-5
Online ISBN: 978-981-99-8543-2
eBook Packages: Computer ScienceComputer Science (R0)