Abstract
Accurate segmentation of medical images is essential for clinical decision-making, and deep learning techniques have shown remarkable results in this area. However, existing segmentation models that combine transformer and convolutional neural networks often use skip connections in U-shaped networks, which may limit their ability to capture contextual information in medical images. To address this limitation, we propose a coordinated mobile and residual transformer UNet (MRC-TransUNet) that combines the strengths of transformer and UNet architectures. Our approach uses a lightweight MR-ViT to address the semantic gap and a reciprocal attention module to compensate for the potential loss of details. To better explore long-range contextual information, we use skip connections only in the first layer and add MR-ViT and RPA modules in the subsequent downsampling layers. In our study, we evaluated the effectiveness of our proposed method on three different medical image segmentation datasets, namely, breast, brain, and lung. Our proposed method outperformed state-of-the-art methods in terms of various evaluation metrics, including the Dice coefficient and Hausdorff distance. These results demonstrate that our proposed method can significantly improve the accuracy of medical image segmentation and has the potential for clinical applications.
Graphical Abstract
Illustration of the proposed MRC-TransUNet. For the input medical images, we first subject them to an intrinsic downsampling operation and then replace the original jump connection structure using MR-ViT. The output feature representations at different scales are fused by the RPA module. Finally, an upsampling operation is performed to fuse the features to restore them to the same resolution as the input image.
Similar content being viewed by others
Data availability
The data are available from the corresponding author on reasonable request.
References
Shirokikh B, Dalechina A, Shevtsov A et al (2020) Deep learning for brain tumor segmentation in radiosurgery: prospective clinical evaluation. In: LNIP, BrainLes 2019, vol 11992, Springer, Cham, pp 119–128. https://doi.org/10.1007/978-3-030-46640-4_12
Otsu N (2007) A threshold selection method from Gray-level histograms. IEEE Trans Syst Man Cybern 9(1):62–66. https://doi.org/10.1109/TSMC.1979.4310076
Prastawa M, Bullitt E, Gerig G (2009) Simulation of brain tumors in MR images for evaluation of segmentation efficacy. Med Image Anal 13(2):297–311. https://doi.org/10.1016/j.media.2008.11.002
Corso JJ, Sharon E, Dube S et al (2008) Efficient multilevel brain tumor segmentation with integrated Bayesian model classification. IEEE Trans Med Imaging 27(5):629–640. https://doi.org/10.1109/TMI.2007.912817
Lin AL, Chen BZ, Xu JY et al (2022) DS-TransUNet: dual swin transformer U-Net for medical image segmentation. IEEE Trans Instrum Meas 71:4005615. https://doi.org/10.1109/TIM.2022.3178991
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. IEEE Trans Pattern Anal Mach Intell 39(4):640–651. https://doi.org/10.1109/TPAMI.2016.2572683
Ronneberger O, Fischer P, Brox T (2015) U-Net: convolutional networks for biomedical image segmentation. In: LNIP, MICCAI 2015, vol 9351, Springer, Cham, pp 234–241. https://doi.org/10.1007/978-3-319-24574-4_28
Zhou Z, Rahman Siddiquee MM, Tajbakhsh N et al (2018) Unet++: A nested u-net architecture for medical image segmentation. In: LNIP, DLMIA 2018, vol 11045, Springer, Cham, pp 3–11. https://doi.org/10.1007/978-3-030-00889-5_1
Guerrero R, Qin C, Oktay O et al (2018) White matter hyperintensity and stroke lesion segmentation and differentiation using convolutional neural networks. Neuroimage-Clin 17:918–934. https://doi.org/10.1016/j.nicl.2017.12.022
Oktay O, Schlemper J, Folgoc LL et al (2018) Attention U-Net: learning where to look for the pancreas. https://doi.org/10.48550/arXiv.1804.03999
Vaswani A, Shazeer N, Parmar N et al (2017) Attention is all you need. Adv Neural Inf Process Syst. https://doi.org/10.48550/arXiv.1706.03762
Dosovitskiy A, Beyer L, Kolesnikov A et al (2020) An image is worth 16x16 words: transformers for image recognition at scale. https://doi.org/10.48550/arXiv.2010.11929
Chen J, Lu Y, Yu Q et al (2021) TransUNet: transformers make strong encoders for medical image segmentation. https://doi.org/10.48550/arXiv.2102.04306
Valanarasu J, Oza P, Hacihaliloglu I et al (2021) Medical transformer: gated axial-attention for medical image segmentation. https://doi.org/10.48550/arXiv.2102.10662
Cao H, Wang YY, Chen J et al (2021) Swin-Unet: Unet-like pure transformer for medical image segmentation. https://doi.org/10.48550/arXiv.2105.05537
Wang H, Cao P, Wang J et al (2022) UCTransNet: rethinking the skip connections in u-net from a channel-wise perspective with transformer. Proc AAAI Conf Artif Intell 36(3):2441–2449. https://doi.org/10.48550/arXiv.2109.04335
He K, Zhang X, Ren S et al (2016) Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016, pp 770–778. https://doi.org/10.1109/CVPR.2016.90
Mehta S and Rastegari M (2021) MobileViT: light-weight, general-purpose, and mobile-friendly vision transformer. https://doi.org/10.48550/arXiv.2110.02178
Xiao X, Shen L, Luo Z et al (2018) Weighted Res-UNet for high-quality retina vessel segmentation. In: 2018 9th International conference on information technology in medicine and education (ITME), Hangzhou, China, 2018, pp 327–331. https://doi.org/10.1109/itme.2018.00080
Alom MZ, Hasan M, Yakopcic C et al (2018) Recurrent residual convolutional neural network based on U-Net (R2U-Net) for medical image segmentation. https://doi.org/10.48550/arXiv.1802.06955
Fan D-P, Ji GP, Zhou T et al (2020) Pranet: Parallel reverse attention network for polyp segmentation. In: Medical image computing and computer assisted intervention–MICCAI 2020: 23rd international conference, Lima, Peru. Proceedings, Part VI 23. Springer, Cham, pp 263–273. https://doi.org/10.48550/arXiv.2006.11392
Valanarasu JMJ, Sindagi VA, Hacihaliloglu I et al (2020) Kiu-net: towards accurate segmentation of biomedical images using over-complete representations. IN: Medical image computing and computer assisted intervention–MICCAI 2020: 23rd international conference, Lima, Peru. Springer, Cham, pp 363–373. https://doi.org/10.1007/978-3-030-59719-1_36
Wang X, Girshick R, Gupta A et al (2018) Non-local neural networks. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 7794–7803. https://doi.org/10.1109/CVPR.2018.00813
Huang Z, Wang X, Huang L et al (2023) CCNet: criss-cross attention for semantic segmentation. Int Conf Comput Vis 45(6):6896–6908. https://doi.org/10.1109/TPAMI.2020.3007032
Li J, Huo HT, Li C et al (2021) Multigrained attention network for infrared and visible image fusion. IEEE Trans Instrum Meas 70:5002412. https://doi.org/10.1109/TIM.2020.3029360
Tang JH, Zou B, Li C et al (2021) Plane-wave image reconstruction via generative adversarial network and attention mechanism. IEEE Trans Instrum Meas 70:4505115. https://doi.org/10.1109/TIM.2021.3087819
Zhao R, Huang Z, Liu T et al (2021) Structure-enhanced attentive learning for spine segmentation from ultrasound volume projection images. In: IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, New York, pp 1195–1199. https://doi.org/10.1109/ICASSP39728.2021.9414658
Liu T, Zhang C, Lam KM et al (2022) Decouple and resolve: transformer-based models for online anomaly detection from weakly labeled videos. IEEE Trans Inf Forensics Secur 18:15–28. https://doi.org/10.1109/TIFS.2022.3216479
Li K, Wang Y, Zhang J et al (2023) Uniformer: unifying convolution and self-attention for visual recognition. IEEE Trans Pattern Anal Mach Intell 1–18. https://doi.org/10.1109/TPAMI.2023.3282631
Zhang Z, Zhang X, Yang Y et al (2023) Accurate segmentation algorithm of acoustic neuroma in the cerebellopontine angle based on ACP-TransUNet. Front Neurosci 17:1207149. https://doi.org/10.3389/fnins.2023.1207149
Drozdzal M, Vorontsov E, Chartrand G et al (2016) The importance of skip connections in biomedical image segmentation. In: LNIP, DLMIA 2016, vol 10008, Springer, Cham, pp 179–187. https://doi.org/10.1007/978-3-319-46976-8_19
Huang G, Liu Z, Laurens V et al (2016) Densely connected convolutional networks. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), Honolulu, HI, USA, 2017, pp 2261–2269. https://doi.org/10.1109/CVPR.2017.243
Li X, Hao C, Qi X et al (2017) H-DenseUNet: hybrid densely connected UNet for liver and liver tumor segmentation from CT volumes. IEEE Trans Med Imaging 37(12):2663–2674. https://doi.org/10.1109/TMI.2018.2845918
Huang H, Lin L, Tong R et al (2020) UNet 3+: a full-scale connected UNet for medical image segmentation. In: ICASSP 2020 - 2020 IEEE international conference on acoustics, speech and signal processing (ICASSP), Barcelona, Spain, 2020, pp 1055–1059. https://doi.org/10.1109/ICASSP40776.2020.9053405
Ibtehaz N, Sohel Rahman M (2019) MultiResUNet: rethinking the U-net architecture for multimodal biomedical image segmentation. Neural Netw 121:74–87. https://doi.org/10.1016/j.neunet.2019.08.025
Xiao T, Singh M, Mintun E et al (2021) Early convolutions help transformers see better. Adv Neural Inf Process Syst. https://doi.org/10.48550/arXiv.2106.14881
Graham B, El-Nouby A, Touvron H et al. (2021) LeViT: a vision transformer in ConvNet’s clothing for faster inference. https://doi.org/10.48550/arXiv.2104.01136
Wadekar SN and Chaurasia A (2022) Mobilevitv3: mobile-friendly vision transformer with simple and effective fusion of local, global and input features. Preprint at https://arXiv.org/arXiv:2209.15159
Hou Q, Zhou D, Feng J (2021) Coordinate attention for efficient mobile network design. Comput Vis Pattern Recogn. https://doi.org/10.48550/arXiv.2103.02907
Al-Dhabyani W, Gomaa M, Khaled H et al (2019) Dataset of breast ultrasound images. Data Brief 28:104863. https://doi.org/10.1016/j.dib.2019.104863
Rahman T, Amith K, Yazan Q et al (2021) Exploring the effect of image enhancement techniques on COVID-19 detection using chest X-ray images. Comput Biol Med 132:104319. https://doi.org/10.1016/j.compbiomed.2021.104319
Chowdhury MEH, Rahman T, Khandakar A et al (2020) Can AI help in screening viral and COVID-19 pneumonia? IEEE Access 8:132665–132676. https://doi.org/10.1109/ACCESS.2020.3010287
Kingma D and Ba J (2014) Adam: a method for stochastic optimization. Preprint at https://arXiv.org/arXiv:1412.6980. https://doi.org/10.48550/arXiv.1412.6980
Beauchemin M, Thomson KP, Edwards G (1998) On the Hausdorff distance used for the evaluation of segmentation results. Can J Remote Sens 24(1):3–8. https://doi.org/10.1080/07038992.1998.10874685
Badrinarayanan V, Kendall A, Cipolla R (2017) SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495. https://doi.org/10.1109/TPAMI.2016.2644615
Zhao H, Shi J, Qi X et al (2016) Pyramid scene parsing network. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), Honolulu, HI, USA, 2017, pp 6230–6239. https://doi.org/10.1109/cvpr.2017.660
Acknowledgements
The authors wish to express our sincere appreciation to Chenzi Zheng (College of Foreign Languages, Nankai University) for her valuable assistance in editing the English language of our research.
Funding
This work was supported in part by National Natural Science Foundation of China (61972456, 61173032) and Tianjin Research Innovation Project for Postgraduate Students (2022SKY126).
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Conflict of interest
The authors declare no conflict of interest.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhang, Z., Wu, H., Zhao, H. et al. A Novel Deep Learning Model for Medical Image Segmentation with Convolutional Neural Network and Transformer. Interdiscip Sci Comput Life Sci 15, 663–677 (2023). https://doi.org/10.1007/s12539-023-00585-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12539-023-00585-9