From CNN to Transformer: A Review of Medical Image Segmentation Models

Yao, Wenjian; Bai, Jiajun; Liao, Wei; Chen, Yuheng; Liu, Mengjuan; Xie, Yao

doi:10.1007/s10278-024-00981-7

From CNN to Transformer: A Review of Medical Image Segmentation Models

Published: 04 March 2024

(2024)
Cite this article

Journal of Imaging Informatics in Medicine Aims and scope Submit manuscript

Wenjian Yao¹,
Jiajun Bai¹,
Wei Liao²,
Yuheng Chen¹,
Mengjuan Liu ORCID: orcid.org/0000-0003-3679-7915¹ &
…
Yao Xie^3,4

892 Accesses
1 Citation
Explore all metrics

Abstract

Medical image segmentation is an important step in medical image analysis, especially as a crucial prerequisite for efficient disease diagnosis and treatment. The use of deep learning for image segmentation has become a prevalent trend. The widely adopted approach currently is U-Net and its variants. Moreover, with the remarkable success of pre-trained models in natural language processing tasks, transformer-based models like TransUNet have achieved desirable performance on multiple medical image segmentation datasets. Recently, the Segment Anything Model (SAM) and its variants have also been attempted for medical image segmentation. In this paper, we conduct a survey of the most representative seven medical image segmentation models in recent years. We theoretically analyze the characteristics of these models and quantitatively evaluate their performance on Tuberculosis Chest X-rays, Ovarian Tumors, and Liver Segmentation datasets. Finally, we discuss the main challenges and future trends in medical image segmentation. Our work can assist researchers in the related field to quickly establish medical segmentation models tailored to specific regions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Swin-Unet: Unet-Like Pure Transformer for Medical Image Segmentation

UNet++: A Nested U-Net Architecture for Medical Image Segmentation

3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation

References

Cheng, J.Z., Ni, D., Chou, Y.H., Qin, J., Tiu, C.M., Chang, Y.C., Huang, C.S., Shen, D., Chen, C.M.: Computer-aided diagnosis with deep learning architecture: Applications to breast lesions in US images and pulmonary nodules in CT scans. Scientific Reports 6, 24454 (2016)
Golan, R., Jacob, C., Denzinger, J.: Lung nodule detection in ct images using deep convolutional neural networks. In: International Joint Conference on Neural Networks (2016)
Christ, P.F., Ettlinger, F., Grün, F., Elshaera, M.E.A., Lipkova, J., Schlecht, S., Ahmaddy, F., Tatavarty, S., Bickel, M., Bilic, P.: Automatic liver and tumor segmentation of CT and MRI volumes using cascaded fully convolutional neural networks (2017)
Otsu, N.: A threshold selection method from gray-level histograms. IEEE Transactions on Systems, Man, and Cybernetics 9(1), 62–66 (1979)
Magnier, Baptiste: Edge detection: a review of dissimilarity evaluations and a proposed normalized measure. Multimedia Tools & Applications (2017)
Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5–9, 2015, Proceedings, Part III 18, pp. 234–241 (2015). Springer
Badrinarayanan, V., Kendall, A., Cipolla, R.: Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(12), 2481–2495 (2017)
Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Semantic image segmentation with deep convolutional nets and fully connected CRFS. arXiv preprint arXiv:1412.7062 (2014)
Chen, J., Lu, Y., Yu, Q., Luo, X., Adeli, E., Wang, Y., Lu, L., Yuille, A.L., Zhou, Y.: Transunet: Transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306 (2021)
Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-unet: Unet-like pure transformer for medical image segmentation. In: European Conference on Computer Vision, pp. 205–218 (2022). Springer
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020). Springer
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021)
Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., Xiao, T., Whitehead, S., Berg, A.C., Lo, W.-Y., et al.: Segment anything. arXiv preprint arXiv:2304.02643 (2023)
Aljuaid, A., Anwar, M.: Survey of supervised learning for medical image processing. SN Computer Science 3(4), 292 (2022)
Abdou, M.A.: Literature review: Efficient deep neural networks techniques for medical image analysis. Neural Computing and Applications 34(8), 5791–5812 (2022)
Asgari Taghanaki, S., Abhishek, K., Cohen, J.P., Cohen-Adad, J., Hamarneh, G.: Deep semantic segmentation of natural and medical images: a review. Artificial Intelligence Review 54, 137–178 (2021)
Zhou, Z., Rahman Siddiquee, M.M., Tajbakhsh, N., Liang, J.: Unet++: A nested u-net architecture for medical image segmentation. In: Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support: 4th International Workshop, DLMIA 2018, and 8th International Workshop, ML-CDS 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, September 20, 2018, Proceedings 4, pp. 3–11 (2018). Springer
Ker, J., Wang, L., Rao, J., Lim, T.: Deep learning applications in medical image analysis. Ieee Access 6, 9375–9389 (2017)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 25 (2012)
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)
Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE transactions on pattern analysis and machine intelligence 40(4), 834–848 (2017)
Chen, L.-C., Papandreou, G., Schroff, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587 (2017)
Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 801–818 (2018)
Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122 (2015)
He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE transactions on pattern analysis and machine intelligence 37(9), 1904–1916 (2015)
Li, C., Tan, Y., Chen, W., Luo, X., Gao, Y., Jia, X., Wang, Z.: Attention unet++: A nested attention-aware u-net for liver ct image segmentation. In: 2020 IEEE International Conference on Image Processing (ICIP), pp. 345–349 (2020). IEEE
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)
He, K., Chen, X., Xie, S., Li, Y., Dollár, P., Girshick, R.: Masked autoencoders are scalable vision learners. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16000–16009 (2022)
Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., et al: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning, pp. 8748–8763 (2021). PMLR
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. Advances in neural information processing systems 30 (2017)
Jaeger, S., Candemir, S., Antani, S., Wáng, Y.-X.J., Lu, P.-X., Thoma, G.: Two public chest x-ray datasets for computer-aided screening of pulmonary diseases. Quantitative imaging in medicine and surgery 4(6), 475 (2014)
Heimann, T., Van Ginneken, B., Styner, M.A., Arzhaeva, Y., Aurich, V., Bauer, C., Beck, A., Becker, C., Beichel, R., Bekes, G., et al: Comparison and evaluation of methods for liver segmentation from ct datasets. IEEE transactions on medical imaging 28(8), 1251–1265 (2009)
Milletari, F., Ahmadi, S.-A., Kroll, C., Plate, A., Rozanski, V., Maiostre, J., Levin, J., Dietrich, O., Ertl-Wagner, B., Bötzel, K., et al: Hough-CNN: Deep learning for segmentation of deep brain regions in MRI and ultrasound. Computer Vision and Image Understanding 164, 92–102 (2017)
Golan, R., Jacob, C., Denzinger, J.: Lung nodule detection in ct images using deep convolutional neural networks. In: 2016 International Joint Conference on Neural Networks (IJCNN), pp. 243–250 (2016). IEEE
Beevi, K.S., Nair, M.S., Bindu, G.: Automatic mitosis detection in breast histopathology images using convolutional neural network based deep transfer learning. Biocybernetics and Biomedical Engineering 39(1), 214–223 (2019)
Urban, G., Bendszus, M., Hamprecht, F., Kleesiek, J., et al.: Multi-modal brain tumor segmentation using deep convolutional neural networks. MICCAI BraTS (brain tumor segmentation) challenge. Proceedings, winning contribution, 31–35 (2014)

Download references

Funding

This work was supported by the Open Project of Network and Data Security Key Laboratory of Sichuan Province (NSD2021-6), Clinical Research and Transformation Fund of Sichuan Provincial People’s Hospital (2021LY24), and the Key Research Project of Science and Technology of Sichuan Province(2022YFS0087, 2023YFS0039).

Author information

Authors and Affiliations

Network and Data Security Key Laboratory of Sichuan Province, University of Electronic Science and Technology of China, 610054, Chengdu, China
Wenjian Yao, Jiajun Bai, Yuheng Chen & Mengjuan Liu
Department of Obstetrics and Gynaecology, Deyang People’s Hospital, 618000, Deyang, China
Wei Liao
Department of Obstetrics and Gynaecology, Sichuan Provincial People’s Hospital, School of Medicine, University of Electronic Science and Technology of China, Chengdu, China
Yao Xie
Chinese Academy of Sciences Sichuan Translational Medicine Research Hospital, 610072, Chengdu, China
Yao Xie

Authors

Wenjian Yao
View author publications
You can also search for this author in PubMed Google Scholar
Jiajun Bai
View author publications
You can also search for this author in PubMed Google Scholar
Wei Liao
View author publications
You can also search for this author in PubMed Google Scholar
Yuheng Chen
View author publications
You can also search for this author in PubMed Google Scholar
Mengjuan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yao Xie
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

[Wenjian Yao, and Mengjuan Liu] contributed to the study conception and design. Material preparation, data collection, and analysis were performed by [Wenjian Yao, Yao Xie, Wei Liao, and Yuheng Chen]. The first draft of the manuscript was written by [Wenjian Yao, Jiajun Bai, and Mengjuan Liu] and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Mengjuan Liu or Yao Xie.

Ethics declarations

Ethics Approval

This study was performed in line with the principles of the Declaration of Helsinki. Approval was granted by the Ethics Committee of Sichuan Provincial People’s Hospital.

Consent to Participate

Informed consent was obtained from all individual participants included in the study.

Consent for Publication

The authors affirm that patients signed informed consent regarding publishing their data and photographs. Tuberculosis Chest X-rays dataset is from the publicly available dataset: Tuberculosis Chest X-rays dataset. Clinical Liver CT dataset is from the publicly available dataset: Clinical Liver CT dataset. Ovarian Tumors dataset, we obtained all the informed consent. Also, the patient’s abdominal images were anonymized so that the images would not identify a patient.

Competing Interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Yao, W., Bai, J., Liao, W. et al. From CNN to Transformer: A Review of Medical Image Segmentation Models. J Digit Imaging. Inform. med. (2024). https://doi.org/10.1007/s10278-024-00981-7

Download citation

Received: 10 July 2023
Revised: 13 November 2023
Accepted: 14 November 2023
Published: 04 March 2024
DOI: https://doi.org/10.1007/s10278-024-00981-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

From CNN to Transformer: A Review of Medical Image Segmentation Models

Abstract

Access this article

Similar content being viewed by others

Swin-Unet: Unet-Like Pure Transformer for Medical Image Segmentation

UNet++: A Nested U-Net Architecture for Medical Image Segmentation

3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Ethics Approval

Consent to Participate

Consent for Publication

Competing Interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

From CNN to Transformer: A Review of Medical Image Segmentation Models

Abstract

Access this article

Similar content being viewed by others

Swin-Unet: Unet-Like Pure Transformer for Medical Image Segmentation

UNet++: A Nested U-Net Architecture for Medical Image Segmentation

3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Ethics Approval

Consent to Participate

Consent for Publication

Competing Interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation