Skip to main content
Log in

TransDiff: medical image segmentation method based on Swin Transformer with diffusion probabilistic model

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Medical image segmentation can provide a reliable basis for clinical analysis and diagnosis. However, this task is challenging due to the low contrast, boundary ambiguity between organs or lesions and surrounding tissues, and noise interference of images. To address this challenge, which is unique to medical images, and further improve the segmentation accuracy and precision, a medical image segmentation model (TransDiff) is proposed from the perspective of improving model robustness and enriching semantic information. TransDiff comprises three parts: a variational autoencoder (VAE), a diffusion transformer model and a Swin Transformer. The VAE constructs a latent space to provide an environment for fully extracting and fusing features. The diffusion model predicts and removes noise by inferring semantics through the propagation of information between nodes. The Swin Transformer enriches discriminative features as a conditional part. TransDiff inherits the robustness to noise and missing data of the diffusion model and the feature enrichment of the Swin Transformer, thus exhibiting a higher understanding of semantic information. It performs well on medical datasets with three different image modalities, outperforms existing medical image segmentation methods in terms of segmentation precision and accuracy, and has good generalizability. The codes and trained models will be publicly available at https://github.com/xiaoxiao1997/TransDiff.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Algorithm 1
Algorithm 2
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data availability

The data used in the paper will be available upon request.

References

  1. Bohlender S, Oksuz I, Mukhopadhyay A (2021) A survey on shape-constraint deep learning for medical image segmentation. IEEE Rev Biomed Eng. https://doi.org/10.1109/RBME.2021.3136343

    Article  Google Scholar 

  2. Shelhamer E, Long J, Darrell T et al (2017) Fully convolutional networks for semantic segmentation. IEEE Trans Pattern Anal Mach Intell 39(4):640–651. https://doi.org/10.1109/TPAMI.2016.2572683

    Article  Google Scholar 

  3. Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5–9, 2015, Proceedings, Part III18. Springer, pp 234–241. https://doi.org/10.1007/978-3-319-24574-4_28

  4. Zhou Y, Chen H, Li Y, Liu Q, Xu X, Wang S, Yap P-T, Shen D (2021) Multi-task learning for segmentation and classification of tumors in 3d automated breast ultrasound images. Med Image Anal 70:101918. https://doi.org/10.1016/j.media.2020.101918

    Article  Google Scholar 

  5. Clough JR, Byrne N, Oksuz I, Zimmer VA, Schnabel JA, King AP (2020) A topological loss function for deep-learning based image segmentation using persistent homology. IEEE Trans Pattern Anal Mach Intell 44(12):8766–8778. https://doi.org/10.1109/TPAMI.2020.3013679

    Article  Google Scholar 

  6. Li J, Zhang L, Shu X, Teng Y, Xu J (2022) Multi-instance learning based on spatial continuous category representation for case-level meningioma grading in mri images. Appl Intell 1–14. https://doi.org/10.1007/s10489-022-04114-x

  7. Zhou Z, Siddiquee MMR, Tajbakhsh N, Liang J (2018) Unet++: A nested u-net architecture for medical image segmentation. Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support: 4th International Workshop, DLMIA 2018, and 8th International Workshop, ML-CDS 2018, held in conjunction with MICCAI 2018, Granada, Spain 11045:3–11. https://doi.org/10.1007/978-3-030-00889-5_1

  8. Huang H, Lin L, Tong R, Hu H, Zhang Q, Iwamoto Y, Han X, Chen YW, Wu J (2020) Unet 3+: a full-scale connected unet for medical image segmentation. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Conference Proceedings, pp 1055–1059. https://doi.org/10.1109/ICASSP40776.2020.9053405

  9. Pal D, Reddy PB, Roy S (2022) Attention uw-net: a fully connected model for automatic segmentation and annotation of chest x-ray. Comput Biol Med 150:106083. https://doi.org/10.1016/j.compbiomed.2022.106083

    Article  Google Scholar 

  10. Hu Q, Wei Y, Li X, Wang C, Wang H, Wang S (2023) Svf-net: spatial and visual feature enhancement network for brain structure segmentation. Appl Intell 53(4):4180–4200. https://doi.org/10.1007/s10489-022-03706-x

    Article  Google Scholar 

  11. Ma M, Xia H, Tan Y, Li H, Song S (2022) Ht-net: hierarchical context-attention transformer network for medical ct image segmentation. Appl Intell 1–14. https://doi.org/10.1007/s10489-021-03010-0

  12. Dhamija T, Gupta A, Gupta S, Anjum, Katarya R, Singh G (2023) Semantic segmentation in medical images through transfused convolution and transformer networks. Appl Intell 53(1):1132–1148. https://doi.org/10.1007/s10489-022-03642-w

    Article  Google Scholar 

  13. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Proc Syst: 30. https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf

  14. Dalmaz O, Yurt M, Cukur T (2022) Resvit: residual vision transformers for multimodal medical image synthesis. IEEE Trans Med Imaging 41(10):2598–2614. https://doi.org/10.1109/TMI.2022.3167808

    Article  Google Scholar 

  15. Valanarasu JMJ, Oza P, Hacihaliloglu I, Patel VM (2021) Medical transformer: Gated axial-attention for medical image segmentation. In: Medical Image Computing and Computer Assisted Intervention–MICCAI2021: 24th International Conference, Strasbourg, France, September27–October 1, 2021, Proceedings, Part I 24. Springer, pp 36–46. https://doi.org/10.1007/978-3-030-87193-2_4

  16. Chen J, Lu Y, Yu Q, Luo X, Adeli E, Wang Y, Lu L, Yuille AL, Zhou Y (2021) Transunet: Transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306. https://doi.org/10.48550/arXiv.2102.04306

  17. Wan Y, Cheng Y, Shao M (2023) Mslanet: multi-scale long attention network for skin lesion classification. Appl Intell 53(10):12580–12598. https://doi.org/10.1007/s10489-022-03320-x

    Article  Google Scholar 

  18. Lin A, Chen B, Xu J, Zhang Z, Lu G, Zhang D (2022) Ds-transunet: dual swin transformer u-net for medical image segmentation. IEEE Trans Instrum Meas 71:1–15. https://doi.org/10.1109/TIM.2022.3178991

    Article  Google Scholar 

  19. Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 10012–10022. https://doi.org/10.1109/ICCV48922.2021.00986

  20. Cao H, Wang Y, Chen J, Jiang D, Zhang X, Tian Q, Wang M (2022) Swin-unet: Unet-like pure transformer for medical image segmentation. In: European Conference on Computer Vision. Springer, pp. 205–218. https://doi.org/10.1007/978-3-031-25066-8_9

  21. He X, Zhou Y, Zhao J, Zhang D, Yao R, Xue Y (2022) Swin transformer embedding unet for remote sensing image semantic segmentation. IEEE Trans Geo Sci Remote Sens 60:1–15. https://doi.org/10.1109/TGRS.2022.3144165

    Article  Google Scholar 

  22. Zhang J, Qin Q, Ye Q, Ruan T (2023) St-unet: Swin transformer boosted u-net with cross-layer feature enhancement for medical image segmentation. Comput Biol Med 153:106516. https://doi.org/10.1016/j.compbiomed.2022.106516

    Article  Google Scholar 

  23. Sohl-Dickstein J, Weiss E, Maheswaranathan N, Ganguli S (2015) Deep unsupervised learning using nonequilibrium thermodynamics. In: International Conference on Machine Learning. PMLR, pp 2256–2265

  24. Avrahami O, Lischinski D, Fried O (2022) Blended diffusion for text driven editing of natural images. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp18208–18218. https://doi.org/10.1109/CVPR52688.2022.01767

  25. Rombach R, Blattmann A, Lorenz D, Esser P, Ommer B (2022) High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 10684–10695. https://doi.org/10.1109/CVPR52688.2022.01042

  26. Croitoru F-A, Hondru V, Ionescu RT, Shah M (2023) Diffusion models in vision: a survey. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2023.3261988

    Article  Google Scholar 

  27. Ozbey M, Dalmaz O, Dar SU, Bedel HA, Ozturk S, Gungor A, Cukur T (2023) Unsupervised medical image translation with adversarial diffusion models. IEEE Trans Med Imaging. https://doi.org/10.1109/TMI.2023.3290149

    Article  Google Scholar 

  28. Saharia C, Ho J, Chan W, Salimans T, Fleet DJ, Norouzi M (2022) Image super-resolution via iterative refinement. IEEE Trans Pattern Anal Mach Intell 45(4):4713–4726. https://doi.org/10.1109/TPAMI.2022.3204461

    Article  Google Scholar 

  29. Ho J, Jain A, Abbeel P (2020) Denoising diffusion probabilistic models. Adv Neural Inf Process Syst 33:6840–6851

    Google Scholar 

  30. Dhariwal P, Nichol A (2021) Diffusion models beat gans on image synthesis. Adv Neural Inf Process Syst 34:8780–8794

    Google Scholar 

  31. Nichol AQ, Dhariwal P (2021) Improved denoising diffusion probabilistic models. In: International Conference on Machine Learning, pp 8162–8171. PMLR

  32. Amit T, Shaharbany T, Nachmani E, Wolf L (2021) Segdiff: image segmentation with diffusion probabilistic models. arXiv preprintarXiv:211200390. https://doi.org/10.48550/arXiv.2112.00390

  33. Wu J, Fu R, Fang H, Zhang Y, Yang Y, Xiong H, Liu H, Xu Y (2024) Medsegdiff: Medical image segmentation with diffusion probabilistic model. In: Medical Imaging with Deep Learning, pp 1623–1639. PMLR

  34. Guo X, Yang Y, Ye C, Lu S, Peng B, Huang H, Xiang Y, Ma T (2023) Accelerating diffusion models via presegmentation diffusion sampling for medical image segmentation. In: 2023 IEEE 20th International Symposium on Biomedical Imaging (ISBI). IEEE, pp 1–5. https://doi.org/10.1109/ISBI53787.2023.10230524

  35. Abbaszadeh Shahri A, Maghsoudi Moud F (2021) Landslide susceptibility mapping using hybridized block modular intelligence model. Bull Eng Geol Environ 80:267–284. https://doi.org/10.1007/s10064-020-01922-8

    Article  Google Scholar 

  36. Zou BJ, Guo YD, He Q, Ouyang PB, Liu K, Chen ZL (2018) 3d filtering by block matching and convolutional neural network for image denoising. J Comput Sci Technol 33:838–848. https://doi.org/10.1007/s11390-018-1859-7

    Article  Google Scholar 

  37. Zhou J, Ni J, Rao Y (2017) Block-based convolutional neural network for image forgery detection. In: Digital Forensics and Watermarking: 16th International Workshop, IWDW 2017, Magdeburg, Germany, August 23–25, 2017, Proceedings 16, pages 65–76. Springer. https://doi.org/10.1007/978-3-319-64185-0_6

  38. Hosseini SA, Abbaszadeh Shahri A, Asheghi R (2022) Prediction of bedload transport rate using a block combined network structure. Hydrol Sci J 67(1):117–128. https://doi.org/10.1080/02626667.2021.2003367

    Article  Google Scholar 

  39. Ibtehaz N, Rahman MS (2020) Multiresunet: Rethinking the u-net architecture for multimodal biomedical image segmentation. Neural Netw 121:74–87. https://doi.org/10.1016/j.neunet.2019.08.025

    Article  Google Scholar 

  40. Li X, Chen H, Qi X, Dou Q, Fu CW, Heng PA (2018) H-dense unet: hybrid densely connected unet for liver and tumor segmentation from ct volumes. IEEE Trans Med Imaging 37(12):2663–2674. https://doi.org/10.1109/TMI.2018.2845918

    Article  Google Scholar 

  41. Touvron H, Cord M, Douze M, Massa F, Sablayrolles A, Jegou H (2021) Training data-efficient image transformers & distillation through attention. In: International conference on machine learning. PMLR. pp 10347–10357

  42. Han K, Xiao A, Wu E, Guo J, Xu C, Wang Y (2021) Transformer in transformer. Adv Neural Inf Process Syst 34:15908–15919

    Google Scholar 

  43. Wang W, XieE, Li X, Fan DP, Song K, Liang D, Lu T, Luo P, Shao L (2021) Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 568–578. https://doi.org/10.1109/ICCV48922.2021.00061

  44. Wolleb J, Sandkuhler R, Bieder F, Valmaggia P, Cattin PC (2022) Diffusion models for implicit image segmentation ensembles. In: International Conference on Medical Imaging with Deep Learning. PMLR, pp 1336–1348

  45. Kynkaanniemi T, Karras T, Laine S, Lehtinen J, Aila T (2019) Improved precision and recall metric for assessing generative models. Adv Neural Inf Proc Syst: 32. https://proceedings.neurips.cc/paper_files/paper/2019/file/0234c510bc6d908b28c70ff313743079-Paper.pdf

  46. Ma J, He Y, Li F, Han L, You C, Wang B (2024) Segment anything in medical images. Nat Commun 15(1):654. https://doi.org/10.1038/s41467-024-44824-z

    Article  Google Scholar 

  47. Abbaszadeh Shahri A, Shan C, Larsson S (2022) A novel approach to uncertainty quantification in groundwater table modeling by automated predictive deep learning. Nat Resour Res 31(3):1351–1373. https://doi.org/10.1007/s11053-022-10051-w

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China under Grants 61771220 and 62271226.

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization: Xiaoxiao Liu; Methodology: Xiaoxiao Liu; Formal analysis and investigation: Xiaoxiao Liu; Writing draft preparation: Xiaoxiao Liu; Review: Yan Zhao, Shigang Wang; Funding acquisition: Yan Zhao, Jian Wei.

Corresponding author

Correspondence to Yan Zhao.

Ethics declarations

Ethical approval

Not applicable.

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, X., Zhao, Y., Wang, S. et al. TransDiff: medical image segmentation method based on Swin Transformer with diffusion probabilistic model. Appl Intell (2024). https://doi.org/10.1007/s10489-024-05496-w

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10489-024-05496-w

Keywords

Navigation