TransDiff: medical image segmentation method based on Swin Transformer with diffusion probabilistic model

Liu, Xiaoxiao; Zhao, Yan; Wang, Shigang; Wei, Jian

doi:10.1007/s10489-024-05496-w

TransDiff: medical image segmentation method based on Swin Transformer with diffusion probabilistic model

Published: 18 May 2024

(2024)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Xiaoxiao Liu¹,
Yan Zhao ORCID: orcid.org/0000-0002-5319-0202¹,
Shigang Wang¹ &
…
Jian Wei¹

308 Accesses
Explore all metrics

Abstract

Medical image segmentation can provide a reliable basis for clinical analysis and diagnosis. However, this task is challenging due to the low contrast, boundary ambiguity between organs or lesions and surrounding tissues, and noise interference of images. To address this challenge, which is unique to medical images, and further improve the segmentation accuracy and precision, a medical image segmentation model (TransDiff) is proposed from the perspective of improving model robustness and enriching semantic information. TransDiff comprises three parts: a variational autoencoder (VAE), a diffusion transformer model and a Swin Transformer. The VAE constructs a latent space to provide an environment for fully extracting and fusing features. The diffusion model predicts and removes noise by inferring semantics through the propagation of information between nodes. The Swin Transformer enriches discriminative features as a conditional part. TransDiff inherits the robustness to noise and missing data of the diffusion model and the feature enrichment of the Swin Transformer, thus exhibiting a higher understanding of semantic information. It performs well on medical datasets with three different image modalities, outperforms existing medical image segmentation methods in terms of segmentation precision and accuracy, and has good generalizability. The codes and trained models will be publicly available at https://github.com/xiaoxiao1997/TransDiff.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 3

BerDiff: Conditional Bernoulli Diffusion Model for Medical Image Segmentation

PHiSeg: Capturing Uncertainty in Medical Image Segmentation

Variational Models and Their Combinations with Deep Learning in Medical Image Segmentation: A Survey

Data availability

The data used in the paper will be available upon request.

References

Bohlender S, Oksuz I, Mukhopadhyay A (2021) A survey on shape-constraint deep learning for medical image segmentation. IEEE Rev Biomed Eng. https://doi.org/10.1109/RBME.2021.3136343
Article Google Scholar
Shelhamer E, Long J, Darrell T et al (2017) Fully convolutional networks for semantic segmentation. IEEE Trans Pattern Anal Mach Intell 39(4):640–651. https://doi.org/10.1109/TPAMI.2016.2572683
Article Google Scholar
Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5–9, 2015, Proceedings, Part III18. Springer, pp 234–241. https://doi.org/10.1007/978-3-319-24574-4_28
Zhou Y, Chen H, Li Y, Liu Q, Xu X, Wang S, Yap P-T, Shen D (2021) Multi-task learning for segmentation and classification of tumors in 3d automated breast ultrasound images. Med Image Anal 70:101918. https://doi.org/10.1016/j.media.2020.101918
Article Google Scholar
Clough JR, Byrne N, Oksuz I, Zimmer VA, Schnabel JA, King AP (2020) A topological loss function for deep-learning based image segmentation using persistent homology. IEEE Trans Pattern Anal Mach Intell 44(12):8766–8778. https://doi.org/10.1109/TPAMI.2020.3013679
Article Google Scholar
Li J, Zhang L, Shu X, Teng Y, Xu J (2022) Multi-instance learning based on spatial continuous category representation for case-level meningioma grading in mri images. Appl Intell 1–14. https://doi.org/10.1007/s10489-022-04114-x
Zhou Z, Siddiquee MMR, Tajbakhsh N, Liang J (2018) Unet++: A nested u-net architecture for medical image segmentation. Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support: 4th International Workshop, DLMIA 2018, and 8th International Workshop, ML-CDS 2018, held in conjunction with MICCAI 2018, Granada, Spain 11045:3–11. https://doi.org/10.1007/978-3-030-00889-5_1
Huang H, Lin L, Tong R, Hu H, Zhang Q, Iwamoto Y, Han X, Chen YW, Wu J (2020) Unet 3+: a full-scale connected unet for medical image segmentation. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Conference Proceedings, pp 1055–1059. https://doi.org/10.1109/ICASSP40776.2020.9053405
Pal D, Reddy PB, Roy S (2022) Attention uw-net: a fully connected model for automatic segmentation and annotation of chest x-ray. Comput Biol Med 150:106083. https://doi.org/10.1016/j.compbiomed.2022.106083
Article Google Scholar
Hu Q, Wei Y, Li X, Wang C, Wang H, Wang S (2023) Svf-net: spatial and visual feature enhancement network for brain structure segmentation. Appl Intell 53(4):4180–4200. https://doi.org/10.1007/s10489-022-03706-x
Article Google Scholar
Ma M, Xia H, Tan Y, Li H, Song S (2022) Ht-net: hierarchical context-attention transformer network for medical ct image segmentation. Appl Intell 1–14. https://doi.org/10.1007/s10489-021-03010-0
Dhamija T, Gupta A, Gupta S, Anjum, Katarya R, Singh G (2023) Semantic segmentation in medical images through transfused convolution and transformer networks. Appl Intell 53(1):1132–1148. https://doi.org/10.1007/s10489-022-03642-w
Article Google Scholar
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Proc Syst: 30. https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
Dalmaz O, Yurt M, Cukur T (2022) Resvit: residual vision transformers for multimodal medical image synthesis. IEEE Trans Med Imaging 41(10):2598–2614. https://doi.org/10.1109/TMI.2022.3167808
Article Google Scholar
Valanarasu JMJ, Oza P, Hacihaliloglu I, Patel VM (2021) Medical transformer: Gated axial-attention for medical image segmentation. In: Medical Image Computing and Computer Assisted Intervention–MICCAI2021: 24th International Conference, Strasbourg, France, September27–October 1, 2021, Proceedings, Part I 24. Springer, pp 36–46. https://doi.org/10.1007/978-3-030-87193-2_4
Chen J, Lu Y, Yu Q, Luo X, Adeli E, Wang Y, Lu L, Yuille AL, Zhou Y (2021) Transunet: Transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306. https://doi.org/10.48550/arXiv.2102.04306
Wan Y, Cheng Y, Shao M (2023) Mslanet: multi-scale long attention network for skin lesion classification. Appl Intell 53(10):12580–12598. https://doi.org/10.1007/s10489-022-03320-x
Article Google Scholar
Lin A, Chen B, Xu J, Zhang Z, Lu G, Zhang D (2022) Ds-transunet: dual swin transformer u-net for medical image segmentation. IEEE Trans Instrum Meas 71:1–15. https://doi.org/10.1109/TIM.2022.3178991
Article Google Scholar
Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 10012–10022. https://doi.org/10.1109/ICCV48922.2021.00986
Cao H, Wang Y, Chen J, Jiang D, Zhang X, Tian Q, Wang M (2022) Swin-unet: Unet-like pure transformer for medical image segmentation. In: European Conference on Computer Vision. Springer, pp. 205–218. https://doi.org/10.1007/978-3-031-25066-8_9
He X, Zhou Y, Zhao J, Zhang D, Yao R, Xue Y (2022) Swin transformer embedding unet for remote sensing image semantic segmentation. IEEE Trans Geo Sci Remote Sens 60:1–15. https://doi.org/10.1109/TGRS.2022.3144165
Article Google Scholar
Zhang J, Qin Q, Ye Q, Ruan T (2023) St-unet: Swin transformer boosted u-net with cross-layer feature enhancement for medical image segmentation. Comput Biol Med 153:106516. https://doi.org/10.1016/j.compbiomed.2022.106516
Article Google Scholar
Sohl-Dickstein J, Weiss E, Maheswaranathan N, Ganguli S (2015) Deep unsupervised learning using nonequilibrium thermodynamics. In: International Conference on Machine Learning. PMLR, pp 2256–2265
Avrahami O, Lischinski D, Fried O (2022) Blended diffusion for text driven editing of natural images. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp18208–18218. https://doi.org/10.1109/CVPR52688.2022.01767
Rombach R, Blattmann A, Lorenz D, Esser P, Ommer B (2022) High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 10684–10695. https://doi.org/10.1109/CVPR52688.2022.01042
Croitoru F-A, Hondru V, Ionescu RT, Shah M (2023) Diffusion models in vision: a survey. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2023.3261988
Article Google Scholar
Ozbey M, Dalmaz O, Dar SU, Bedel HA, Ozturk S, Gungor A, Cukur T (2023) Unsupervised medical image translation with adversarial diffusion models. IEEE Trans Med Imaging. https://doi.org/10.1109/TMI.2023.3290149
Article Google Scholar
Saharia C, Ho J, Chan W, Salimans T, Fleet DJ, Norouzi M (2022) Image super-resolution via iterative refinement. IEEE Trans Pattern Anal Mach Intell 45(4):4713–4726. https://doi.org/10.1109/TPAMI.2022.3204461
Article Google Scholar
Ho J, Jain A, Abbeel P (2020) Denoising diffusion probabilistic models. Adv Neural Inf Process Syst 33:6840–6851
Google Scholar
Dhariwal P, Nichol A (2021) Diffusion models beat gans on image synthesis. Adv Neural Inf Process Syst 34:8780–8794
Google Scholar
Nichol AQ, Dhariwal P (2021) Improved denoising diffusion probabilistic models. In: International Conference on Machine Learning, pp 8162–8171. PMLR
Amit T, Shaharbany T, Nachmani E, Wolf L (2021) Segdiff: image segmentation with diffusion probabilistic models. arXiv preprintarXiv:211200390. https://doi.org/10.48550/arXiv.2112.00390
Wu J, Fu R, Fang H, Zhang Y, Yang Y, Xiong H, Liu H, Xu Y (2024) Medsegdiff: Medical image segmentation with diffusion probabilistic model. In: Medical Imaging with Deep Learning, pp 1623–1639. PMLR
Guo X, Yang Y, Ye C, Lu S, Peng B, Huang H, Xiang Y, Ma T (2023) Accelerating diffusion models via presegmentation diffusion sampling for medical image segmentation. In: 2023 IEEE 20th International Symposium on Biomedical Imaging (ISBI). IEEE, pp 1–5. https://doi.org/10.1109/ISBI53787.2023.10230524
Abbaszadeh Shahri A, Maghsoudi Moud F (2021) Landslide susceptibility mapping using hybridized block modular intelligence model. Bull Eng Geol Environ 80:267–284. https://doi.org/10.1007/s10064-020-01922-8
Article Google Scholar
Zou BJ, Guo YD, He Q, Ouyang PB, Liu K, Chen ZL (2018) 3d filtering by block matching and convolutional neural network for image denoising. J Comput Sci Technol 33:838–848. https://doi.org/10.1007/s11390-018-1859-7
Article Google Scholar
Zhou J, Ni J, Rao Y (2017) Block-based convolutional neural network for image forgery detection. In: Digital Forensics and Watermarking: 16th International Workshop, IWDW 2017, Magdeburg, Germany, August 23–25, 2017, Proceedings 16, pages 65–76. Springer. https://doi.org/10.1007/978-3-319-64185-0_6
Hosseini SA, Abbaszadeh Shahri A, Asheghi R (2022) Prediction of bedload transport rate using a block combined network structure. Hydrol Sci J 67(1):117–128. https://doi.org/10.1080/02626667.2021.2003367
Article Google Scholar
Ibtehaz N, Rahman MS (2020) Multiresunet: Rethinking the u-net architecture for multimodal biomedical image segmentation. Neural Netw 121:74–87. https://doi.org/10.1016/j.neunet.2019.08.025
Article Google Scholar
Li X, Chen H, Qi X, Dou Q, Fu CW, Heng PA (2018) H-dense unet: hybrid densely connected unet for liver and tumor segmentation from ct volumes. IEEE Trans Med Imaging 37(12):2663–2674. https://doi.org/10.1109/TMI.2018.2845918
Article Google Scholar
Touvron H, Cord M, Douze M, Massa F, Sablayrolles A, Jegou H (2021) Training data-efficient image transformers & distillation through attention. In: International conference on machine learning. PMLR. pp 10347–10357
Han K, Xiao A, Wu E, Guo J, Xu C, Wang Y (2021) Transformer in transformer. Adv Neural Inf Process Syst 34:15908–15919
Google Scholar
Wang W, XieE, Li X, Fan DP, Song K, Liang D, Lu T, Luo P, Shao L (2021) Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 568–578. https://doi.org/10.1109/ICCV48922.2021.00061
Wolleb J, Sandkuhler R, Bieder F, Valmaggia P, Cattin PC (2022) Diffusion models for implicit image segmentation ensembles. In: International Conference on Medical Imaging with Deep Learning. PMLR, pp 1336–1348
Kynkaanniemi T, Karras T, Laine S, Lehtinen J, Aila T (2019) Improved precision and recall metric for assessing generative models. Adv Neural Inf Proc Syst: 32. https://proceedings.neurips.cc/paper_files/paper/2019/file/0234c510bc6d908b28c70ff313743079-Paper.pdf
Ma J, He Y, Li F, Han L, You C, Wang B (2024) Segment anything in medical images. Nat Commun 15(1):654. https://doi.org/10.1038/s41467-024-44824-z
Article Google Scholar
Abbaszadeh Shahri A, Shan C, Larsson S (2022) A novel approach to uncertainty quantification in groundwater table modeling by automated predictive deep learning. Nat Resour Res 31(3):1351–1373. https://doi.org/10.1007/s11053-022-10051-w
Article Google Scholar

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China under Grants 61771220 and 62271226.

Author information

Authors and Affiliations

School of Communication Engineering, Jilin University, Changchun, 130012, China
Xiaoxiao Liu, Yan Zhao, Shigang Wang & Jian Wei

Authors

Xiaoxiao Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yan Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Shigang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jian Wei
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Conceptualization: Xiaoxiao Liu; Methodology: Xiaoxiao Liu; Formal analysis and investigation: Xiaoxiao Liu; Writing draft preparation: Xiaoxiao Liu; Review: Yan Zhao, Shigang Wang; Funding acquisition: Yan Zhao, Jian Wei.

Corresponding author

Correspondence to Yan Zhao.

Ethics declarations

Ethical approval

Not applicable.

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Liu, X., Zhao, Y., Wang, S. et al. TransDiff: medical image segmentation method based on Swin Transformer with diffusion probabilistic model. Appl Intell (2024). https://doi.org/10.1007/s10489-024-05496-w

Download citation

Accepted: 29 April 2024
Published: 18 May 2024
DOI: https://doi.org/10.1007/s10489-024-05496-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

TransDiff: medical image segmentation method based on Swin Transformer with diffusion probabilistic model

Abstract

Access this article

Similar content being viewed by others

BerDiff: Conditional Bernoulli Diffusion Model for Medical Image Segmentation

PHiSeg: Capturing Uncertainty in Medical Image Segmentation

Variational Models and Their Combinations with Deep Learning in Medical Image Segmentation: A Survey

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethical approval

Conflict of interest

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

TransDiff: medical image segmentation method based on Swin Transformer with diffusion probabilistic model

Abstract

Access this article

Similar content being viewed by others

BerDiff: Conditional Bernoulli Diffusion Model for Medical Image Segmentation

PHiSeg: Capturing Uncertainty in Medical Image Segmentation

Variational Models and Their Combinations with Deep Learning in Medical Image Segmentation: A Survey

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethical approval

Conflict of interest

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation