Abstract
Medical image segmentation can provide a reliable basis for clinical analysis and diagnosis. However, this task is challenging due to the low contrast, boundary ambiguity between organs or lesions and surrounding tissues, and noise interference of images. To address this challenge, which is unique to medical images, and further improve the segmentation accuracy and precision, a medical image segmentation model (TransDiff) is proposed from the perspective of improving model robustness and enriching semantic information. TransDiff comprises three parts: a variational autoencoder (VAE), a diffusion transformer model and a Swin Transformer. The VAE constructs a latent space to provide an environment for fully extracting and fusing features. The diffusion model predicts and removes noise by inferring semantics through the propagation of information between nodes. The Swin Transformer enriches discriminative features as a conditional part. TransDiff inherits the robustness to noise and missing data of the diffusion model and the feature enrichment of the Swin Transformer, thus exhibiting a higher understanding of semantic information. It performs well on medical datasets with three different image modalities, outperforms existing medical image segmentation methods in terms of segmentation precision and accuracy, and has good generalizability. The codes and trained models will be publicly available at https://github.com/xiaoxiao1997/TransDiff.
Similar content being viewed by others
Data availability
The data used in the paper will be available upon request.
References
Bohlender S, Oksuz I, Mukhopadhyay A (2021) A survey on shape-constraint deep learning for medical image segmentation. IEEE Rev Biomed Eng. https://doi.org/10.1109/RBME.2021.3136343
Shelhamer E, Long J, Darrell T et al (2017) Fully convolutional networks for semantic segmentation. IEEE Trans Pattern Anal Mach Intell 39(4):640–651. https://doi.org/10.1109/TPAMI.2016.2572683
Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5–9, 2015, Proceedings, Part III18. Springer, pp 234–241. https://doi.org/10.1007/978-3-319-24574-4_28
Zhou Y, Chen H, Li Y, Liu Q, Xu X, Wang S, Yap P-T, Shen D (2021) Multi-task learning for segmentation and classification of tumors in 3d automated breast ultrasound images. Med Image Anal 70:101918. https://doi.org/10.1016/j.media.2020.101918
Clough JR, Byrne N, Oksuz I, Zimmer VA, Schnabel JA, King AP (2020) A topological loss function for deep-learning based image segmentation using persistent homology. IEEE Trans Pattern Anal Mach Intell 44(12):8766–8778. https://doi.org/10.1109/TPAMI.2020.3013679
Li J, Zhang L, Shu X, Teng Y, Xu J (2022) Multi-instance learning based on spatial continuous category representation for case-level meningioma grading in mri images. Appl Intell 1–14. https://doi.org/10.1007/s10489-022-04114-x
Zhou Z, Siddiquee MMR, Tajbakhsh N, Liang J (2018) Unet++: A nested u-net architecture for medical image segmentation. Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support: 4th International Workshop, DLMIA 2018, and 8th International Workshop, ML-CDS 2018, held in conjunction with MICCAI 2018, Granada, Spain 11045:3–11. https://doi.org/10.1007/978-3-030-00889-5_1
Huang H, Lin L, Tong R, Hu H, Zhang Q, Iwamoto Y, Han X, Chen YW, Wu J (2020) Unet 3+: a full-scale connected unet for medical image segmentation. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Conference Proceedings, pp 1055–1059. https://doi.org/10.1109/ICASSP40776.2020.9053405
Pal D, Reddy PB, Roy S (2022) Attention uw-net: a fully connected model for automatic segmentation and annotation of chest x-ray. Comput Biol Med 150:106083. https://doi.org/10.1016/j.compbiomed.2022.106083
Hu Q, Wei Y, Li X, Wang C, Wang H, Wang S (2023) Svf-net: spatial and visual feature enhancement network for brain structure segmentation. Appl Intell 53(4):4180–4200. https://doi.org/10.1007/s10489-022-03706-x
Ma M, Xia H, Tan Y, Li H, Song S (2022) Ht-net: hierarchical context-attention transformer network for medical ct image segmentation. Appl Intell 1–14. https://doi.org/10.1007/s10489-021-03010-0
Dhamija T, Gupta A, Gupta S, Anjum, Katarya R, Singh G (2023) Semantic segmentation in medical images through transfused convolution and transformer networks. Appl Intell 53(1):1132–1148. https://doi.org/10.1007/s10489-022-03642-w
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Proc Syst: 30. https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
Dalmaz O, Yurt M, Cukur T (2022) Resvit: residual vision transformers for multimodal medical image synthesis. IEEE Trans Med Imaging 41(10):2598–2614. https://doi.org/10.1109/TMI.2022.3167808
Valanarasu JMJ, Oza P, Hacihaliloglu I, Patel VM (2021) Medical transformer: Gated axial-attention for medical image segmentation. In: Medical Image Computing and Computer Assisted Intervention–MICCAI2021: 24th International Conference, Strasbourg, France, September27–October 1, 2021, Proceedings, Part I 24. Springer, pp 36–46. https://doi.org/10.1007/978-3-030-87193-2_4
Chen J, Lu Y, Yu Q, Luo X, Adeli E, Wang Y, Lu L, Yuille AL, Zhou Y (2021) Transunet: Transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306. https://doi.org/10.48550/arXiv.2102.04306
Wan Y, Cheng Y, Shao M (2023) Mslanet: multi-scale long attention network for skin lesion classification. Appl Intell 53(10):12580–12598. https://doi.org/10.1007/s10489-022-03320-x
Lin A, Chen B, Xu J, Zhang Z, Lu G, Zhang D (2022) Ds-transunet: dual swin transformer u-net for medical image segmentation. IEEE Trans Instrum Meas 71:1–15. https://doi.org/10.1109/TIM.2022.3178991
Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 10012–10022. https://doi.org/10.1109/ICCV48922.2021.00986
Cao H, Wang Y, Chen J, Jiang D, Zhang X, Tian Q, Wang M (2022) Swin-unet: Unet-like pure transformer for medical image segmentation. In: European Conference on Computer Vision. Springer, pp. 205–218. https://doi.org/10.1007/978-3-031-25066-8_9
He X, Zhou Y, Zhao J, Zhang D, Yao R, Xue Y (2022) Swin transformer embedding unet for remote sensing image semantic segmentation. IEEE Trans Geo Sci Remote Sens 60:1–15. https://doi.org/10.1109/TGRS.2022.3144165
Zhang J, Qin Q, Ye Q, Ruan T (2023) St-unet: Swin transformer boosted u-net with cross-layer feature enhancement for medical image segmentation. Comput Biol Med 153:106516. https://doi.org/10.1016/j.compbiomed.2022.106516
Sohl-Dickstein J, Weiss E, Maheswaranathan N, Ganguli S (2015) Deep unsupervised learning using nonequilibrium thermodynamics. In: International Conference on Machine Learning. PMLR, pp 2256–2265
Avrahami O, Lischinski D, Fried O (2022) Blended diffusion for text driven editing of natural images. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp18208–18218. https://doi.org/10.1109/CVPR52688.2022.01767
Rombach R, Blattmann A, Lorenz D, Esser P, Ommer B (2022) High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 10684–10695. https://doi.org/10.1109/CVPR52688.2022.01042
Croitoru F-A, Hondru V, Ionescu RT, Shah M (2023) Diffusion models in vision: a survey. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2023.3261988
Ozbey M, Dalmaz O, Dar SU, Bedel HA, Ozturk S, Gungor A, Cukur T (2023) Unsupervised medical image translation with adversarial diffusion models. IEEE Trans Med Imaging. https://doi.org/10.1109/TMI.2023.3290149
Saharia C, Ho J, Chan W, Salimans T, Fleet DJ, Norouzi M (2022) Image super-resolution via iterative refinement. IEEE Trans Pattern Anal Mach Intell 45(4):4713–4726. https://doi.org/10.1109/TPAMI.2022.3204461
Ho J, Jain A, Abbeel P (2020) Denoising diffusion probabilistic models. Adv Neural Inf Process Syst 33:6840–6851
Dhariwal P, Nichol A (2021) Diffusion models beat gans on image synthesis. Adv Neural Inf Process Syst 34:8780–8794
Nichol AQ, Dhariwal P (2021) Improved denoising diffusion probabilistic models. In: International Conference on Machine Learning, pp 8162–8171. PMLR
Amit T, Shaharbany T, Nachmani E, Wolf L (2021) Segdiff: image segmentation with diffusion probabilistic models. arXiv preprintarXiv:211200390. https://doi.org/10.48550/arXiv.2112.00390
Wu J, Fu R, Fang H, Zhang Y, Yang Y, Xiong H, Liu H, Xu Y (2024) Medsegdiff: Medical image segmentation with diffusion probabilistic model. In: Medical Imaging with Deep Learning, pp 1623–1639. PMLR
Guo X, Yang Y, Ye C, Lu S, Peng B, Huang H, Xiang Y, Ma T (2023) Accelerating diffusion models via presegmentation diffusion sampling for medical image segmentation. In: 2023 IEEE 20th International Symposium on Biomedical Imaging (ISBI). IEEE, pp 1–5. https://doi.org/10.1109/ISBI53787.2023.10230524
Abbaszadeh Shahri A, Maghsoudi Moud F (2021) Landslide susceptibility mapping using hybridized block modular intelligence model. Bull Eng Geol Environ 80:267–284. https://doi.org/10.1007/s10064-020-01922-8
Zou BJ, Guo YD, He Q, Ouyang PB, Liu K, Chen ZL (2018) 3d filtering by block matching and convolutional neural network for image denoising. J Comput Sci Technol 33:838–848. https://doi.org/10.1007/s11390-018-1859-7
Zhou J, Ni J, Rao Y (2017) Block-based convolutional neural network for image forgery detection. In: Digital Forensics and Watermarking: 16th International Workshop, IWDW 2017, Magdeburg, Germany, August 23–25, 2017, Proceedings 16, pages 65–76. Springer. https://doi.org/10.1007/978-3-319-64185-0_6
Hosseini SA, Abbaszadeh Shahri A, Asheghi R (2022) Prediction of bedload transport rate using a block combined network structure. Hydrol Sci J 67(1):117–128. https://doi.org/10.1080/02626667.2021.2003367
Ibtehaz N, Rahman MS (2020) Multiresunet: Rethinking the u-net architecture for multimodal biomedical image segmentation. Neural Netw 121:74–87. https://doi.org/10.1016/j.neunet.2019.08.025
Li X, Chen H, Qi X, Dou Q, Fu CW, Heng PA (2018) H-dense unet: hybrid densely connected unet for liver and tumor segmentation from ct volumes. IEEE Trans Med Imaging 37(12):2663–2674. https://doi.org/10.1109/TMI.2018.2845918
Touvron H, Cord M, Douze M, Massa F, Sablayrolles A, Jegou H (2021) Training data-efficient image transformers & distillation through attention. In: International conference on machine learning. PMLR. pp 10347–10357
Han K, Xiao A, Wu E, Guo J, Xu C, Wang Y (2021) Transformer in transformer. Adv Neural Inf Process Syst 34:15908–15919
Wang W, XieE, Li X, Fan DP, Song K, Liang D, Lu T, Luo P, Shao L (2021) Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 568–578. https://doi.org/10.1109/ICCV48922.2021.00061
Wolleb J, Sandkuhler R, Bieder F, Valmaggia P, Cattin PC (2022) Diffusion models for implicit image segmentation ensembles. In: International Conference on Medical Imaging with Deep Learning. PMLR, pp 1336–1348
Kynkaanniemi T, Karras T, Laine S, Lehtinen J, Aila T (2019) Improved precision and recall metric for assessing generative models. Adv Neural Inf Proc Syst: 32. https://proceedings.neurips.cc/paper_files/paper/2019/file/0234c510bc6d908b28c70ff313743079-Paper.pdf
Ma J, He Y, Li F, Han L, You C, Wang B (2024) Segment anything in medical images. Nat Commun 15(1):654. https://doi.org/10.1038/s41467-024-44824-z
Abbaszadeh Shahri A, Shan C, Larsson S (2022) A novel approach to uncertainty quantification in groundwater table modeling by automated predictive deep learning. Nat Resour Res 31(3):1351–1373. https://doi.org/10.1007/s11053-022-10051-w
Acknowledgements
This work was supported by the National Natural Science Foundation of China under Grants 61771220 and 62271226.
Author information
Authors and Affiliations
Contributions
Conceptualization: Xiaoxiao Liu; Methodology: Xiaoxiao Liu; Formal analysis and investigation: Xiaoxiao Liu; Writing draft preparation: Xiaoxiao Liu; Review: Yan Zhao, Shigang Wang; Funding acquisition: Yan Zhao, Jian Wei.
Corresponding author
Ethics declarations
Ethical approval
Not applicable.
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Liu, X., Zhao, Y., Wang, S. et al. TransDiff: medical image segmentation method based on Swin Transformer with diffusion probabilistic model. Appl Intell (2024). https://doi.org/10.1007/s10489-024-05496-w
Accepted:
Published:
DOI: https://doi.org/10.1007/s10489-024-05496-w