Skip to main content
Log in

Shifted window-based Transformer with multimodal representation for the systematic staging of rectal cancer

  • Original Research Paper
  • Published:
Service Oriented Computing and Applications Aims and scope Submit manuscript

Abstract

Systematic staging of rectal cancer aims to determine tumor invasion degree and lymph node metastasis (LNM) status. Artificial intelligence technologies can aid physicians in making more accurate therapeutic decisions. Current research on rectal cancer segmentation primarily relies on convolutional neural networks. However, convolution operations’ limitations often result in ineffective capture of long-distance dependencies. Moreover, existing LNM diagnosis methods typically necessitate manual extraction of radiomics features from rectal cancer lesions. However, the efficacy of these features heavily depends on the specific dataset employed. In this paper, we propose a Transformer-based multi-modal rectal cancer diagnostic framework. This framework employs the hierarchical feature representation of the Swin Transformer to accurately segment tumors and adaptively extracts multi-scale features for LNM diagnosis. Compared to the current state-of-the-art models, our model has improved the accuracy of tumor segmentation and LNM classification by 3.62% and 4.10%, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Sung H, Ferlay J, Siegel RL et al (2021) Global cancer statistics 2020: Globocan estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 71(3):209–249

    Article  Google Scholar 

  2. Rawla P, Sunkara T, Barsouk A (2019) Epidemiology of colorectal cancer: incidence, mortality, survival, and risk factors. Gastroenterol Rev/Przegląd Gastroenterologiczny 14(2):89–103

    Google Scholar 

  3. Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5–9, 2015, Proceedings, Part III 18, 234–241, Springer

  4. He K, Zhang X, Ren S et al (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916

    Article  Google Scholar 

  5. Xiao X, Lian S, Luo Z et al (2018) Weighted res-unet for high-quality retina vessel segmentation. In: 2018 9th international conference on information technology in medicine and education (ITME), 327–331, IEEE

  6. Çiçek Ö, Abdulkadir A, Lienkamp SS et al (2016) 3d u-net: learning dense volumetric segmentation from sparse annotation. In: Medical image computing and computer-assisted intervention–MICCAI 2016: 19th international conference, Athens, Greece, October 17–21, 2016, Proceedings, Part II 19, 424–432, Springer

  7. Zhou Z, Rahman Siddiquee MM, Tajbakhsh N et al (2018) Unet++: a nested u-net architecture for medical image segmentation. In: Deep learning in medical image analysis and multimodal learning for clinical decision support: 4th international workshop, DLMIA 2018, and 8th International Workshop, ML-CDS 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, September 20, 2018, Proceedings 4, 3–11, Springer

  8. Huang H, Lin L, Tong R et al (2020) Unet 3+: a full-scale connected unet for medical image segmentation. In: ICASSP 2020-2020 IEEE international conference on acoustics, speech and signal processing (ICASSP), 1055–1059, IEEE

  9. Hartwig M, Bräuner KB, Vogelsang R et al (2022) Preoperative prediction of lymph node status in patients with colorectal cancer. developing a predictive model using machine learning. Int J Colorectal Dis 37(12):2517–2524

    Article  Google Scholar 

  10. Srivastava S, Vidyarthi A, Jain S (2023) Analytical study of the encoder–decoder models for ultrasound image segmentation, Service Oriented Computing and Applications , 1–20, Springer

  11. Wang J, Lu J, Qin G et al (2018) A deep learning-based autosegmentation of rectal tumors in MR images. Med Phys 45(6):2560–2564

    Article  Google Scholar 

  12. Men K, Boimel P, Janopaul-Naylor J et al (2018) Cascaded atrous convolution and spatial pyramid pooling for more accurate tumor target segmentation for rectal cancer radiotherapy. Phys. Med. Biol. 63(18):185016

    Article  Google Scholar 

  13. Ma S, Lu H, Jing G et al (2023) Deep learning-based clinical-radiomics nomogram for preoperative prediction of lymph node metastasis in patients with rectal cancer: a two-center study. Front Med 10

  14. Li Z-Y, Wang X-D, Li M et al (2020) Multi-modal radiomics model to predict treatment response to neoadjuvant chemotherapy for locally advanced rectal cancer. World J Gastroenterol 26(19):2388

    Article  Google Scholar 

  15. Li J, Zhou Y, Wang P et al (2021) Deep transfer learning based on magnetic resonance imaging can improve the diagnosis of lymph node metastasis in patients with rectal cancer. Quant Imaging Med Surg 11(6):2477

    Article  Google Scholar 

  16. Ding L, Liu G, Zhang X et al (2020) A deep learning nomogram kit for predicting metastatic lymph nodes in rectal cancer. Cancer Med 9(23):8809–8820

    Article  Google Scholar 

  17. Hsu W-W, Wu Y, Chen C-H et al (2023) A computer-aided diagnosis system for breast pathology: a deep learning approach with model interpretability from pathological perspective. Service Oriented Computing and Applications , 1–11, Springer

  18. Trebeschi S, van Griethuysen JJ, Lambregts DM et al (2017) Deep learning for fully-automated localization and segmentation of rectal cancer on multiparametric MR. Sci Rep 7(1):5301

    Article  Google Scholar 

  19. Li D, Chu X, Cui Y et al (2022) Improved u-net based on contour prediction for efficient segmentation of rectal cancer. Comput Methods Programs Biomed 213:106493

    Article  Google Scholar 

  20. Cao H, Wang Y, Chen J et al (2022) Swin-unet: Unet-like pure transformer for medical image segmentation. In: European conference on computer vision, 205–218, Springer

  21. Liu Z, Lin Y, Cao Y et al (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF international conference on computer vision, 10012–10022

  22. Chen J, Lu Y, Yu Q et al (2021) Transunet: transformers make strong encoders for medical image segmentation, arXiv preprint arXiv:2102.04306

  23. Dosovitskiy A, Beyer L, Kolesnikov A et al (2020) An image is worth 16\(\times \)16 words: transformers for image recognition at scale, arXiv preprint arXiv:2010.11929

  24. Oktay O, Schlemper J, Folgoc LL et al (1804) Attention u-net: learning where to look for the pancreas. arxiv 2018, arXiv preprint arXiv:1804.03999

  25. Alom MZ, Yakopcic C, Hasan M et al (2019) Recurrent residual u-net for medical image segmentation. J Med Imaging 6(1):014006–014006

    Article  Google Scholar 

  26. Diakogiannis FI, Waldner F, Caccetta P et al (2020) ResUNet-a: a deep learning framework for semantic segmentation of remotely sensed data. ISPRS J Photogramm Remote Sens 162:94–114

    Article  Google Scholar 

  27. Zhao C, Shuai R, Ma L et al (2021) Segmentation of dermoscopy images based on deformable 3d convolution and resu-next++. Med Biol Eng Comput 59(9):1815–1832

    Article  Google Scholar 

  28. Yuan L, Liu Y, Feng H-M (2023) Parkinson disease prediction using machine learning-based features from speech signal. Service Oriented Computing and Applications , 1–7, Springer

  29. Rish I et al (2001) An empirical study of the naive bayes classifier. In: IJCAI 2001 workshop on empirical methods in artificial intelligence 3(22):41–46

  30. Qi Y (2012) Random forest for bioinformatics, Ensemble machine learning: Methods and applications , 307–323, Springer

  31. Liu S, Huang D et al (2018) Receptive field block net for accurate and fast object detection. In: Proceedings of the European conference on computer vision (ECCV), pp 385–400

  32. Chen L-C, Papandreou G, Schroff F et al (2017) Rethinking atrous convolution for semantic image segmentation, arXiv preprint arXiv:1706.05587

  33. Hatamizadeh A, Tang Y, Nath V et al (2022) Unetr: transformers for 3d medical image segmentation. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 574–584

Download references

Funding

No funding was obtained for this study.

Author information

Authors and Affiliations

Authors

Contributions

Conception and design were performed by Haoyu Wang and Peihong Li; analysis and interpretation of the data by Haoyu Wang; drafting of the article by Haoyu Wang and Peihong Li; final approval of the article by all the authors.

Corresponding author

Correspondence to Haoyu Wang.

Ethics declarations

Conflict of interest

The author has no conflict of interest to declare that is relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, H., Li, P. Shifted window-based Transformer with multimodal representation for the systematic staging of rectal cancer. SOCA (2024). https://doi.org/10.1007/s11761-024-00400-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11761-024-00400-3

Keywords

Navigation