Abstract
Systematic staging of rectal cancer aims to determine tumor invasion degree and lymph node metastasis (LNM) status. Artificial intelligence technologies can aid physicians in making more accurate therapeutic decisions. Current research on rectal cancer segmentation primarily relies on convolutional neural networks. However, convolution operations’ limitations often result in ineffective capture of long-distance dependencies. Moreover, existing LNM diagnosis methods typically necessitate manual extraction of radiomics features from rectal cancer lesions. However, the efficacy of these features heavily depends on the specific dataset employed. In this paper, we propose a Transformer-based multi-modal rectal cancer diagnostic framework. This framework employs the hierarchical feature representation of the Swin Transformer to accurately segment tumors and adaptively extracts multi-scale features for LNM diagnosis. Compared to the current state-of-the-art models, our model has improved the accuracy of tumor segmentation and LNM classification by 3.62% and 4.10%, respectively.
Similar content being viewed by others
References
Sung H, Ferlay J, Siegel RL et al (2021) Global cancer statistics 2020: Globocan estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 71(3):209–249
Rawla P, Sunkara T, Barsouk A (2019) Epidemiology of colorectal cancer: incidence, mortality, survival, and risk factors. Gastroenterol Rev/Przegląd Gastroenterologiczny 14(2):89–103
Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5–9, 2015, Proceedings, Part III 18, 234–241, Springer
He K, Zhang X, Ren S et al (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916
Xiao X, Lian S, Luo Z et al (2018) Weighted res-unet for high-quality retina vessel segmentation. In: 2018 9th international conference on information technology in medicine and education (ITME), 327–331, IEEE
Çiçek Ö, Abdulkadir A, Lienkamp SS et al (2016) 3d u-net: learning dense volumetric segmentation from sparse annotation. In: Medical image computing and computer-assisted intervention–MICCAI 2016: 19th international conference, Athens, Greece, October 17–21, 2016, Proceedings, Part II 19, 424–432, Springer
Zhou Z, Rahman Siddiquee MM, Tajbakhsh N et al (2018) Unet++: a nested u-net architecture for medical image segmentation. In: Deep learning in medical image analysis and multimodal learning for clinical decision support: 4th international workshop, DLMIA 2018, and 8th International Workshop, ML-CDS 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, September 20, 2018, Proceedings 4, 3–11, Springer
Huang H, Lin L, Tong R et al (2020) Unet 3+: a full-scale connected unet for medical image segmentation. In: ICASSP 2020-2020 IEEE international conference on acoustics, speech and signal processing (ICASSP), 1055–1059, IEEE
Hartwig M, Bräuner KB, Vogelsang R et al (2022) Preoperative prediction of lymph node status in patients with colorectal cancer. developing a predictive model using machine learning. Int J Colorectal Dis 37(12):2517–2524
Srivastava S, Vidyarthi A, Jain S (2023) Analytical study of the encoder–decoder models for ultrasound image segmentation, Service Oriented Computing and Applications , 1–20, Springer
Wang J, Lu J, Qin G et al (2018) A deep learning-based autosegmentation of rectal tumors in MR images. Med Phys 45(6):2560–2564
Men K, Boimel P, Janopaul-Naylor J et al (2018) Cascaded atrous convolution and spatial pyramid pooling for more accurate tumor target segmentation for rectal cancer radiotherapy. Phys. Med. Biol. 63(18):185016
Ma S, Lu H, Jing G et al (2023) Deep learning-based clinical-radiomics nomogram for preoperative prediction of lymph node metastasis in patients with rectal cancer: a two-center study. Front Med 10
Li Z-Y, Wang X-D, Li M et al (2020) Multi-modal radiomics model to predict treatment response to neoadjuvant chemotherapy for locally advanced rectal cancer. World J Gastroenterol 26(19):2388
Li J, Zhou Y, Wang P et al (2021) Deep transfer learning based on magnetic resonance imaging can improve the diagnosis of lymph node metastasis in patients with rectal cancer. Quant Imaging Med Surg 11(6):2477
Ding L, Liu G, Zhang X et al (2020) A deep learning nomogram kit for predicting metastatic lymph nodes in rectal cancer. Cancer Med 9(23):8809–8820
Hsu W-W, Wu Y, Chen C-H et al (2023) A computer-aided diagnosis system for breast pathology: a deep learning approach with model interpretability from pathological perspective. Service Oriented Computing and Applications , 1–11, Springer
Trebeschi S, van Griethuysen JJ, Lambregts DM et al (2017) Deep learning for fully-automated localization and segmentation of rectal cancer on multiparametric MR. Sci Rep 7(1):5301
Li D, Chu X, Cui Y et al (2022) Improved u-net based on contour prediction for efficient segmentation of rectal cancer. Comput Methods Programs Biomed 213:106493
Cao H, Wang Y, Chen J et al (2022) Swin-unet: Unet-like pure transformer for medical image segmentation. In: European conference on computer vision, 205–218, Springer
Liu Z, Lin Y, Cao Y et al (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF international conference on computer vision, 10012–10022
Chen J, Lu Y, Yu Q et al (2021) Transunet: transformers make strong encoders for medical image segmentation, arXiv preprint arXiv:2102.04306
Dosovitskiy A, Beyer L, Kolesnikov A et al (2020) An image is worth 16\(\times \)16 words: transformers for image recognition at scale, arXiv preprint arXiv:2010.11929
Oktay O, Schlemper J, Folgoc LL et al (1804) Attention u-net: learning where to look for the pancreas. arxiv 2018, arXiv preprint arXiv:1804.03999
Alom MZ, Yakopcic C, Hasan M et al (2019) Recurrent residual u-net for medical image segmentation. J Med Imaging 6(1):014006–014006
Diakogiannis FI, Waldner F, Caccetta P et al (2020) ResUNet-a: a deep learning framework for semantic segmentation of remotely sensed data. ISPRS J Photogramm Remote Sens 162:94–114
Zhao C, Shuai R, Ma L et al (2021) Segmentation of dermoscopy images based on deformable 3d convolution and resu-next++. Med Biol Eng Comput 59(9):1815–1832
Yuan L, Liu Y, Feng H-M (2023) Parkinson disease prediction using machine learning-based features from speech signal. Service Oriented Computing and Applications , 1–7, Springer
Rish I et al (2001) An empirical study of the naive bayes classifier. In: IJCAI 2001 workshop on empirical methods in artificial intelligence 3(22):41–46
Qi Y (2012) Random forest for bioinformatics, Ensemble machine learning: Methods and applications , 307–323, Springer
Liu S, Huang D et al (2018) Receptive field block net for accurate and fast object detection. In: Proceedings of the European conference on computer vision (ECCV), pp 385–400
Chen L-C, Papandreou G, Schroff F et al (2017) Rethinking atrous convolution for semantic image segmentation, arXiv preprint arXiv:1706.05587
Hatamizadeh A, Tang Y, Nath V et al (2022) Unetr: transformers for 3d medical image segmentation. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 574–584
Funding
No funding was obtained for this study.
Author information
Authors and Affiliations
Contributions
Conception and design were performed by Haoyu Wang and Peihong Li; analysis and interpretation of the data by Haoyu Wang; drafting of the article by Haoyu Wang and Peihong Li; final approval of the article by all the authors.
Corresponding author
Ethics declarations
Conflict of interest
The author has no conflict of interest to declare that is relevant to the content of this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wang, H., Li, P. Shifted window-based Transformer with multimodal representation for the systematic staging of rectal cancer. SOCA (2024). https://doi.org/10.1007/s11761-024-00400-3
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11761-024-00400-3