Abstract
Polyp Segmentation is important in helping doctors diagnose and provide an accurate treatment plan. With the emerging of deep learning technology in the last decade, deep learning models especially Unet and its evolved versions, for medical segmentation task have achieved superior results compared to previous traditional methods. To preserve location information, Unet-based models use connections between feature maps of the same resolution of encoder and decoder. However, using the same resolution connections has two problems: 1) High-resolution feature maps on the encoder side contain low-level information. In contrast, high-resolution feature maps on the decoder side contain high-level information that leads to an imbalance in terms of semantic information when connecting. 2) In medical images, objects such as tumours and cells often have diverse sizes, so to be able to segment objects correctly, the use of context information on a scale of the feature map encoder during the decoding process is not enough, so it is necessary to use context information on full-scale. In this paper, we propose a model called CTDCFormer that uses the PvitV2_B3 model as the backbone encoder to extract global information about the object. In order to exploit the full-scale context information of the encoder, we propose the GCF module using the lightweight attention mechanism between the decoder’s feature map and the encoder’s four feature maps. Our model CTDCFormer achieves superior results compared to other state of the arts, with the Dice scores up to 94.1% on the Kvasir-SEG set, and 94.7% on the CVC-ClinicDB set.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W., Frangi, A. (eds.) Medical Image Computing and Computer-Assisted Intervention– MICCAI 2015. MICCAI 2015. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Jha, D., et al.: Kvasir-seg: a segmented polyp dataset. In: International Conference on Multimedia Modeling. Springer, pp. 451–462 (2020)
Jha, D. , et al.: Resunet++: an advanced architecture for medical image segmentation. In: 2019 IEEE International Symposium on Multimedia (ISM), pp. 225–2255. IEEE (2019)
Ibtehaz, N., Rahman, M.S.: Multiresunet: rethinking the u-net architecture for multimodal biomedical image segmentation. Neural Netw. 121, 74–87 (2020)
Zhou, Z., Rahman Siddiquee, M.M., Tajbakhsh, N., Liang, J.: Unet++: a nested u-net architecture for medical image segmentation. In: Deep learning in medical image analysis and multimodal learning for clinical decision support, pp. 3–11. Springer (2018)
Huang, H.: Unet 3+: a full-scale connected unet for medical image segmentation. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1055–1059. IEEE (2020)
Vaswani, A.: Attention is all you need. Adv. Neural Inf. Process. Syst 30 (2017)
Dosovitskiy, A., et al.: An image is worth 16\(\times \)16 words: Transformers for image recognition at scale, arXiv preprint arXiv:2010.11929 (2020)
Cao, H.: Swin-unet: Unet-like pure transformer for medical image segmentation. arXiv preprint arXiv:2105.05537 (2021)
Wang, W., et al.: Pvt v2: improved baselines with pyramid vision transformer. Comput. Vis. Media 8(3), 415–424 (2022)
Chen, J.: Transunet: Transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306 (2021)
Hille, G., Agrawal, S., Wybranski, C., Pech, M., Surov, A., Saalfeld, S.: Joint liver and hepatic lesion segmentation using a hybrid CNN with transformer layers. arXiv preprint arXiv:2201.10981 (2022)
Jha, D.: Kvasir-seg: a segmented polyp dataset. In: International Conference on Multimedia Modeling. Springer, pp. 451–462 (2020)
Bernal, J., Sánchez, F.J., Fernández-Esparrach, G., Gil, D., Rodríguez, C., Vilariño, F.: WM-dova maps for accurate polyp highlighting in colonoscopy: validation vs. saliency maps from physicians. Comput. Med. Imaging Graph. 43, 99–111 (2015)
Tajbakhsh, N., Gurudu, S.R., Liang, J.: Automated polyp detection in colonoscopy videos using shape and context information. IEEE Trans. Med. Imaging 35(2), 630–644 (2015)
Vázquez, D.: A benchmark for endoluminal scene segmentation of colonoscopy images. J. Healthcare Eng. (2017)
Silva, J., Histace, A., Romain, O., Dray, X., Granado, B.: Toward embedded detection of polyps in WCE images for early diagnosis of colorectal cancer. Int. J. Comput. Assist. Radiol. Surg. 9(2), 283–293 (2014)
Wang, W.: Pyramid vision transformer: a versatile backbone for dense prediction without convolutions. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 568–578 (2021)
Xie, E., Wang, W., Yu, Z., Anandkumar, A., Alvarez, J.M., Luo, P.: Segformer: simple and efficient design for semantic segmentation with transformers. Adv. Neural Inf. Process. Syst. 34, 12077–12090 (2021)
Rahman, M.M., Marculescu, R.: Medical image segmentation via cascaded attention decoding. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 6222–6231 (2023)
Wang, J., Huang, Q., Tang, F., Meng, J., Su, J., Song, S.: Stepwise feature fusion: Local guides global. arXiv preprint arXiv:2203.03635 (2022)
Le, T.-K., Tran, T.-T., Pham, V.-T., et al.: Msma-net: a multi-scale multidirectional adaptation network for polyp segmentation. In: 2022 RIVF International Conference on Computing and Communication Technologies (RIVF). IEEE, pp. 629–634 (2022)
Srivastava, A., et al.: MSRF-net: a multi-scale residual fusion network for biomedical image segmentation. IEEE J. Biomed Health Inform. 26(5), 2252–2263 (2021)
Wang, H., Cao, P., Wang, J., Zaiane, O.R.: Uctransnet: rethinking the skip connections in u-net from a channel-wise perspective with transformer. Proc. AAAI Conf. Artif. Intell. 36(3), 2441–2449 (2022)
Fan, D.-P., et al.: Pranet: parallel reverse attention network for polyp segmentation. In: Medical Image Computing and Computer Assisted Intervention-MICCAI 2020: 23rd International Conference, Lima, Peru, October 4–8, 2020, Proceedings, Part VI 23, pp. 263-273. Springer (2020)
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-cam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 618–626 (2017)
Acknowledgment
This research is funded by Vietnam National Foundation for Science and Technology Development (NAFOSTED) under grant number 102.05-2021.34.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Tran, ND., Nguyen, DQD., Nguyen, NLC., Pham, VT., Tran, TT. (2023). A Multi Context Decoder-based Network with Applications for Polyp Segmentation in Colonoscopy Images. In: Nguyen, N.T., Le-Minh, H., Huynh, CP., Nguyen, QV. (eds) The 12th Conference on Information Technology and Its Applications. CITA 2023. Lecture Notes in Networks and Systems, vol 734. Springer, Cham. https://doi.org/10.1007/978-3-031-36886-8_13
Download citation
DOI: https://doi.org/10.1007/978-3-031-36886-8_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-36885-1
Online ISBN: 978-3-031-36886-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)