Skip to main content

A Multi Context Decoder-based Network with Applications for Polyp Segmentation in Colonoscopy Images

  • Conference paper
  • First Online:
The 12th Conference on Information Technology and Its Applications (CITA 2023)

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 734))

Included in the following conference series:

  • 228 Accesses


Polyp Segmentation is important in helping doctors diagnose and provide an accurate treatment plan. With the emerging of deep learning technology in the last decade, deep learning models especially Unet and its evolved versions, for medical segmentation task have achieved superior results compared to previous traditional methods. To preserve location information, Unet-based models use connections between feature maps of the same resolution of encoder and decoder. However, using the same resolution connections has two problems: 1) High-resolution feature maps on the encoder side contain low-level information. In contrast, high-resolution feature maps on the decoder side contain high-level information that leads to an imbalance in terms of semantic information when connecting. 2) In medical images, objects such as tumours and cells often have diverse sizes, so to be able to segment objects correctly, the use of context information on a scale of the feature map encoder during the decoding process is not enough, so it is necessary to use context information on full-scale. In this paper, we propose a model called CTDCFormer that uses the PvitV2_B3 model as the backbone encoder to extract global information about the object. In order to exploit the full-scale context information of the encoder, we propose the GCF module using the lightweight attention mechanism between the decoder’s feature map and the encoder’s four feature maps. Our model CTDCFormer achieves superior results compared to other state of the arts, with the Dice scores up to 94.1% on the Kvasir-SEG set, and 94.7% on the CVC-ClinicDB set.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
USD 139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 179.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions


  1. Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W., Frangi, A. (eds.) Medical Image Computing and Computer-Assisted Intervention– MICCAI 2015. MICCAI 2015. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Cham (2015).

  2. Jha, D., et al.: Kvasir-seg: a segmented polyp dataset. In: International Conference on Multimedia Modeling. Springer, pp. 451–462 (2020)

    Google Scholar 

  3. Jha, D. , et al.: Resunet++: an advanced architecture for medical image segmentation. In: 2019 IEEE International Symposium on Multimedia (ISM), pp. 225–2255. IEEE (2019)

    Google Scholar 

  4. Ibtehaz, N., Rahman, M.S.: Multiresunet: rethinking the u-net architecture for multimodal biomedical image segmentation. Neural Netw. 121, 74–87 (2020)

    Article  Google Scholar 

  5. Zhou, Z., Rahman Siddiquee, M.M., Tajbakhsh, N., Liang, J.: Unet++: a nested u-net architecture for medical image segmentation. In: Deep learning in medical image analysis and multimodal learning for clinical decision support, pp. 3–11. Springer (2018)

    Google Scholar 

  6. Huang, H.: Unet 3+: a full-scale connected unet for medical image segmentation. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1055–1059. IEEE (2020)

    Google Scholar 

  7. Vaswani, A.: Attention is all you need. Adv. Neural Inf. Process. Syst 30 (2017)

    Google Scholar 

  8. Dosovitskiy, A., et al.: An image is worth 16\(\times \)16 words: Transformers for image recognition at scale, arXiv preprint arXiv:2010.11929 (2020)

  9. Cao, H.: Swin-unet: Unet-like pure transformer for medical image segmentation. arXiv preprint arXiv:2105.05537 (2021)

  10. Wang, W., et al.: Pvt v2: improved baselines with pyramid vision transformer. Comput. Vis. Media 8(3), 415–424 (2022)

    Article  Google Scholar 

  11. Chen, J.: Transunet: Transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306 (2021)

  12. Hille, G., Agrawal, S., Wybranski, C., Pech, M., Surov, A., Saalfeld, S.: Joint liver and hepatic lesion segmentation using a hybrid CNN with transformer layers. arXiv preprint arXiv:2201.10981 (2022)

  13. Jha, D.: Kvasir-seg: a segmented polyp dataset. In: International Conference on Multimedia Modeling. Springer, pp. 451–462 (2020)

    Google Scholar 

  14. Bernal, J., Sánchez, F.J., Fernández-Esparrach, G., Gil, D., Rodríguez, C., Vilariño, F.: WM-dova maps for accurate polyp highlighting in colonoscopy: validation vs. saliency maps from physicians. Comput. Med. Imaging Graph. 43, 99–111 (2015)

    Article  Google Scholar 

  15. Tajbakhsh, N., Gurudu, S.R., Liang, J.: Automated polyp detection in colonoscopy videos using shape and context information. IEEE Trans. Med. Imaging 35(2), 630–644 (2015)

    Article  Google Scholar 

  16. Vázquez, D.: A benchmark for endoluminal scene segmentation of colonoscopy images. J. Healthcare Eng. (2017)

    Google Scholar 

  17. Silva, J., Histace, A., Romain, O., Dray, X., Granado, B.: Toward embedded detection of polyps in WCE images for early diagnosis of colorectal cancer. Int. J. Comput. Assist. Radiol. Surg. 9(2), 283–293 (2014)

    Article  Google Scholar 

  18. Wang, W.: Pyramid vision transformer: a versatile backbone for dense prediction without convolutions. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 568–578 (2021)

    Google Scholar 

  19. Xie, E., Wang, W., Yu, Z., Anandkumar, A., Alvarez, J.M., Luo, P.: Segformer: simple and efficient design for semantic segmentation with transformers. Adv. Neural Inf. Process. Syst. 34, 12077–12090 (2021)

    Google Scholar 

  20. Rahman, M.M., Marculescu, R.: Medical image segmentation via cascaded attention decoding. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 6222–6231 (2023)

    Google Scholar 

  21. Wang, J., Huang, Q., Tang, F., Meng, J., Su, J., Song, S.: Stepwise feature fusion: Local guides global. arXiv preprint arXiv:2203.03635 (2022)

  22. Le, T.-K., Tran, T.-T., Pham, V.-T., et al.: Msma-net: a multi-scale multidirectional adaptation network for polyp segmentation. In: 2022 RIVF International Conference on Computing and Communication Technologies (RIVF). IEEE, pp. 629–634 (2022)

    Google Scholar 

  23. Srivastava, A., et al.: MSRF-net: a multi-scale residual fusion network for biomedical image segmentation. IEEE J. Biomed Health Inform. 26(5), 2252–2263 (2021)

    Google Scholar 

  24. Wang, H., Cao, P., Wang, J., Zaiane, O.R.: Uctransnet: rethinking the skip connections in u-net from a channel-wise perspective with transformer. Proc. AAAI Conf. Artif. Intell. 36(3), 2441–2449 (2022)

    Google Scholar 

  25. Fan, D.-P., et al.: Pranet: parallel reverse attention network for polyp segmentation. In: Medical Image Computing and Computer Assisted Intervention-MICCAI 2020: 23rd International Conference, Lima, Peru, October 4–8, 2020, Proceedings, Part VI 23, pp. 263-273. Springer (2020)

    Google Scholar 

  26. Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-cam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 618–626 (2017)

    Google Scholar 

Download references


This research is funded by Vietnam National Foundation for Science and Technology Development (NAFOSTED) under grant number 102.05-2021.34.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Thi-Thao Tran .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tran, ND., Nguyen, DQD., Nguyen, NLC., Pham, VT., Tran, TT. (2023). A Multi Context Decoder-based Network with Applications for Polyp Segmentation in Colonoscopy Images. In: Nguyen, N.T., Le-Minh, H., Huynh, CP., Nguyen, QV. (eds) The 12th Conference on Information Technology and Its Applications. CITA 2023. Lecture Notes in Networks and Systems, vol 734. Springer, Cham.

Download citation

Publish with us

Policies and ethics