A Multi Context Decoder-based Network with Applications for Polyp Segmentation in Colonoscopy Images

Tran, Ngoc-Du; Nguyen, Dinh-Quoc-Dai; Nguyen, Ngoc-Linh-Chi; Pham, Van-Truong; Tran, Thi-Thao

doi:10.1007/978-3-031-36886-8_13

Ngoc-Du Tran¹³,
Dinh-Quoc-Dai Nguyen¹³,
Ngoc-Linh-Chi Nguyen¹³,
Van-Truong Pham¹³ &
…
Thi-Thao Tran¹³

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 734))

Included in the following conference series:

Conference on Information Technology and its Applications

232 Accesses

Abstract

Polyp Segmentation is important in helping doctors diagnose and provide an accurate treatment plan. With the emerging of deep learning technology in the last decade, deep learning models especially Unet and its evolved versions, for medical segmentation task have achieved superior results compared to previous traditional methods. To preserve location information, Unet-based models use connections between feature maps of the same resolution of encoder and decoder. However, using the same resolution connections has two problems: 1) High-resolution feature maps on the encoder side contain low-level information. In contrast, high-resolution feature maps on the decoder side contain high-level information that leads to an imbalance in terms of semantic information when connecting. 2) In medical images, objects such as tumours and cells often have diverse sizes, so to be able to segment objects correctly, the use of context information on a scale of the feature map encoder during the decoding process is not enough, so it is necessary to use context information on full-scale. In this paper, we propose a model called CTDCFormer that uses the PvitV2_B3 model as the backbone encoder to extract global information about the object. In order to exploit the full-scale context information of the encoder, we propose the GCF module using the lightweight attention mechanism between the decoder’s feature map and the encoder’s four feature maps. Our model CTDCFormer achieves superior results compared to other state of the arts, with the Dice scores up to 94.1% on the Kvasir-SEG set, and 94.7% on the CVC-ClinicDB set.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.00; Price excludes VAT (USA)

Softcover Book: USD 179.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W., Frangi, A. (eds.) Medical Image Computing and Computer-Assisted Intervention– MICCAI 2015. MICCAI 2015. Lecture Notes in Computer Science, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Jha, D., et al.: Kvasir-seg: a segmented polyp dataset. In: International Conference on Multimedia Modeling. Springer, pp. 451–462 (2020)
Google Scholar
Jha, D. , et al.: Resunet++: an advanced architecture for medical image segmentation. In: 2019 IEEE International Symposium on Multimedia (ISM), pp. 225–2255. IEEE (2019)
Google Scholar
Ibtehaz, N., Rahman, M.S.: Multiresunet: rethinking the u-net architecture for multimodal biomedical image segmentation. Neural Netw. 121, 74–87 (2020)
Article Google Scholar
Zhou, Z., Rahman Siddiquee, M.M., Tajbakhsh, N., Liang, J.: Unet++: a nested u-net architecture for medical image segmentation. In: Deep learning in medical image analysis and multimodal learning for clinical decision support, pp. 3–11. Springer (2018)
Google Scholar
Huang, H.: Unet 3+: a full-scale connected unet for medical image segmentation. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1055–1059. IEEE (2020)
Google Scholar
Vaswani, A.: Attention is all you need. Adv. Neural Inf. Process. Syst 30 (2017)
Google Scholar
Dosovitskiy, A., et al.: An image is worth 16\(\times \)16 words: Transformers for image recognition at scale, arXiv preprint arXiv:2010.11929 (2020)
Cao, H.: Swin-unet: Unet-like pure transformer for medical image segmentation. arXiv preprint arXiv:2105.05537 (2021)
Wang, W., et al.: Pvt v2: improved baselines with pyramid vision transformer. Comput. Vis. Media 8(3), 415–424 (2022)
Article Google Scholar
Chen, J.: Transunet: Transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306 (2021)
Hille, G., Agrawal, S., Wybranski, C., Pech, M., Surov, A., Saalfeld, S.: Joint liver and hepatic lesion segmentation using a hybrid CNN with transformer layers. arXiv preprint arXiv:2201.10981 (2022)
Jha, D.: Kvasir-seg: a segmented polyp dataset. In: International Conference on Multimedia Modeling. Springer, pp. 451–462 (2020)
Google Scholar
Bernal, J., Sánchez, F.J., Fernández-Esparrach, G., Gil, D., Rodríguez, C., Vilariño, F.: WM-dova maps for accurate polyp highlighting in colonoscopy: validation vs. saliency maps from physicians. Comput. Med. Imaging Graph. 43, 99–111 (2015)
Article Google Scholar
Tajbakhsh, N., Gurudu, S.R., Liang, J.: Automated polyp detection in colonoscopy videos using shape and context information. IEEE Trans. Med. Imaging 35(2), 630–644 (2015)
Article Google Scholar
Vázquez, D.: A benchmark for endoluminal scene segmentation of colonoscopy images. J. Healthcare Eng. (2017)
Google Scholar
Silva, J., Histace, A., Romain, O., Dray, X., Granado, B.: Toward embedded detection of polyps in WCE images for early diagnosis of colorectal cancer. Int. J. Comput. Assist. Radiol. Surg. 9(2), 283–293 (2014)
Article Google Scholar
Wang, W.: Pyramid vision transformer: a versatile backbone for dense prediction without convolutions. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 568–578 (2021)
Google Scholar
Xie, E., Wang, W., Yu, Z., Anandkumar, A., Alvarez, J.M., Luo, P.: Segformer: simple and efficient design for semantic segmentation with transformers. Adv. Neural Inf. Process. Syst. 34, 12077–12090 (2021)
Google Scholar
Rahman, M.M., Marculescu, R.: Medical image segmentation via cascaded attention decoding. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 6222–6231 (2023)
Google Scholar
Wang, J., Huang, Q., Tang, F., Meng, J., Su, J., Song, S.: Stepwise feature fusion: Local guides global. arXiv preprint arXiv:2203.03635 (2022)
Le, T.-K., Tran, T.-T., Pham, V.-T., et al.: Msma-net: a multi-scale multidirectional adaptation network for polyp segmentation. In: 2022 RIVF International Conference on Computing and Communication Technologies (RIVF). IEEE, pp. 629–634 (2022)
Google Scholar
Srivastava, A., et al.: MSRF-net: a multi-scale residual fusion network for biomedical image segmentation. IEEE J. Biomed Health Inform. 26(5), 2252–2263 (2021)
Google Scholar
Wang, H., Cao, P., Wang, J., Zaiane, O.R.: Uctransnet: rethinking the skip connections in u-net from a channel-wise perspective with transformer. Proc. AAAI Conf. Artif. Intell. 36(3), 2441–2449 (2022)
Google Scholar
Fan, D.-P., et al.: Pranet: parallel reverse attention network for polyp segmentation. In: Medical Image Computing and Computer Assisted Intervention-MICCAI 2020: 23rd International Conference, Lima, Peru, October 4–8, 2020, Proceedings, Part VI 23, pp. 263-273. Springer (2020)
Google Scholar
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-cam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 618–626 (2017)
Google Scholar

Download references

Acknowledgment

This research is funded by Vietnam National Foundation for Science and Technology Development (NAFOSTED) under grant number 102.05-2021.34.

Author information

Authors and Affiliations

Department of Automation Engineering, School of Electrical and Electronic Engineering, Hanoi University of Science and Technology, Hanoi, Vietnam
Ngoc-Du Tran, Dinh-Quoc-Dai Nguyen, Ngoc-Linh-Chi Nguyen, Van-Truong Pham & Thi-Thao Tran

Authors

Ngoc-Du Tran
View author publications
You can also search for this author in PubMed Google Scholar
Dinh-Quoc-Dai Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Ngoc-Linh-Chi Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Van-Truong Pham
View author publications
You can also search for this author in PubMed Google Scholar
Thi-Thao Tran
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thi-Thao Tran .

Editor information

Editors and Affiliations

Wroclaw University of Science and Technology, Wroclaw, Poland
Ngoc Thanh Nguyen
Northumbria University, Newcastle, UK
Hoa Le-Minh
The University of Danang – Vietnam-Korea University of Information and Communication Technology, Danang, Vietnam
Cong-Phap Huynh
The University of Danang – Vietnam-Korea University of Information and Communication Technology, Danang, Vietnam
Quang-Vu Nguyen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tran, ND., Nguyen, DQD., Nguyen, NLC., Pham, VT., Tran, TT. (2023). A Multi Context Decoder-based Network with Applications for Polyp Segmentation in Colonoscopy Images. In: Nguyen, N.T., Le-Minh, H., Huynh, CP., Nguyen, QV. (eds) The 12th Conference on Information Technology and Its Applications. CITA 2023. Lecture Notes in Networks and Systems, vol 734. Springer, Cham. https://doi.org/10.1007/978-3-031-36886-8_13

Download citation

DOI: https://doi.org/10.1007/978-3-031-36886-8_13
Published: 26 July 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-36885-1
Online ISBN: 978-3-031-36886-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics