PASPP Medical Transformer for Medical Image Segmentation

Lai, Hong-Phuc; Tran, Thi-Thao; Pham, Van-Truong

doi:10.1007/978-981-19-6631-6_31

Hong-Phuc Lai¹³,
Thi-Thao Tran¹³ &
Van-Truong Pham¹³

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 551))

512 Accesses
1 Citations

Abstract

Medical Transformer (MedT) has recently attracted much attention in medical segmentation as it could perform global context of the image and can work well even with small datasets. However, there are some limitations of MedT such as the big disparity between the information of the encoder and the decoder, the low resolution of input images to effectively execute, and the lack of ability to recognize contextual information in multiple scales. To address such issues, in this study, we propose an architecture that employs progressive atrous spatial pyramid pooling (PASPP) to the MedT architecture, and pointwise atrous convolution layers instead of AvgPooling layers in MedT to make robust pooling operations. In addition, we also change the convolution stem of MedT to help the model to accept a higher resolution of input with the same computational complexity. The proposed model is evaluated on two medical image segmentation datasets including the Glas and Data science bowls 2018. Experiment results show that the proposed approach outperforms other state of the arts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 219.00; Price excludes VAT (USA)

Softcover Book: USD 279.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

LeCun Y, Bengio Y et al (1995) Convolutional networks for images, speech, and time series. In: The handbook of brain theory and neural networks, vol 3361
Google Scholar
Krizhevsky A, Sutskever I, Hinton G (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105
Google Scholar
Pham V, Tran T, Wang P, Lo M (2021) Tympanic membrane segmentation in otoscopic images based on fully convolutional network with active contour loss. Signal Image Video Process 15:519–527
Article Google Scholar
Trinh M, Nguyen N, Tran T, Pham V (2022) A deep learning-based approach with image-driven active contour loss for medical image segmentation. In: Proceedings of international conference on data science and applications, pp 1–12
Google Scholar
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Google Scholar
He K, Zhang X, Ren S, Sun J (2016) Identity mappings in deep residual networks. In: European conference on computer vision, pp 630–645
Google Scholar
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440
Google Scholar
Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention, pp 234–241
Google Scholar
Pham V, Tran T, Wang P, Chen P, Lo M (2021) EAR-UNet: a deep learning-based approach for segmentation of tympanic membranes from otoscopic images. Artif Intelli Med 115:102065
Article Google Scholar
Zhou Z, Siddiquee M, Tajbakhsh N, Liang J (2018) Unet++: A nested u-net architecture for medical image segmentation. In: Deep learning in medical image analysis and multimodal learning for clinical decision support, pp 3–11
Google Scholar
Chen L, Papandreou G, Kokkinos I, Murphy K, Yuille A (2014) Semantic image segmentation with deep convolutional nets and fully connected CRFs. ArXiv:1412.7062
Chen L, Papandreou G, Kokkinos I, Murphy K, Yuille A (2017) DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans Pattern Analy Mach Intell 40:834–848
Article Google Scholar
Huang Z, Wang X, Huang L, Huang C, Wei Y, Liu W (2019) Ccnet: Criss-cross attention for semantic segmentation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 603–612
Google Scholar
Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, Zemel R, Bengio Y (2015) Show, attend and tell: neural image caption generation with visual attention. In: International conference on machine learning, pp 2048–2057
Google Scholar
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008
Google Scholar
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S et al (2020) An image is worth \(16 \times 16\) words: Transformers for image recognition at scale. ArXiv:2010.11929
Wang H, Zhu Y, Green B, Adam H, Yuille A, Chen L (2020) Axial-DeepLab: stand-alone axial-attention for panoptic segmentation. In: European conference on computer vision. pp 108–126
Google Scholar
Lin T, Dollár P, Girshick R, He K, Hariharan B, Belongie (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 2117–2125
Google Scholar
Chen L, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European conference on computer vision (ECCV), pp 801–818
Google Scholar
Valanarasu J, Oza P, Hacihaliloglu I, Patel V (2021) Medical transformer: gated axial-attention for medical image segmentation. ArXiv:2102.10662
Malìk P, Krištofìk Š, Knapová K (2020) Instance segmentation model created from three semantic segmentations of mask, boundary and centroid Pixels verified on GlaS dataset. In: 2020 15th Conference on computer science and information systems (FedCSIS), pp 569–576
Google Scholar
Rashno A, Koozekanani D, Drayna P, Nazari B, Sadri S, Rabbani H, Parhi K (2017) Fully automated segmentation of fluid/cyst regions in optical coherence tomography images with diabetic macular edema using neutrosophic sets and graph algorithms. IEEE Trans Biomed Eng 65:989–1001
Google Scholar
Yan Q, Wang B, Gong D, Luo C, Zhao W, Shen J, Shi Q, Jin S, Zhang L, You Z (2020) COVID-19 chest CT image segmentation-a deep convolutional neural network solution. ArXiv:2004.10987
Kingma D, Ba J (2014) Adam: a method for stochastic optimization. ArXiv:1412.6980
Izmailov P, Podoprikhin D, Garipov T, Vetrov D, Wilson A (2018) Averaging weights leads to wider optima and better generalization. ArXiv:1803.05407
Chen L, Papandreou G, Schroff F, Adam H (2017) Rethinking atrous convolution for semantic image segmentation. ArXiv:1706.05587
Jha D, Smedsrud P, Riegler M, Johansen D, De Lange T, Halvorsen P, Johansen, H (2019) Resunet++: an advanced architecture for medical image segmentation. In: 2019 IEEE international symposium on multimedia (ISM), pp 225–2255
Google Scholar
Valanarasu J, Sindagi V, Hacihaliloglu I, Patel V (2020) Kiu-net: towards accurate segmentation of biomedical images using over-complete representations. In: International conference on medical image computing and computer-assisted intervention, pp 363–373
Google Scholar
Tomar N, Jha D, Riegler M, Johansen H, Johansen D, Rittscher J, Halvorsen P, Ali S (2021) FANet: a feedback attention network for improved biomedical image segmentation. ArXiv:2103.17235
Chen B, Liu Y, Zhang Z, Lu G, Zhang D (2021) TransAttUnet: multi-level attention-guided U-Net with transformer for medical image segmentation. ArXiv:2107.05274

Download references

Acknowledgements

This research is funded by the Hanoi University of Science and Technology (HUST) under project number T2021-PC-005.

Author information

Authors and Affiliations

Department of Automation Engineering, School of Electrical and Electronic Engineering, Hanoi University of Science and Technology, Hanoi, Vietnam
Hong-Phuc Lai, Thi-Thao Tran & Van-Truong Pham

Authors

Hong-Phuc Lai
View author publications
You can also search for this author in PubMed Google Scholar
Thi-Thao Tran
View author publications
You can also search for this author in PubMed Google Scholar
Van-Truong Pham
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thi-Thao Tran .

Editor information

Editors and Affiliations

Department of Computer Science and Engineering & Information Technology, Jaypee Institute of Information Technology, Noida, India
Mukesh Saraswat
Department of Computer Science and Engineering, Jadavpur University, Kolkata, India
Chandreyee Chowdhury
Department of Computer Science and Engineering, Jadavpur University, Kolkata, India
Chintan Kumar Mandal
Data Science Institute, University of Technology Sydney, Sydney, NSW, Australia
Amir H. Gandomi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lai, HP., Tran, TT., Pham, VT. (2023). PASPP Medical Transformer for Medical Image Segmentation. In: Saraswat, M., Chowdhury, C., Kumar Mandal, C., Gandomi, A.H. (eds) Proceedings of International Conference on Data Science and Applications. Lecture Notes in Networks and Systems, vol 551. Springer, Singapore. https://doi.org/10.1007/978-981-19-6631-6_31

Download citation

DOI: https://doi.org/10.1007/978-981-19-6631-6_31
Published: 17 February 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-6630-9
Online ISBN: 978-981-19-6631-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics