Abstract
Medical Transformer (MedT) has recently attracted much attention in medical segmentation as it could perform global context of the image and can work well even with small datasets. However, there are some limitations of MedT such as the big disparity between the information of the encoder and the decoder, the low resolution of input images to effectively execute, and the lack of ability to recognize contextual information in multiple scales. To address such issues, in this study, we propose an architecture that employs progressive atrous spatial pyramid pooling (PASPP) to the MedT architecture, and pointwise atrous convolution layers instead of AvgPooling layers in MedT to make robust pooling operations. In addition, we also change the convolution stem of MedT to help the model to accept a higher resolution of input with the same computational complexity. The proposed model is evaluated on two medical image segmentation datasets including the Glas and Data science bowls 2018. Experiment results show that the proposed approach outperforms other state of the arts.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
LeCun Y, Bengio Y et al (1995) Convolutional networks for images, speech, and time series. In: The handbook of brain theory and neural networks, vol 3361
Krizhevsky A, Sutskever I, Hinton G (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105
Pham V, Tran T, Wang P, Lo M (2021) Tympanic membrane segmentation in otoscopic images based on fully convolutional network with active contour loss. Signal Image Video Process 15:519–527
Trinh M, Nguyen N, Tran T, Pham V (2022) A deep learning-based approach with image-driven active contour loss for medical image segmentation. In: Proceedings of international conference on data science and applications, pp 1–12
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
He K, Zhang X, Ren S, Sun J (2016) Identity mappings in deep residual networks. In: European conference on computer vision, pp 630–645
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440
Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention, pp 234–241
Pham V, Tran T, Wang P, Chen P, Lo M (2021) EAR-UNet: a deep learning-based approach for segmentation of tympanic membranes from otoscopic images. Artif Intelli Med 115:102065
Zhou Z, Siddiquee M, Tajbakhsh N, Liang J (2018) Unet++: A nested u-net architecture for medical image segmentation. In: Deep learning in medical image analysis and multimodal learning for clinical decision support, pp 3–11
Chen L, Papandreou G, Kokkinos I, Murphy K, Yuille A (2014) Semantic image segmentation with deep convolutional nets and fully connected CRFs. ArXiv:1412.7062
Chen L, Papandreou G, Kokkinos I, Murphy K, Yuille A (2017) DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans Pattern Analy Mach Intell 40:834–848
Huang Z, Wang X, Huang L, Huang C, Wei Y, Liu W (2019) Ccnet: Criss-cross attention for semantic segmentation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 603–612
Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, Zemel R, Bengio Y (2015) Show, attend and tell: neural image caption generation with visual attention. In: International conference on machine learning, pp 2048–2057
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S et al (2020) An image is worth \(16 \times 16\) words: Transformers for image recognition at scale. ArXiv:2010.11929
Wang H, Zhu Y, Green B, Adam H, Yuille A, Chen L (2020) Axial-DeepLab: stand-alone axial-attention for panoptic segmentation. In: European conference on computer vision. pp 108–126
Lin T, Dollár P, Girshick R, He K, Hariharan B, Belongie (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 2117–2125
Chen L, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European conference on computer vision (ECCV), pp 801–818
Valanarasu J, Oza P, Hacihaliloglu I, Patel V (2021) Medical transformer: gated axial-attention for medical image segmentation. ArXiv:2102.10662
Malìk P, Krištofìk Š, Knapová K (2020) Instance segmentation model created from three semantic segmentations of mask, boundary and centroid Pixels verified on GlaS dataset. In: 2020 15th Conference on computer science and information systems (FedCSIS), pp 569–576
Rashno A, Koozekanani D, Drayna P, Nazari B, Sadri S, Rabbani H, Parhi K (2017) Fully automated segmentation of fluid/cyst regions in optical coherence tomography images with diabetic macular edema using neutrosophic sets and graph algorithms. IEEE Trans Biomed Eng 65:989–1001
Yan Q, Wang B, Gong D, Luo C, Zhao W, Shen J, Shi Q, Jin S, Zhang L, You Z (2020) COVID-19 chest CT image segmentation-a deep convolutional neural network solution. ArXiv:2004.10987
Kingma D, Ba J (2014) Adam: a method for stochastic optimization. ArXiv:1412.6980
Izmailov P, Podoprikhin D, Garipov T, Vetrov D, Wilson A (2018) Averaging weights leads to wider optima and better generalization. ArXiv:1803.05407
Chen L, Papandreou G, Schroff F, Adam H (2017) Rethinking atrous convolution for semantic image segmentation. ArXiv:1706.05587
Jha D, Smedsrud P, Riegler M, Johansen D, De Lange T, Halvorsen P, Johansen, H (2019) Resunet++: an advanced architecture for medical image segmentation. In: 2019 IEEE international symposium on multimedia (ISM), pp 225–2255
Valanarasu J, Sindagi V, Hacihaliloglu I, Patel V (2020) Kiu-net: towards accurate segmentation of biomedical images using over-complete representations. In: International conference on medical image computing and computer-assisted intervention, pp 363–373
Tomar N, Jha D, Riegler M, Johansen H, Johansen D, Rittscher J, Halvorsen P, Ali S (2021) FANet: a feedback attention network for improved biomedical image segmentation. ArXiv:2103.17235
Chen B, Liu Y, Zhang Z, Lu G, Zhang D (2021) TransAttUnet: multi-level attention-guided U-Net with transformer for medical image segmentation. ArXiv:2107.05274
Acknowledgements
This research is funded by the Hanoi University of Science and Technology (HUST) under project number T2021-PC-005.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Lai, HP., Tran, TT., Pham, VT. (2023). PASPP Medical Transformer for Medical Image Segmentation. In: Saraswat, M., Chowdhury, C., Kumar Mandal, C., Gandomi, A.H. (eds) Proceedings of International Conference on Data Science and Applications. Lecture Notes in Networks and Systems, vol 551. Springer, Singapore. https://doi.org/10.1007/978-981-19-6631-6_31
Download citation
DOI: https://doi.org/10.1007/978-981-19-6631-6_31
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-6630-9
Online ISBN: 978-981-19-6631-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)