TransDeepLab: Convolution-Free Transformer-Based DeepLab v3+ for Medical Image Segmentation

Azad, Reza; Heidari, Moein; Shariatnia, Moein; Aghdam, Ehsan Khodapanah; Karimijafarbigloo, Sanaz; Adeli, Ehsan; Merhof, Dorit

doi:10.1007/978-3-031-16919-9_9

Reza Azad¹¹,
Moein Heidari¹²,
Moein Shariatnia¹³,
Ehsan Khodapanah Aghdam¹⁴,
Sanaz Karimijafarbigloo¹¹,
Ehsan Adeli¹⁵ &
…
Dorit Merhof^11,16

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13564))

Included in the following conference series:

International Workshop on PRedictive Intelligence In MEdicine

1348 Accesses
31 Citations

Abstract

Convolutional neural networks (CNNs) have been the de facto standard in a diverse set of computer vision tasks for many years. Especially, deep neural networks based on seminal architectures such as U-shaped model with skip-connections or atrous convolution with pyramid pooling have been tailored to a wide range of medical image analysis tasks. The main advantage of such architectures is that they are prone to detaining versatile local features. However, as a general consensus, CNNs fail to capture long-range dependencies and spatial correlations due to the intrinsic property of confined receptive field size of convolution operations. Alternatively, Transformer, profiting from global information modeling that stems from the self-attention mechanism, has recently attained remarkable performance in natural language processing and computer vision. Nevertheless, previous studies prove that both local and global features are critical for a deep model in dense prediction, such as segmenting complicated structures with disparate shapes and configurations. This paper proposes TransDeepLab, a novel DeepLab-like pure Transformer for medical image segmentation. Specifically, we exploit hierarchical Swin-Transformer with shifted windows to extend the DeepLabv3 and model the Atrous Spatial Pyramid Pooling (ASPP) module. A thorough search of the relevant literature yielded that we are the first to model the seminal DeepLab model with a pure Transformer-based model. Extensive experiments on various medical image segmentation tasks verify that our approach performs superior or on par with most contemporary works on an amalgamation of Vision Transformer and CNN-based methods, along with a significant reduction of model complexity. The codes and trained models are publicly available at github.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Asadi-Aghbolaghi, M., Azad, R., Fathy, M., Escalera, S.: Multi-level context gating of embedded collective knowledge for medical image segmentation. arXiv preprint arXiv:2003.05056 (2020)
Azad, R., Asadi-Aghbolaghi, M., Fathy, M., Escalera, S.: Attention Deeplabv3+: multi-level context attention mechanism for skin lesion segmentation. In: Bartoli, A., Fusiello, A. (eds.) ECCV 2020. LNCS, vol. 12535, pp. 251–266. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-66415-2_16
Chapter Google Scholar
Cao, H., et al.: Swin-Unet: Unet-like pure transformer for medical image segmentation. arXiv preprint arXiv:2105.05537 (2021)
Chen, J., et al.: TransUNet: transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306 (2021)
Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Semantic image segmentation with deep convolutional nets and fully connected CRFs. arXiv preprint arXiv:1412.7062 (2014)
Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2017)
Article Google Scholar
Chen, L.C., Papandreou, G., Schroff, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587 (2017)
Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 833–851. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_49
Chapter Google Scholar
Codella, N., et al.: Skin lesion analysis toward melanoma detection 2018: a challenge hosted by the international skin imaging collaboration (ISIC). arXiv preprint arXiv:1902.03368 (2019)
Codella, N.C., et al.: Skin lesion analysis toward melanoma detection: a challenge at the 2017 international symposium on biomedical imaging (ISBI), hosted by the international skin imaging collaboration (ISIC). In: 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018), pp. 168–172. IEEE (2018)
Google Scholar
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Hatamizadeh, A., et al.: UNETR: transformers for 3D medical image segmentation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 574–584 (2022)
Google Scholar
Hu, H., Gu, J., Zhang, Z., Dai, J., Wei, Y.: Relation networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3588–3597 (2018)
Google Scholar
Hu, H., Zhang, Z., Xie, Z., Lin, S.: Local relation networks for image recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3464–3473 (2019)
Google Scholar
Lei, B., et al.: Skin lesion segmentation via generative adversarial networks with dual discriminators. Med. Image Anal. 64, 101716 (2020)
Article Google Scholar
Li, S., Sui, X., Luo, X., Xu, X., Liu, Y., Goh, R.: Medical image segmentation using squeeze-and-expansion transformers. arXiv preprint arXiv:2105.09511 (2021)
Lin, A., Chen, B., Xu, J., Zhang, Z., Lu, G.: DS-TransUNet: dual swin transformer U-Net for medical image segmentation. arXiv preprint arXiv:2106.06716 (2021)
Liu, X., Hu, G., Ma, X., Kuang, H.: An enhanced neural network based on deep metric learning for skin lesion segmentation. In: 2019 Chinese Control and Decision Conference (CCDC), pp. 1633–1638. IEEE (2019)
Google Scholar
Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021)
Google Scholar
Mendonça, T., Ferreira, P.M., Marques, J.S., Marcal, A.R., Rozeira, J.: PH\(^2\) - a dermoscopic image database for research and benchmarking. In: 2013 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 5437–5440. IEEE (2013)
Google Scholar
Milletari, F., Navab, N., Ahmadi, S.A.: V-net: fully convolutional neural networks for volumetric medical image segmentation. In: 2016 Fourth International Conference on 3D Vision (3DV), pp. 565–571. IEEE (2016)
Google Scholar
Oktay, O., et al.: Attention U-Net: learning where to look for the pancreas. arXiv preprint arXiv:1804.03999 (2018)
Reza, A., Moein, H., Yuli, W., Dorit, M.: Contextual attention network: transformer meets U-Net. arXiv preprint arXiv:2203.01932 (2022)
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Valanarasu, J.M.J., Oza, P., Hacihaliloglu, I., Patel, V.M.: Medical transformer: gated axial-attention for medical image segmentation. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12901, pp. 36–46. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87193-2_4
Chapter Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems 30 (2017)
Google Scholar
Wu, H., Chen, S., Chen, G., Wang, W., Lei, B., Wen, Z.: FAT-Net: feature adaptive transformers for automated skin lesion segmentation. Med. Image Anal. 76, 102327 (2022)
Article Google Scholar
Xie, Y., Zhang, J., Shen, C., Xia, Y.: CoTr: efficiently bridging CNN and transformer for 3D medical image segmentation. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12903, pp. 171–180. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87199-4_16
Chapter Google Scholar
Zhang, Z., Liu, Q., Wang, Y.: Road extraction by deep residual U-Net. IEEE Geosci. Remote Sens. Lett. 15(5), 749–753 (2018)
Article Google Scholar
Zhou, Z., Rahman Siddiquee, M.M., Tajbakhsh, N., Liang, J.: UNet++: a nested U-Net architecture for medical image segmentation. In: Stoyanov, D., et al. (eds.) DLMIA/ML-CDS -2018. LNCS, vol. 11045, pp. 3–11. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00889-5_1
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Imaging and Computer Vision, RWTH Aachen University, Aachen, Germany
Reza Azad, Sanaz Karimijafarbigloo & Dorit Merhof
School of Electrical Engineering, Iran University of Science and Technology, Tehran, Iran
Moein Heidari
School of Medicine, Tehran University of Medical Sciences, Tehran, Iran
Moein Shariatnia
Department of Electrical Engineering, Shahid Beheshti University, Tehran, Iran
Ehsan Khodapanah Aghdam
Stanford University, Stanford, USA
Ehsan Adeli
Fraunhofer Institute for Digital Medicine MEVIS, Bremen, Germany
Dorit Merhof

Authors

Reza Azad
View author publications
You can also search for this author in PubMed Google Scholar
Moein Heidari
View author publications
You can also search for this author in PubMed Google Scholar
Moein Shariatnia
View author publications
You can also search for this author in PubMed Google Scholar
Ehsan Khodapanah Aghdam
View author publications
You can also search for this author in PubMed Google Scholar
Sanaz Karimijafarbigloo
View author publications
You can also search for this author in PubMed Google Scholar
Ehsan Adeli
View author publications
You can also search for this author in PubMed Google Scholar
Dorit Merhof
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Reza Azad .

Editor information

Editors and Affiliations

Istanbul Technical University, Istanbul, Turkey
Islem Rekik
Stanford University, Stanford, CA, USA
Ehsan Adeli
Daegu Gyeongbuk Institute of Science and Technology, Daegu, Korea (Republic of)
Sang Hyun Park
IBM Research - Africa, Nairobi, Kenya
Celia Cintas

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Azad, R. et al. (2022). TransDeepLab: Convolution-Free Transformer-Based DeepLab v3+ for Medical Image Segmentation. In: Rekik, I., Adeli, E., Park, S.H., Cintas, C. (eds) Predictive Intelligence in Medicine. PRIME 2022. Lecture Notes in Computer Science, vol 13564. Springer, Cham. https://doi.org/10.1007/978-3-031-16919-9_9

Download citation

DOI: https://doi.org/10.1007/978-3-031-16919-9_9
Published: 16 September 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-16918-2
Online ISBN: 978-3-031-16919-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)

TransDeepLab: Convolution-Free Transformer-Based DeepLab v3+ for Medical Image Segmentation