DAE-Former: Dual Attention-Guided Efficient Transformer for Medical Image Segmentation

Azad, Reza; Arimond, René; Aghdam, Ehsan Khodapanah; Kazerouni, Amirhossein; Merhof, Dorit

doi:10.1007/978-3-031-46005-0_8

Reza Azad¹²,
René Arimond¹²,
Ehsan Khodapanah Aghdam¹³,
Amirhossein Kazerouni¹⁴ &
…
Dorit Merhof^15,16

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14277))

Included in the following conference series:

International Workshop on PRedictive Intelligence In MEdicine

1113 Accesses
17 Citations

Abstract

Transformers have recently gained attention in the computer vision domain due to their ability to model long-range dependencies. However, the self-attention mechanism, which is the core part of the Transformer model, usually suffers from quadratic computational complexity with respect to the number of tokens. Many architectures attempt to reduce model complexity by limiting the self-attention mechanism to local regions or by redesigning the tokenization process. In this paper, we propose DAE-Former, a novel method that seeks to provide an alternative perspective by efficiently designing the self-attention mechanism. More specifically, we reformulate the self-attention mechanism to capture both spatial and channel relations across the whole feature dimension while staying computationally efficient. Furthermore, we redesign the skip connection path by including the cross-attention module to ensure the feature reusability and enhance the localization power. Our method outperforms state-of-the-art methods on multi-organ cardiac and skin lesion segmentation datasets, without pre-training weights. The code is publicly available at GitHub.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ali, A., et al.: XCiT: cross-covariance image transformers. In: Advances in Neural Information Processing Systems, vol. 34 (2021)
Google Scholar
Antonelli, M., et al.: The medical segmentation decathlon. Nat. Commun. 13(1), 1–13 (2022)
Article Google Scholar
Azad, R., et al.: Medical image segmentation review: the success of U-Net. arXiv preprint arXiv:2211.14830 (2022)
Azad, R., Asadi-Aghbolaghi, M., Fathy, M., Escalera, S.: Bi-directional ConvLSTM U-Net with densly connected convolutions. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (2019)
Google Scholar
Azad, R., et al.: TransDeepLab: convolution-free transformer-based DeepLab v3+ for medical image segmentation. In: Rekik, I., Adeli, E., Park, S.H., Cintas, C. (eds.) Predictive Intelligence in Medicine, PRIME 2022. LNCS, vol. 13564, pp. 91–102. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-16919-9_9
Azad, R., Heidari, M., Wu, Y., Merhof, D.: Contextual attention network: transformer meets U-Net. In: Lian, C., Cao, X., Rekik, I., Xu, X., Cui, Z. (eds.) Machine Learning in Medical Imaging, MLMI 2022. LNCS, vol. 13583, pp. 377–386. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-21014-3_39
Azad, R., et al.: Advances in medical image analysis with vision transformers: a comprehensive review. arXiv preprint arXiv:2301.03505 (2023)
Cao, H., et al.: Swin-Unet: Unet-Like Pure Transformer for Medical Image Segmentation. In: Karlinsky, L., Michaeli, T., Nishino, K. (eds.) Computer Vision – ECCV 2022 Workshops, ECCV 2022. LNCS, vol. 13803, pp. 205–218. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-25066-8_9
Chen, C.F., Panda, R., Fan, Q.: RegionViT: regional-to-local attention for vision transformers. arXiv preprint arXiv:2106.02689 (2021)
Chen, J., et al.: TransUNet: transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306 (2021)
Codella, N., et al.: Skin lesion analysis toward melanoma detection 2018: a challenge hosted by the international skin imaging collaboration (ISIC). arXiv preprint arXiv:1902.03368 (2019)
Ding, M., Xiao, B., Codella, N., Luo, P., Wang, J., Yuan, L.: DaViT: dual attention vision transformers. arXiv preprint arXiv:2204.03645 (2022)
Dosovitskiy, A., et al.: An image is worth 16 \(\times \) 16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Guo, M.H., et al.: Attention mechanisms in computer vision: a survey. Comput. Vis. Media 8, 331–368 (2022)
Article Google Scholar
Heidari, M., et al.: HiFormer: hierarchical multi-scale representations using transformers for medical image segmentation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 6202–6212 (2023)
Google Scholar
Hendrycks, D., Gimpel, K.: Gaussian error linear units (GELUs). arXiv preprint arXiv:1606.08415 (2016)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)
Google Scholar
Huang, X., Deng, Z., Li, D., Yuan, X., Fu, Y.: MISSFormer: an effective transformer for 2D medical image segmentation. IEEE Trans. Med. Imaging 42, 1484–1494 (2022). https://doi.org/10.1109/TMI.2022.3230943
Article Google Scholar
Karaali, A., Dahyot, R., Sexton, D.J.: DR-VNet: retinal vessel segmentation via dense residual UNet. In: El Yacoubi, M., Granger, E., Yuen, P.C., Pal, U., Vincent, N. (eds.) Pattern Recognition and Artificial Intelligence, ICPRAI 2022. LNCS, vol. 13363, pp. 198–210. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-09037-0_17
Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021)
Google Scholar
Luo, H., Changdong, Y., Selvan, R.: Hybrid ladder transformers with efficient parallel-cross attention for medical image segmentation. In: International Conference on Medical Imaging with Deep Learning, pp. 808–819. PMLR (2022)
Google Scholar
Oktay, O., et al.: Attention U-Net: learning where to look for the pancreas. arXiv preprint arXiv:1804.03999 (2018)
Ronneberger, O.: Invited Talk: U-Net convolutional networks for biomedical image segmentation. In: Bildverarbeitung für die Medizin 2017. I, pp. 3–3. Springer, Heidelberg (2017). https://doi.org/10.1007/978-3-662-54345-0_3
Chapter Google Scholar
Schlemper, J., et al.: Attention gated networks: learning to leverage salient regions in medical images. Med. Image Anal. 53, 197–207 (2019)
Article Google Scholar
Shen, Z., Zhang, M., Zhao, H., Yi, S., Li, H.: Efficient attention: attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 3531–3539 (2021)
Google Scholar
Valanarasu, J.M.J., Oza, P., Hacihaliloglu, I., Patel, V.M.: Medical transformer: gated axial-attention for medical image segmentation. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12901, pp. 36–46. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87193-2_4
Chapter Google Scholar
Wang, H., et al.: Mixed transformer U-Net for medical image segmentation. In: 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), ICASSP 2022, pp. 2390–2394. IEEE (2022)
Google Scholar
Wu, H., Chen, S., Chen, G., Wang, W., Lei, B., Wen, Z.: FAT-Net: feature adaptive transformers for automated skin lesion segmentation. Med. Image Anal. 76, 102327 (2022)
Article Google Scholar
Xie, E., Wang, W., Yu, Z., Anandkumar, A., Alvarez, J.M., Luo, P.: SegFormer: simple and efficient design for semantic segmentation with transformers. In: Advances in Neural Information Processing Systems, vol. 34, pp. 12077–12090 (2021)
Google Scholar
Xu, G., Wu, X., Zhang, X., He, X.: LeViT-UNet: make faster encoders with transformer for medical image segmentation. arXiv preprint arXiv:2107.08623 (2021)
Zhu, X., et al.: Region aware transformer for automatic breast ultrasound tumor segmentation. In: International Conference on Medical Imaging with Deep Learning, pp. 1523–1537. PMLR (2022)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Electrical Engineering and Information Technology, RWTH Aachen University, Aachen, Germany
Reza Azad & René Arimond
Department of Electrical Engineering, Shahid Beheshti University, Tehran, Iran
Ehsan Khodapanah Aghdam
School of Electrical Engineering, Iran University of Science and Technology, Tehran, Iran
Amirhossein Kazerouni
Faculty of Informatics and Data Science, Institute of Image Analysis and Computer Vision, University of Regensburg, Regensburg, Germany
Dorit Merhof
Fraunhofer Institute for Digital Medicine MEVIS, Bremen, Germany
Dorit Merhof

Authors

Reza Azad
View author publications
You can also search for this author in PubMed Google Scholar
René Arimond
View author publications
You can also search for this author in PubMed Google Scholar
Ehsan Khodapanah Aghdam
View author publications
You can also search for this author in PubMed Google Scholar
Amirhossein Kazerouni
View author publications
You can also search for this author in PubMed Google Scholar
Dorit Merhof
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Reza Azad .

Editor information

Editors and Affiliations

Imperial College London, London, UK
Islem Rekik
Stanford University, Stanford, CA, USA
Ehsan Adeli
Daegu Gyeongbuk Institute of Science and Technology, Daegu, Korea (Republic of)
Sang Hyun Park
IBM Research - Africa, Nairobi, Kenya
Celia Cintas
United States Food and Drug Administration, Silver Spring, MD, USA
Ghada Zamzmi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Azad, R., Arimond, R., Aghdam, E.K., Kazerouni, A., Merhof, D. (2023). DAE-Former: Dual Attention-Guided Efficient Transformer for Medical Image Segmentation. In: Rekik, I., Adeli, E., Park, S.H., Cintas, C., Zamzmi, G. (eds) Predictive Intelligence in Medicine. PRIME 2023. Lecture Notes in Computer Science, vol 14277. Springer, Cham. https://doi.org/10.1007/978-3-031-46005-0_8

Download citation

DOI: https://doi.org/10.1007/978-3-031-46005-0_8
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-46004-3
Online ISBN: 978-3-031-46005-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)

DAE-Former: Dual Attention-Guided Efficient Transformer for Medical Image Segmentation