Skip to main content

DAE-Former: Dual Attention-Guided Efficient Transformer for Medical Image Segmentation

  • Conference paper
Predictive Intelligence in Medicine (PRIME 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14277))

Included in the following conference series:

Abstract

Transformers have recently gained attention in the computer vision domain due to their ability to model long-range dependencies. However, the self-attention mechanism, which is the core part of the Transformer model, usually suffers from quadratic computational complexity with respect to the number of tokens. Many architectures attempt to reduce model complexity by limiting the self-attention mechanism to local regions or by redesigning the tokenization process. In this paper, we propose DAE-Former, a novel method that seeks to provide an alternative perspective by efficiently designing the self-attention mechanism. More specifically, we reformulate the self-attention mechanism to capture both spatial and channel relations across the whole feature dimension while staying computationally efficient. Furthermore, we redesign the skip connection path by including the cross-attention module to ensure the feature reusability and enhance the localization power. Our method outperforms state-of-the-art methods on multi-organ cardiac and skin lesion segmentation datasets, without pre-training weights. The code is publicly available at GitHub.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ali, A., et al.: XCiT: cross-covariance image transformers. In: Advances in Neural Information Processing Systems, vol. 34 (2021)

    Google Scholar 

  2. Antonelli, M., et al.: The medical segmentation decathlon. Nat. Commun. 13(1), 1–13 (2022)

    Article  Google Scholar 

  3. Azad, R., et al.: Medical image segmentation review: the success of U-Net. arXiv preprint arXiv:2211.14830 (2022)

  4. Azad, R., Asadi-Aghbolaghi, M., Fathy, M., Escalera, S.: Bi-directional ConvLSTM U-Net with densly connected convolutions. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (2019)

    Google Scholar 

  5. Azad, R., et al.: TransDeepLab: convolution-free transformer-based DeepLab v3+ for medical image segmentation. In: Rekik, I., Adeli, E., Park, S.H., Cintas, C. (eds.) Predictive Intelligence in Medicine, PRIME 2022. LNCS, vol. 13564, pp. 91–102. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-16919-9_9

  6. Azad, R., Heidari, M., Wu, Y., Merhof, D.: Contextual attention network: transformer meets U-Net. In: Lian, C., Cao, X., Rekik, I., Xu, X., Cui, Z. (eds.) Machine Learning in Medical Imaging, MLMI 2022. LNCS, vol. 13583, pp. 377–386. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-21014-3_39

  7. Azad, R., et al.: Advances in medical image analysis with vision transformers: a comprehensive review. arXiv preprint arXiv:2301.03505 (2023)

  8. Cao, H., et al.: Swin-Unet: Unet-Like Pure Transformer for Medical Image Segmentation. In: Karlinsky, L., Michaeli, T., Nishino, K. (eds.) Computer Vision – ECCV 2022 Workshops, ECCV 2022. LNCS, vol. 13803, pp. 205–218. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-25066-8_9

  9. Chen, C.F., Panda, R., Fan, Q.: RegionViT: regional-to-local attention for vision transformers. arXiv preprint arXiv:2106.02689 (2021)

  10. Chen, J., et al.: TransUNet: transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306 (2021)

  11. Codella, N., et al.: Skin lesion analysis toward melanoma detection 2018: a challenge hosted by the international skin imaging collaboration (ISIC). arXiv preprint arXiv:1902.03368 (2019)

  12. Ding, M., Xiao, B., Codella, N., Luo, P., Wang, J., Yuan, L.: DaViT: dual attention vision transformers. arXiv preprint arXiv:2204.03645 (2022)

  13. Dosovitskiy, A., et al.: An image is worth 16 \(\times \) 16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)

  14. Guo, M.H., et al.: Attention mechanisms in computer vision: a survey. Comput. Vis. Media 8, 331–368 (2022)

    Article  Google Scholar 

  15. Heidari, M., et al.: HiFormer: hierarchical multi-scale representations using transformers for medical image segmentation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 6202–6212 (2023)

    Google Scholar 

  16. Hendrycks, D., Gimpel, K.: Gaussian error linear units (GELUs). arXiv preprint arXiv:1606.08415 (2016)

  17. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)

    Google Scholar 

  18. Huang, X., Deng, Z., Li, D., Yuan, X., Fu, Y.: MISSFormer: an effective transformer for 2D medical image segmentation. IEEE Trans. Med. Imaging 42, 1484–1494 (2022). https://doi.org/10.1109/TMI.2022.3230943

    Article  Google Scholar 

  19. Karaali, A., Dahyot, R., Sexton, D.J.: DR-VNet: retinal vessel segmentation via dense residual UNet. In: El Yacoubi, M., Granger, E., Yuen, P.C., Pal, U., Vincent, N. (eds.) Pattern Recognition and Artificial Intelligence, ICPRAI 2022. LNCS, vol. 13363, pp. 198–210. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-09037-0_17

  20. Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021)

    Google Scholar 

  21. Luo, H., Changdong, Y., Selvan, R.: Hybrid ladder transformers with efficient parallel-cross attention for medical image segmentation. In: International Conference on Medical Imaging with Deep Learning, pp. 808–819. PMLR (2022)

    Google Scholar 

  22. Oktay, O., et al.: Attention U-Net: learning where to look for the pancreas. arXiv preprint arXiv:1804.03999 (2018)

  23. Ronneberger, O.: Invited Talk: U-Net convolutional networks for biomedical image segmentation. In: Bildverarbeitung für die Medizin 2017. I, pp. 3–3. Springer, Heidelberg (2017). https://doi.org/10.1007/978-3-662-54345-0_3

    Chapter  Google Scholar 

  24. Schlemper, J., et al.: Attention gated networks: learning to leverage salient regions in medical images. Med. Image Anal. 53, 197–207 (2019)

    Article  Google Scholar 

  25. Shen, Z., Zhang, M., Zhao, H., Yi, S., Li, H.: Efficient attention: attention with linear complexities. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 3531–3539 (2021)

    Google Scholar 

  26. Valanarasu, J.M.J., Oza, P., Hacihaliloglu, I., Patel, V.M.: Medical transformer: gated axial-attention for medical image segmentation. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12901, pp. 36–46. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87193-2_4

    Chapter  Google Scholar 

  27. Wang, H., et al.: Mixed transformer U-Net for medical image segmentation. In: 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), ICASSP 2022, pp. 2390–2394. IEEE (2022)

    Google Scholar 

  28. Wu, H., Chen, S., Chen, G., Wang, W., Lei, B., Wen, Z.: FAT-Net: feature adaptive transformers for automated skin lesion segmentation. Med. Image Anal. 76, 102327 (2022)

    Article  Google Scholar 

  29. Xie, E., Wang, W., Yu, Z., Anandkumar, A., Alvarez, J.M., Luo, P.: SegFormer: simple and efficient design for semantic segmentation with transformers. In: Advances in Neural Information Processing Systems, vol. 34, pp. 12077–12090 (2021)

    Google Scholar 

  30. Xu, G., Wu, X., Zhang, X., He, X.: LeViT-UNet: make faster encoders with transformer for medical image segmentation. arXiv preprint arXiv:2107.08623 (2021)

  31. Zhu, X., et al.: Region aware transformer for automatic breast ultrasound tumor segmentation. In: International Conference on Medical Imaging with Deep Learning, pp. 1523–1537. PMLR (2022)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Reza Azad .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Cite this paper

Azad, R., Arimond, R., Aghdam, E.K., Kazerouni, A., Merhof, D. (2023). DAE-Former: Dual Attention-Guided Efficient Transformer for Medical Image Segmentation. In: Rekik, I., Adeli, E., Park, S.H., Cintas, C., Zamzmi, G. (eds) Predictive Intelligence in Medicine. PRIME 2023. Lecture Notes in Computer Science, vol 14277. Springer, Cham. https://doi.org/10.1007/978-3-031-46005-0_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-46005-0_8

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-46004-3

  • Online ISBN: 978-3-031-46005-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics