Skip to main content

AMG-Mixer: A Multi-Axis Attention MLP-Mixer Architecture for Biomedical Image Segmentation

  • Conference paper
  • First Online:
The 12th Conference on Information Technology and Its Applications (CITA 2023)

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 734))

Included in the following conference series:

Abstract

Previously, Multi-Layer Perceptrons (MLPs) were primarily used in image classification tasks. The emergence of the MLP-Mixer architecture has demonstrated the continued efficacy of MLPs in other visual tasks. To obtain superior results, it is imperative to have pre-trained weights from large datasets, and the Cross-Location (Token Mix) operation must be adaptively modified to suit the specific task at hand. Inspired by this, we proposed AMG-Mixer, an MLP-based architecture for image segmentation. In particular, recognizing the importance of positional information, we proposed AxialMBconv Token Mix utilizing Axial Attention. Additionally, to reduce Axial Attention’s receptive field constraints, we proposed Multi-scale Multi-axis MLP Gated (MS-MAMG) block which employs Multi-Axis MLP. The proposed AMG-Mixer architecture outperformed State-of-the-Art (SOTA) methods on benchmark datasets including GLaS, Data Science Bowl 2018, and Skin Lesion Segmentation ISIC 2018, even without pre-training. The proposed AMG-Mixer architecture has been confirmed effective and high performing in our study. The code is available at https://github.com/quanglets1fvr/amg_mixer

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 179.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Krizhevsky, A., Sutskever, I., Hinton, G.: ImageNet classification with deep convolutional neural networks. In: NIPS (2012)

    Google Scholar 

  2. Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: Proceedings Medical Image Computing Computer-Assisted Intervention, pp. 234–241 (2015)

    Google Scholar 

  3. Zongwei, Z., Md, M.R.S., Nima, T., Jianming, L.: UNet++: a nested u-net architecture for medical image segmentation. In: Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support (2018)

    Google Scholar 

  4. Jha, D., Riegler, M., Johansen, D., Halvorsen, P., Johansen, H.: Doubleu-net: a deep convolutional neural network for medical image segmentation. In: 2020 IEEE 33rd (CBMS), pp. 558–564 (2020)

    Google Scholar 

  5. Vaswani, A., et al.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017)

    Google Scholar 

  6. Dosovitskiy, A., et al.: Image is worth 16\(\times \)16 words: transformers for image recognition at scale. In: Proceedings of the 9th International Conference on Learning Representations (2021)

    Google Scholar 

  7. Wang, H., Zhu, Y., Green, B., Adam, H., Yuille, A., Chen, L.: Axial-deeplab: stand-alone axial-attention for panoptic segmentation. In: ECCV, pp. 108–126 (2020)

    Google Scholar 

  8. Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 10012–10022 (2021)

    Google Scholar 

  9. Tu, Z.: Maxvit: Multi-axis vision transformer In: ECCV 2022 (2022)

    Google Scholar 

  10. Tolstikhin, I., et al.: MLP-Mixer: an all-MLP architecture for vision. Adv. Neural Inf. Process. Syst. 34, 24261–24272 (2021)

    Google Scholar 

  11. Jieneng, C., et al.: Transunet: Transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306 (2021)

  12. Jeya, M.J.V., Vishal, M.P.: Unext: Mlp-based rapid medical image segmentation network. In: Medical Image Computing and Computer Assisted Intervention - MICCAI 2022 (2022)

    Google Scholar 

  13. Lai, H.P., Tran, T.T., Pham, V.T.: Axial attention MLP-mixer: a new architecture for image segmentation. In: ICCE (2022)

    Google Scholar 

  14. Tu, Z.: Maxim: Multi-axis mlp for image processing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)

    Google Scholar 

  15. Yan, Q., et al.: COVID-19 chest CT image segmentation-a deep convolutional neural network solution, Jin, arXiv preprint arXiv:2004.10987 (2020)

  16. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)

    Google Scholar 

  17. Tan, M., et al.: MnasNet: platform-aware neural architecture search for mobile. In: CVPR, pp. 2820–2828 (2019)

    Google Scholar 

  18. Cao, H.: Swin-unet: unet-like pure transformer for medical image segmentation. In: Computer Vision - ECCV (2022)

    Google Scholar 

  19. Chu, X., et al.: Conditional positional encodings for vision transformers. In: ICLR (2023)

    Google Scholar 

  20. Jinkai, L., et al.: CM-MLP: cscade multi-scale MLP with axial context relation encoder for edge segmentation of medical image. In: 2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 1100–1107 (2022)

    Google Scholar 

  21. Valanarasu, J., Oza, P., Hacihaliloglu, I., Patel, V.: Medical transformer: gated axial-attention for medical image segmentation. In: International Conference on Medical Image Computing and Computer Assisted Intervention, pp. 36–46 (2021)

    Google Scholar 

  22. Hou, Q., Jiang, Z., Yuan, L., Cheng, M., Yan, S., Feng, J.: Vision permutator: a permutable MLP-like architecture for visual recognition. IEEE Tran. Pattern Analy. Mach. Intell. 45(1), 1328–1334 (2022)

    Google Scholar 

  23. Badrinarayanan, V., Kendall, A., Cipolla, R.: Segnet: a deep convolutional encoder-decoder architecture for image segmentation. In: PAMI (2017)

    Google Scholar 

  24. Jha, D., et al.: ResUNet++: an advanced architecture for medical image segmentation. In: Proceedings of International Symposium Multimedia, pp. 225–230 (2019)

    Google Scholar 

  25. Codella, N.C., et al.: Skin lesion analysis toward melanoma detection: a challenge at the 2017 international symposium on biomedical imaging (ISBI), hosted by the international skin imaging collaboration (ISIC). In: Proceedings International Symposium Biomedical Imaging, pp. 168–172 (2018)

    Google Scholar 

  26. Izmailov, P., Podoprikhin, D., Garipov, T., Vetrov, D., Wilson, A.: Averaging weights leads to wider optima and better generalization. ArXiv Preprint ArXiv:1803.05407 (2018)

  27. Rashno, A., et al.: Fully automated segmentation of fluid/cyst regions in optical coherence tomography images with diabetic macular edema using neutrosophic sets and graph algorithms. IEEE Trans. Biomed. Eng. 65, 989–1001 (2017)

    Google Scholar 

  28. Malık, P., Kristofık, S., Knapov a, K.: Instance segmentation model ’ created from three semantic segmentations of mask, boundary and centroid pixels verified on GlaS dataset. In: 2020 15th Conference On Computer Science And Information Systems (FedCSIS), pp. 569–576 (2020)

    Google Scholar 

Download references

Acknowledgements

This research is funded by Vietnam National Foundation for Science and Technology Development (NAFOSTED) under grant number 102.05-2021.34.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Thi-Thao Tran .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Le, HMQ., Le, TK., Pham, VT., Tran, TT. (2023). AMG-Mixer: A Multi-Axis Attention MLP-Mixer Architecture for Biomedical Image Segmentation. In: Nguyen, N.T., Le-Minh, H., Huynh, CP., Nguyen, QV. (eds) The 12th Conference on Information Technology and Its Applications. CITA 2023. Lecture Notes in Networks and Systems, vol 734. Springer, Cham. https://doi.org/10.1007/978-3-031-36886-8_14

Download citation

Publish with us

Policies and ethics