AMG-Mixer: A Multi-Axis Attention MLP-Mixer Architecture for Biomedical Image Segmentation

Le, Hoang-Minh-Quang; Le, Trung-Kien; Pham, Van-Truong; Tran, Thi-Thao

doi:10.1007/978-3-031-36886-8_14

Hoang-Minh-Quang Le¹³,
Trung-Kien Le¹³,
Van-Truong Pham¹³ &
…
Thi-Thao Tran¹³

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 734))

Included in the following conference series:

Conference on Information Technology and its Applications

246 Accesses
2 Citations

Abstract

Previously, Multi-Layer Perceptrons (MLPs) were primarily used in image classification tasks. The emergence of the MLP-Mixer architecture has demonstrated the continued efficacy of MLPs in other visual tasks. To obtain superior results, it is imperative to have pre-trained weights from large datasets, and the Cross-Location (Token Mix) operation must be adaptively modified to suit the specific task at hand. Inspired by this, we proposed AMG-Mixer, an MLP-based architecture for image segmentation. In particular, recognizing the importance of positional information, we proposed AxialMBconv Token Mix utilizing Axial Attention. Additionally, to reduce Axial Attention’s receptive field constraints, we proposed Multi-scale Multi-axis MLP Gated (MS-MAMG) block which employs Multi-Axis MLP. The proposed AMG-Mixer architecture outperformed State-of-the-Art (SOTA) methods on benchmark datasets including GLaS, Data Science Bowl 2018, and Skin Lesion Segmentation ISIC 2018, even without pre-training. The proposed AMG-Mixer architecture has been confirmed effective and high performing in our study. The code is available at https://github.com/quanglets1fvr/amg_mixer

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.00; Price excludes VAT (USA)

Softcover Book: USD 179.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Krizhevsky, A., Sutskever, I., Hinton, G.: ImageNet classification with deep convolutional neural networks. In: NIPS (2012)
Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: Proceedings Medical Image Computing Computer-Assisted Intervention, pp. 234–241 (2015)
Google Scholar
Zongwei, Z., Md, M.R.S., Nima, T., Jianming, L.: UNet++: a nested u-net architecture for medical image segmentation. In: Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support (2018)
Google Scholar
Jha, D., Riegler, M., Johansen, D., Halvorsen, P., Johansen, H.: Doubleu-net: a deep convolutional neural network for medical image segmentation. In: 2020 IEEE 33rd (CBMS), pp. 558–564 (2020)
Google Scholar
Vaswani, A., et al.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017)
Google Scholar
Dosovitskiy, A., et al.: Image is worth 16\(\times \)16 words: transformers for image recognition at scale. In: Proceedings of the 9th International Conference on Learning Representations (2021)
Google Scholar
Wang, H., Zhu, Y., Green, B., Adam, H., Yuille, A., Chen, L.: Axial-deeplab: stand-alone axial-attention for panoptic segmentation. In: ECCV, pp. 108–126 (2020)
Google Scholar
Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 10012–10022 (2021)
Google Scholar
Tu, Z.: Maxvit: Multi-axis vision transformer In: ECCV 2022 (2022)
Google Scholar
Tolstikhin, I., et al.: MLP-Mixer: an all-MLP architecture for vision. Adv. Neural Inf. Process. Syst. 34, 24261–24272 (2021)
Google Scholar
Jieneng, C., et al.: Transunet: Transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306 (2021)
Jeya, M.J.V., Vishal, M.P.: Unext: Mlp-based rapid medical image segmentation network. In: Medical Image Computing and Computer Assisted Intervention - MICCAI 2022 (2022)
Google Scholar
Lai, H.P., Tran, T.T., Pham, V.T.: Axial attention MLP-mixer: a new architecture for image segmentation. In: ICCE (2022)
Google Scholar
Tu, Z.: Maxim: Multi-axis mlp for image processing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)
Google Scholar
Yan, Q., et al.: COVID-19 chest CT image segmentation-a deep convolutional neural network solution, Jin, arXiv preprint arXiv:2004.10987 (2020)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)
Google Scholar
Tan, M., et al.: MnasNet: platform-aware neural architecture search for mobile. In: CVPR, pp. 2820–2828 (2019)
Google Scholar
Cao, H.: Swin-unet: unet-like pure transformer for medical image segmentation. In: Computer Vision - ECCV (2022)
Google Scholar
Chu, X., et al.: Conditional positional encodings for vision transformers. In: ICLR (2023)
Google Scholar
Jinkai, L., et al.: CM-MLP: cscade multi-scale MLP with axial context relation encoder for edge segmentation of medical image. In: 2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 1100–1107 (2022)
Google Scholar
Valanarasu, J., Oza, P., Hacihaliloglu, I., Patel, V.: Medical transformer: gated axial-attention for medical image segmentation. In: International Conference on Medical Image Computing and Computer Assisted Intervention, pp. 36–46 (2021)
Google Scholar
Hou, Q., Jiang, Z., Yuan, L., Cheng, M., Yan, S., Feng, J.: Vision permutator: a permutable MLP-like architecture for visual recognition. IEEE Tran. Pattern Analy. Mach. Intell. 45(1), 1328–1334 (2022)
Google Scholar
Badrinarayanan, V., Kendall, A., Cipolla, R.: Segnet: a deep convolutional encoder-decoder architecture for image segmentation. In: PAMI (2017)
Google Scholar
Jha, D., et al.: ResUNet++: an advanced architecture for medical image segmentation. In: Proceedings of International Symposium Multimedia, pp. 225–230 (2019)
Google Scholar
Codella, N.C., et al.: Skin lesion analysis toward melanoma detection: a challenge at the 2017 international symposium on biomedical imaging (ISBI), hosted by the international skin imaging collaboration (ISIC). In: Proceedings International Symposium Biomedical Imaging, pp. 168–172 (2018)
Google Scholar
Izmailov, P., Podoprikhin, D., Garipov, T., Vetrov, D., Wilson, A.: Averaging weights leads to wider optima and better generalization. ArXiv Preprint ArXiv:1803.05407 (2018)
Rashno, A., et al.: Fully automated segmentation of fluid/cyst regions in optical coherence tomography images with diabetic macular edema using neutrosophic sets and graph algorithms. IEEE Trans. Biomed. Eng. 65, 989–1001 (2017)
Google Scholar
Malık, P., Kristofık, S., Knapov a, K.: Instance segmentation model ’ created from three semantic segmentations of mask, boundary and centroid pixels verified on GlaS dataset. In: 2020 15th Conference On Computer Science And Information Systems (FedCSIS), pp. 569–576 (2020)
Google Scholar

Download references

Acknowledgements

This research is funded by Vietnam National Foundation for Science and Technology Development (NAFOSTED) under grant number 102.05-2021.34.

Author information

Authors and Affiliations

Department of Automation Engineering, School of Electrical and Electronic Engineering, Hanoi University of Science and Technology, Hanoi, Vietnam
Hoang-Minh-Quang Le, Trung-Kien Le, Van-Truong Pham & Thi-Thao Tran

Authors

Hoang-Minh-Quang Le
View author publications
You can also search for this author in PubMed Google Scholar
Trung-Kien Le
View author publications
You can also search for this author in PubMed Google Scholar
Van-Truong Pham
View author publications
You can also search for this author in PubMed Google Scholar
Thi-Thao Tran
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thi-Thao Tran .

Editor information

Editors and Affiliations

Wroclaw University of Science and Technology, Wroclaw, Poland
Ngoc Thanh Nguyen
Northumbria University, Newcastle, UK
Hoa Le-Minh
The University of Danang – Vietnam-Korea University of Information and Communication Technology, Danang, Vietnam
Cong-Phap Huynh
The University of Danang – Vietnam-Korea University of Information and Communication Technology, Danang, Vietnam
Quang-Vu Nguyen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Le, HMQ., Le, TK., Pham, VT., Tran, TT. (2023). AMG-Mixer: A Multi-Axis Attention MLP-Mixer Architecture for Biomedical Image Segmentation. In: Nguyen, N.T., Le-Minh, H., Huynh, CP., Nguyen, QV. (eds) The 12th Conference on Information Technology and Its Applications. CITA 2023. Lecture Notes in Networks and Systems, vol 734. Springer, Cham. https://doi.org/10.1007/978-3-031-36886-8_14

Download citation

DOI: https://doi.org/10.1007/978-3-031-36886-8_14
Published: 26 July 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-36885-1
Online ISBN: 978-3-031-36886-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics