Abstract
Previously, Multi-Layer Perceptrons (MLPs) were primarily used in image classification tasks. The emergence of the MLP-Mixer architecture has demonstrated the continued efficacy of MLPs in other visual tasks. To obtain superior results, it is imperative to have pre-trained weights from large datasets, and the Cross-Location (Token Mix) operation must be adaptively modified to suit the specific task at hand. Inspired by this, we proposed AMG-Mixer, an MLP-based architecture for image segmentation. In particular, recognizing the importance of positional information, we proposed AxialMBconv Token Mix utilizing Axial Attention. Additionally, to reduce Axial Attention’s receptive field constraints, we proposed Multi-scale Multi-axis MLP Gated (MS-MAMG) block which employs Multi-Axis MLP. The proposed AMG-Mixer architecture outperformed State-of-the-Art (SOTA) methods on benchmark datasets including GLaS, Data Science Bowl 2018, and Skin Lesion Segmentation ISIC 2018, even without pre-training. The proposed AMG-Mixer architecture has been confirmed effective and high performing in our study. The code is available at https://github.com/quanglets1fvr/amg_mixer
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Krizhevsky, A., Sutskever, I., Hinton, G.: ImageNet classification with deep convolutional neural networks. In: NIPS (2012)
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: Proceedings Medical Image Computing Computer-Assisted Intervention, pp. 234–241 (2015)
Zongwei, Z., Md, M.R.S., Nima, T., Jianming, L.: UNet++: a nested u-net architecture for medical image segmentation. In: Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support (2018)
Jha, D., Riegler, M., Johansen, D., Halvorsen, P., Johansen, H.: Doubleu-net: a deep convolutional neural network for medical image segmentation. In: 2020 IEEE 33rd (CBMS), pp. 558–564 (2020)
Vaswani, A., et al.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017)
Dosovitskiy, A., et al.: Image is worth 16\(\times \)16 words: transformers for image recognition at scale. In: Proceedings of the 9th International Conference on Learning Representations (2021)
Wang, H., Zhu, Y., Green, B., Adam, H., Yuille, A., Chen, L.: Axial-deeplab: stand-alone axial-attention for panoptic segmentation. In: ECCV, pp. 108–126 (2020)
Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 10012–10022 (2021)
Tu, Z.: Maxvit: Multi-axis vision transformer In: ECCV 2022 (2022)
Tolstikhin, I., et al.: MLP-Mixer: an all-MLP architecture for vision. Adv. Neural Inf. Process. Syst. 34, 24261–24272 (2021)
Jieneng, C., et al.: Transunet: Transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306 (2021)
Jeya, M.J.V., Vishal, M.P.: Unext: Mlp-based rapid medical image segmentation network. In: Medical Image Computing and Computer Assisted Intervention - MICCAI 2022 (2022)
Lai, H.P., Tran, T.T., Pham, V.T.: Axial attention MLP-mixer: a new architecture for image segmentation. In: ICCE (2022)
Tu, Z.: Maxim: Multi-axis mlp for image processing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)
Yan, Q., et al.: COVID-19 chest CT image segmentation-a deep convolutional neural network solution, Jin, arXiv preprint arXiv:2004.10987 (2020)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)
Tan, M., et al.: MnasNet: platform-aware neural architecture search for mobile. In: CVPR, pp. 2820–2828 (2019)
Cao, H.: Swin-unet: unet-like pure transformer for medical image segmentation. In: Computer Vision - ECCV (2022)
Chu, X., et al.: Conditional positional encodings for vision transformers. In: ICLR (2023)
Jinkai, L., et al.: CM-MLP: cscade multi-scale MLP with axial context relation encoder for edge segmentation of medical image. In: 2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 1100–1107 (2022)
Valanarasu, J., Oza, P., Hacihaliloglu, I., Patel, V.: Medical transformer: gated axial-attention for medical image segmentation. In: International Conference on Medical Image Computing and Computer Assisted Intervention, pp. 36–46 (2021)
Hou, Q., Jiang, Z., Yuan, L., Cheng, M., Yan, S., Feng, J.: Vision permutator: a permutable MLP-like architecture for visual recognition. IEEE Tran. Pattern Analy. Mach. Intell. 45(1), 1328–1334 (2022)
Badrinarayanan, V., Kendall, A., Cipolla, R.: Segnet: a deep convolutional encoder-decoder architecture for image segmentation. In: PAMI (2017)
Jha, D., et al.: ResUNet++: an advanced architecture for medical image segmentation. In: Proceedings of International Symposium Multimedia, pp. 225–230 (2019)
Codella, N.C., et al.: Skin lesion analysis toward melanoma detection: a challenge at the 2017 international symposium on biomedical imaging (ISBI), hosted by the international skin imaging collaboration (ISIC). In: Proceedings International Symposium Biomedical Imaging, pp. 168–172 (2018)
Izmailov, P., Podoprikhin, D., Garipov, T., Vetrov, D., Wilson, A.: Averaging weights leads to wider optima and better generalization. ArXiv Preprint ArXiv:1803.05407 (2018)
Rashno, A., et al.: Fully automated segmentation of fluid/cyst regions in optical coherence tomography images with diabetic macular edema using neutrosophic sets and graph algorithms. IEEE Trans. Biomed. Eng. 65, 989–1001 (2017)
Malık, P., Kristofık, S., Knapov a, K.: Instance segmentation model ’ created from three semantic segmentations of mask, boundary and centroid pixels verified on GlaS dataset. In: 2020 15th Conference On Computer Science And Information Systems (FedCSIS), pp. 569–576 (2020)
Acknowledgements
This research is funded by Vietnam National Foundation for Science and Technology Development (NAFOSTED) under grant number 102.05-2021.34.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Le, HMQ., Le, TK., Pham, VT., Tran, TT. (2023). AMG-Mixer: A Multi-Axis Attention MLP-Mixer Architecture for Biomedical Image Segmentation. In: Nguyen, N.T., Le-Minh, H., Huynh, CP., Nguyen, QV. (eds) The 12th Conference on Information Technology and Its Applications. CITA 2023. Lecture Notes in Networks and Systems, vol 734. Springer, Cham. https://doi.org/10.1007/978-3-031-36886-8_14
Download citation
DOI: https://doi.org/10.1007/978-3-031-36886-8_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-36885-1
Online ISBN: 978-3-031-36886-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)