CoalUMLP: Slice and Dice! A Fast, MLP-Like 3D Medical Image Segmentation Network

Wu, Ruoyu; Wu, Zifan; Hu, Xue; Zhang, Lei

doi:10.1007/978-981-99-7025-4_7

Ruoyu Wu¹²,
Zifan Wu¹²,
Xue Hu¹² &
…
Lei Zhang¹²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14327))

Included in the following conference series:

Pacific Rim International Conference on Artificial Intelligence

586 Accesses

Abstract

3D medical image segmentation tasks play a crucial role in clinical diagnosis. However, Handling vast data and intricate structures in Point-of-Care (POC) devices is challenging. While current methods use CNNs and Transformer models, their high computational demands and limited real-time capabilities limit their POC application. Recent studies have explored the application of Multilayer Perceptrons (MLP) to medical image segmentation tasks. However, these studies overlook the significance of local and global image features and multi-scale contextual information. To overcome these limitations, we propose CoalUMLP, an efficient vision MLP architecture designed specifically for 3D medical image segmentation tasks. CoalUMLP combines the strengths of CNN, Transformer, and MLP, incorporating three key components: the Multi-Scale Axial Permute Encoder (MSAP), Masked Axial Permute Decoder (MAP), and Semantic Bridging Connection (SBC). We reframe the medical image segmentation problem as a sequence-to-sequence prediction problem and evaluate the performance of our approach on the Medical Segmentation Decathlon (MSD) dataset. CoalUMLP showcases a state-of-the-art performance by significantly reducing the parameter count by 32.8% and computational complexity by 48.5%, all while maintaining a compact structure. Our results highlight the potential of CoalUMLP as a promising backbone for real-time medical image applications. It achieves a superior trade-off between accuracy and efficiency compared to previous Transformer and CNN-based models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

LeViT-UNet: Make Faster Encoders with Transformer for Medical Image Segmentation

UNeXt: MLP-Based Rapid Medical Image Segmentation Network

Positional Information is a Strong Supervision for Volumetric Medical Image Segmentation

Article 16 June 2023

References

Antonelli, M., et al.: The medical segmentation decathlon. Nature Commun. 13(1), 4128 (2022)
Article Google Scholar
Bertels, J., et al.: Optimizing the Dice Score and Jaccard Index for medical image segmentation: theory and practice. In: Shen, D., et al. (eds.) Medical Image Computing and Computer Assisted Intervention – MICCAI 2019: 22nd International Conference, Shenzhen, China, October 13–17, 2019, Proceedings, Part II, pp. 92–100. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32245-8_11
Chapter Google Scholar
Cao, H., et al.: Swin-Unet: Unet-like pure transformer for medical image segmentation. In: Karlinsky, L., Michaeli, T., Nishino, K. (eds.) Computer Vision – ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part III, pp. 205–218. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-25066-8_9
Chapter Google Scholar
Cardoso, M.J., et al.: Monai: An open-source framework for deep learning in healthcare. arXiv preprint arXiv:2211.02701 (2022)
Chen, J., et al.: Transunet: Transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306 (2021)
Chen, S., Xie, E., Ge, C., Chen, R., Liang, D., Luo, P.: Cyclemlp: A MLP-like architecture for dense prediction. arXiv preprint arXiv:2107.10224 (2021)
Çiçek, Ö., Abdulkadir, A., Lienkamp, S.S., Brox, T., Ronneberger, O.: 3D U-Net: learning dense volumetric segmentation from sparse annotation. In: Ourselin, S., Joskowicz, L., Sabuncu, M.R., Unal, G., Wells, W. (eds.) Medical Image Computing and Computer-Assisted Intervention – MICCAI 2016: 19th International Conference, Athens, Greece, October 17-21, 2016, Proceedings, Part II, pp. 424–432. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46723-8_49
Chapter Google Scholar
Gu, J., et al.: Multi-scale high-resolution vision transformer for semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12094–12103 (2022)
Google Scholar
Guo, J., et al.: Hire-MLP: Vision MLP via hierarchical rearrangement. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 826–836 (2022)
Google Scholar
Hatamizadeh, A., et al.: UNETR: Transformers for 3D medical image segmentation. In: Proceedings of the IEEE/CVF Winter Conference On Applications of Computer Vision, pp. 574–584 (2022)
Google Scholar
He, K., Chen, X., Xie, S., Li, Y., Dollár, P., Girshick, R.: Masked autoencoders are scalable vision learners. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16000–16009 (2022)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Huttenlocher, D.P., Klanderman, G.A., Rucklidge, W.J.: Comparing images using the Hausdorff distance. IEEE Trans. Pattern Anal. Mach. Intell. 15(9), 850–863 (1993)
Article Google Scholar
Jha, D., et al.: Resunet++: An advanced architecture for medical image segmentation. In: 2019 IEEE International Symposium on Multimedia (ISM), pp. 225–2255. IEEE (2019)
Google Scholar
Jiang, W., Trulls, E., Hosang, J., Tagliasacchi, A., Yi, K.M.: Cotr: Correspondence transformer for matching across images. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6207–6217 (2021)
Google Scholar
Li, M., Wei, M., He, X., Shen, F.: Enhancing part features via contrastive attention module for vehicle re-identification. In: Conference on International Conference on Image Processing. IEEE (2022)
Google Scholar
Lian, D., Yu, Z., Sun, X., Gao, S.: As-MLP: An axial shifted MLP architecture for vision. arXiv preprint arXiv:2107.08391 (2021)
Liu, Y., Qin, G., Lyu, K., Huang, Y.: Mixed-net: A mixed architecture for medical image segmentation. In: 2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 2095–2102. IEEE (2022)
Google Scholar
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)
Google Scholar
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017)
Paszke, A., et al.: Pytorch: An imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems 32 (2019)
Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Shen, F., Du, X., Zhang, L., Tang, J.: Triplet contrastive learning for unsupervised vehicle re-identification. arXiv preprint arXiv:2301.09498 (2023)
Shen, F., Peng, X., Wang, L., Zhang, X., Shu, M., Wang, Y.: Hsgm: A hierarchical similarity graph module for object re-identification. In: 2022 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6. IEEE (2022)
Google Scholar
Shen, F., Xiangbo, S., Du, X., Tang, J.: Pedestrian-specific bipartite-aware similarity learning for text-based person retrieval. In: Proceedings of the 31th ACM International Conference on Multimedia (2023)
Google Scholar
Shen, F., Xie, Y., Zhu, J., Zhu, X., Zeng, H.: Git: Graph interactive transformer for vehicle re-identification. IEEE Trans. Image Process. 32, 1039–1051 (2023)
Google Scholar
Shen, F., Zhu, J., Zhu, X., Huang, J., Zeng, H., Lei, Z., Cai, C.: An efficient multi-resolution network for vehicle re-identification. IEEE Internet of Things Journal (2021)
Google Scholar
Shen, F., Zhu, J., Zhu, X., Xie, Y., Huang, J.: Exploring spatial significance via hybrid pyramidal graph network for vehicle re-identification. IEEE Trans. Intell. Transport. Syst. 23(7), 8793–8804 (2021)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Tolstikhin, I.O., et al.: MLP-mixer: An all-MLP architecture for vision. Adv. Neural. Inf. Process. Syst. 34, 24261–24272 (2021)
Google Scholar
Valanarasu, J.M.J., Patel, V.M.: Unext: MLP-based rapid medical image segmentation network. In: Medical Image Computing and Computer Assisted Intervention-MICCAI 2022: 25th International Conference, Singapore, September 18–22, 2022, Proceedings, Part V, pp. 23–33. Springer (2022). https://doi.org/10.1007/978-3-031-16443-9_3
Yu, T., Li, X., Cai, Y., Sun, M., Li, P.: S2-MLP: Spatial-shift MLP architecture for vision. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 297–306 (2022)
Google Scholar

Download references

Acknowledgments.

This work is being supported by the National Natural Science Foundation of China under the Grant No. 52074299.

Author information

Authors and Affiliations

China University of Mining and Technology-Beijing, Beijing, 100083, China
Ruoyu Wu, Zifan Wu, Xue Hu & Lei Zhang

Authors

Ruoyu Wu
View author publications
You can also search for this author in PubMed Google Scholar
Zifan Wu
View author publications
You can also search for this author in PubMed Google Scholar
Xue Hu
View author publications
You can also search for this author in PubMed Google Scholar
Lei Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lei Zhang .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Fenrong Liu
SEEK Limited, Cremorne, NSW, Australia
Arun Anand Sadanandan
MIMOS Berhad, Kuala Lumpur, Malaysia
Duc Nghia Pham
Universitas Indonesia, Depok, Indonesia
Petrus Mursanto
Tabcorp Holdings Limited, Melbourne, VIC, Australia
Dickson Lukose

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wu, R., Wu, Z., Hu, X., Zhang, L. (2024). CoalUMLP: Slice and Dice! A Fast, MLP-Like 3D Medical Image Segmentation Network. In: Liu, F., Sadanandan, A.A., Pham, D.N., Mursanto, P., Lukose, D. (eds) PRICAI 2023: Trends in Artificial Intelligence. PRICAI 2023. Lecture Notes in Computer Science(), vol 14327. Springer, Singapore. https://doi.org/10.1007/978-981-99-7025-4_7

Download citation

DOI: https://doi.org/10.1007/978-981-99-7025-4_7
Published: 10 November 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-7024-7
Online ISBN: 978-981-99-7025-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

CoalUMLP: Slice and Dice! A Fast, MLP-Like 3D Medical Image Segmentation Network

Abstract

Access this chapter

Similar content being viewed by others

LeViT-UNet: Make Faster Encoders with Transformer for Medical Image Segmentation

UNeXt: MLP-Based Rapid Medical Image Segmentation Network

Positional Information is a Strong Supervision for Volumetric Medical Image Segmentation

References

Acknowledgments.

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

CoalUMLP: Slice and Dice! A Fast, MLP-Like 3D Medical Image Segmentation Network

Abstract

Access this chapter

Similar content being viewed by others

LeViT-UNet: Make Faster Encoders with Transformer for Medical Image Segmentation

UNeXt: MLP-Based Rapid Medical Image Segmentation Network

Positional Information is a Strong Supervision for Volumetric Medical Image Segmentation

References

Acknowledgments.

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation