Abstract
3D medical image segmentation tasks play a crucial role in clinical diagnosis. However, Handling vast data and intricate structures in Point-of-Care (POC) devices is challenging. While current methods use CNNs and Transformer models, their high computational demands and limited real-time capabilities limit their POC application. Recent studies have explored the application of Multilayer Perceptrons (MLP) to medical image segmentation tasks. However, these studies overlook the significance of local and global image features and multi-scale contextual information. To overcome these limitations, we propose CoalUMLP, an efficient vision MLP architecture designed specifically for 3D medical image segmentation tasks. CoalUMLP combines the strengths of CNN, Transformer, and MLP, incorporating three key components: the Multi-Scale Axial Permute Encoder (MSAP), Masked Axial Permute Decoder (MAP), and Semantic Bridging Connection (SBC). We reframe the medical image segmentation problem as a sequence-to-sequence prediction problem and evaluate the performance of our approach on the Medical Segmentation Decathlon (MSD) dataset. CoalUMLP showcases a state-of-the-art performance by significantly reducing the parameter count by 32.8% and computational complexity by 48.5%, all while maintaining a compact structure. Our results highlight the potential of CoalUMLP as a promising backbone for real-time medical image applications. It achieves a superior trade-off between accuracy and efficiency compared to previous Transformer and CNN-based models.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Antonelli, M., et al.: The medical segmentation decathlon. Nature Commun. 13(1), 4128 (2022)
Bertels, J., et al.: Optimizing the Dice Score and Jaccard Index for medical image segmentation: theory and practice. In: Shen, D., et al. (eds.) Medical Image Computing and Computer Assisted Intervention – MICCAI 2019: 22nd International Conference, Shenzhen, China, October 13–17, 2019, Proceedings, Part II, pp. 92–100. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32245-8_11
Cao, H., et al.: Swin-Unet: Unet-like pure transformer for medical image segmentation. In: Karlinsky, L., Michaeli, T., Nishino, K. (eds.) Computer Vision – ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part III, pp. 205–218. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-25066-8_9
Cardoso, M.J., et al.: Monai: An open-source framework for deep learning in healthcare. arXiv preprint arXiv:2211.02701 (2022)
Chen, J., et al.: Transunet: Transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306 (2021)
Chen, S., Xie, E., Ge, C., Chen, R., Liang, D., Luo, P.: Cyclemlp: A MLP-like architecture for dense prediction. arXiv preprint arXiv:2107.10224 (2021)
Çiçek, Ö., Abdulkadir, A., Lienkamp, S.S., Brox, T., Ronneberger, O.: 3D U-Net: learning dense volumetric segmentation from sparse annotation. In: Ourselin, S., Joskowicz, L., Sabuncu, M.R., Unal, G., Wells, W. (eds.) Medical Image Computing and Computer-Assisted Intervention – MICCAI 2016: 19th International Conference, Athens, Greece, October 17-21, 2016, Proceedings, Part II, pp. 424–432. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46723-8_49
Gu, J., et al.: Multi-scale high-resolution vision transformer for semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12094–12103 (2022)
Guo, J., et al.: Hire-MLP: Vision MLP via hierarchical rearrangement. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 826–836 (2022)
Hatamizadeh, A., et al.: UNETR: Transformers for 3D medical image segmentation. In: Proceedings of the IEEE/CVF Winter Conference On Applications of Computer Vision, pp. 574–584 (2022)
He, K., Chen, X., Xie, S., Li, Y., Dollár, P., Girshick, R.: Masked autoencoders are scalable vision learners. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16000–16009 (2022)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Huttenlocher, D.P., Klanderman, G.A., Rucklidge, W.J.: Comparing images using the Hausdorff distance. IEEE Trans. Pattern Anal. Mach. Intell. 15(9), 850–863 (1993)
Jha, D., et al.: Resunet++: An advanced architecture for medical image segmentation. In: 2019 IEEE International Symposium on Multimedia (ISM), pp. 225–2255. IEEE (2019)
Jiang, W., Trulls, E., Hosang, J., Tagliasacchi, A., Yi, K.M.: Cotr: Correspondence transformer for matching across images. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6207–6217 (2021)
Li, M., Wei, M., He, X., Shen, F.: Enhancing part features via contrastive attention module for vehicle re-identification. In: Conference on International Conference on Image Processing. IEEE (2022)
Lian, D., Yu, Z., Sun, X., Gao, S.: As-MLP: An axial shifted MLP architecture for vision. arXiv preprint arXiv:2107.08391 (2021)
Liu, Y., Qin, G., Lyu, K., Huang, Y.: Mixed-net: A mixed architecture for medical image segmentation. In: 2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 2095–2102. IEEE (2022)
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017)
Paszke, A., et al.: Pytorch: An imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems 32 (2019)
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Shen, F., Du, X., Zhang, L., Tang, J.: Triplet contrastive learning for unsupervised vehicle re-identification. arXiv preprint arXiv:2301.09498 (2023)
Shen, F., Peng, X., Wang, L., Zhang, X., Shu, M., Wang, Y.: Hsgm: A hierarchical similarity graph module for object re-identification. In: 2022 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6. IEEE (2022)
Shen, F., Xiangbo, S., Du, X., Tang, J.: Pedestrian-specific bipartite-aware similarity learning for text-based person retrieval. In: Proceedings of the 31th ACM International Conference on Multimedia (2023)
Shen, F., Xie, Y., Zhu, J., Zhu, X., Zeng, H.: Git: Graph interactive transformer for vehicle re-identification. IEEE Trans. Image Process. 32, 1039–1051 (2023)
Shen, F., Zhu, J., Zhu, X., Huang, J., Zeng, H., Lei, Z., Cai, C.: An efficient multi-resolution network for vehicle re-identification. IEEE Internet of Things Journal (2021)
Shen, F., Zhu, J., Zhu, X., Xie, Y., Huang, J.: Exploring spatial significance via hybrid pyramidal graph network for vehicle re-identification. IEEE Trans. Intell. Transport. Syst. 23(7), 8793–8804 (2021)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Tolstikhin, I.O., et al.: MLP-mixer: An all-MLP architecture for vision. Adv. Neural. Inf. Process. Syst. 34, 24261–24272 (2021)
Valanarasu, J.M.J., Patel, V.M.: Unext: MLP-based rapid medical image segmentation network. In: Medical Image Computing and Computer Assisted Intervention-MICCAI 2022: 25th International Conference, Singapore, September 18–22, 2022, Proceedings, Part V, pp. 23–33. Springer (2022). https://doi.org/10.1007/978-3-031-16443-9_3
Yu, T., Li, X., Cai, Y., Sun, M., Li, P.: S2-MLP: Spatial-shift MLP architecture for vision. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 297–306 (2022)
Acknowledgments.
This work is being supported by the National Natural Science Foundation of China under the Grant No. 52074299.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Wu, R., Wu, Z., Hu, X., Zhang, L. (2024). CoalUMLP: Slice and Dice! A Fast, MLP-Like 3D Medical Image Segmentation Network. In: Liu, F., Sadanandan, A.A., Pham, D.N., Mursanto, P., Lukose, D. (eds) PRICAI 2023: Trends in Artificial Intelligence. PRICAI 2023. Lecture Notes in Computer Science(), vol 14327. Springer, Singapore. https://doi.org/10.1007/978-981-99-7025-4_7
Download citation
DOI: https://doi.org/10.1007/978-981-99-7025-4_7
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-7024-7
Online ISBN: 978-981-99-7025-4
eBook Packages: Computer ScienceComputer Science (R0)