UMixer: A Novel U-shaped Convolutional Mixer for Multi-scale Feature Fusion in Medical Image Segmentation

Su, Yongxin; Huang, Hongbo; Song, Zun; Lin, Lei; Liu, Jinhan

doi:10.1007/978-3-031-20233-9_70

Yongxin Su¹⁵,
Hongbo Huang^16,18,
Zun Song¹⁵,
Lei Lin¹⁷ &
…
Jinhan Liu¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13628))

Included in the following conference series:

Chinese Conference on Biometric Recognition

1145 Accesses

Abstract

Medical image segmentation plays a critical role in assisting diagnosis and prognosis. Since the transformer was first introduced to the field, the neural network structure has experienced a transition from ConvNet into Transformer. However, some redesigned ConvNets in recent works show astonishing effects, which outperforms classic elaborate ConvNets, and even complicated transformers. Inspired by these works, we introduced large-kernel convolutions to improve the ConvNets in capturing the long-range dependency. Cooperated with a novel multi-scale feature fusion method, we proposed a U-shaped convolutional structure, dubbed UMixer, which effectively integrates shallow spatial information with deep semantic information and high-resolution detailed information with low-resolution global information. Without any attention mechanism and pre-training on large datasets, UMixer achieves more accurate segmentation results than traditional ConvNets and Transformers on the Synapse dataset. Experiments demonstrate the effectiveness of this multi-scale feature fusion structure and its capability in modeling long-range dependency.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)
Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Zhou, Z., Rahman Siddiquee, M.M., Tajbakhsh, N., Liang, J.: UNet++: a nested U-Net architecture for medical image segmentation. In: Stoyanov, D., et al. (eds.) DLMIA/ML-CDS -2018. LNCS, vol. 11045, pp. 3–11. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00889-5_1
Chapter Google Scholar
Dosovitskiy, A., et al.: An image is worth 16 \(\times \) 16 words: transformers for image recognition at scale (2020)
Google Scholar
Chen, J., et al.: TransUNet: transformers make strong encoders for medical image segmentation (2021)
Google Scholar
Valanarasu, J.M.J., Oza, P., Hacihaliloglu, I., Patel, V.M.: Medical transformer: gated axial-attention for medical image segmentation (2021)
Google Scholar
Xie, E., Wang, W., Yu, Z., Anandkumar, A., Alvarez, J.M., Luo, P.: SegFormer: simple and efficient design for semantic segmentation with transformers (2021a)
Google Scholar
Cao, H., et al.: Swin-Unet: Unet-like pure trans-former for medical image segmentation (2021)
Google Scholar
Huang, X., Deng, Z., Li, D., Yuan, X.: MISSFormer: an effective medical image segmentation transformer (2021)
Google Scholar
Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s (2022)
Google Scholar
Trockman, A., Zico Kolter, J.: Patchecs Are All You Need? (2022)
Google Scholar
Wang, H., Cao, P., Wang, J., Zaiane, O. R.: UCTransNet: rethinking the skip connections in U-Net from a channel-wise perspective with transformer (2021)
Google Scholar
Ding, X., Zhang, X., Zhou, Y., Han, J., Ding, G., Sun, J.: Scaling up your kernels to 31 \(\times \) 31: revisiting large kernel design in CNNs (2022)
Google Scholar
Oktay, O.: Attention U-Net: learning where to look for the pancreas (2018)
Google Scholar
Islam, M.A., Jia, S., Bruce, N.D.: How much position information do convolutional neural networks encode? (2020)
Google Scholar
Chu, X., et al.: Conditional positional encodings for vision transformers (2021b)
Google Scholar
Li, Y., Zhang, K., Cao, J., Timofte, R., Van Gool, L.: LocalViT: bringing locality to vision transformers (2021)
Google Scholar
Wang, Z., Cun, X., Bao, J., Liu, J.: Uformer: a general U-shaped transformer for image restoration (2021d)
Google Scholar

Download references

Acknowledgments

This work is supported by the Beijing municipal education committee scientific and technological planning Project (KM201811232024), and Beijing Information Science and Technology University Research Fund (2021XJJ30, 2021XJJ34).

Author information

Authors and Affiliations

Mechanical Electrical Engineering School, Beijing Information Science and Technology University, Beijing, China
Yongxin Su & Zun Song
Computer School, Beijing Information Science and Technology University, Beijing, China
Hongbo Huang & Jinhan Liu
School of Economics and Management, Beijing Information Science and Technology University, Beijing, China
Lei Lin
Institute of Computing Intelligence, Beijing Information Science and Technology University, Beijing, China
Hongbo Huang

Authors

Yongxin Su
View author publications
You can also search for this author in PubMed Google Scholar
Hongbo Huang
View author publications
You can also search for this author in PubMed Google Scholar
Zun Song
View author publications
You can also search for this author in PubMed Google Scholar
Lei Lin
View author publications
You can also search for this author in PubMed Google Scholar
Jinhan Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hongbo Huang .

Editor information

Editors and Affiliations

Beijing University of Posts and Telecommunications, Beijing, China
Weihong Deng
Tsinghua University, Beijing, China
Jianjiang Feng
Beihang University, Beijing, China
Di Huang
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Meina Kan
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Zhenan Sun
Tsinghua University, Beijing, China
Fang Zheng
China Electronics Standardization Institute, Beijing, China
Wenfeng Wang
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Zhaofeng He

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Su, Y., Huang, H., Song, Z., Lin, L., Liu, J. (2022). UMixer: A Novel U-shaped Convolutional Mixer for Multi-scale Feature Fusion in Medical Image Segmentation. In: Deng, W., et al. Biometric Recognition. CCBR 2022. Lecture Notes in Computer Science, vol 13628. Springer, Cham. https://doi.org/10.1007/978-3-031-20233-9_70

Download citation

DOI: https://doi.org/10.1007/978-3-031-20233-9_70
Published: 03 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20232-2
Online ISBN: 978-3-031-20233-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

UMixer: A Novel U-shaped Convolutional Mixer for Multi-scale Feature Fusion in Medical Image Segmentation