Dual encoder network with transformer-CNN for multi-organ segmentation

Hong, Zhifang; Chen, Mingzhi; Hu, Weijie; Yan, Shiyu; Qu, Aiping; Chen, Lingna; Chen, Junxi

doi:10.1007/s11517-022-02723-9

Dual encoder network with transformer-CNN for multi-organ segmentation

Original Article
Published: 29 December 2022

Volume 61, pages 661–671, (2023)
Cite this article

Medical & Biological Engineering & Computing Aims and scope Submit manuscript

Zhifang Hong¹,
Mingzhi Chen²,
Weijie Hu³,
Shiyu Yan¹,
Aiping Qu¹,
Lingna Chen ORCID: orcid.org/0000-0002-9803-3681¹ &
…
Junxi Chen⁴

1028 Accesses
4 Citations
Explore all metrics

Abstract

Medical image segmentation is a critical step in many imaging applications. Automatic segmentation has gained extensive concern using a convolutional neural network (CNN). However, the traditional CNN-based methods fail to extract global and long-range contextual information due to local convolution operation. Transformer overcomes the limitation of CNN-based models. Inspired by the success of transformers in computer vision (CV), many researchers focus on designing the transformer-based U-shaped method in medical image segmentation. The transformer-based approach cannot effectively capture the fine-grained details. This paper proposes a dual encoder network with transformer-CNN for multi-organ segmentation. The new segmentation framework takes full advantage of CNN and transformer to enhance the segmentation accuracy. The Swin-transformer encoder extracts global information, and the CNN encoder captures local information. We introduce fusion modules to fuse convolutional features and the sequence of features from the transformer. Feature fusion is concatenated through the skip connection to smooth the decision boundary effectively. We extensively evaluate our method on the synapse multi-organ CT dataset and the automated cardiac diagnosis challenge (ACDC) dataset. The results demonstrate that the proposed method achieves Dice similarity coefficient (DSC) metrics of 80.68% and 91.12% on the synapse multi-organ CT and ACDC datasets, respectively. We perform the ablation studies on the ACDC dataset, demonstrating the effectiveness of critical components of our method. Our results match the ground-truth boundary more consistently than the existing models. Our approach gains more accurate results on challenging 2D images for multi-organ segmentation. Compared with the state-of-the-art methods, our proposed method achieves superior performance in multi-organ segmentation tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

GDTNet: A Synergistic Dilated Transformer and CNN by Gate Attention for Abdominal Multi-organ Segmentation

LiteTrans: Reconstruct Transformer with Convolution for Medical Image Segmentation

MBUTransNet: multi-branch U-shaped network fusion transformer architecture for medical image segmentation

Article 06 April 2023

References

Zhou Z, Siddiquee MMR, Tajbakhsh N, Liang J (2019) Unet+ +: redesigning skip connections to exploit multiscale features in image segmentation. IEEE Trans Med Imaging 39(6):1856–1867
Article PubMed PubMed Central Google Scholar
Diakogiannis FI, Waldner F, Caccetta P, Wu C (2020) ResUNet-a: a deep learning framework for semantic segmentation of remotely sensed data. ISPRS J Photogramm Remote Sens 162:94–114
Article Google Scholar
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708
Bello I (2021) Lambdanetworks: modeling long-range interactions without attention. arXiv:2102.08602
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S et al (2020) An image is worth 16x16 words: transformers for image recognition at scale. arXiv:2010.11929
Wang W, Chen C, Ding M, Yu H, Zha S, Li J (2021) Transbts: multimodal brain tumor segmentation using transformer. In: International conference on medical image computing and computer-assisted intervention, pp. 109–119. Springer
Dai Y, Gao Y, Liu F (2021) Transmed: transformers advance multi-modal medical image classification. Diagnostics 11(8):1384
Article PubMed PubMed Central Google Scholar
Cao H, Wang Y, Chen J, Jiang D, Zhang X, Tian Q, Wang M (2021) Swin-unet: Unet-like pure transformer for medical image segmentation. arXiv:2105.05537
Wang H, Cao P, Wang J, Zaiane OR (2022) Uctransnet: rethinking the skip connections in u-net from a channel-wise perspective with transformer. In: Proceedings of the AAAI conference on artificial intelligence, vol 36, pp 2441–2449
Chen J, Lu Y, Yu Q, Luo X, Adeli E, Wang Y, Lu L, Yuille AL, Zhou Y (2021) Transunet: transformers make strong encoders for medical image segmentation. arXiv:2102.04306
Landman B, Xu Z, Igelsias J, Styner M, Langerak T, Klein A (2015) Miccai multi-atlas labeling beyond the cranial vault–workshop and challenge. In: Proc MICCAI Multi-Atlas Labeling Beyond Cranial Vault—Workshop Challenge. vol 5, pp 12
Bernard O, Lalande A, Zotti C, Cervenansky F, Yang X, Heng PA, Cetin I, Lekadir K, Camara O, Ballester MAG et al (2018) Deep learning techniques for automatic MRI cardiac multi-structures segmentation and diagnosis: is the problem solved?. IEEE Trans Med Imaging 37(11):2514–2525
Article PubMed Google Scholar
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Advances in neural information processing systems, 30
Devlin J, Chang MW, Lee K, Toutanova K (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805
Fan H, Xiong B, Mangalam K, Li Y, Yan Z, Malik J, Feichtenhofer C (2021) Multiscale vision transformers. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 6824–6835
Strudel R, Garcia R, Laptev I, Schmid C (2021) Segmenter: transformer for semantic segmentation. In: Proceedings of the IEEE/CVF international conference on computer vision, 7262–7272
Zheng S, Lu J, Zhao H, Zhu X, Luo Z, Wang Y, Fu Y, Feng J, Xiang T, Torr PH et al (2021) Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6881–6890
Touvron H, Cord M, Douze M, Massa F, Sablayrolles A, Jégou H (2021) Training data-efficient image transformers & distillation through attention. In: International conference on machine learning, pp. 10347–10357. PMLR
Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 10012–10022
Xu G, Wu X, Zhang X, He X (2021) Levit-unet: make faster encoders with transformer for medical image segmentation. arXiv:2107.08623
Graham B, El-Nouby A, Touvron H, Stock P, Joulin A, Jégou H, Douze M (2021) Levit: a vision transformer in convnet’s clothing for faster inference. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 12259–12269
Wang H, Xie S, Lin L, Iwamoto Y, Han XH, Chen YW, Tong R (2022) Mixed transformer u-net for medical image segmentation. In: ICASSP 2022-2022 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 2390–2394. IEEE
Yan X, Tang H, Sun S, Ma H, Kong D, Xie X (2022) After-unet: axial fusion transformer unet for medical image segmentation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp 3971–3981
Hatamizadeh A, Tang Y, Nath V, Yang D, Myronenko A, Landman B, Roth HR, Xu D (2022) Unetr: transformers for 3d medical image segmentation. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp. 574–584
Gao Y, Zhou M, Metaxas DN (2021) UTNet: a hybrid transformer architecture for medical image segmentation. In: International conference on medical image computing and computer-assisted intervention, pp. 61–71. Springer
Xie Y, Zhang J, Shen C, Xia Y (2021) Cotr: efficiently bridging cnn and transformer for 3d medical image segmentation. In: International conference on medical image computing and computer-assisted intervention, pp. 171–180. Springer
Lin A, Chen B, Xu J, Zhang Z, Lu G, Zhang D (2022) Ds-transunet: dual Swin transformer u-net for medical image segmentation. IEEE Transactions on Instrumentation and Measurement
Zhang Y, Liu H, Hu Q (2021) Transfuse: fusing transformers and cnns for medical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 14–24. Springer
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Wang T, Lan J, Han Z, Hu Z, Huang Y, Deng Y, Zhang H, Wang J, Chen M, Jiang H, et al. (2022) O-Net: a novel framework with deep fusion of CNN and transformer for simultaneous segmentation and classification. Front Neurosci, 16
Huang J, Fang Y, Wu Y, Wu H, Gao Z, Li Y, Del Ser J, Xia J, Yang G (2022) Swin transformer for fast MRI. Neurocomputing 493:281–304
Article Google Scholar
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141
Woo S, Park J, Lee JY, Kweon IS (2018) Cbam: convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp 3–19
Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: International Conference on Medical image computing and computer-assisted intervention, pp 234–241. Springer
Schlemper J, Oktay O, Schaap M, Heinrich M, Kainz B, Glocker B, Rueckert D (2019) Attention gated networks: learning to leverage salient regions in medical images. Med Image Anal 53:197–207
Article PubMed PubMed Central Google Scholar

Download references

Funding

This study was funded by the National Natural Science Foundation of China (grant numbers 61504055 and 61701218), the Natural Science Foundation of Hunan Province of China (grant numbers 2020JJ4514 and 2020JJ4519), and the Postgraduate Research Innovation Project of Hunan Province of China (grant numbers CX20200934). This study was also funded by Hunan provincial base for scientific and technological innovation cooperation.

Author information

Authors and Affiliations

Computer School, University of South China, Hengyang, 421001, China
Zhifang Hong, Shiyu Yan, Aiping Qu & Lingna Chen
College of Mechanical and Vehicle Engineering, Hunan University, Hengyang, 410082, China
Mingzhi Chen
School of Economics and Management, Beijing University of Chemical Technology, Beijing, 100029, China
Weijie Hu
Affiliated Nanhua Hospital, University of South China, Hengyang, 421001, China
Junxi Chen

Authors

Zhifang Hong
View author publications
You can also search for this author in PubMed Google Scholar
Mingzhi Chen
View author publications
You can also search for this author in PubMed Google Scholar
Weijie Hu
View author publications
You can also search for this author in PubMed Google Scholar
Shiyu Yan
View author publications
You can also search for this author in PubMed Google Scholar
Aiping Qu
View author publications
You can also search for this author in PubMed Google Scholar
Lingna Chen
View author publications
You can also search for this author in PubMed Google Scholar
Junxi Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lingna Chen.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Zhifang Hong and Mingzhi Chen contributed equally to this study.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Hong, Z., Chen, M., Hu, W. et al. Dual encoder network with transformer-CNN for multi-organ segmentation. Med Biol Eng Comput 61, 661–671 (2023). https://doi.org/10.1007/s11517-022-02723-9

Download citation

Received: 10 February 2022
Accepted: 27 November 2022
Published: 29 December 2022
Issue Date: March 2023
DOI: https://doi.org/10.1007/s11517-022-02723-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dual encoder network with transformer-CNN for multi-organ segmentation

Abstract

Access this article

Similar content being viewed by others

GDTNet: A Synergistic Dilated Transformer and CNN by Gate Attention for Abdominal Multi-organ Segmentation

LiteTrans: Reconstruct Transformer with Convolution for Medical Image Segmentation

MBUTransNet: multi-branch U-shaped network fusion transformer architecture for medical image segmentation

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Dual encoder network with transformer-CNN for multi-organ segmentation

Abstract

Access this article

Similar content being viewed by others

GDTNet: A Synergistic Dilated Transformer and CNN by Gate Attention for Abdominal Multi-organ Segmentation

LiteTrans: Reconstruct Transformer with Convolution for Medical Image Segmentation

MBUTransNet: multi-branch U-shaped network fusion transformer architecture for medical image segmentation

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation