End-to-end dynamic residual focal transformer network for multimodal medical image fusion

Zhang, Weihao; Yu, Lei; Wang, Huiqi; Pedrycz, Witold

doi:10.1007/s00521-024-09729-4

End-to-end dynamic residual focal transformer network for multimodal medical image fusion

Original Article
Published: 17 April 2024

(2024)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Weihao Zhang¹,
Lei Yu¹,
Huiqi Wang² &
…
Witold Pedrycz³

92 Accesses
Explore all metrics

Abstract

Multimodal medical image fusion aims to improve the clinical practicability of medical images by integrating complementary information from multiple medical images. However, in traditional fusion methods, the fusion rules based on prior knowledge or logic usually cannot match the feature representation perfectly, which results in partial information loss. Furthermore, most deep learning-based fusion methods depend on convolutional operations, which only focus on local features and have limited retention of context information. To address the above issues, we propose an end-to-end dynamic residual focal transformer network for multimodal medical image fusion, termed DRFT. The DRFT framework is an end-to-end network with no need to manually design fusion rules. Firstly, the context-gated convolution is introduced to construct the context dynamic extraction module (CDEM) to extract the key semantic information more accurately from multimodal medical images. Then, a new residual transformer fusion module (RTFM) is designed by incorporating the focal transformer into the residual mechanism, which can not only extract the deep semantic features, but also adaptively learn the optimal fusion scheme. Finally, the nest architecture is employed to extract multiscale features. In addition, a new objective function consisting of global detail loss and fusion enhancement loss is designed to enrich the modal information in the fused image. Notably, the proposed network does not require the two-stage training strategy as opposed to the traditional encoder–decoder fusion structure. Extensive experimental results on mainstream datasets show that, compared with the state-of-the-art methods, the proposed DRFT delivers better performance in both qualitative and quantitative evaluation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-modal medical image fusion based on densely-connected high-resolution CNN and hybrid transformer

Article 29 July 2022

RTFusion: A Multimodal Fusion Network with Significant Information Enhancement

Article 10 April 2023

Multimodal Medical Image Fusion Based on Multichannel Aggregated Network

Data availability

The data are available from the corresponding author on reasonable request.

References

Du J, Li WS, Xiao B (2017) Anatomical-functional image fusion by information of interest in local Laplacian filtering domain. IEEE Trans Image Process 26(12):5855–5866
Article MathSciNet Google Scholar
Azam MA, Khan KB, Salahuddin S, Rehman E, Khan SA, Khan MA, Kadry S, Gandomi AH (2022) A review on multimodal medical image fusion: compendious analysis of medical modalities, multimodal databases, fusion techniques and quality metrics. Comput Biol Med 144:105253
Article Google Scholar
Zhou T, Cheng QR, Lu HL, Li Q, Zhang XX, Qiu S (2023) Deep learning methods for medical image fusion: a review. Comput Biol Med 160:106959
Article Google Scholar
Ma B, Zhu Y, Yin X et al (2021) Sesf-fuse: an unsupervised deep model for multi-focus image fusion. Neural Comput Appl 33:5793–5804
Article Google Scholar
He CT, Liu QX, Li HL, Wang HX (2010) Multimodal medical image fusion based on IHS and PCA. Proced Eng 7:280–285
Article Google Scholar
Faragallah OS, Muhammed AN, Taha TS, Geweid GGN (2021) PCA based SVD fusion for MRI and CT medical images. J Intell Fuzzy Syst 41(2):4021–4033
Article Google Scholar
Bhat S, Koundal D (2021) Multi-focus image fusion using neutrosophic based wavelet transform. Appl Soft Comput 106:107307
Article Google Scholar
Ibrahim SI, Makhlouf MA, El-Tawel GS (2023) Multimodal medical image fusion algorithm based on pulse coupled neural networks and nonsubsampled contourlet transform. Med Biol Eng Compu 61(1):155–177
Article Google Scholar
Xu W, Fu YL, Xu H, Wong KKL (2023) Medical image fusion using enhanced cross-visual cortex model based on artificial selection and impulse-coupled neural network. Comput Methods Progr Biomed 229:107304
Article Google Scholar
Bhatnagar G, Wu QMJ, Liu Z (2015) A new contrast based multimodal medical image fusion framework. Neurocomputing 157:143–152
Article Google Scholar
Shabanzade F, Ghassemian H (2017) Combination of wavelet and contourlet transforms for PET and MRI image fusion, In:2017 Artificial Intelligence and Signal Processing Conference (AISP), Shiraz, Iran, pp. 178-183.
Daniel E (2018) Optimum wavelet-based homomorphic medical image fusion using hybrid genetic–grey wolf optimization algorithm. IEEE Sens J 18(16):6804–6811
Article Google Scholar
Zhu ZQ, Chai Y, Yin HP, Li YX, Liu ZD (2016) A novel dictionary learning approach for multi-modality medical image fusion. Neurocomputing 214:471–482
Article Google Scholar
Liu Y, Chen X, Ward RK, Wang ZJ (2019) Medical image fusion via convolutional sparsity based morphological component analysis. IEEE Signal Process Lett 26(3):485–489
Article Google Scholar
Qi GQ, Wang JC, Zhang Q, Zeng FC, Zhu ZQ (2017) An integrated dictionary-learning entropy-based medical image fusion framework. Future Internet 9(4):61
Article Google Scholar
Dinh PH (2023) Combining spectral total variation with dynamic threshold neural P systems for medical image fusion. Biomed Signal Process Control 80:104343
Article Google Scholar
Li Y, Liu G, PBavirisetti D et al (2023) Infrared-visible image fusion method based on sparse and prior joint saliency detection and LatLRR-FPDE. Digital Signal Processing 134:103910
Article Google Scholar
Dinh PH (2023) Medical image fusion based on enhanced three-layer image decomposition and chameleon swarm algorithm. Biomed Signal Process Control 84:104740
Article Google Scholar
Panigrahy C, Seal A, Gonzalo-Martín C, Pathak P, Jalal AS (2023) Parameter adaptive unit-linking pulse coupled neural network based MRI–PET/SPECT image fusion. Biomed Signal Process Control 83:104659
Article Google Scholar
Tang H, Liu G, Tang L et al (2022) MdedFusion: a multi-level detail enhancement decomposition method for infrared and visible image fusion. Infrared Phys Technol 127:104435
Article Google Scholar
Li H, Wu XJ (2018) DenseFuse: a fusion approach to infrared and visible images. IEEE Trans Image Process 28(5):2614–2623
Article MathSciNet Google Scholar
Li H, Wu XJ, Kittler J (2021) RFN-Nest: an end-to-end residual fusion network for infrared and visible images. Inf Fusion 73:72–86
Article Google Scholar
Liu Y, Chen X, Cheng J, Peng H (2017) A medical image fusion method based on convolutional neural networks, In: 20th International Conference on Information Fusion, Xi'an, China, pp. 1–7.
Wang LF, Zhang J, Liu Y, Mi J, Zhang J (2021) Multimodal medical image fusion based on Gabor representation combination of multi-CNN and fuzzy neural network. IEEE Access 9:67634–67647
Article Google Scholar
Fu J, He BQ, Yang J, Liu JP, Ouyang AJ, Wang Y (2023) CDRNet: cascaded dense residual network for grayscale and pseudocolor medical image fusion. Comput Methods Programs Biomed 234:107506
Article Google Scholar
Fu J, Li WS, Peng XX, Du J, Ouyang AJ, Wang Q, Chen X (2023) MDRANet: a multiscale dense residual attention network for magnetic resonance and nuclear medicine image fusion. Biomed Signal Process Control 80:104382
Article Google Scholar
Li JW, Han DG, Wang XP, Yi P, Yan L, Li XS (2023) Multi-sensor medical-image fusion technique based on embedding bilateral filter in least squares and salient detection. Sensors 23(7):3490
Article Google Scholar
Ding ZS, Li HY, Guo Y, Zhou DM, Liu YY, Xie SD (2023) M4FNet: multimodal medical image fusion network via multi-receptive-field and multi-scale feature integration. Comput Biol Med 159:106923
Article Google Scholar
Li WS, Peng XX, Fu J, Wang GF, Huang YP, Chao FF (2022) A multiscale double-branch residual attention network for anatomical–functional medical image fusion. Comput Biol Med 141:105005
Article Google Scholar
Li H, Wu XJ, Durrani T (2020) NestFuse: an infrared and visible image fusion architecture based on nest connection and spatial/channel attention models. IEEE Trans Instrum Meas 69(12):9645–9656
Article Google Scholar
Guo K, Li XF, Hu XH, Liu JC, Fan TH (2021) Hahn-PCNN-CNN: an end-to-end multi-modal brain medical image fusion framework useful for clinical diagnosis. BMC Med Imaging 21:1–22
Article Google Scholar
Fu J, Li WS, Du J, Huang YP (2021) A multiscale residual pyramid attention network for medical image fusion. Biomed Signal Process Control 66:102488
Article Google Scholar
Zhao C, Wang TF, Lei BY (2021) Medical image fusion method based on dense block and deep convolutional generative adversarial network. Neural Comput Appl 33:6595–6610
Article Google Scholar
Wang J, Yu L, Tian SW (2022) MsRAN: a multi-scale residual attention network for multi-model image fusion. Med Biol Eng Compu 60:3615–3634
Article Google Scholar
Ma JY, Xu H, Jiang JJ, Mei XG, Zhang XP (2020) DDcGAN: a dual-discriminator conditional generative adversarial network for multi-resolution image fusion. IEEE Trans Image Process 29:4980–4995
Article Google Scholar
Fu J, Li WS, Du J, Xu LM (2021) DSAGAN: a generative adversarial network based on dual-stream attention mechanism for anatomical and functional image fusion. Inf Sci 576:484–506
Article MathSciNet Google Scholar
Dosovitskiy A, Beyer L, Kolesnikov A, et al (2020) An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929
Wang ZS, Chen YL, Shao WY, Li H, Zhang L (2022) SwinFuse: a residual swin transformer fusion network for infrared and visible images. IEEE Trans Instrum Meas 71:1–12
Article Google Scholar
Li J, Zhu JM, Li C, Chen X, Yang B (2022) CGTF: convolution-guided transformer for infrared and visible image fusion. IEEE Trans Instrum Meas 71:1–14
Article Google Scholar
Zhang J, Liu AP, Wang D, Liu Y, Wang ZJ, Chen X (2022) Transformer-based end-to-end anatomical and functional image fusion. IEEE Trans Instrum Meas 71:1–11
Google Scholar
Tang W, He FZ, Liu Y, Duan YS (2022) MATR: multimodal medical image fusion via multiscale adaptive transformer. IEEE Trans Image Process 31:5134–5149
Article Google Scholar
Lin X, Ma L, Liu W, Chang SF (2020) Context-gated convolution. In: Vedaldi A, Bischof H, Brox T, Frahm JM (eds) Computer Vision—ECCV 2020. ECCV 2020. Lecture Notes in Computer Science, vol 12363. Springer, Cham
Google Scholar
Yang J, Li C, Zhang P, Dai X, Xiao B, Yuan L, Gao J (2021) Focal self-attention for local-global interactions in vision transformers. arXiv preprint arXiv:2107.00641
Vanitha K (2020) Medical image fusion algorithm based on weighted local energy motivated PAPCNN in NSST domain. J Adv Res Dyn Control Syst 12(SP3):960–967
Article Google Scholar
Xu H, Ma JY (2021) EMFusion: an unsupervised enhanced medical image fusion network. Inf Fusion 76:177–186
Article Google Scholar
Xu H, Ma JY, Jiang JJ, Guo XJ, Ling HB (2020) U2Fusion: a unified unsupervised image fusion network. IEEE Trans Pattern Anal Mach Intell 44(1):502–518
Article Google Scholar
Zhang Y, Xiang WH, Zhang SL, Shen JJ, Wei R, Bai XZ, Zhang L, Zhang Q (2022) Local extreme map guided multi-modal brain image fusion. Front Neurosci 16:1055451
Article Google Scholar
Ma J, Tang L, Fan F, Huang J, Mei X, Ma Y (2022) SwinFusion: cross-domain long-range learning for general image fusion via swin transformer. IEEE/CAA J Autom Sinica 9(7):1200–1217
Article Google Scholar
Kurban R (2023) Gaussian of differences: a simple and efficient general image fusion method. Entropy 25(8):1215
Article MathSciNet Google Scholar
Tang LF, Zhang H, Xu H, Ma JY (2023) Deep learning-based image fusion: a survey. J Image Gr 28(1):3–36
Google Scholar
Qu GH, Zhang DL, Yan PF (2002) Information measure for performance of image fusion. Electron Lett 38(7):313–315
Article Google Scholar
Sheikh HR, Bovik AC (2006) Image information and visual quality. IEEE Trans Image Process 15(2):430–444
Article Google Scholar
Cui GM, Feng HJ, Xu ZH, Li Q, Chen YT (2015) Detail preserved fusion of visible and infrared images using regional saliency extraction and multi-scale image decomposition. Optics Communications 341:199–209
Article Google Scholar
Xydeas CS, Petrovic V (2000) Objective image fusion performance measure. Military Technical Courier 56(4):181–193
Google Scholar
Eskicioglu AM, Fisher PS (1995) Image quality measures and their performance. IEEE Trans Commun 43(12):2959–2965
Article Google Scholar
Kong WW, Miao QG, Liu RY, Lei Y, Cui J, Xie Q (2022) Multimodal medical image fusion using gradient domain guided filter random walk and side window filtering in framelet domain. Inf Sci 585:418–440
Article Google Scholar
Li XS, Zhou FQ, Tan HS, Zhang WN, Zhao CY (2021) Multimodal medical image fusion based on joint bilateral filter and local gradient energy. Inf Sci 569:302–325
Article MathSciNet Google Scholar

Download references

Acknowledgements

The authors would like to thank the anonymous editors and reviewers for their valuable advice and help. This work was supported by the grant from the National Natural Science Foundation of China [No. 72071019], and grant from the Natural Science Foundation of Chongqing [No. cstc2021jcyj-msxmX0185].

Author information

Authors and Affiliations

College of Computer and Information Science, Chongqing Normal University, Chongqing, 401331, China
Weihao Zhang & Lei Yu
College of Mathematics and Statistics, Chongqing University, Chongqing, 401331, China
Huiqi Wang
Department of Electrical and Computer Engineering, University of Alberta, Edmonton, T6G 2V4, Canada
Witold Pedrycz

Authors

Weihao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Lei Yu
View author publications
You can also search for this author in PubMed Google Scholar
Huiqi Wang
View author publications
You can also search for this author in PubMed Google Scholar
Witold Pedrycz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lei Yu.

Ethics declarations

Competing of interests

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhang, W., Yu, L., Wang, H. et al. End-to-end dynamic residual focal transformer network for multimodal medical image fusion. Neural Comput & Applic (2024). https://doi.org/10.1007/s00521-024-09729-4

Download citation

Received: 10 December 2023
Accepted: 25 March 2024
Published: 17 April 2024
DOI: https://doi.org/10.1007/s00521-024-09729-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

End-to-end dynamic residual focal transformer network for multimodal medical image fusion

Abstract

Access this article

Similar content being viewed by others

Multi-modal medical image fusion based on densely-connected high-resolution CNN and hybrid transformer

RTFusion: A Multimodal Fusion Network with Significant Information Enhancement

Multimodal Medical Image Fusion Based on Multichannel Aggregated Network

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing of interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

End-to-end dynamic residual focal transformer network for multimodal medical image fusion

Abstract

Access this article

Similar content being viewed by others

Multi-modal medical image fusion based on densely-connected high-resolution CNN and hybrid transformer

RTFusion: A Multimodal Fusion Network with Significant Information Enhancement

Multimodal Medical Image Fusion Based on Multichannel Aggregated Network

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing of interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation