Abstract
Multimodal medical image fusion aims to improve the clinical practicability of medical images by integrating complementary information from multiple medical images. However, in traditional fusion methods, the fusion rules based on prior knowledge or logic usually cannot match the feature representation perfectly, which results in partial information loss. Furthermore, most deep learning-based fusion methods depend on convolutional operations, which only focus on local features and have limited retention of context information. To address the above issues, we propose an end-to-end dynamic residual focal transformer network for multimodal medical image fusion, termed DRFT. The DRFT framework is an end-to-end network with no need to manually design fusion rules. Firstly, the context-gated convolution is introduced to construct the context dynamic extraction module (CDEM) to extract the key semantic information more accurately from multimodal medical images. Then, a new residual transformer fusion module (RTFM) is designed by incorporating the focal transformer into the residual mechanism, which can not only extract the deep semantic features, but also adaptively learn the optimal fusion scheme. Finally, the nest architecture is employed to extract multiscale features. In addition, a new objective function consisting of global detail loss and fusion enhancement loss is designed to enrich the modal information in the fused image. Notably, the proposed network does not require the two-stage training strategy as opposed to the traditional encoder–decoder fusion structure. Extensive experimental results on mainstream datasets show that, compared with the state-of-the-art methods, the proposed DRFT delivers better performance in both qualitative and quantitative evaluation.
Similar content being viewed by others
Data availability
The data are available from the corresponding author on reasonable request.
References
Du J, Li WS, Xiao B (2017) Anatomical-functional image fusion by information of interest in local Laplacian filtering domain. IEEE Trans Image Process 26(12):5855–5866
Azam MA, Khan KB, Salahuddin S, Rehman E, Khan SA, Khan MA, Kadry S, Gandomi AH (2022) A review on multimodal medical image fusion: compendious analysis of medical modalities, multimodal databases, fusion techniques and quality metrics. Comput Biol Med 144:105253
Zhou T, Cheng QR, Lu HL, Li Q, Zhang XX, Qiu S (2023) Deep learning methods for medical image fusion: a review. Comput Biol Med 160:106959
Ma B, Zhu Y, Yin X et al (2021) Sesf-fuse: an unsupervised deep model for multi-focus image fusion. Neural Comput Appl 33:5793–5804
He CT, Liu QX, Li HL, Wang HX (2010) Multimodal medical image fusion based on IHS and PCA. Proced Eng 7:280–285
Faragallah OS, Muhammed AN, Taha TS, Geweid GGN (2021) PCA based SVD fusion for MRI and CT medical images. J Intell Fuzzy Syst 41(2):4021–4033
Bhat S, Koundal D (2021) Multi-focus image fusion using neutrosophic based wavelet transform. Appl Soft Comput 106:107307
Ibrahim SI, Makhlouf MA, El-Tawel GS (2023) Multimodal medical image fusion algorithm based on pulse coupled neural networks and nonsubsampled contourlet transform. Med Biol Eng Compu 61(1):155–177
Xu W, Fu YL, Xu H, Wong KKL (2023) Medical image fusion using enhanced cross-visual cortex model based on artificial selection and impulse-coupled neural network. Comput Methods Progr Biomed 229:107304
Bhatnagar G, Wu QMJ, Liu Z (2015) A new contrast based multimodal medical image fusion framework. Neurocomputing 157:143–152
Shabanzade F, Ghassemian H (2017) Combination of wavelet and contourlet transforms for PET and MRI image fusion, In:2017 Artificial Intelligence and Signal Processing Conference (AISP), Shiraz, Iran, pp. 178-183.
Daniel E (2018) Optimum wavelet-based homomorphic medical image fusion using hybrid genetic–grey wolf optimization algorithm. IEEE Sens J 18(16):6804–6811
Zhu ZQ, Chai Y, Yin HP, Li YX, Liu ZD (2016) A novel dictionary learning approach for multi-modality medical image fusion. Neurocomputing 214:471–482
Liu Y, Chen X, Ward RK, Wang ZJ (2019) Medical image fusion via convolutional sparsity based morphological component analysis. IEEE Signal Process Lett 26(3):485–489
Qi GQ, Wang JC, Zhang Q, Zeng FC, Zhu ZQ (2017) An integrated dictionary-learning entropy-based medical image fusion framework. Future Internet 9(4):61
Dinh PH (2023) Combining spectral total variation with dynamic threshold neural P systems for medical image fusion. Biomed Signal Process Control 80:104343
Li Y, Liu G, PBavirisetti D et al (2023) Infrared-visible image fusion method based on sparse and prior joint saliency detection and LatLRR-FPDE. Digital Signal Processing 134:103910
Dinh PH (2023) Medical image fusion based on enhanced three-layer image decomposition and chameleon swarm algorithm. Biomed Signal Process Control 84:104740
Panigrahy C, Seal A, Gonzalo-Martín C, Pathak P, Jalal AS (2023) Parameter adaptive unit-linking pulse coupled neural network based MRI–PET/SPECT image fusion. Biomed Signal Process Control 83:104659
Tang H, Liu G, Tang L et al (2022) MdedFusion: a multi-level detail enhancement decomposition method for infrared and visible image fusion. Infrared Phys Technol 127:104435
Li H, Wu XJ (2018) DenseFuse: a fusion approach to infrared and visible images. IEEE Trans Image Process 28(5):2614–2623
Li H, Wu XJ, Kittler J (2021) RFN-Nest: an end-to-end residual fusion network for infrared and visible images. Inf Fusion 73:72–86
Liu Y, Chen X, Cheng J, Peng H (2017) A medical image fusion method based on convolutional neural networks, In: 20th International Conference on Information Fusion, Xi'an, China, pp. 1–7.
Wang LF, Zhang J, Liu Y, Mi J, Zhang J (2021) Multimodal medical image fusion based on Gabor representation combination of multi-CNN and fuzzy neural network. IEEE Access 9:67634–67647
Fu J, He BQ, Yang J, Liu JP, Ouyang AJ, Wang Y (2023) CDRNet: cascaded dense residual network for grayscale and pseudocolor medical image fusion. Comput Methods Programs Biomed 234:107506
Fu J, Li WS, Peng XX, Du J, Ouyang AJ, Wang Q, Chen X (2023) MDRANet: a multiscale dense residual attention network for magnetic resonance and nuclear medicine image fusion. Biomed Signal Process Control 80:104382
Li JW, Han DG, Wang XP, Yi P, Yan L, Li XS (2023) Multi-sensor medical-image fusion technique based on embedding bilateral filter in least squares and salient detection. Sensors 23(7):3490
Ding ZS, Li HY, Guo Y, Zhou DM, Liu YY, Xie SD (2023) M4FNet: multimodal medical image fusion network via multi-receptive-field and multi-scale feature integration. Comput Biol Med 159:106923
Li WS, Peng XX, Fu J, Wang GF, Huang YP, Chao FF (2022) A multiscale double-branch residual attention network for anatomical–functional medical image fusion. Comput Biol Med 141:105005
Li H, Wu XJ, Durrani T (2020) NestFuse: an infrared and visible image fusion architecture based on nest connection and spatial/channel attention models. IEEE Trans Instrum Meas 69(12):9645–9656
Guo K, Li XF, Hu XH, Liu JC, Fan TH (2021) Hahn-PCNN-CNN: an end-to-end multi-modal brain medical image fusion framework useful for clinical diagnosis. BMC Med Imaging 21:1–22
Fu J, Li WS, Du J, Huang YP (2021) A multiscale residual pyramid attention network for medical image fusion. Biomed Signal Process Control 66:102488
Zhao C, Wang TF, Lei BY (2021) Medical image fusion method based on dense block and deep convolutional generative adversarial network. Neural Comput Appl 33:6595–6610
Wang J, Yu L, Tian SW (2022) MsRAN: a multi-scale residual attention network for multi-model image fusion. Med Biol Eng Compu 60:3615–3634
Ma JY, Xu H, Jiang JJ, Mei XG, Zhang XP (2020) DDcGAN: a dual-discriminator conditional generative adversarial network for multi-resolution image fusion. IEEE Trans Image Process 29:4980–4995
Fu J, Li WS, Du J, Xu LM (2021) DSAGAN: a generative adversarial network based on dual-stream attention mechanism for anatomical and functional image fusion. Inf Sci 576:484–506
Dosovitskiy A, Beyer L, Kolesnikov A, et al (2020) An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929
Wang ZS, Chen YL, Shao WY, Li H, Zhang L (2022) SwinFuse: a residual swin transformer fusion network for infrared and visible images. IEEE Trans Instrum Meas 71:1–12
Li J, Zhu JM, Li C, Chen X, Yang B (2022) CGTF: convolution-guided transformer for infrared and visible image fusion. IEEE Trans Instrum Meas 71:1–14
Zhang J, Liu AP, Wang D, Liu Y, Wang ZJ, Chen X (2022) Transformer-based end-to-end anatomical and functional image fusion. IEEE Trans Instrum Meas 71:1–11
Tang W, He FZ, Liu Y, Duan YS (2022) MATR: multimodal medical image fusion via multiscale adaptive transformer. IEEE Trans Image Process 31:5134–5149
Lin X, Ma L, Liu W, Chang SF (2020) Context-gated convolution. In: Vedaldi A, Bischof H, Brox T, Frahm JM (eds) Computer Vision—ECCV 2020. ECCV 2020. Lecture Notes in Computer Science, vol 12363. Springer, Cham
Yang J, Li C, Zhang P, Dai X, Xiao B, Yuan L, Gao J (2021) Focal self-attention for local-global interactions in vision transformers. arXiv preprint arXiv:2107.00641
Vanitha K (2020) Medical image fusion algorithm based on weighted local energy motivated PAPCNN in NSST domain. J Adv Res Dyn Control Syst 12(SP3):960–967
Xu H, Ma JY (2021) EMFusion: an unsupervised enhanced medical image fusion network. Inf Fusion 76:177–186
Xu H, Ma JY, Jiang JJ, Guo XJ, Ling HB (2020) U2Fusion: a unified unsupervised image fusion network. IEEE Trans Pattern Anal Mach Intell 44(1):502–518
Zhang Y, Xiang WH, Zhang SL, Shen JJ, Wei R, Bai XZ, Zhang L, Zhang Q (2022) Local extreme map guided multi-modal brain image fusion. Front Neurosci 16:1055451
Ma J, Tang L, Fan F, Huang J, Mei X, Ma Y (2022) SwinFusion: cross-domain long-range learning for general image fusion via swin transformer. IEEE/CAA J Autom Sinica 9(7):1200–1217
Kurban R (2023) Gaussian of differences: a simple and efficient general image fusion method. Entropy 25(8):1215
Tang LF, Zhang H, Xu H, Ma JY (2023) Deep learning-based image fusion: a survey. J Image Gr 28(1):3–36
Qu GH, Zhang DL, Yan PF (2002) Information measure for performance of image fusion. Electron Lett 38(7):313–315
Sheikh HR, Bovik AC (2006) Image information and visual quality. IEEE Trans Image Process 15(2):430–444
Cui GM, Feng HJ, Xu ZH, Li Q, Chen YT (2015) Detail preserved fusion of visible and infrared images using regional saliency extraction and multi-scale image decomposition. Optics Communications 341:199–209
Xydeas CS, Petrovic V (2000) Objective image fusion performance measure. Military Technical Courier 56(4):181–193
Eskicioglu AM, Fisher PS (1995) Image quality measures and their performance. IEEE Trans Commun 43(12):2959–2965
Kong WW, Miao QG, Liu RY, Lei Y, Cui J, Xie Q (2022) Multimodal medical image fusion using gradient domain guided filter random walk and side window filtering in framelet domain. Inf Sci 585:418–440
Li XS, Zhou FQ, Tan HS, Zhang WN, Zhao CY (2021) Multimodal medical image fusion based on joint bilateral filter and local gradient energy. Inf Sci 569:302–325
Acknowledgements
The authors would like to thank the anonymous editors and reviewers for their valuable advice and help. This work was supported by the grant from the National Natural Science Foundation of China [No. 72071019], and grant from the Natural Science Foundation of Chongqing [No. cstc2021jcyj-msxmX0185].
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing of interests
The authors have no relevant financial or non-financial interests to disclose.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhang, W., Yu, L., Wang, H. et al. End-to-end dynamic residual focal transformer network for multimodal medical image fusion. Neural Comput & Applic (2024). https://doi.org/10.1007/s00521-024-09729-4
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00521-024-09729-4