Skip to main content
Log in

End-to-end dynamic residual focal transformer network for multimodal medical image fusion

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Multimodal medical image fusion aims to improve the clinical practicability of medical images by integrating complementary information from multiple medical images. However, in traditional fusion methods, the fusion rules based on prior knowledge or logic usually cannot match the feature representation perfectly, which results in partial information loss. Furthermore, most deep learning-based fusion methods depend on convolutional operations, which only focus on local features and have limited retention of context information. To address the above issues, we propose an end-to-end dynamic residual focal transformer network for multimodal medical image fusion, termed DRFT. The DRFT framework is an end-to-end network with no need to manually design fusion rules. Firstly, the context-gated convolution is introduced to construct the context dynamic extraction module (CDEM) to extract the key semantic information more accurately from multimodal medical images. Then, a new residual transformer fusion module (RTFM) is designed by incorporating the focal transformer into the residual mechanism, which can not only extract the deep semantic features, but also adaptively learn the optimal fusion scheme. Finally, the nest architecture is employed to extract multiscale features. In addition, a new objective function consisting of global detail loss and fusion enhancement loss is designed to enrich the modal information in the fused image. Notably, the proposed network does not require the two-stage training strategy as opposed to the traditional encoder–decoder fusion structure. Extensive experimental results on mainstream datasets show that, compared with the state-of-the-art methods, the proposed DRFT delivers better performance in both qualitative and quantitative evaluation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24

Similar content being viewed by others

Data availability

The data are available from the corresponding author on reasonable request.

References

  1. Du J, Li WS, Xiao B (2017) Anatomical-functional image fusion by information of interest in local Laplacian filtering domain. IEEE Trans Image Process 26(12):5855–5866

    Article  MathSciNet  Google Scholar 

  2. Azam MA, Khan KB, Salahuddin S, Rehman E, Khan SA, Khan MA, Kadry S, Gandomi AH (2022) A review on multimodal medical image fusion: compendious analysis of medical modalities, multimodal databases, fusion techniques and quality metrics. Comput Biol Med 144:105253

    Article  Google Scholar 

  3. Zhou T, Cheng QR, Lu HL, Li Q, Zhang XX, Qiu S (2023) Deep learning methods for medical image fusion: a review. Comput Biol Med 160:106959

    Article  Google Scholar 

  4. Ma B, Zhu Y, Yin X et al (2021) Sesf-fuse: an unsupervised deep model for multi-focus image fusion. Neural Comput Appl 33:5793–5804

    Article  Google Scholar 

  5. He CT, Liu QX, Li HL, Wang HX (2010) Multimodal medical image fusion based on IHS and PCA. Proced Eng 7:280–285

    Article  Google Scholar 

  6. Faragallah OS, Muhammed AN, Taha TS, Geweid GGN (2021) PCA based SVD fusion for MRI and CT medical images. J Intell Fuzzy Syst 41(2):4021–4033

    Article  Google Scholar 

  7. Bhat S, Koundal D (2021) Multi-focus image fusion using neutrosophic based wavelet transform. Appl Soft Comput 106:107307

    Article  Google Scholar 

  8. Ibrahim SI, Makhlouf MA, El-Tawel GS (2023) Multimodal medical image fusion algorithm based on pulse coupled neural networks and nonsubsampled contourlet transform. Med Biol Eng Compu 61(1):155–177

    Article  Google Scholar 

  9. Xu W, Fu YL, Xu H, Wong KKL (2023) Medical image fusion using enhanced cross-visual cortex model based on artificial selection and impulse-coupled neural network. Comput Methods Progr Biomed 229:107304

    Article  Google Scholar 

  10. Bhatnagar G, Wu QMJ, Liu Z (2015) A new contrast based multimodal medical image fusion framework. Neurocomputing 157:143–152

    Article  Google Scholar 

  11. Shabanzade F, Ghassemian H (2017) Combination of wavelet and contourlet transforms for PET and MRI image fusion, In:2017 Artificial Intelligence and Signal Processing Conference (AISP), Shiraz, Iran, pp. 178-183.

  12. Daniel E (2018) Optimum wavelet-based homomorphic medical image fusion using hybrid genetic–grey wolf optimization algorithm. IEEE Sens J 18(16):6804–6811

    Article  Google Scholar 

  13. Zhu ZQ, Chai Y, Yin HP, Li YX, Liu ZD (2016) A novel dictionary learning approach for multi-modality medical image fusion. Neurocomputing 214:471–482

    Article  Google Scholar 

  14. Liu Y, Chen X, Ward RK, Wang ZJ (2019) Medical image fusion via convolutional sparsity based morphological component analysis. IEEE Signal Process Lett 26(3):485–489

    Article  Google Scholar 

  15. Qi GQ, Wang JC, Zhang Q, Zeng FC, Zhu ZQ (2017) An integrated dictionary-learning entropy-based medical image fusion framework. Future Internet 9(4):61

    Article  Google Scholar 

  16. Dinh PH (2023) Combining spectral total variation with dynamic threshold neural P systems for medical image fusion. Biomed Signal Process Control 80:104343

    Article  Google Scholar 

  17. Li Y, Liu G, PBavirisetti D et al (2023) Infrared-visible image fusion method based on sparse and prior joint saliency detection and LatLRR-FPDE. Digital Signal Processing 134:103910

    Article  Google Scholar 

  18. Dinh PH (2023) Medical image fusion based on enhanced three-layer image decomposition and chameleon swarm algorithm. Biomed Signal Process Control 84:104740

    Article  Google Scholar 

  19. Panigrahy C, Seal A, Gonzalo-Martín C, Pathak P, Jalal AS (2023) Parameter adaptive unit-linking pulse coupled neural network based MRI–PET/SPECT image fusion. Biomed Signal Process Control 83:104659

    Article  Google Scholar 

  20. Tang H, Liu G, Tang L et al (2022) MdedFusion: a multi-level detail enhancement decomposition method for infrared and visible image fusion. Infrared Phys Technol 127:104435

    Article  Google Scholar 

  21. Li H, Wu XJ (2018) DenseFuse: a fusion approach to infrared and visible images. IEEE Trans Image Process 28(5):2614–2623

    Article  MathSciNet  Google Scholar 

  22. Li H, Wu XJ, Kittler J (2021) RFN-Nest: an end-to-end residual fusion network for infrared and visible images. Inf Fusion 73:72–86

    Article  Google Scholar 

  23. Liu Y, Chen X, Cheng J, Peng H (2017) A medical image fusion method based on convolutional neural networks, In: 20th International Conference on Information Fusion, Xi'an, China, pp. 1–7.

  24. Wang LF, Zhang J, Liu Y, Mi J, Zhang J (2021) Multimodal medical image fusion based on Gabor representation combination of multi-CNN and fuzzy neural network. IEEE Access 9:67634–67647

    Article  Google Scholar 

  25. Fu J, He BQ, Yang J, Liu JP, Ouyang AJ, Wang Y (2023) CDRNet: cascaded dense residual network for grayscale and pseudocolor medical image fusion. Comput Methods Programs Biomed 234:107506

    Article  Google Scholar 

  26. Fu J, Li WS, Peng XX, Du J, Ouyang AJ, Wang Q, Chen X (2023) MDRANet: a multiscale dense residual attention network for magnetic resonance and nuclear medicine image fusion. Biomed Signal Process Control 80:104382

    Article  Google Scholar 

  27. Li JW, Han DG, Wang XP, Yi P, Yan L, Li XS (2023) Multi-sensor medical-image fusion technique based on embedding bilateral filter in least squares and salient detection. Sensors 23(7):3490

    Article  Google Scholar 

  28. Ding ZS, Li HY, Guo Y, Zhou DM, Liu YY, Xie SD (2023) M4FNet: multimodal medical image fusion network via multi-receptive-field and multi-scale feature integration. Comput Biol Med 159:106923

    Article  Google Scholar 

  29. Li WS, Peng XX, Fu J, Wang GF, Huang YP, Chao FF (2022) A multiscale double-branch residual attention network for anatomical–functional medical image fusion. Comput Biol Med 141:105005

    Article  Google Scholar 

  30. Li H, Wu XJ, Durrani T (2020) NestFuse: an infrared and visible image fusion architecture based on nest connection and spatial/channel attention models. IEEE Trans Instrum Meas 69(12):9645–9656

    Article  Google Scholar 

  31. Guo K, Li XF, Hu XH, Liu JC, Fan TH (2021) Hahn-PCNN-CNN: an end-to-end multi-modal brain medical image fusion framework useful for clinical diagnosis. BMC Med Imaging 21:1–22

    Article  Google Scholar 

  32. Fu J, Li WS, Du J, Huang YP (2021) A multiscale residual pyramid attention network for medical image fusion. Biomed Signal Process Control 66:102488

    Article  Google Scholar 

  33. Zhao C, Wang TF, Lei BY (2021) Medical image fusion method based on dense block and deep convolutional generative adversarial network. Neural Comput Appl 33:6595–6610

    Article  Google Scholar 

  34. Wang J, Yu L, Tian SW (2022) MsRAN: a multi-scale residual attention network for multi-model image fusion. Med Biol Eng Compu 60:3615–3634

    Article  Google Scholar 

  35. Ma JY, Xu H, Jiang JJ, Mei XG, Zhang XP (2020) DDcGAN: a dual-discriminator conditional generative adversarial network for multi-resolution image fusion. IEEE Trans Image Process 29:4980–4995

    Article  Google Scholar 

  36. Fu J, Li WS, Du J, Xu LM (2021) DSAGAN: a generative adversarial network based on dual-stream attention mechanism for anatomical and functional image fusion. Inf Sci 576:484–506

    Article  MathSciNet  Google Scholar 

  37. Dosovitskiy A, Beyer L, Kolesnikov A, et al (2020) An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929

  38. Wang ZS, Chen YL, Shao WY, Li H, Zhang L (2022) SwinFuse: a residual swin transformer fusion network for infrared and visible images. IEEE Trans Instrum Meas 71:1–12

    Article  Google Scholar 

  39. Li J, Zhu JM, Li C, Chen X, Yang B (2022) CGTF: convolution-guided transformer for infrared and visible image fusion. IEEE Trans Instrum Meas 71:1–14

    Article  Google Scholar 

  40. Zhang J, Liu AP, Wang D, Liu Y, Wang ZJ, Chen X (2022) Transformer-based end-to-end anatomical and functional image fusion. IEEE Trans Instrum Meas 71:1–11

    Google Scholar 

  41. Tang W, He FZ, Liu Y, Duan YS (2022) MATR: multimodal medical image fusion via multiscale adaptive transformer. IEEE Trans Image Process 31:5134–5149

    Article  Google Scholar 

  42. Lin X, Ma L, Liu W, Chang SF (2020) Context-gated convolution. In: Vedaldi A, Bischof H, Brox T, Frahm JM (eds) Computer Vision—ECCV 2020. ECCV 2020. Lecture Notes in Computer Science, vol 12363. Springer, Cham

    Google Scholar 

  43. Yang J, Li C, Zhang P, Dai X, Xiao B, Yuan L, Gao J (2021) Focal self-attention for local-global interactions in vision transformers. arXiv preprint arXiv:2107.00641

  44. Vanitha K (2020) Medical image fusion algorithm based on weighted local energy motivated PAPCNN in NSST domain. J Adv Res Dyn Control Syst 12(SP3):960–967

    Article  Google Scholar 

  45. Xu H, Ma JY (2021) EMFusion: an unsupervised enhanced medical image fusion network. Inf Fusion 76:177–186

    Article  Google Scholar 

  46. Xu H, Ma JY, Jiang JJ, Guo XJ, Ling HB (2020) U2Fusion: a unified unsupervised image fusion network. IEEE Trans Pattern Anal Mach Intell 44(1):502–518

    Article  Google Scholar 

  47. Zhang Y, Xiang WH, Zhang SL, Shen JJ, Wei R, Bai XZ, Zhang L, Zhang Q (2022) Local extreme map guided multi-modal brain image fusion. Front Neurosci 16:1055451

    Article  Google Scholar 

  48. Ma J, Tang L, Fan F, Huang J, Mei X, Ma Y (2022) SwinFusion: cross-domain long-range learning for general image fusion via swin transformer. IEEE/CAA J Autom Sinica 9(7):1200–1217

    Article  Google Scholar 

  49. Kurban R (2023) Gaussian of differences: a simple and efficient general image fusion method. Entropy 25(8):1215

    Article  MathSciNet  Google Scholar 

  50. Tang LF, Zhang H, Xu H, Ma JY (2023) Deep learning-based image fusion: a survey. J Image Gr 28(1):3–36

    Google Scholar 

  51. Qu GH, Zhang DL, Yan PF (2002) Information measure for performance of image fusion. Electron Lett 38(7):313–315

    Article  Google Scholar 

  52. Sheikh HR, Bovik AC (2006) Image information and visual quality. IEEE Trans Image Process 15(2):430–444

    Article  Google Scholar 

  53. Cui GM, Feng HJ, Xu ZH, Li Q, Chen YT (2015) Detail preserved fusion of visible and infrared images using regional saliency extraction and multi-scale image decomposition. Optics Communications 341:199–209

    Article  Google Scholar 

  54. Xydeas CS, Petrovic V (2000) Objective image fusion performance measure. Military Technical Courier 56(4):181–193

    Google Scholar 

  55. Eskicioglu AM, Fisher PS (1995) Image quality measures and their performance. IEEE Trans Commun 43(12):2959–2965

    Article  Google Scholar 

  56. Kong WW, Miao QG, Liu RY, Lei Y, Cui J, Xie Q (2022) Multimodal medical image fusion using gradient domain guided filter random walk and side window filtering in framelet domain. Inf Sci 585:418–440

    Article  Google Scholar 

  57. Li XS, Zhou FQ, Tan HS, Zhang WN, Zhao CY (2021) Multimodal medical image fusion based on joint bilateral filter and local gradient energy. Inf Sci 569:302–325

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

The authors would like to thank the anonymous editors and reviewers for their valuable advice and help. This work was supported by the grant from the National Natural Science Foundation of China [No. 72071019], and grant from the Natural Science Foundation of Chongqing [No. cstc2021jcyj-msxmX0185].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lei Yu.

Ethics declarations

Competing of interests

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, W., Yu, L., Wang, H. et al. End-to-end dynamic residual focal transformer network for multimodal medical image fusion. Neural Comput & Applic (2024). https://doi.org/10.1007/s00521-024-09729-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00521-024-09729-4

Keywords

Navigation