Skip to main content
Log in

MsRAN: a multi-scale residual attention network for multi-model image fusion

  • Original Article
  • Published:
Medical & Biological Engineering & Computing Aims and scope Submit manuscript

Abstract

Fusion is a critical step in image processing tasks. Recently, deep learning networks have been considerably applied in information fusion. But the significant limitation of existing image fusion methods is the inability to highlight typical regions of the source image and retain sufficient useful information. To address the problem, the paper proposes a multi-scale residual attention network (MsRAN) to fully exploit the image feature. Its generator network contains two information refinement networks and one information integration network. The information refinement network extracts feature at different scales using convolution kernels of different sizes. The information integration network, with a merging block and an attention block added, prevents the underutilization of information in the intermediate layers and forces the generator to focus on salient regions in multi-modal source images. Furthermore, in the phase of model training, we add an information loss function and adopt a dual adversarial structure, enabling the model to capture more details. Qualitative and quantitative experiments on publicly available datasets validate that the proposed method provides better visual results than other methods and retains more detail information.

Graphical abstract

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. http://www.med.harvard.edu/AANLIB/home.html

  2. https://figshare.com/articles/TNO_Image_Fusion_Dataset/100802

References

  1. Dogra A, Goyal B, Agrawal S (2017) From multi-scale decomposition to non-multi-scale decomposition methods: a comprehensive survey of image fusion techniques and its applications. IEEE Access 5:16040–16067

    Article  Google Scholar 

  2. Ma J, Ma Y, Li C (2019) Infrared and visible image fusion methods and applications: a survey. Information Fusion 45:153–178

    Article  Google Scholar 

  3. Li W, Peng X, Fu J, Wang G, Huang Y, Chao F (2022) A multiscale double-branch residual attention network for anatomical–functional medical image fusion. Comp Biol Med 141:105005

    Article  Google Scholar 

  4. Li Q, Lu L, Li Z, Wu W, Liu Z, Jeon G, Yang X (2019) Coupled GAN with relativistic discriminators for infrared and visible images fusion. IEEE Sensors J 21(6):7458–7467

    Article  Google Scholar 

  5. Li J et al (2019) Poisson reconstruction-based fusion of infrared and visible images via saliency detection. IEEE Access 7:20676–20688

    Article  Google Scholar 

  6. Xiang T, Yan Li, Gao R (2015) A fusion algorithm for infrared and visible images based on adaptive dual-channel unit-linking PCNN in NSCT domain. Infrared Phys Technol 69:53–61

    Article  Google Scholar 

  7. Naidu VPS (2011) Image fusion technique using multi-resolution singular value decomposition. Def Sci J 61(5):479

    Article  Google Scholar 

  8. Zhang Q et al (2018) Sparse representation based multi-sensor image fusion for multi-focus and multi-modality images: a review. Information Fusion 40:57–75

    Article  Google Scholar 

  9. Mou J, Gao W, Song Z (2013) Image fusion based on non-negative matrix factorization and infrared feature extraction. 2013 6th International Congress on Image and Signal Processing (CISP). Vol 2. IEEE

  10. Yang Y et al (2020) Infrared and visible image fusion using visual saliency sparse representation and detail injection model. IEEE Trans Instrum Meas 70:1–15

    Article  Google Scholar 

  11. Singh S, Anand RS (2019) Multimodal medical image sensor fusion model using sparse K-SVD dictionary learning in nonsubsampled shearlet domain. IEEE Trans Instrum Meas 69(2):593–607

    Article  Google Scholar 

  12. Ma J et al (2020) DDcGAN: A dual-discriminator conditional generative adversarial network for multi-resolution image fusion. IEEE Trans Image Process 29:4980–4995

    Article  Google Scholar 

  13. Liu Y et al (2018) Deep learning for pixel-level image fusion: recent advances and future prospects. Information Fusion 42:158–173

    Article  Google Scholar 

  14. Xu H, Liang P, Yu W, Jiang J, Ma J (2019) Learning a generative model for fusing infrared and visible images via conditional generative adversarial network with dual discriminators. In: IJCAI, pp 3954–3960

  15. Goodfellow I (2016) Nips 2016 tutorial: generative adversarial networks. arXiv preprint arXiv:1701.00160

  16. Ma J et al (2019) FusionGAN: a generative adversarial network for infrared and visible image fusion. Information Fusion 48:11–26

    Article  CAS  Google Scholar 

  17. Ma J et al (2020) Infrared and visible image fusion via detail preserving adversarial learning. Information Fusion 54:85–98

    Article  Google Scholar 

  18. Xu X (2020) Multifocus image fusion algorithm based on rough set and neural network. IEEE Sensors J 99:1–1

    Google Scholar 

  19. Vlamou E, Papadopoulos B (2019) Fuzzy logic systems and medical applications. AIMS Neuroscience 6(4):266–272

    Article  PubMed  PubMed Central  Google Scholar 

  20. Liu Y et al (2017) A medical image fusion method based on convolutional neural networks. 2017 20th international conference on information fusion (Fusion). IEEE

  21. Li X, Zhang X, Ding M (2019) A sum-modified-Laplacian and sparse representation based multimodal medical image fusion in Laplacian pyramid domain. Med Biol Eng Compu 57(10):2265–2275

    Article  Google Scholar 

  22. Liu S et al (2019) Multi-focus image fusion based on residual network in non-subsampled shearlet domain. IEEE Access 7:152043–152063

    Article  Google Scholar 

  23. Huang J et al (2020) MGMDcGAN: medical image fusion using multi-generator multi-discriminator conditional generative adversarial network. IEEE Access 99:1–1

    Google Scholar 

  24. Chan W et al (2016) Listen, attend and spell: a neural network for large vocabulary conversational speech recognition. 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE

  25. Galassi A, Lippi M, Torroni P (2020) Attention in natural language processing. IEEE Trans Neural Netw Learn Syst 32(10):4291–4308

    Article  Google Scholar 

  26. Xu K et al (2015) Show, attend and tell: neural image caption generation with visual attention. International conference on machine learning. PMLR

  27. Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141

  28. Woo S et al (2018) Cbam: Convolutional block attention module. Proceedings of the European conference on computer vision (ECCV)

  29. Zhao B et al (2017) Diversified visual attention networks for fine-grained object classification. IEEE Trans Multimedia 19(6):1245–1256

    Article  Google Scholar 

  30. Wang F et al (2017) Residual attention network for image classification. Proceedings of the IEEE conference on computer vision and pattern recognition

  31. Newell A, Yang K, Deng J (2016) Stacked hourglass networks for human pose estimation. European conference on computer vision. Springer, Cham

  32. Yan Q et al (2019) Attention-guided network for ghost-free high dynamic range imaging. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

  33. Ganasala P, Kumar V, Prasad A D (2016) Performance evaluation of color models in the fusion of functional and anatomical images. J Med Syst 40(5):122

    Article  PubMed  Google Scholar 

  34. Roberts JW, Van Aardt JA, Ahmed FB (2008) Assessment of image fusion procedures using entropy, image quality, and multispectral classification. J Appl Remote Sens 2(1):023522

    Article  Google Scholar 

  35. Han Y et al (2013) A new image fusion performance metric based on visual information fidelity. Information Fusion 14(2):127–135

    Article  Google Scholar 

  36. Wang Z et al (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612

    Article  PubMed  Google Scholar 

  37. Naidu VPS (2014) Hybrid DDCT-PCA based multi sensor image fusion. J Opt 43(1):48–61

    Article  Google Scholar 

  38. Yin M et al (2018) Medical image fusion with parameter-adaptive pulse coupled neural network in nonsubsampled shearlet transform domain. IEEE Trans Instrum Meas 68(1):49–64

    Article  Google Scholar 

  39. Lewis JJ et al (2007) Pixel-and region-based image fusion with complex wavelets. Information Fusion 8(2):119–130

    Article  Google Scholar 

  40. Li J et al (2020) Multigrained attention network for infrared and visible image fusion. IEEE Trans Instrum Meas 70:1–12

    Google Scholar 

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China 61962057, Autonomous Region Key R&D Project 2021B01002, the National Natural Science Foundation of China under Grant U2003208.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Long Yu.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

1.1 Statistical test for ablation study

In deep learning methods, there are a number of metrics that can be used to evaluate models and thus provide beneficial assistance in their selection. However, when several models have similar accuracy, measuring metrics alone is insufficient to assess the model performance, and it is necessary to combine with other methods. At this point, in order to further test the superiority of the model, the method of statistical hypothesis testing is used.

The statistical test is used to evaluate the methods adopted in the ablation experiments in Sections 4.1 and 4.2, which can validate the performance of our model. We employ the methods in Table 1 on image pairs from the Harvard dataset and the TNO dataset for detection and identification. The T test is used to compare whether there is any difference between the mean values of the two models on the metric of EN. When the p value is less than 0.001, the null hypothesis is rejected; i.e., the mean values taken by the two models on the EN metrics are significantly different. In contrast, the F test method is used to compare the variance of the values taken by these samples on the metric of EN, and when the p value is more than 0.001, the original hypothesis is accepted; i.e., the experimental results of the two models have the same stability.

Table 2 Statistical hypothesis testing of methods in ablation experiments
Fig. 12
figure 12

Statistical distributions of methods in ablation experiments on the metric of EN

As shown in Table 2, the p values in the T test are all lower than 0.001 and the p values in the F test are all greater than 0.001, indicating that our data are statistically significant. In addition, as shown in Fig. 12, our proposed method obtains the best results of the EN on both datasets, which further indicates the superiority of our models.

1.2 Ablation study on batch size

The batch size is a very important hyperparameter that affects the training speed and the accuracy of the model. Small batch sizes slow down the network convergence. Larger sizes can accelerate the training speed. But using too large batch sizes have a negative impact on the accuracy and may reduce the generalization ability of the model. In order to determine the value of batch size in our network, we first set the batch size to 24 according to DDCGAN [12]. As shown in Fig. 13, when the batch size is set to 24, it does not retain enough texture information. Then, we set it to 16, 18, 20, and 24, respectively, and select the best result from them. As shown in Figs. 13 and 14, when the batch size is set to 20, the network convergence accuracy is the highest, and the experimental effect is the best.

Fig. 13
figure 13

Our MsRAN performs batch size ablation experiments on infrared and visible light images. From top to bottom: infrared images, visible light images, method with a batch size of 16 (B_s = 16), method with a batch size of 16 (B_s = 18), our method with a batch size of 20 (B_s = 20), and method with a batch size of 24 (B_s = 24). Choose a typical region and zoom in at the bottom

Fig. 14
figure 14

A quantitative comparison of the batch size ablation experiments on 20 image pairs from the TNO dataset

1.3 Ablation study on optimizer

Currently, there is little mention of the impact of optimizer selection on network performance in multi-modal image fusion tasks. Through extensive experiments, it can be found that the quality of fusion results is also affected to some extent by the optimizer.

Firstly, we use the AdamOptimizer proposed by Li et al. in both the generator and the discriminator [40], but the training time is too long, and the fusion results are not favorable. Then, we set the generator and discriminator networks to different optimizers according to DDCGAN [12]. The AdamOptimizer and SGDOptimizer are employed to update the network parameters of the generator and discriminator, because the former can automatically adjust the learning rate and its parameter updates are not affected by the gradients, and the latter can be computed and converged quickly. Subsequently, a large number of experiments are conducted with different combinations of the AdamOptimizer and SGDOptimizer in both the generator network and the discriminator network. Finally, it is concluded that the network using SGDOptimizer in the generator and AdamOptimizer in the discriminator not only achieves the best fusion results but also takes the shortest time.

As shown in Figs. 15 and 16, we conduct qualitative and quantitative experiments with different optimizer combinations, confirming the superiority of our proposed optimizer combinations.

Fig. 15
figure 15

We use a combination of four optimizers to perform qualitative comparison on four pairs of typical PET and MR images. The typical regions are selected and magnified at the bottom

Fig. 16
figure 16

A quantitative comparison of our proposed combination of optimizers with three other combinations of optimizer approaches is presented in the Harvard dataset

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, J., Yu, L. & Tian, S. MsRAN: a multi-scale residual attention network for multi-model image fusion. Med Biol Eng Comput 60, 3615–3634 (2022). https://doi.org/10.1007/s11517-022-02690-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11517-022-02690-1

Keywords

Navigation