Abstract
Fusion is a critical step in image processing tasks. Recently, deep learning networks have been considerably applied in information fusion. But the significant limitation of existing image fusion methods is the inability to highlight typical regions of the source image and retain sufficient useful information. To address the problem, the paper proposes a multi-scale residual attention network (MsRAN) to fully exploit the image feature. Its generator network contains two information refinement networks and one information integration network. The information refinement network extracts feature at different scales using convolution kernels of different sizes. The information integration network, with a merging block and an attention block added, prevents the underutilization of information in the intermediate layers and forces the generator to focus on salient regions in multi-modal source images. Furthermore, in the phase of model training, we add an information loss function and adopt a dual adversarial structure, enabling the model to capture more details. Qualitative and quantitative experiments on publicly available datasets validate that the proposed method provides better visual results than other methods and retains more detail information.
Graphical abstract
Similar content being viewed by others
References
Dogra A, Goyal B, Agrawal S (2017) From multi-scale decomposition to non-multi-scale decomposition methods: a comprehensive survey of image fusion techniques and its applications. IEEE Access 5:16040–16067
Ma J, Ma Y, Li C (2019) Infrared and visible image fusion methods and applications: a survey. Information Fusion 45:153–178
Li W, Peng X, Fu J, Wang G, Huang Y, Chao F (2022) A multiscale double-branch residual attention network for anatomical–functional medical image fusion. Comp Biol Med 141:105005
Li Q, Lu L, Li Z, Wu W, Liu Z, Jeon G, Yang X (2019) Coupled GAN with relativistic discriminators for infrared and visible images fusion. IEEE Sensors J 21(6):7458–7467
Li J et al (2019) Poisson reconstruction-based fusion of infrared and visible images via saliency detection. IEEE Access 7:20676–20688
Xiang T, Yan Li, Gao R (2015) A fusion algorithm for infrared and visible images based on adaptive dual-channel unit-linking PCNN in NSCT domain. Infrared Phys Technol 69:53–61
Naidu VPS (2011) Image fusion technique using multi-resolution singular value decomposition. Def Sci J 61(5):479
Zhang Q et al (2018) Sparse representation based multi-sensor image fusion for multi-focus and multi-modality images: a review. Information Fusion 40:57–75
Mou J, Gao W, Song Z (2013) Image fusion based on non-negative matrix factorization and infrared feature extraction. 2013 6th International Congress on Image and Signal Processing (CISP). Vol 2. IEEE
Yang Y et al (2020) Infrared and visible image fusion using visual saliency sparse representation and detail injection model. IEEE Trans Instrum Meas 70:1–15
Singh S, Anand RS (2019) Multimodal medical image sensor fusion model using sparse K-SVD dictionary learning in nonsubsampled shearlet domain. IEEE Trans Instrum Meas 69(2):593–607
Ma J et al (2020) DDcGAN: A dual-discriminator conditional generative adversarial network for multi-resolution image fusion. IEEE Trans Image Process 29:4980–4995
Liu Y et al (2018) Deep learning for pixel-level image fusion: recent advances and future prospects. Information Fusion 42:158–173
Xu H, Liang P, Yu W, Jiang J, Ma J (2019) Learning a generative model for fusing infrared and visible images via conditional generative adversarial network with dual discriminators. In: IJCAI, pp 3954–3960
Goodfellow I (2016) Nips 2016 tutorial: generative adversarial networks. arXiv preprint arXiv:1701.00160
Ma J et al (2019) FusionGAN: a generative adversarial network for infrared and visible image fusion. Information Fusion 48:11–26
Ma J et al (2020) Infrared and visible image fusion via detail preserving adversarial learning. Information Fusion 54:85–98
Xu X (2020) Multifocus image fusion algorithm based on rough set and neural network. IEEE Sensors J 99:1–1
Vlamou E, Papadopoulos B (2019) Fuzzy logic systems and medical applications. AIMS Neuroscience 6(4):266–272
Liu Y et al (2017) A medical image fusion method based on convolutional neural networks. 2017 20th international conference on information fusion (Fusion). IEEE
Li X, Zhang X, Ding M (2019) A sum-modified-Laplacian and sparse representation based multimodal medical image fusion in Laplacian pyramid domain. Med Biol Eng Compu 57(10):2265–2275
Liu S et al (2019) Multi-focus image fusion based on residual network in non-subsampled shearlet domain. IEEE Access 7:152043–152063
Huang J et al (2020) MGMDcGAN: medical image fusion using multi-generator multi-discriminator conditional generative adversarial network. IEEE Access 99:1–1
Chan W et al (2016) Listen, attend and spell: a neural network for large vocabulary conversational speech recognition. 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE
Galassi A, Lippi M, Torroni P (2020) Attention in natural language processing. IEEE Trans Neural Netw Learn Syst 32(10):4291–4308
Xu K et al (2015) Show, attend and tell: neural image caption generation with visual attention. International conference on machine learning. PMLR
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141
Woo S et al (2018) Cbam: Convolutional block attention module. Proceedings of the European conference on computer vision (ECCV)
Zhao B et al (2017) Diversified visual attention networks for fine-grained object classification. IEEE Trans Multimedia 19(6):1245–1256
Wang F et al (2017) Residual attention network for image classification. Proceedings of the IEEE conference on computer vision and pattern recognition
Newell A, Yang K, Deng J (2016) Stacked hourglass networks for human pose estimation. European conference on computer vision. Springer, Cham
Yan Q et al (2019) Attention-guided network for ghost-free high dynamic range imaging. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Ganasala P, Kumar V, Prasad A D (2016) Performance evaluation of color models in the fusion of functional and anatomical images. J Med Syst 40(5):122
Roberts JW, Van Aardt JA, Ahmed FB (2008) Assessment of image fusion procedures using entropy, image quality, and multispectral classification. J Appl Remote Sens 2(1):023522
Han Y et al (2013) A new image fusion performance metric based on visual information fidelity. Information Fusion 14(2):127–135
Wang Z et al (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612
Naidu VPS (2014) Hybrid DDCT-PCA based multi sensor image fusion. J Opt 43(1):48–61
Yin M et al (2018) Medical image fusion with parameter-adaptive pulse coupled neural network in nonsubsampled shearlet transform domain. IEEE Trans Instrum Meas 68(1):49–64
Lewis JJ et al (2007) Pixel-and region-based image fusion with complex wavelets. Information Fusion 8(2):119–130
Li J et al (2020) Multigrained attention network for infrared and visible image fusion. IEEE Trans Instrum Meas 70:1–12
Acknowledgements
This work was supported by the National Natural Science Foundation of China 61962057, Autonomous Region Key R&D Project 2021B01002, the National Natural Science Foundation of China under Grant U2003208.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
1.1 Statistical test for ablation study
In deep learning methods, there are a number of metrics that can be used to evaluate models and thus provide beneficial assistance in their selection. However, when several models have similar accuracy, measuring metrics alone is insufficient to assess the model performance, and it is necessary to combine with other methods. At this point, in order to further test the superiority of the model, the method of statistical hypothesis testing is used.
The statistical test is used to evaluate the methods adopted in the ablation experiments in Sections 4.1 and 4.2, which can validate the performance of our model. We employ the methods in Table 1 on image pairs from the Harvard dataset and the TNO dataset for detection and identification. The T test is used to compare whether there is any difference between the mean values of the two models on the metric of EN. When the p value is less than 0.001, the null hypothesis is rejected; i.e., the mean values taken by the two models on the EN metrics are significantly different. In contrast, the F test method is used to compare the variance of the values taken by these samples on the metric of EN, and when the p value is more than 0.001, the original hypothesis is accepted; i.e., the experimental results of the two models have the same stability.
As shown in Table 2, the p values in the T test are all lower than 0.001 and the p values in the F test are all greater than 0.001, indicating that our data are statistically significant. In addition, as shown in Fig. 12, our proposed method obtains the best results of the EN on both datasets, which further indicates the superiority of our models.
1.2 Ablation study on batch size
The batch size is a very important hyperparameter that affects the training speed and the accuracy of the model. Small batch sizes slow down the network convergence. Larger sizes can accelerate the training speed. But using too large batch sizes have a negative impact on the accuracy and may reduce the generalization ability of the model. In order to determine the value of batch size in our network, we first set the batch size to 24 according to DDCGAN [12]. As shown in Fig. 13, when the batch size is set to 24, it does not retain enough texture information. Then, we set it to 16, 18, 20, and 24, respectively, and select the best result from them. As shown in Figs. 13 and 14, when the batch size is set to 20, the network convergence accuracy is the highest, and the experimental effect is the best.
1.3 Ablation study on optimizer
Currently, there is little mention of the impact of optimizer selection on network performance in multi-modal image fusion tasks. Through extensive experiments, it can be found that the quality of fusion results is also affected to some extent by the optimizer.
Firstly, we use the AdamOptimizer proposed by Li et al. in both the generator and the discriminator [40], but the training time is too long, and the fusion results are not favorable. Then, we set the generator and discriminator networks to different optimizers according to DDCGAN [12]. The AdamOptimizer and SGDOptimizer are employed to update the network parameters of the generator and discriminator, because the former can automatically adjust the learning rate and its parameter updates are not affected by the gradients, and the latter can be computed and converged quickly. Subsequently, a large number of experiments are conducted with different combinations of the AdamOptimizer and SGDOptimizer in both the generator network and the discriminator network. Finally, it is concluded that the network using SGDOptimizer in the generator and AdamOptimizer in the discriminator not only achieves the best fusion results but also takes the shortest time.
As shown in Figs. 15 and 16, we conduct qualitative and quantitative experiments with different optimizer combinations, confirming the superiority of our proposed optimizer combinations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wang, J., Yu, L. & Tian, S. MsRAN: a multi-scale residual attention network for multi-model image fusion. Med Biol Eng Comput 60, 3615–3634 (2022). https://doi.org/10.1007/s11517-022-02690-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11517-022-02690-1