MsRAN: a multi-scale residual attention network for multi-model image fusion

Wang, Jing; Yu, Long; Tian, Shengwei

doi:10.1007/s11517-022-02690-1

MsRAN: a multi-scale residual attention network for multi-model image fusion

Original Article
Published: 20 October 2022

Volume 60, pages 3615–3634, (2022)
Cite this article

Medical & Biological Engineering & Computing Aims and scope Submit manuscript

Jing Wang^1,2,
Long Yu^3,4 &
Shengwei Tian^1,2

422 Accesses
5 Citations
Explore all metrics

Abstract

Fusion is a critical step in image processing tasks. Recently, deep learning networks have been considerably applied in information fusion. But the significant limitation of existing image fusion methods is the inability to highlight typical regions of the source image and retain sufficient useful information. To address the problem, the paper proposes a multi-scale residual attention network (MsRAN) to fully exploit the image feature. Its generator network contains two information refinement networks and one information integration network. The information refinement network extracts feature at different scales using convolution kernels of different sizes. The information integration network, with a merging block and an attention block added, prevents the underutilization of information in the intermediate layers and forces the generator to focus on salient regions in multi-modal source images. Furthermore, in the phase of model training, we add an information loss function and adopt a dual adversarial structure, enabling the model to capture more details. Qualitative and quantitative experiments on publicly available datasets validate that the proposed method provides better visual results than other methods and retains more detail information.

Graphical abstract

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Coarse-to-fine multi-scale attention-guided network for multi-exposure image fusion

Article 02 June 2023

A unified image fusion framework with flexible bilevel paradigm integration

Article 25 August 2022

Learning Enriched Features for Real Image Restoration and Enhancement

Notes

References

Dogra A, Goyal B, Agrawal S (2017) From multi-scale decomposition to non-multi-scale decomposition methods: a comprehensive survey of image fusion techniques and its applications. IEEE Access 5:16040–16067
Article Google Scholar
Ma J, Ma Y, Li C (2019) Infrared and visible image fusion methods and applications: a survey. Information Fusion 45:153–178
Article Google Scholar
Li W, Peng X, Fu J, Wang G, Huang Y, Chao F (2022) A multiscale double-branch residual attention network for anatomical–functional medical image fusion. Comp Biol Med 141:105005
Article Google Scholar
Li Q, Lu L, Li Z, Wu W, Liu Z, Jeon G, Yang X (2019) Coupled GAN with relativistic discriminators for infrared and visible images fusion. IEEE Sensors J 21(6):7458–7467
Article Google Scholar
Li J et al (2019) Poisson reconstruction-based fusion of infrared and visible images via saliency detection. IEEE Access 7:20676–20688
Article Google Scholar
Xiang T, Yan Li, Gao R (2015) A fusion algorithm for infrared and visible images based on adaptive dual-channel unit-linking PCNN in NSCT domain. Infrared Phys Technol 69:53–61
Article Google Scholar
Naidu VPS (2011) Image fusion technique using multi-resolution singular value decomposition. Def Sci J 61(5):479
Article Google Scholar
Zhang Q et al (2018) Sparse representation based multi-sensor image fusion for multi-focus and multi-modality images: a review. Information Fusion 40:57–75
Article Google Scholar
Mou J, Gao W, Song Z (2013) Image fusion based on non-negative matrix factorization and infrared feature extraction. 2013 6th International Congress on Image and Signal Processing (CISP). Vol 2. IEEE
Yang Y et al (2020) Infrared and visible image fusion using visual saliency sparse representation and detail injection model. IEEE Trans Instrum Meas 70:1–15
Article Google Scholar
Singh S, Anand RS (2019) Multimodal medical image sensor fusion model using sparse K-SVD dictionary learning in nonsubsampled shearlet domain. IEEE Trans Instrum Meas 69(2):593–607
Article Google Scholar
Ma J et al (2020) DDcGAN: A dual-discriminator conditional generative adversarial network for multi-resolution image fusion. IEEE Trans Image Process 29:4980–4995
Article Google Scholar
Liu Y et al (2018) Deep learning for pixel-level image fusion: recent advances and future prospects. Information Fusion 42:158–173
Article Google Scholar
Xu H, Liang P, Yu W, Jiang J, Ma J (2019) Learning a generative model for fusing infrared and visible images via conditional generative adversarial network with dual discriminators. In: IJCAI, pp 3954–3960
Goodfellow I (2016) Nips 2016 tutorial: generative adversarial networks. arXiv preprint arXiv:1701.00160
Ma J et al (2019) FusionGAN: a generative adversarial network for infrared and visible image fusion. Information Fusion 48:11–26
Article CAS Google Scholar
Ma J et al (2020) Infrared and visible image fusion via detail preserving adversarial learning. Information Fusion 54:85–98
Article Google Scholar
Xu X (2020) Multifocus image fusion algorithm based on rough set and neural network. IEEE Sensors J 99:1–1
Google Scholar
Vlamou E, Papadopoulos B (2019) Fuzzy logic systems and medical applications. AIMS Neuroscience 6(4):266–272
Article PubMed PubMed Central Google Scholar
Liu Y et al (2017) A medical image fusion method based on convolutional neural networks. 2017 20th international conference on information fusion (Fusion). IEEE
Li X, Zhang X, Ding M (2019) A sum-modified-Laplacian and sparse representation based multimodal medical image fusion in Laplacian pyramid domain. Med Biol Eng Compu 57(10):2265–2275
Article Google Scholar
Liu S et al (2019) Multi-focus image fusion based on residual network in non-subsampled shearlet domain. IEEE Access 7:152043–152063
Article Google Scholar
Huang J et al (2020) MGMDcGAN: medical image fusion using multi-generator multi-discriminator conditional generative adversarial network. IEEE Access 99:1–1
Google Scholar
Chan W et al (2016) Listen, attend and spell: a neural network for large vocabulary conversational speech recognition. 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE
Galassi A, Lippi M, Torroni P (2020) Attention in natural language processing. IEEE Trans Neural Netw Learn Syst 32(10):4291–4308
Article Google Scholar
Xu K et al (2015) Show, attend and tell: neural image caption generation with visual attention. International conference on machine learning. PMLR
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141
Woo S et al (2018) Cbam: Convolutional block attention module. Proceedings of the European conference on computer vision (ECCV)
Zhao B et al (2017) Diversified visual attention networks for fine-grained object classification. IEEE Trans Multimedia 19(6):1245–1256
Article Google Scholar
Wang F et al (2017) Residual attention network for image classification. Proceedings of the IEEE conference on computer vision and pattern recognition
Newell A, Yang K, Deng J (2016) Stacked hourglass networks for human pose estimation. European conference on computer vision. Springer, Cham
Yan Q et al (2019) Attention-guided network for ghost-free high dynamic range imaging. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Ganasala P, Kumar V, Prasad A D (2016) Performance evaluation of color models in the fusion of functional and anatomical images. J Med Syst 40(5):122
Article PubMed Google Scholar
Roberts JW, Van Aardt JA, Ahmed FB (2008) Assessment of image fusion procedures using entropy, image quality, and multispectral classification. J Appl Remote Sens 2(1):023522
Article Google Scholar
Han Y et al (2013) A new image fusion performance metric based on visual information fidelity. Information Fusion 14(2):127–135
Article Google Scholar
Wang Z et al (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612
Article PubMed Google Scholar
Naidu VPS (2014) Hybrid DDCT-PCA based multi sensor image fusion. J Opt 43(1):48–61
Article Google Scholar
Yin M et al (2018) Medical image fusion with parameter-adaptive pulse coupled neural network in nonsubsampled shearlet transform domain. IEEE Trans Instrum Meas 68(1):49–64
Article Google Scholar
Lewis JJ et al (2007) Pixel-and region-based image fusion with complex wavelets. Information Fusion 8(2):119–130
Article Google Scholar
Li J et al (2020) Multigrained attention network for infrared and visible image fusion. IEEE Trans Instrum Meas 70:1–12
Google Scholar

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China 61962057, Autonomous Region Key R&D Project 2021B01002, the National Natural Science Foundation of China under Grant U2003208.

Author information

Authors and Affiliations

College of Software Engineering, Xin Jiang University, Urumqi, 830000, China
Jing Wang & Shengwei Tian
Key Laboratory of Software Engineering Technology, Xinjiang University, Urumqi, China
Jing Wang & Shengwei Tian
College of Information Science and Engineering, Xinjiang University, Urumqi, 830000, China
Long Yu
College of Network Center, Xinjiang University, Urumqi, 830000, China
Long Yu

Authors

Jing Wang
View author publications
You can also search for this author in PubMed Google Scholar
Long Yu
View author publications
You can also search for this author in PubMed Google Scholar
Shengwei Tian
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Long Yu.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

1.1 Statistical test for ablation study

In deep learning methods, there are a number of metrics that can be used to evaluate models and thus provide beneficial assistance in their selection. However, when several models have similar accuracy, measuring metrics alone is insufficient to assess the model performance, and it is necessary to combine with other methods. At this point, in order to further test the superiority of the model, the method of statistical hypothesis testing is used.

The statistical test is used to evaluate the methods adopted in the ablation experiments in Sections 4.1 and 4.2, which can validate the performance of our model. We employ the methods in Table 1 on image pairs from the Harvard dataset and the TNO dataset for detection and identification. The T test is used to compare whether there is any difference between the mean values of the two models on the metric of EN. When the p value is less than 0.001, the null hypothesis is rejected; i.e., the mean values taken by the two models on the EN metrics are significantly different. In contrast, the F test method is used to compare the variance of the values taken by these samples on the metric of EN, and when the p value is more than 0.001, the original hypothesis is accepted; i.e., the experimental results of the two models have the same stability.

Table 2 Statistical hypothesis testing of methods in ablation experiments

Full size table

As shown in Table 2, the p values in the T test are all lower than 0.001 and the p values in the F test are all greater than 0.001, indicating that our data are statistically significant. In addition, as shown in Fig. 12, our proposed method obtains the best results of the EN on both datasets, which further indicates the superiority of our models.

1.2 Ablation study on batch size

The batch size is a very important hyperparameter that affects the training speed and the accuracy of the model. Small batch sizes slow down the network convergence. Larger sizes can accelerate the training speed. But using too large batch sizes have a negative impact on the accuracy and may reduce the generalization ability of the model. In order to determine the value of batch size in our network, we first set the batch size to 24 according to DDCGAN [12]. As shown in Fig. 13, when the batch size is set to 24, it does not retain enough texture information. Then, we set it to 16, 18, 20, and 24, respectively, and select the best result from them. As shown in Figs. 13 and 14, when the batch size is set to 20, the network convergence accuracy is the highest, and the experimental effect is the best.

1.3 Ablation study on optimizer

Currently, there is little mention of the impact of optimizer selection on network performance in multi-modal image fusion tasks. Through extensive experiments, it can be found that the quality of fusion results is also affected to some extent by the optimizer.

Firstly, we use the AdamOptimizer proposed by Li et al. in both the generator and the discriminator [40], but the training time is too long, and the fusion results are not favorable. Then, we set the generator and discriminator networks to different optimizers according to DDCGAN [12]. The AdamOptimizer and SGDOptimizer are employed to update the network parameters of the generator and discriminator, because the former can automatically adjust the learning rate and its parameter updates are not affected by the gradients, and the latter can be computed and converged quickly. Subsequently, a large number of experiments are conducted with different combinations of the AdamOptimizer and SGDOptimizer in both the generator network and the discriminator network. Finally, it is concluded that the network using SGDOptimizer in the generator and AdamOptimizer in the discriminator not only achieves the best fusion results but also takes the shortest time.

As shown in Figs. 15 and 16, we conduct qualitative and quantitative experiments with different optimizer combinations, confirming the superiority of our proposed optimizer combinations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wang, J., Yu, L. & Tian, S. MsRAN: a multi-scale residual attention network for multi-model image fusion. Med Biol Eng Comput 60, 3615–3634 (2022). https://doi.org/10.1007/s11517-022-02690-1

Download citation

Received: 23 December 2021
Accepted: 02 October 2022
Published: 20 October 2022
Issue Date: December 2022
DOI: https://doi.org/10.1007/s11517-022-02690-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MsRAN: a multi-scale residual attention network for multi-model image fusion