Abstract
How to accurately extract the effective information and profile features of source images has been a difficult problem in the domain of infrared and visible image fusion. The existing fusion algorithms lack scientific decision-making in feature information extraction and fusion weight allocation, which can easily result in the loss of information or redundancy and affect the fusion effect. Therefore, this paper proposes a dual U-Net network based on the attention mechanism to realize the fusion of infrared and visible images. The method integrates the feature information of two different structural U-Net branches. The inner nested connected network extracts and fuses multi-scale infrared and visible features, while the attention mechanism is introduced in the decoding block to enhance the model’s attention to the significant features of the image to generate rich semantic information. The outer U-Net network extracts the depth features of the fused image, enhances the contour features of the fused image, and improves the retention of spatial feature. Comparison with six existing fusion methods on a public dataset shows that the algorithm in this paper is superior in both subjective vision and objective evaluation.
Similar content being viewed by others
Data Availability
Data available on request from the authors.
References
Hong C, Yu J, Tao D, Wang M (2014) Image-based three-dimensional human pose recovery by multiview locality-sensitive sparse retrieval. IEEE Trans Ind Electron 62(6):3742–3751
Yu J, Tao D, Wang M, Rui Y (2014) Learning to rank using user clicks and visual features for image retrieval. IEEE Trans Cybern 45(4):767–779
Yu J, Tan M, Zhang H, Rui Y, Tao D (2019) Hierarchical deep click feature prediction for fine-grained image recognition. IEEE Trans Pattern Anal Mach Intell 44(2):563–578
Hong C, Yu J, Wan J, Tao D, Wang M (2015) Multimodal deep autoencoder for human pose recovery. IEEE Trans Image Process 24(12):5659–5670
Yu N, Li J, Hua Z (2022) Decolorization algorithm based on contrast pyramid transform fusion. Multimed Tools Appl 1–23
Huang L, Dai S, Huang T, Huang X, Wang H (2021) Infrared small target segmentation with multiscale feature representation. Infrared Phys Technol 116:103755
Qiao W, Yang Z (2020) Forecast the electricity price of us using a wavelet transform-based hybrid model. Energy 193:116704
Zhu Z, Yin H, Chai Y, Li Y, Qi G (2018) A novel multi-modality image fusion method based on image decomposition and sparse representation. Inf Sci 432:516–529
Zhang Q, Shi T, Wang F, Blum RS, Han J (2018) Robust sparse representation based multi-focus image fusion with dictionary construction and local spatial consistency. Pattern Recognit 83:299–313
Hong C, Yu J, Zhang J, Jin X, Lee KH (2018) Multimodal face-pose estimation with multitask manifold deep learning. IEEE Trans Ind Inf 15(7):3952–3961
Li Z, Li J, Zhang F, Fan L (2023) Cadui: cross-attention-based depth unfolding iteration network for pansharpening remote sensing images. IEEE Trans Geosci Remote Sensing
Hssayni EH, Joudar NE, Ettaouil M (2022a) A deep learning framework for time series classification using normal cloud representation and convolutional neural network optimization. Comput Intell 38(6):2056–2074
Hssayni Eh, Joudar NE, Ettaouil M (2022b) Localization and reduction of redundancy in cnn using l 1-sparsity induction. J Ambient Intell Humaniz Comput 1–13
Liu Y, Chen X, Peng H, Wang Z (2017) Multi-focus image fusion with a deep convolutional neural network. Inf Fusion 36:191–207
Zhang Y, Liu Y, Sun P, Yan H, Zhao X, Zhang L (2020) Ifcnn: a general image fusion framework based on convolutional neural network. Inf Fusion 54:99–118
Zhu Z, Li D, Hu Y, Li J, Liu D, Li J (2021) Indoor scene segmentation algorithm based on full convolutional neural network. Neural Comput Appl 33(14):8261–8273
Guo R, Xj Shen, Xy Dong, Xl Zhang (2020) Multi-focus image fusion based on fully convolutional networks. Front Inf Technol Electron 21(7):1019–1033
Feng Y, Lu H, Bai J, Cao L, Yin H (2020) Fully convolutional network-based infrared and visible image fusion. Multimed Tools Appl 79(21):15001–15014
Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention, Springer, pp 234–241
Xiao B, Xu B, Bi X, Li W (2020) Global-feature encoding u-net (geu-net) for multi-focus image fusion. IEEE Trans Image Process 30:163–175
Zhou Z, Siddiquee MMR, Tajbakhsh N, Liang J (2018) Unet++: a nested u-net architecture for medical image segmentation. In: Deep learning in medical image analysis and multimodal learning for clinical decision support, Springer, pp 3–11
Li H, Wu XJ, Durrani T (2020) Nestfuse: an infrared and visible image fusion architecture based on nest connection and spatial/channel attention models. IEEE Trans Instrum Meas 69(12):9645–9656
Su X, Li J, Hua Z (2022) Transformer-based regression network for pansharpening remote sensing images. IEEE Trans Geosci Remote Sens 60:1–23
Chen J, Li X, Luo L, Mei X, Ma J (2020) Infrared and visible image fusion based on target-enhanced multiscale transform decomposition. Inf Sci 508:64–78
Mo Y, Kang X, Duan P, Sun B, Li S (2021) Attribute filter based infrared and visible image fusion. Inf Fusion 75:41–54
Zhan L, Zhuang Y, Huang L (2017) Infrared and visible images fusion method based on discrete wavelet transform. J Comput 28(2):57–71
Aghamaleki JA, Ghorbani A (2023) Image fusion using dual tree discrete wavelet transform and weights optimization. Vis Comput 39(3):1181–1191
Li S, Yin H, Fang L (2013) Remote sensing image fusion via sparse representations over learned dictionaries. IEEE Trans Geosci Remote Sens 51(9):4779–4789
Liu Y, Chen X, Ward RK, Wang ZJ (2016) Image fusion with convolutional sparse representation. IEEE Signal Process Lett 23(12):1882–1886
Ma J, Yu W, Liang P, Li C, Jiang J (2019) Fusiongan: a generative adversarial network for infrared and visible image fusion. Inf Fusion 48:11–26
Li J, Huo H, Li C, Wang R, Feng Q (2020) Attentionfgan: infrared and visible image fusion using attention-based generative adversarial networks. IEEE Trans Multimed 23:1383–1396
Zhang H, Yuan J, Tian X, Ma J (2021) Gan-fm: infrared and visible image fusion using gan with full-scale skip connection and dual markovian discriminators. IEEE Trans Comput Imaging 7:1134–1147
Li H, Wu XJ, Kittler J (2018) Infrared and visible image fusion using a deep learning framework. In: 2018 24th International conference on pattern recognition (ICPR), IEEE, pp 2705–2710
Li H, Wu XJ, Durrani TS (2019) Infrared and visible image fusion with resnet and zero-phase component analysis. Infrared Phys Technol 102:103039
Li H, Wu XJ (2018) Densefuse: a fusion approach to infrared and visible images. IEEE Trans Image Process 28(5):2614–2623
Hou R, Zhou D, Nie R, Liu D, Xiong L, Guo Y, Yu C (2020) Vif-net: an unsupervised framework for infrared and visible image fusion. IEEE Trans Comput Imaging 6:640–651
Liu HI, Chen WL (2021) Re-transformer: a self-attention based model for machine translation. Procedia Comput Sci 189:3–10
Galassi A, Lippi M, Torroni P (2020) Attention in natural language processing. IEEE Trans Neural Netw Learn Syst
Bahdanau D, Chorowski J, Serdyuk D, Brakel P, Bengio Y (2016) End-to-end attention-based large vocabulary speech recognition. In: 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 4945–4949
Wang D, Lai R, Guan J (2021) Target attention deep neural network for infrared image enhancement. Infrared Phys Technol 115:103690
Zhang T, Gong X, Chen CP (2021) Bmt-net: broad multitask transformer network for sentiment analysis. IEEE Trans Cybern
Yang B, Wang L, Wong DF, Shi S, Tu Z (2021) Context-aware self-attention networks for natural language processing. Neurocomputing 458:157–169
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141
Sang H, Zhou Q, Zhao Y (2020) Pcanet: pyramid convolutional attention network for semantic segmentation. Image Vis Comput 103:103997
Cheng J, Tian S, Yu L, Lu H, Lv X (2020) Fully convolutional attention network for biomedical image segmentation. Artif Intell Med 107:101899
Sun J, Darbehani F, Zaidi M, Wang B (2020) Saunet: shape attentive u-net for interpretable medical image segmentation. In: International conference on medical image computing and computer-assisted intervention, Springer, pp 797–806
Ren K, Zhang D, Wan M, Miao X, Gu G, Chen Q (2021) An infrared and visible image fusion method based on improved densenet and mrmr-zca. Infrared Phys Technol 115:103707
Huang H, Lin L, Tong R, Hu H, Zhang Q, Iwamoto Y, Han X, Chen YW, Wu J (2020) Unet 3+: a full-scale connected unet for medical image segmentation. ICASSP 2020–2020 IEEE International Conference on Acoustics. Speech and Signal Processing (ICASSP), IEEE, pp 1055–1059
Zhang J, Jin Y, Xu J, Xu X, Zhang Y (2018) Mdu-net: multi-scale densely connected u-net for biomedical image segmentation. arXiv:1812.00352
Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: common objects in context. In: European conference on computer vision, Springer, pp 740–755
Joudar NE, Ettaouil M et al (2022) An adaptive drop method for deep neural networks regularization: estimation of dropconnect hyperparameter using generalization gap. Knowl-Based Syst 253:109567
Wang X, Hua Z, Li J (2022) Cross-unet: dual-branch infrared and visible image fusion framework based on cross-convolution and attention mechanism. Vis Comput 1–18
Toet A, et al (2014) Tno image fusion dataset https://figshare.com/articles.TN_Image_Fusion_Dataset/1008029
Wang X, Hua Z, Li J (2022) Paccdu: pyramid attention cross-convolutional dual unet for infrared and visible image fusion. IEEE Trans Instrum Meas 71:1–16
Wang X, Hua Z, Li J (2023) Dbsd: dual branches network using semantic and detail information for infrared and visible image fusion. Infrared Phys Technol 104769
Li H, Wu XJ, Kittler J (2021) Rfn-nest: an end-to-end residual fusion network for infrared and visible images. Inf Fusion 73:72–86
Zhang L, Li H, Zhu R, Du P (2022) An infrared and visible image fusion algorithm based on resnet-152. Multimed Tools Appl 1–11
Li Y, Wang J, Miao Z, Wang J (2020) Unsupervised densely attention network for infrared and visible image fusion. Multimed Tools Appl 79(45):34685–34696
Ma J, Zhou Z, Wang B, Zong H (2017) Infrared and visible image fusion based on visual saliency map and weighted least square optimization. Infrared Phys Technol 82:8–17
Xu H, Ma J, Jiang J, Guo X, Ling H (2020) U2fusion: a unified unsupervised image fusion network. IEEE Trans Pattern Anal Mach Intell
Acknowledgements
This research was supported by the National Natural Science Foundation of China (62002200, 62202268, 62272281), Shandong Natural Science Foundation of China (ZR2023MF026), Yantai science and technology innovation development plan(2022JCYJ031).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
The authors declare that no potentail competing interests exist. There is no an undisclosed relationship thay may pose a competing interest. There is no an undisclosed funding source that may pose a competing interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wang, X., Hua, Z. & Li, J. Attention based dual UNET network for infrared and visible image fusion. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-18196-x
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11042-024-18196-x