Skip to main content
Log in

Attention based dual UNET network for infrared and visible image fusion

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

How to accurately extract the effective information and profile features of source images has been a difficult problem in the domain of infrared and visible image fusion. The existing fusion algorithms lack scientific decision-making in feature information extraction and fusion weight allocation, which can easily result in the loss of information or redundancy and affect the fusion effect. Therefore, this paper proposes a dual U-Net network based on the attention mechanism to realize the fusion of infrared and visible images. The method integrates the feature information of two different structural U-Net branches. The inner nested connected network extracts and fuses multi-scale infrared and visible features, while the attention mechanism is introduced in the decoding block to enhance the model’s attention to the significant features of the image to generate rich semantic information. The outer U-Net network extracts the depth features of the fused image, enhances the contour features of the fused image, and improves the retention of spatial feature. Comparison with six existing fusion methods on a public dataset shows that the algorithm in this paper is superior in both subjective vision and objective evaluation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Data Availability

Data available on request from the authors.

References

  1. Hong C, Yu J, Tao D, Wang M (2014) Image-based three-dimensional human pose recovery by multiview locality-sensitive sparse retrieval. IEEE Trans Ind Electron 62(6):3742–3751

    Google Scholar 

  2. Yu J, Tao D, Wang M, Rui Y (2014) Learning to rank using user clicks and visual features for image retrieval. IEEE Trans Cybern 45(4):767–779

    Article  Google Scholar 

  3. Yu J, Tan M, Zhang H, Rui Y, Tao D (2019) Hierarchical deep click feature prediction for fine-grained image recognition. IEEE Trans Pattern Anal Mach Intell 44(2):563–578

    Article  Google Scholar 

  4. Hong C, Yu J, Wan J, Tao D, Wang M (2015) Multimodal deep autoencoder for human pose recovery. IEEE Trans Image Process 24(12):5659–5670

    Article  MathSciNet  Google Scholar 

  5. Yu N, Li J, Hua Z (2022) Decolorization algorithm based on contrast pyramid transform fusion. Multimed Tools Appl 1–23

  6. Huang L, Dai S, Huang T, Huang X, Wang H (2021) Infrared small target segmentation with multiscale feature representation. Infrared Phys Technol 116:103755

    Article  Google Scholar 

  7. Qiao W, Yang Z (2020) Forecast the electricity price of us using a wavelet transform-based hybrid model. Energy 193:116704

    Article  Google Scholar 

  8. Zhu Z, Yin H, Chai Y, Li Y, Qi G (2018) A novel multi-modality image fusion method based on image decomposition and sparse representation. Inf Sci 432:516–529

    Article  MathSciNet  Google Scholar 

  9. Zhang Q, Shi T, Wang F, Blum RS, Han J (2018) Robust sparse representation based multi-focus image fusion with dictionary construction and local spatial consistency. Pattern Recognit 83:299–313

    Article  Google Scholar 

  10. Hong C, Yu J, Zhang J, Jin X, Lee KH (2018) Multimodal face-pose estimation with multitask manifold deep learning. IEEE Trans Ind Inf 15(7):3952–3961

    Article  Google Scholar 

  11. Li Z, Li J, Zhang F, Fan L (2023) Cadui: cross-attention-based depth unfolding iteration network for pansharpening remote sensing images. IEEE Trans Geosci Remote Sensing

  12. Hssayni EH, Joudar NE, Ettaouil M (2022a) A deep learning framework for time series classification using normal cloud representation and convolutional neural network optimization. Comput Intell 38(6):2056–2074

    Article  Google Scholar 

  13. Hssayni Eh, Joudar NE, Ettaouil M (2022b) Localization and reduction of redundancy in cnn using l 1-sparsity induction. J Ambient Intell Humaniz Comput 1–13

  14. Liu Y, Chen X, Peng H, Wang Z (2017) Multi-focus image fusion with a deep convolutional neural network. Inf Fusion 36:191–207

    Article  Google Scholar 

  15. Zhang Y, Liu Y, Sun P, Yan H, Zhao X, Zhang L (2020) Ifcnn: a general image fusion framework based on convolutional neural network. Inf Fusion 54:99–118

  16. Zhu Z, Li D, Hu Y, Li J, Liu D, Li J (2021) Indoor scene segmentation algorithm based on full convolutional neural network. Neural Comput Appl 33(14):8261–8273

    Article  Google Scholar 

  17. Guo R, Xj Shen, Xy Dong, Xl Zhang (2020) Multi-focus image fusion based on fully convolutional networks. Front Inf Technol Electron 21(7):1019–1033

    Article  Google Scholar 

  18. Feng Y, Lu H, Bai J, Cao L, Yin H (2020) Fully convolutional network-based infrared and visible image fusion. Multimed Tools Appl 79(21):15001–15014

    Article  Google Scholar 

  19. Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention, Springer, pp 234–241

  20. Xiao B, Xu B, Bi X, Li W (2020) Global-feature encoding u-net (geu-net) for multi-focus image fusion. IEEE Trans Image Process 30:163–175

    Article  Google Scholar 

  21. Zhou Z, Siddiquee MMR, Tajbakhsh N, Liang J (2018) Unet++: a nested u-net architecture for medical image segmentation. In: Deep learning in medical image analysis and multimodal learning for clinical decision support, Springer, pp 3–11

  22. Li H, Wu XJ, Durrani T (2020) Nestfuse: an infrared and visible image fusion architecture based on nest connection and spatial/channel attention models. IEEE Trans Instrum Meas 69(12):9645–9656

  23. Su X, Li J, Hua Z (2022) Transformer-based regression network for pansharpening remote sensing images. IEEE Trans Geosci Remote Sens 60:1–23

    Google Scholar 

  24. Chen J, Li X, Luo L, Mei X, Ma J (2020) Infrared and visible image fusion based on target-enhanced multiscale transform decomposition. Inf Sci 508:64–78

    Article  Google Scholar 

  25. Mo Y, Kang X, Duan P, Sun B, Li S (2021) Attribute filter based infrared and visible image fusion. Inf Fusion 75:41–54

    Article  Google Scholar 

  26. Zhan L, Zhuang Y, Huang L (2017) Infrared and visible images fusion method based on discrete wavelet transform. J Comput 28(2):57–71

    Google Scholar 

  27. Aghamaleki JA, Ghorbani A (2023) Image fusion using dual tree discrete wavelet transform and weights optimization. Vis Comput 39(3):1181–1191

    Article  Google Scholar 

  28. Li S, Yin H, Fang L (2013) Remote sensing image fusion via sparse representations over learned dictionaries. IEEE Trans Geosci Remote Sens 51(9):4779–4789

    Article  Google Scholar 

  29. Liu Y, Chen X, Ward RK, Wang ZJ (2016) Image fusion with convolutional sparse representation. IEEE Signal Process Lett 23(12):1882–1886

    Article  Google Scholar 

  30. Ma J, Yu W, Liang P, Li C, Jiang J (2019) Fusiongan: a generative adversarial network for infrared and visible image fusion. Inf Fusion 48:11–26

  31. Li J, Huo H, Li C, Wang R, Feng Q (2020) Attentionfgan: infrared and visible image fusion using attention-based generative adversarial networks. IEEE Trans Multimed 23:1383–1396

  32. Zhang H, Yuan J, Tian X, Ma J (2021) Gan-fm: infrared and visible image fusion using gan with full-scale skip connection and dual markovian discriminators. IEEE Trans Comput Imaging 7:1134–1147

  33. Li H, Wu XJ, Kittler J (2018) Infrared and visible image fusion using a deep learning framework. In: 2018 24th International conference on pattern recognition (ICPR), IEEE, pp 2705–2710

  34. Li H, Wu XJ, Durrani TS (2019) Infrared and visible image fusion with resnet and zero-phase component analysis. Infrared Phys Technol 102:103039

    Article  Google Scholar 

  35. Li H, Wu XJ (2018) Densefuse: a fusion approach to infrared and visible images. IEEE Trans Image Process 28(5):2614–2623

  36. Hou R, Zhou D, Nie R, Liu D, Xiong L, Guo Y, Yu C (2020) Vif-net: an unsupervised framework for infrared and visible image fusion. IEEE Trans Comput Imaging 6:640–651

  37. Liu HI, Chen WL (2021) Re-transformer: a self-attention based model for machine translation. Procedia Comput Sci 189:3–10

    Article  Google Scholar 

  38. Galassi A, Lippi M, Torroni P (2020) Attention in natural language processing. IEEE Trans Neural Netw Learn Syst

  39. Bahdanau D, Chorowski J, Serdyuk D, Brakel P, Bengio Y (2016) End-to-end attention-based large vocabulary speech recognition. In: 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 4945–4949

  40. Wang D, Lai R, Guan J (2021) Target attention deep neural network for infrared image enhancement. Infrared Phys Technol 115:103690

    Article  Google Scholar 

  41. Zhang T, Gong X, Chen CP (2021) Bmt-net: broad multitask transformer network for sentiment analysis. IEEE Trans Cybern

  42. Yang B, Wang L, Wong DF, Shi S, Tu Z (2021) Context-aware self-attention networks for natural language processing. Neurocomputing 458:157–169

    Article  Google Scholar 

  43. Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141

  44. Sang H, Zhou Q, Zhao Y (2020) Pcanet: pyramid convolutional attention network for semantic segmentation. Image Vis Comput 103:103997

  45. Cheng J, Tian S, Yu L, Lu H, Lv X (2020) Fully convolutional attention network for biomedical image segmentation. Artif Intell Med 107:101899

    Article  Google Scholar 

  46. Sun J, Darbehani F, Zaidi M, Wang B (2020) Saunet: shape attentive u-net for interpretable medical image segmentation. In: International conference on medical image computing and computer-assisted intervention, Springer, pp 797–806

  47. Ren K, Zhang D, Wan M, Miao X, Gu G, Chen Q (2021) An infrared and visible image fusion method based on improved densenet and mrmr-zca. Infrared Phys Technol 115:103707

    Article  Google Scholar 

  48. Huang H, Lin L, Tong R, Hu H, Zhang Q, Iwamoto Y, Han X, Chen YW, Wu J (2020) Unet 3+: a full-scale connected unet for medical image segmentation. ICASSP 2020–2020 IEEE International Conference on Acoustics. Speech and Signal Processing (ICASSP), IEEE, pp 1055–1059

  49. Zhang J, Jin Y, Xu J, Xu X, Zhang Y (2018) Mdu-net: multi-scale densely connected u-net for biomedical image segmentation. arXiv:1812.00352

  50. Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: common objects in context. In: European conference on computer vision, Springer, pp 740–755

  51. Joudar NE, Ettaouil M et al (2022) An adaptive drop method for deep neural networks regularization: estimation of dropconnect hyperparameter using generalization gap. Knowl-Based Syst 253:109567

  52. Wang X, Hua Z, Li J (2022) Cross-unet: dual-branch infrared and visible image fusion framework based on cross-convolution and attention mechanism. Vis Comput 1–18

  53. Toet A, et al (2014) Tno image fusion dataset https://figshare.com/articles.TN_Image_Fusion_Dataset/1008029

  54. Wang X, Hua Z, Li J (2022) Paccdu: pyramid attention cross-convolutional dual unet for infrared and visible image fusion. IEEE Trans Instrum Meas 71:1–16

  55. Wang X, Hua Z, Li J (2023) Dbsd: dual branches network using semantic and detail information for infrared and visible image fusion. Infrared Phys Technol 104769

  56. Li H, Wu XJ, Kittler J (2021) Rfn-nest: an end-to-end residual fusion network for infrared and visible images. Inf Fusion 73:72–86

  57. Zhang L, Li H, Zhu R, Du P (2022) An infrared and visible image fusion algorithm based on resnet-152. Multimed Tools Appl 1–11

  58. Li Y, Wang J, Miao Z, Wang J (2020) Unsupervised densely attention network for infrared and visible image fusion. Multimed Tools Appl 79(45):34685–34696

    Article  Google Scholar 

  59. Ma J, Zhou Z, Wang B, Zong H (2017) Infrared and visible image fusion based on visual saliency map and weighted least square optimization. Infrared Phys Technol 82:8–17

    Article  Google Scholar 

  60. Xu H, Ma J, Jiang J, Guo X, Ling H (2020) U2fusion: a unified unsupervised image fusion network. IEEE Trans Pattern Anal Mach Intell

Download references

Acknowledgements

This research was supported by the National Natural Science Foundation of China (62002200, 62202268, 62272281), Shandong Natural Science Foundation of China (ZR2023MF026), Yantai science and technology innovation development plan(2022JCYJ031).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jinjiang Li.

Ethics declarations

Conflict of Interests

The authors declare that no potentail competing interests exist. There is no an undisclosed relationship thay may pose a competing interest. There is no an undisclosed funding source that may pose a competing interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, X., Hua, Z. & Li, J. Attention based dual UNET network for infrared and visible image fusion. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-18196-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11042-024-18196-x

Keywords

Navigation