Multimedia Tools and Applications

, Volume 78, Issue 24, pp 34483–34512 | Cite as

Multifocus image fusion using convolutional neural networks in the discrete wavelet transform domain

  • Zeyu Wang
  • Xiongfei Li
  • Haoran Duan
  • Xiaoli ZhangEmail author
  • Hancheng Wang


In this paper, a novel multifocus image fusion algorithm based on the convolutional neural network (CNN) in the discrete wavelet transform (DWT) domain is proposed. The algorithm combines the advantages of spatial domain- and transform domain-based methods. The CNN is used to amplify features and generate different decision maps for different frequency subbands instead of image blocks or source images. In addition, the CNN, which can be seen as an adaptive fusion rule, replaces the traditional fusion rules. The proposed algorithm includes the following steps: first, we decompose each source image into one low frequency subband and several high frequency subbands using the DWT; second, these frequency subbands are used as input to the CNN to generate weight maps. To obtain a more accurate decision map, it is refined by a series of postprocessing operations, including the sum-modified-Laplacian (SML) and guided filter (GF). According to their decision maps, the frequency subbands are fused; finally, the fused image can be obtained using the inverse DWT. The experimental results show that our algorithm is superior to other algorithms.


Multifocus image fusion Convolutional neural network Discrete wavelet transform 



The work was supported by National Science & Technology Pillar Program of China (Grant NO.2012BAH48F02), National Natural Science Foundation of China (Grant No.61801190), Nature Science Foundation of Jilin Province (Grant No.20180101055JC), Outstanding Young Talent Foundation of Jilin Province (Grant No.20180520029JH), China Postdoctoral Science Foundation (Grant No.2017M611323), Industrial Technology Research and Development Funds of Jilin province (2019C054-3), and the Fundamental Research Funds for the Central Universities, JLU.


  1. 1.
    Acerbi-Junior FW, Clevers JGPW, Schaepman ME (2006) The assessment of multi-sensor image fusion using wavelet transforms for mapping the Brazilian Savanna. Int J Appl Earth Obs Geoinf 8(4):278–288Google Scholar
  2. 2.
    Amin-Naji M, Aghagolzadeh A (2018) Multi-focus image fusion in DCT domain using variance and energy of Laplacian and correlation coefficient for visual sensor networks. Journal of AI and Data Mining 6(2):233–250Google Scholar
  3. 3.
    Anderson CH (1988) Filter-subtract-decimate hierarchical pyramid signal analyzing and synthesizing technique. USGoogle Scholar
  4. 4.
    Bertinetto L, Valmadre J, Henriques JF, Vedaldi A, Torr PHS (2016) Fully-Convolutional Siamese Networks for Object Tracking. Computer Vision - Eccv 2016 Workshops. Pt Ii 9914:850–865Google Scholar
  5. 5.
    Burt PJ, Adelson EH, Fischler MA, Firschein O (1987) The Laplacian pyramid as a compact image code, Morgan Kaufmann, San FranciscoGoogle Scholar
  6. 6.
    Du CB, Gao SS (2017) Image segmentation-based multi-focus image fusion through multi-scale convolutional neural network. IEEE Access 5:15750–15761Google Scholar
  7. 7.
    Fan D-P, Gong C, Cao Y, Ren B, Cheng M-M, Borji A Enhanced-alignment measure for binary foreground map evaluation. arXiv:180510421
  8. 8.
    Fan D-P, Cheng M-M, Liu Y, Borji LT (2017) A Structure-measure: A new way to evaluate foreground maps. In: Proceedings of the IEEE international conference on computer vision. pp 4548–4557Google Scholar
  9. 9.
    Fan D-P, Cheng M-M, Liu J-J, Gao S-H, Borji HQ (2018) A salient objects in clutter: Bringing salient object detection to the foreground. In: Proceedings of the European conference on computer vision (ECCV). pp 186–202Google Scholar
  10. 10.
    Fan D-P, Zhang S, Wu Y-H, Cheng M-M, Ren B, Ji R, Rosin PL (2018) Face sketch synthesis style similarity: a new structure co-occurrence texture measure. arXiv:180402975
  11. 11.
    Fan D-P, Wang W, Cheng M-M, Shen J (2019) Shifting more attention to video salient object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 8554–8564Google Scholar
  12. 12.
    Farfade SS, Saberian M, Li LJ (2015) Multi-view face detection using deep convolutional neural networks. Icmr’15: Proceedings of the 2015 ACM international conference on multimedia retrieval: 643–650Google Scholar
  13. 13.
    Gao Z, Wang D, Xue Y, Xu G, Zhang H, Wang Y (2018) 3D object recognition based on pairwise Multi-view Convolutional Neural Networks. J Vis Commun Image Represent 56:305–315Google Scholar
  14. 14.
    Gao Z, Xuan H -Z, Zhang H, Wan S, Choo K-KR (2019) Adaptive fusion and category-level dictionary learning model for multi-view human action recognition. IEEE Internet of Things JournalGoogle Scholar
  15. 15.
    Guo X P, Nie RC, Cao JD, Zhou DM, Qian WH (2018) Fully Convolutional Network-Based Multifocus Image Fusion. Neural Comput 30(7):1775–1800MathSciNetGoogle Scholar
  16. 16.
    Gutman I, Zhou B (2006) Laplacian energy of a graph. Linear Algebra Appl 414(1):29–37MathSciNetzbMATHGoogle Scholar
  17. 17.
    Hareeta M, Mahendra K, Anurag P (2016) image fusion based on the modified curvelet transform. Smart Trends in Information Technology and Computer Communications. Smartcom 2016(628):111–118Google Scholar
  18. 18.
    He KM, Sun J, Tang XO (2013) Guided image filtering. IEEE Trans Pattern Anal Mach Intell 35(6):1397–1409Google Scholar
  19. 19.
    Holzinger A From machine learning to explainable AI. In: 2018 world symposium on digital intelligence for systems and machines (DISA). IEEE, pp 55–66Google Scholar
  20. 20.
    Hong C, Yu J, Tao D, Wang M (2014) Image-based three-dimensional human pose recovery by multiview locality-sensitive sparse retrieval. IEEE Trans Ind Electron 62(6):3742–3751Google Scholar
  21. 21.
    Hong C, Yu J, Wan J, Tao D, Wang M (2015) Multimodal deep autoencoder for human pose recovery. IEEE Trans Image Process 24(12):5659–5670MathSciNetzbMATHGoogle Scholar
  22. 22.
    Hou Q, Cheng M-M, Liu J, Torr PH (2018) Webseg: Learning semantic segmentation from web searches. arXiv:180309859
  23. 23.
    Hu Y-T, Huang J-B, Schwing AG (2018) Unsupervised video object segmentation using motion saliency-guided spatio-temporal propagation. In: Proceedings of the European conference on computer vision (ECCV). pp 786–802Google Scholar
  24. 24.
    Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T Caffe: Convolutional architecture for fast feature embedding. Paper presented at the proceedings of the 22nd ACM international conference on Multimedia, Orlando, Florida, USAGoogle Scholar
  25. 25.
    Jin X, Hou J Y, Nie RC, Yao SW, Zhou DM, Jiang Q, He KJ (2018) A lightweight scheme for multi-focus image fusion. Multimed Tools Appl 77 (18):23501–23527Google Scholar
  26. 26.
    Kong J, Zheng K, Zhang J, Feng X (2008) Multi-focus image fusion using spatial frequency and genetic algorithm. International Journal of Computer Science & Network Security 2:220–224Google Scholar
  27. 27.
    Krizhevsky A, Sutskever I, Hinton G E (2017) ImageNet classification with deep convolutional neural networks. Commun ACM 60(6):84–90Google Scholar
  28. 28.
    Lee K, Ji S (2015) Multi-focus image fusion using energy of image gradient and gradual boundary smoothing. Tencon 2015 - 2015 IEEE Region 10 ConferenceGoogle Scholar
  29. 29.
    Lewis JJ, O’Callaghan RJ, Nikolov SG, Bull DR, Canagarajah N (2007) Pixel- and region-based image fusion with complex wavelets. Information Fusion 8(2):119–130Google Scholar
  30. 30.
    Li K, He F, Yu H, Chen X A parallel and robust object tracking approach synthesizing adaptive Bayesian learning and improved incremental subspace learning. Frontiers of Computer Science:1–20Google Scholar
  31. 31.
    Li ST, Yang B (2008) Multifocus image fusion using region segmentation and spatial frequency. Image Vis Comput 26(7):971–979Google Scholar
  32. 32.
    Li ZH, Jing ZL, Liu G, Sun SY, Leung H (2003) Pixel visibility based multifocus image fusion. Proceedings of 2003, International Conference on Neural Networks & Signal Processing, Proceedings, Vols 1 and 2:1050–1053Google Scholar
  33. 33.
    Li S, Kang XD, Hu JW, Yang B (2013) Image matting for fusion of multi-focus images in dynamic scenes. Information Fusion 14(2):147–162Google Scholar
  34. 34.
    Li ST, Kang XD, Hu JW (2013) Image fusion with guided filtering. IEEE Trans Image Process 22(7):2864–2875Google Scholar
  35. 35.
    Li K, He F-z, H-p Y u, Chen X (2017) A correlative classifiers approach based on particle filter and sample set for tracking occluded target. Applied Mathematics-A Journal of Chinese Universities 32(3):294–312MathSciNetGoogle Scholar
  36. 36.
    Liu Z, Blasch E, Xue ZY, Zhao JY, Laganiere R, Wu W (2012) Objective assessment of multiresolution image fusion algorithms for context enhancement in night vision: A comparative study. IEEE Trans Pattern Anal Mach Intell 34(1):94–109Google Scholar
  37. 37.
    Liu Y, Liu SP, Wang ZF (2015) Multi-focus image fusion with dense SIFT. Information Fusion 23:139–155Google Scholar
  38. 38.
    Liu Y, Chen X, Peng H, Wang ZF (2017) Multi-focus image fusion with a deep convolutional neural network. Information Fusion 36:191–207Google Scholar
  39. 39.
    Liu Y, Chen X, Wang ZF, Wang ZJ, Ward RK, Wang XS (2018) Deep learning for pixel-level image fusion: Recent advances and future prospects. Information Fusion 42:158–173Google Scholar
  40. 40.
    Liu Y, Cheng M-M, Bian J, Zhang L, Jiang P-T, Cao Y (2018) Semantic edge detection with diverse deep supervision. arXiv:180402864
  41. 41.
    Liu Y, Fan DP, Nie GY, Zhang X, Cheng MM (2019) DNA: Deeply-supervised nonlinear aggregation for salient object detectionGoogle Scholar
  42. 42.
    Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. 2015 IEEE conference on computer vision and pattern recognition (CVPR):3431-3440Google Scholar
  43. 43.
    Lv X, He F, Yan X, Wu Y, Cheng Y (2019) Integrating selective undo of feature-based modeling operations for real-time collaborative CAD systems. Futur Gener Comput Syst 100:473–497Google Scholar
  44. 44.
    Margolin R, Zelnik-Manor L, Tal A (2014) How to evaluate foreground maps? In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 248–255Google Scholar
  45. 45.
    Nie G-Y, Cheng M-M, Liu Y, Liang Z, Fan D-P, Liu Y, Wang Y (2019) Multi-level context ultra-aggregation for stereo matching. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 3283–3291Google Scholar
  46. 46.
    Norouzi M, Fleet DJ, Salakhutdinov R (2018) Hamming distance metric learning. Adv Neural Inf Proces Syst 2:1061–1069Google Scholar
  47. 47.
    Pajares G, de la Cruz JM (2004) A wavelet-based image fusion tutorial. Pattern Recogn 37(9):1855–1872Google Scholar
  48. 48.
    Pan Y, He F, Yu H A correlative denoising autoencoder to model social influence for top-n recommender system. Frontiers of Computer ScienceGoogle Scholar
  49. 49.
    Pan Y, He F, Yu H (2019) A novel enhanced collaborative autoencoder with knowledge distillation for top-N recommender systems. Neurocomputing 332:137–148Google Scholar
  50. 50.
    Piella G (2008) New quality measures for image fusion. Astronomische Nachrichten 173(16-17):267–268Google Scholar
  51. 51.
    Qu XB (2009) Matlab image fusion toolbox for sum-modified-laplacian-based multifocus image fusion method in cycle spinning sharp frequency localized contourlet transform. Opt Precis Eng 17(5):1203–1212Google Scholar
  52. 52.
    Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, Lecun Y OverFeat: Integrated recognition, localization and detection using convolutional networks. Eprint ArxivGoogle Scholar
  53. 53.
    Tang H, Xiao B, Li W S, Wang G Y (2018) Pixel convolutional neural network for multi-focus image fusion. Inf Sci 433:125–141MathSciNetGoogle Scholar
  54. 54.
    Wei H, Jing ZL (2007) Evaluation of focus measures in multi-focus image fusion. Pattern Recogn Lett 28(4):493–500Google Scholar
  55. 55.
    Wu Y, He F, Zhang D, Li X (2015) Service-oriented feature-based data exchange for cloud-based design and manufacturing. IEEE Trans Serv Comput 11 (2):341–353Google Scholar
  56. 56.
    Wu Z, Huang Y, Zhang K (2018) Remote sensing image fusion method based on PCA and curvelet transform. J Indian Soc Remote Sens 3:1–9Google Scholar
  57. 57.
    Xiao-Bo QU, Yan JW, Yang GD (2009) Multifocus image fusion method of sharp frequency localized Contourlet transform domain based on sum-modified-Laplacian. Opt Precis Eng 17(5):1203–1212Google Scholar
  58. 58.
    Xu KP, Qin Z, Wang GL, Zhang HD, Huang K, Ye SX (2018) Multi-focus Image Fusion using Fully Convolutional Two-stream Network for Visual Sensors. Ksii T Internet Inf 12(5):2253–2272Google Scholar
  59. 59.
    Yin M, Duan PH, Liu W, Liang XY (2017) A novel infrared and visible image fusion algorithm based on shift-invariant dual-tree complex shearlet transform and sparse representation. Neurocomputing 226:182–191Google Scholar
  60. 60.
    Yu J, Rui Y, Tao D (2014) Click prediction for web image reranking using multimodal sparse coding. IEEE Trans Image Process 23(5):2019–2032MathSciNetzbMATHGoogle Scholar
  61. 61.
    Yu J, Tao D, Wang M, Rui Y (2014) Learning to rank using user clicks and visual features for image retrieval. IEEE Transactions on Cybernetics 45(4):767–779Google Scholar
  62. 62.
    Yu J, Yang X, Gao F, Tao D (2016) Deep multimodal distance metric learning using click constraints for image ranking. IEEE Transactions on Cybernetics 47(12):4014–4024Google Scholar
  63. 63.
    Zhang Q, Guo BL (2009) Multifocus image fusion using the nonsubsampled contourlet transform. Signal Process 89(7):1334–1346zbMATHGoogle Scholar
  64. 64.
    Zhang Q, Wang L, Li H J, Ma ZK (2011) Similarity-based multimodality image fusion with shiftable complex directional pyramid. Pattern Recogn Lett 32 (13):1544–1553Google Scholar
  65. 65.
    Zhao H, Li Q, Feng HJ (2008) Multi-focus color image fusion in the HSI space using the sum-modified-laplacian and a coarse edge map. Image Vis Comput 26 (9):1285–1295Google Scholar
  66. 66.
    Zhang J, Wang M, Lin L, Yang X, Gao J, Rui Y (2017) Saliency detection on light field: A multi-cue approach. ACM Trans Multimed Comput Commun Appl (TOMM) 13(3):32Google Scholar
  67. 67.
    Zhao J -X, Cao Y, Fan D-P, Cheng M-M, Li X-Y, Zhang L (2019) Contrast prior and fluid pyramid integration for RGBD salient object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)Google Scholar
  68. 68.
    Zhou Z, Li S, Wang B (2014) Multi-scale weighted gradient-based fusion for multi-focus images. Information Fusion 20:60–72Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of EducationJilin UniversityChangchunChina
  2. 2.College of SoftwareJilin UniversityChangchunChina
  3. 3.College of Computer Science and TechnologyJilin UniversityChangchunChina
  4. 4.School of ComputingNewcastle UniversityNewcastle upon TyneUK

Personalised recommendations