Skip to main content

MFT: Multi-scale Fusion Transformer for Infrared and Visible Image Fusion

  • Conference paper
  • First Online:
Artificial Neural Networks and Machine Learning – ICANN 2023 (ICANN 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14259))

Included in the following conference series:

  • 941 Accesses

Abstract

This paper studies the problem of fusing the infrared and visible images to improve the quality of target image. Traditional image fusion algorithms usually utilize convolutional neural network (CNN) for feature extraction and fusion, and thus can only exploit local information. Some recent approaches combines CNN and Transformer to capture long-range dependencies, but the global contextual information in the images still cannot be full exploited. To improve the ability of capturing global information, we propose a novel multi-scale fusion transformer (MFT) to fuse the infrared and visible images. In the encoder of our MFT, a multi-head pooling attention module is utilized to extract both local features and long-range dependencies for the input image. Then a novel dual-branch fusion module is designed to simultaneously exploit the global contextual and infrared-visible complementary information in the fusion process. Experimental results show that the proposed method can effectively improve the subjective visual experience of the infrared-visible fused image, and outperforms many recent and competitive counterparts in terms of most objective evaluation criteria.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Paramanandham, N., Rajendiran, K.: Infrared and visible image fusion using discrete cosine transform and swarm intelligence for surveillance applications. Infrared Phys. Technol. 88, 13–22 (2018)

    Article  Google Scholar 

  2. Gao, H., Cheng, B., Wang, J., et al.: Object classification using CNN-based fusion of vision and LIDAR in autonomous vehicle environment. IEEE Trans. Industr. Inf. 14(9), 4224–4231 (2018)

    Article  Google Scholar 

  3. Ma, J., Ma, Y., Li, C.: Infrared and visible image fusion methods and applications: a survey. Inf. Fusion 45, 153–178 (2019)

    Article  Google Scholar 

  4. Kristan, M., Matas, J., Leonardis, A., et al.: The seventh visual object tracking VOT2019 challenge results. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (2019)

    Google Scholar 

  5. Lopez-Molina, C., Montero, J., Bustince, H., et al.: Self-adapting weighted operators for multiscale gradient fusion. Inf. Fusion 44, 136–146 (2018)

    Article  Google Scholar 

  6. Wright, J., Yang, A.Y., Ganesh, A., et al.: Robust face recognition via sparse representation. IEEE Trans. Pattern Anal. Mach. Intell. 31(2), 210–227 (2008)

    Article  Google Scholar 

  7. He, K., Zhou, D., Zhang, X., et al.: Infrared and visible image fusion based on target extraction in the nonsubsampled contourlet transform domain. J. Appl. Remote Sens. 11(1), 015011–015011 (2017)

    Article  Google Scholar 

  8. Liu, G., Lin, Z., Yan, S., et al.: Robust recovery of subspace structures by low-rank representation. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 171–184 (2012)

    Article  Google Scholar 

  9. Liu, C.H., Qi, Y., Ding, W.R.: Infrared and visible image fusion method based on saliency detection in sparse domain. Infrared Phys. Technol. 83, 94–102 (2017)

    Article  Google Scholar 

  10. Zhang, Q., Fu, Y., Li, H., et al.: Dictionary learning method for joint sparse representation-based image fusion. Opt. Eng. 52(5), 057006–057006 (2013)

    Article  Google Scholar 

  11. Li, H., Wu, X.J., Kittler, J.: Infrared and visible image fusion using a deep learning framework. In: 2018 24th International Conference on Pattern Recognition (ICPR), pp. 2705–2710. IEEE (2018)

    Google Scholar 

  12. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  13. Li, H., Wu, X.J.: DenseFuse: a fusion approach to infrared and visible images. IEEE Trans. Image Process. 28(5), 2614–2623 (2018)

    Article  MathSciNet  Google Scholar 

  14. Ma, J., Yu, W., Liang, P., et al.: FusionGAN: a generative adversarial network for infrared and visible image fusion. Inf. Fusion 48, 11–26 (2019)

    Article  Google Scholar 

  15. Hu, R., Singh, A.: Transformer is all you need: Multimodal multitask learning with a unified transformer. arXiv preprint arXiv:2102.10772 (2021)

  16. Chen, C.F.R., Fan, Q., Panda, R.: CrossViT: cross-attention multi-scale vision transformer for image classification. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 357–366 (2021)

    Google Scholar 

  17. Ho, J., Kalchbrenner, N., Weissenborn, D., et al.: Axial attention in multidimensional transformers. arXiv preprint arXiv:1912.12180 (2019)

  18. Fan, H., Xiong, B., Mangalam, K., et al.: Multiscale vision transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6824–6835 (2021)

    Google Scholar 

  19. Zhang, Y., Liu, Y., Sun, P., et al.: IFCNN: a general image fusion framework based on convolutional neural network. Inf. Fusion 54, 99–118 (2020)

    Article  Google Scholar 

  20. Li, H., Wu, X.J., Durrani, T.: NestFuse: an infrared and visible image fusion architecture based on nest connection and spatial/channel attention models. IEEE Trans. Instrum. Meas. 69(12), 9645–9656 (2020)

    Article  Google Scholar 

  21. Xu, H., Ma, J., Jiang, J., et al.: U2Fusion: a unified unsupervised image fusion network. IEEE Trans. Pattern Anal. Mach. Intell. 44(1), 502–518 (2020)

    Article  Google Scholar 

  22. Li, H., Wu, X.J., Kittler, J.: RFN-Nest: an end-to-end residual fusion network for infrared and visible images. Inf. Fusion 73, 72–86 (2021)

    Article  Google Scholar 

  23. Vs, V., Valanarasu, J.M.J., Oza, P., et al.: Image fusion transformer. In: 2022 IEEE International Conference on Image Processing (ICIP), pp. 3566–3570. IEEE (2022)

    Google Scholar 

  24. Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48

    Chapter  Google Scholar 

  25. Hwang, S., Park, J., Kim, N., et al.: Multispectral pedestrian detection: benchmark dataset and baseline. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1037–1045 (2015)

    Google Scholar 

  26. Toet, A.: The TNO multiband image data collection. Data Brief 15, 249–251 (2017)

    Article  Google Scholar 

  27. Zhou, Z., Rahman Siddiquee, M.M., Tajbakhsh, N., Liang, J.: UNet++: a nested U-Net architecture for medical image segmentation. In: Stoyanov, D., et al. (eds.) DLMIA/ML-CDS -2018. LNCS, vol. 11045, pp. 3–11. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00889-5_1

    Chapter  Google Scholar 

  28. Kristan, M., et al.: The eighth visual object tracking VOT2020 challenge results. In: Bartoli, A., Fusiello, A. (eds.) ECCV 2020. LNCS, vol. 12539, pp. 547–601. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-68238-5_39

    Chapter  Google Scholar 

  29. Zhang, Q., Xu, Y., Zhang, J., et al.: VSA: learning varied-size window attention in vision transformers. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13685, pp. 466–483. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19806-9_27

    Chapter  Google Scholar 

  30. Xu, Y., Zhang, Q., Zhang, J., et al.: Vitae: vision transformer advanced by exploring intrinsic inductive bias. Adv. Neural. Inf. Process. Syst. 34, 28522–28535 (2021)

    Google Scholar 

  31. Zhang, Q., Xu, Y., Zhang, J., et al.: ViTAEv2: vision transformer advanced by exploring inductive bias for image recognition and beyond. Int. J. Comput. Vision 1–22 (2023)

    Google Scholar 

Download references

Acknowledgement

This work is supported by the National Natural Science Foundation of China (No. 62262026 and 62276195), the project of Jiangxi Education Department (No. GJJ211111), and the Fundamental Research Funds for the Central Universities (No. 2042023kf1033).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xin Zhou .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, CM., Yuan, C., Luo, Y., Zhou, X. (2023). MFT: Multi-scale Fusion Transformer for Infrared and Visible Image Fusion. In: Iliadis, L., Papaleonidas, A., Angelov, P., Jayne, C. (eds) Artificial Neural Networks and Machine Learning – ICANN 2023. ICANN 2023. Lecture Notes in Computer Science, vol 14259. Springer, Cham. https://doi.org/10.1007/978-3-031-44223-0_39

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-44223-0_39

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-44222-3

  • Online ISBN: 978-3-031-44223-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics