Skip to main content

MGT: Modality-Guided Transformer for Infrared and Visible Image Fusion

  • Conference paper
  • First Online:
Pattern Recognition and Computer Vision (PRCV 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14425))

Included in the following conference series:

  • 997 Accesses

Abstract

Infrared and visible image fusion aims to generate high-quality fused images containing thermal radiation information from infrared images and texture information from visible images. Most deep learning-based methods are simple stacks of Transformer or convolution blocks and fail to further integrate the feature information of source images that may be missed in the fusion stage after generating the fused features. In this work, we develop a cross-attention-based macro framework, named Modality-Guided Transformer (MGT), that reintroduces detailed information from the two input images across multiple feature extraction layers into the initially obtained fused image. For efficiency, our MGT also introduces shared attention and multi-scale windows to reduce the computational costs of attention. Experimental results show that the proposed MGT outperforms state-of-the-art methods, especially in preserving salient targets and infrared texture details. Our code is publicly available at https://github.com/TaoYing-Zhang/MGT.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Han, J., Bhanu, B.: Fusion of color and infrared video for moving human detection. Pattern Recognit. 40(6), 1771–1784 (2007)

    Article  Google Scholar 

  2. Cao, Y., Guan, D., Huang, W., Yang, J., Cao, Y., Qiao, Y.: Pedestrian detection with unsupervised multispectral feature learning using deep neural networks. Inf. Fusion 46, 206–217 (2019)

    Article  Google Scholar 

  3. Cui, G., Feng, H., Xu, Z., Li, Q., Chen, Y.: Detail preserved fusion of visible and infrared images using regional saliency extraction and multi-scale image decomposition. Optics Commun. 341, 199–209 (2015)

    Article  Google Scholar 

  4. Fu, Y., Liang, S., Chen, D., Chen, Z.: Translation of aerial image into digital map via discriminative segmentation and creative generation. IEEE Trans. Geosci. Remote Sens. 60, 1–15 (2021)

    Google Scholar 

  5. Fu, Z., Wang, X., Xu, J., Zhou, N., Zhao, Y.: Infrared and visible images fusion based on RPCA and NSCT. Infrared Phys. Technol. 77, 114–123 (2016)

    Article  Google Scholar 

  6. Meng-Yin, F., Cheng, Z.: Fusion of infrared and visible images based on the second generation curvelet transform. J. Infrared Millimeter Waves 28(4), 254–258 (2009)

    Article  Google Scholar 

  7. Gao, S., Cheng, Y., Zhao, Y.: Method of visual and infrared fusion for moving object detection. Opt. Lett. 38(11), 1981–1983 (2013)

    Article  Google Scholar 

  8. Han, Y., Cai, Y., Cao, Y., Xu, X.: A new image fusion performance metric based on visual information fidelity. Inf. Fusion 14(2), 127–135 (2013)

    Article  Google Scholar 

  9. Heo, J., Kong, S.G., Abidi, B.R., Abidi, M.A.: Fusion of visual and thermal signatures with eyeglass removal for robust face recognition. In: ICIP, pp. 122–122 (2004)

    Google Scholar 

  10. Li, C., Zhu, C., Huang, Y., Tang, J., Wang, L.: Cross-modal ranking with soft consistency and noisy labels for robust RGB-T tracking. In: ECCV (2018)

    Google Scholar 

  11. Li, H., Ding, W., Cao, X., Liu, C.: Image registration and fusion of visible and infrared integrated camera for medium-altitude unmanned aerial vehicle remote sensing. Remote Sens. 9(5), 441 (2017)

    Article  Google Scholar 

  12. Li, H., Wu, X.J.: Densefuse: a fusion approach to infrared and visible images. IEEE Trans. Image Process. 28(5), 2614–2623 (2019)

    Article  MathSciNet  Google Scholar 

  13. Li, H., Wu, X.J., Kittler, J.: MDLatLRR: a novel decomposition method for infrared and visible image fusion. IEEE Trans. Image Process. 29, 4733–4746 (2020)

    Article  Google Scholar 

  14. Li, H., Wu, X.J., Kittler, J.: RFN-Nest: an end-to-end residual fusion network for infrared and visible images. Inf. Fusion 73, 72–86 (2021)

    Article  Google Scholar 

  15. Liang, J., Cao, J., Sun, G., Zhang, K., Van Gool, L., Timofte, R.: SwinIR: image restoration using swin transformer. In: CVPR, pp. 1833–1844 (2021)

    Google Scholar 

  16. Liu, H., Sun, F.: Fusion tracking in color and infrared images using joint sparse representation. SCIENCE CHINA Inf. Sci. 55, 590–599 (2012)

    Article  MathSciNet  Google Scholar 

  17. Liu, Y., Chen, X., Ward, R.K., Jane Wang, Z.: Image fusion with convolutional sparse representation. IEEE Signal Process. Lett. 23(12), 1882–1886 (2016)

    Article  Google Scholar 

  18. Ma, J., Chen, C., Li, C., Huang, J.: Infrared and visible image fusion via gradient transfer and total variation minimization. Inf. Fusion 31, 100–109 (2016)

    Article  Google Scholar 

  19. Ma, J., et al.: Infrared and visible image fusion via detail preserving adversarial learning. Inf. Fusion 54, 85–98 (2020)

    Article  Google Scholar 

  20. Ma, J., Ma, Y., Li, C.: Infrared and visible image fusion methods and applications: a survey. Inf. Fusion 45, 153–178 (2019)

    Article  Google Scholar 

  21. Ma, J., Tang, L., Fan, F., Huang, J., Mei, X., Ma, Y.: Swinfusion: cross-domain long-range learning for general image fusion via swin transformer. IEEE/CAA J. Automatica Sinica 9(7), 1200–1217 (2022)

    Article  Google Scholar 

  22. Ma, J., Yu, W., Liang, P., Li, C., Jiang, J.: Fusiongan: a generative adversarial network for infrared and visible image fusion. Inf. Fusion 48, 11–26 (2019)

    Article  Google Scholar 

  23. Mou, J., Gao, W., Song, Z.: Image fusion based on non-negative matrix factorization and infrared feature extraction. In: CISP, vol. 2, pp. 1046–1050 (2013)

    Google Scholar 

  24. Rao, D., Xu, T., Wu, X.J.: TGFuse: an infrared and visible image fusion approach based on transformer and generative adversarial network. IEEE Trans. Image Process. (2023)

    Google Scholar 

  25. Rao, Y.J.: In-fibre bragg grating sensors. Meas. Sci. Technol. 8(4), 355 (1997)

    Article  Google Scholar 

  26. Smith, D., Singh, S.: Approaches to multisensor data fusion in target tracking: a survey. IEEE Trans. Knowl. Data Eng. 18(12), 1696–1710 (2006)

    Article  Google Scholar 

  27. Tang, L., Yuan, J., Zhang, H., Jiang, X., Ma, J.: Piafusion: a progressive infrared and visible image fusion network based on illumination aware. Inf. Fusion 83–84, 79–92 (2022)

    Article  Google Scholar 

  28. Vanmali, A.V., Gadre, V.M.: Visible and NIR image fusion using weight-map-guided laplacian-gaussian pyramid for improving scene visibility. Sādhanā 42, 1063–1082 (2017)

    Article  Google Scholar 

  29. Vs, V., Valanarasu, J.M.J., Oza, P., Patel, V.M.: Image fusion transformer. In: ICIP, pp. 3566–3570 (2022)

    Google Scholar 

  30. Wang, Z., Chen, Y., Shao, W., Li, H., Zhang, L.: Swinfuse: a residual swin transformer fusion network for infrared and visible images. IEEE Trans. Instrum. Meas. 71, 1–12 (2022)

    Article  Google Scholar 

  31. Wu, M., Ma, Y., Fan, F., Mei, X., Huang, J.: Infrared and visible image fusion via joint convolutional sparse representation. J. Opt. Soc. Am. A 37(7), 1105–1115 (2020)

    Article  Google Scholar 

  32. Xiang, Y., Fu, Y., Huang, H.: Global relative position space based pooling for fine-grained vehicle recognition. Neurocomputing 367, 287–298 (2019)

    Article  Google Scholar 

  33. Xu, H., Ma, J., Jiang, J., Guo, X., Ling, H.: U2Fusion: a unified unsupervised image fusion network. IEEE Trans. Pattern Anal. Mach. Intell. 44(1), 502–518 (2022)

    Article  Google Scholar 

  34. Xu, H., Ma, J., Le, Z., Jiang, J., Guo, X.: Fusiondn: a unified densely connected network for image fusion. In: AAAI, vol. 34, pp. 12484–12491 (2020)

    Google Scholar 

  35. Xu, H., Zhang, H., Ma, J.: Classification saliency-based rule for visible and infrared image fusion. IEEE Trans. Comput. Imaging 7, 824–836 (2021)

    Article  MathSciNet  Google Scholar 

  36. Xydeas, C.S., Petrovic, V., et al.: Objective image fusion performance measure. Electron. Lett. 36(4), 308–309 (2000)

    Article  Google Scholar 

  37. Zhang, X., Zeng, H., Guo, S., Zhang, L.: Efficient long-range attention network for image super-resolution. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13677, pp. 649–667. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19790-1_39

    Chapter  Google Scholar 

  38. Zhang, Y., Liu, Y., Sun, P., Yan, H., Zhao, X., Zhang, L.: IFCNN: a general image fusion framework based on convolutional neural network. Inf. Fusion 54, 99–118 (2020)

    Article  Google Scholar 

  39. Zheng, S., et al.: Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In: CVPR, pp. 6881–6890 (2021)

    Google Scholar 

  40. Zhou, W., Liu, J., Lei, J., Yu, L., Hwang, J.N.: GMNet: graded-feature multilabel-learning network for RGB-thermal urban scene semantic segmentation. IEEE Trans. Image Process. 30, 7790–7802 (2021)

    Article  Google Scholar 

Download references

Acknowledgement

This work was supported by the National Natural Science Foundation of China (62331006, 62171038, and 62088101), and the Fundamental Research Funds for the Central Universities.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaoyong Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, T., Li, H., Liu, Q., Wang, X., Fu, Y. (2024). MGT: Modality-Guided Transformer for Infrared and Visible Image Fusion. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14425. Springer, Singapore. https://doi.org/10.1007/978-981-99-8429-9_26

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-8429-9_26

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-8428-2

  • Online ISBN: 978-981-99-8429-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics