MGT: Modality-Guided Transformer for Infrared and Visible Image Fusion

Zhang, Taoying; Li, Hesong; Liu, Qiankun; Wang, Xiaoyong; Fu, Ying

doi:10.1007/978-981-99-8429-9_26

Taoying Zhang¹⁵,
Hesong Li¹⁵,
Qiankun Liu¹⁵,
Xiaoyong Wang¹⁵ &
…
Ying Fu^15,16

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14425))

Included in the following conference series:

Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

997 Accesses

Abstract

Infrared and visible image fusion aims to generate high-quality fused images containing thermal radiation information from infrared images and texture information from visible images. Most deep learning-based methods are simple stacks of Transformer or convolution blocks and fail to further integrate the feature information of source images that may be missed in the fusion stage after generating the fused features. In this work, we develop a cross-attention-based macro framework, named Modality-Guided Transformer (MGT), that reintroduces detailed information from the two input images across multiple feature extraction layers into the initially obtained fused image. For efficiency, our MGT also introduces shared attention and multi-scale windows to reduce the computational costs of attention. Experimental results show that the proposed MGT outperforms state-of-the-art methods, especially in preserving salient targets and infrared texture details. Our code is publicly available at https://github.com/TaoYing-Zhang/MGT.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Han, J., Bhanu, B.: Fusion of color and infrared video for moving human detection. Pattern Recognit. 40(6), 1771–1784 (2007)
Article Google Scholar
Cao, Y., Guan, D., Huang, W., Yang, J., Cao, Y., Qiao, Y.: Pedestrian detection with unsupervised multispectral feature learning using deep neural networks. Inf. Fusion 46, 206–217 (2019)
Article Google Scholar
Cui, G., Feng, H., Xu, Z., Li, Q., Chen, Y.: Detail preserved fusion of visible and infrared images using regional saliency extraction and multi-scale image decomposition. Optics Commun. 341, 199–209 (2015)
Article Google Scholar
Fu, Y., Liang, S., Chen, D., Chen, Z.: Translation of aerial image into digital map via discriminative segmentation and creative generation. IEEE Trans. Geosci. Remote Sens. 60, 1–15 (2021)
Google Scholar
Fu, Z., Wang, X., Xu, J., Zhou, N., Zhao, Y.: Infrared and visible images fusion based on RPCA and NSCT. Infrared Phys. Technol. 77, 114–123 (2016)
Article Google Scholar
Meng-Yin, F., Cheng, Z.: Fusion of infrared and visible images based on the second generation curvelet transform. J. Infrared Millimeter Waves 28(4), 254–258 (2009)
Article Google Scholar
Gao, S., Cheng, Y., Zhao, Y.: Method of visual and infrared fusion for moving object detection. Opt. Lett. 38(11), 1981–1983 (2013)
Article Google Scholar
Han, Y., Cai, Y., Cao, Y., Xu, X.: A new image fusion performance metric based on visual information fidelity. Inf. Fusion 14(2), 127–135 (2013)
Article Google Scholar
Heo, J., Kong, S.G., Abidi, B.R., Abidi, M.A.: Fusion of visual and thermal signatures with eyeglass removal for robust face recognition. In: ICIP, pp. 122–122 (2004)
Google Scholar
Li, C., Zhu, C., Huang, Y., Tang, J., Wang, L.: Cross-modal ranking with soft consistency and noisy labels for robust RGB-T tracking. In: ECCV (2018)
Google Scholar
Li, H., Ding, W., Cao, X., Liu, C.: Image registration and fusion of visible and infrared integrated camera for medium-altitude unmanned aerial vehicle remote sensing. Remote Sens. 9(5), 441 (2017)
Article Google Scholar
Li, H., Wu, X.J.: Densefuse: a fusion approach to infrared and visible images. IEEE Trans. Image Process. 28(5), 2614–2623 (2019)
Article MathSciNet Google Scholar
Li, H., Wu, X.J., Kittler, J.: MDLatLRR: a novel decomposition method for infrared and visible image fusion. IEEE Trans. Image Process. 29, 4733–4746 (2020)
Article Google Scholar
Li, H., Wu, X.J., Kittler, J.: RFN-Nest: an end-to-end residual fusion network for infrared and visible images. Inf. Fusion 73, 72–86 (2021)
Article Google Scholar
Liang, J., Cao, J., Sun, G., Zhang, K., Van Gool, L., Timofte, R.: SwinIR: image restoration using swin transformer. In: CVPR, pp. 1833–1844 (2021)
Google Scholar
Liu, H., Sun, F.: Fusion tracking in color and infrared images using joint sparse representation. SCIENCE CHINA Inf. Sci. 55, 590–599 (2012)
Article MathSciNet Google Scholar
Liu, Y., Chen, X., Ward, R.K., Jane Wang, Z.: Image fusion with convolutional sparse representation. IEEE Signal Process. Lett. 23(12), 1882–1886 (2016)
Article Google Scholar
Ma, J., Chen, C., Li, C., Huang, J.: Infrared and visible image fusion via gradient transfer and total variation minimization. Inf. Fusion 31, 100–109 (2016)
Article Google Scholar
Ma, J., et al.: Infrared and visible image fusion via detail preserving adversarial learning. Inf. Fusion 54, 85–98 (2020)
Article Google Scholar
Ma, J., Ma, Y., Li, C.: Infrared and visible image fusion methods and applications: a survey. Inf. Fusion 45, 153–178 (2019)
Article Google Scholar
Ma, J., Tang, L., Fan, F., Huang, J., Mei, X., Ma, Y.: Swinfusion: cross-domain long-range learning for general image fusion via swin transformer. IEEE/CAA J. Automatica Sinica 9(7), 1200–1217 (2022)
Article Google Scholar
Ma, J., Yu, W., Liang, P., Li, C., Jiang, J.: Fusiongan: a generative adversarial network for infrared and visible image fusion. Inf. Fusion 48, 11–26 (2019)
Article Google Scholar
Mou, J., Gao, W., Song, Z.: Image fusion based on non-negative matrix factorization and infrared feature extraction. In: CISP, vol. 2, pp. 1046–1050 (2013)
Google Scholar
Rao, D., Xu, T., Wu, X.J.: TGFuse: an infrared and visible image fusion approach based on transformer and generative adversarial network. IEEE Trans. Image Process. (2023)
Google Scholar
Rao, Y.J.: In-fibre bragg grating sensors. Meas. Sci. Technol. 8(4), 355 (1997)
Article Google Scholar
Smith, D., Singh, S.: Approaches to multisensor data fusion in target tracking: a survey. IEEE Trans. Knowl. Data Eng. 18(12), 1696–1710 (2006)
Article Google Scholar
Tang, L., Yuan, J., Zhang, H., Jiang, X., Ma, J.: Piafusion: a progressive infrared and visible image fusion network based on illumination aware. Inf. Fusion 83–84, 79–92 (2022)
Article Google Scholar
Vanmali, A.V., Gadre, V.M.: Visible and NIR image fusion using weight-map-guided laplacian-gaussian pyramid for improving scene visibility. Sādhanā 42, 1063–1082 (2017)
Article Google Scholar
Vs, V., Valanarasu, J.M.J., Oza, P., Patel, V.M.: Image fusion transformer. In: ICIP, pp. 3566–3570 (2022)
Google Scholar
Wang, Z., Chen, Y., Shao, W., Li, H., Zhang, L.: Swinfuse: a residual swin transformer fusion network for infrared and visible images. IEEE Trans. Instrum. Meas. 71, 1–12 (2022)
Article Google Scholar
Wu, M., Ma, Y., Fan, F., Mei, X., Huang, J.: Infrared and visible image fusion via joint convolutional sparse representation. J. Opt. Soc. Am. A 37(7), 1105–1115 (2020)
Article Google Scholar
Xiang, Y., Fu, Y., Huang, H.: Global relative position space based pooling for fine-grained vehicle recognition. Neurocomputing 367, 287–298 (2019)
Article Google Scholar
Xu, H., Ma, J., Jiang, J., Guo, X., Ling, H.: U2Fusion: a unified unsupervised image fusion network. IEEE Trans. Pattern Anal. Mach. Intell. 44(1), 502–518 (2022)
Article Google Scholar
Xu, H., Ma, J., Le, Z., Jiang, J., Guo, X.: Fusiondn: a unified densely connected network for image fusion. In: AAAI, vol. 34, pp. 12484–12491 (2020)
Google Scholar
Xu, H., Zhang, H., Ma, J.: Classification saliency-based rule for visible and infrared image fusion. IEEE Trans. Comput. Imaging 7, 824–836 (2021)
Article MathSciNet Google Scholar
Xydeas, C.S., Petrovic, V., et al.: Objective image fusion performance measure. Electron. Lett. 36(4), 308–309 (2000)
Article Google Scholar
Zhang, X., Zeng, H., Guo, S., Zhang, L.: Efficient long-range attention network for image super-resolution. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13677, pp. 649–667. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19790-1_39
Chapter Google Scholar
Zhang, Y., Liu, Y., Sun, P., Yan, H., Zhao, X., Zhang, L.: IFCNN: a general image fusion framework based on convolutional neural network. Inf. Fusion 54, 99–118 (2020)
Article Google Scholar
Zheng, S., et al.: Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In: CVPR, pp. 6881–6890 (2021)
Google Scholar
Zhou, W., Liu, J., Lei, J., Yu, L., Hwang, J.N.: GMNet: graded-feature multilabel-learning network for RGB-thermal urban scene semantic segmentation. IEEE Trans. Image Process. 30, 7790–7802 (2021)
Article Google Scholar

Download references

Acknowledgement

This work was supported by the National Natural Science Foundation of China (62331006, 62171038, and 62088101), and the Fundamental Research Funds for the Central Universities.

Author information

Authors and Affiliations

MIIT Key Laboratory of Complex-Field Intelligent Sensing, Beijing Institute of Technology, Beijing, 100081, China
Taoying Zhang, Hesong Li, Qiankun Liu, Xiaoyong Wang & Ying Fu
Yangtze Delta Region Academy of Beijing Institute of Technology, Jiaxing, 314019, China
Ying Fu

Authors

Taoying Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Hesong Li
View author publications
You can also search for this author in PubMed Google Scholar
Qiankun Liu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoyong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Ying Fu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaoyong Wang .

Editor information

Editors and Affiliations

Nanjing University of Information Science and Technology, Nanjing, China
Qingshan Liu
Xiamen University, Xiamen, China
Hanzi Wang
Beijing University of Posts and Telecommunications, Beijing, China
Zhanyu Ma
Sun Yat-sen University, Guangzhou, China
Weishi Zheng
Peking University, Beijing, China
Hongbin Zha
Chinese Academy of Sciences, Beijing, China
Xilin Chen
Chinese Academy of Sciences, Beijing, China
Liang Wang
Xiamen University, Xiamen, China
Rongrong Ji

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, T., Li, H., Liu, Q., Wang, X., Fu, Y. (2024). MGT: Modality-Guided Transformer for Infrared and Visible Image Fusion. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14425. Springer, Singapore. https://doi.org/10.1007/978-981-99-8429-9_26

Download citation

DOI: https://doi.org/10.1007/978-981-99-8429-9_26
Published: 24 December 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8428-2
Online ISBN: 978-981-99-8429-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

MGT: Modality-Guided Transformer for Infrared and Visible Image Fusion