Abstract
Infrared and visible image fusion aims to generate high-quality fused images containing thermal radiation information from infrared images and texture information from visible images. Most deep learning-based methods are simple stacks of Transformer or convolution blocks and fail to further integrate the feature information of source images that may be missed in the fusion stage after generating the fused features. In this work, we develop a cross-attention-based macro framework, named Modality-Guided Transformer (MGT), that reintroduces detailed information from the two input images across multiple feature extraction layers into the initially obtained fused image. For efficiency, our MGT also introduces shared attention and multi-scale windows to reduce the computational costs of attention. Experimental results show that the proposed MGT outperforms state-of-the-art methods, especially in preserving salient targets and infrared texture details. Our code is publicly available at https://github.com/TaoYing-Zhang/MGT.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Han, J., Bhanu, B.: Fusion of color and infrared video for moving human detection. Pattern Recognit. 40(6), 1771–1784 (2007)
Cao, Y., Guan, D., Huang, W., Yang, J., Cao, Y., Qiao, Y.: Pedestrian detection with unsupervised multispectral feature learning using deep neural networks. Inf. Fusion 46, 206–217 (2019)
Cui, G., Feng, H., Xu, Z., Li, Q., Chen, Y.: Detail preserved fusion of visible and infrared images using regional saliency extraction and multi-scale image decomposition. Optics Commun. 341, 199–209 (2015)
Fu, Y., Liang, S., Chen, D., Chen, Z.: Translation of aerial image into digital map via discriminative segmentation and creative generation. IEEE Trans. Geosci. Remote Sens. 60, 1–15 (2021)
Fu, Z., Wang, X., Xu, J., Zhou, N., Zhao, Y.: Infrared and visible images fusion based on RPCA and NSCT. Infrared Phys. Technol. 77, 114–123 (2016)
Meng-Yin, F., Cheng, Z.: Fusion of infrared and visible images based on the second generation curvelet transform. J. Infrared Millimeter Waves 28(4), 254–258 (2009)
Gao, S., Cheng, Y., Zhao, Y.: Method of visual and infrared fusion for moving object detection. Opt. Lett. 38(11), 1981–1983 (2013)
Han, Y., Cai, Y., Cao, Y., Xu, X.: A new image fusion performance metric based on visual information fidelity. Inf. Fusion 14(2), 127–135 (2013)
Heo, J., Kong, S.G., Abidi, B.R., Abidi, M.A.: Fusion of visual and thermal signatures with eyeglass removal for robust face recognition. In: ICIP, pp. 122–122 (2004)
Li, C., Zhu, C., Huang, Y., Tang, J., Wang, L.: Cross-modal ranking with soft consistency and noisy labels for robust RGB-T tracking. In: ECCV (2018)
Li, H., Ding, W., Cao, X., Liu, C.: Image registration and fusion of visible and infrared integrated camera for medium-altitude unmanned aerial vehicle remote sensing. Remote Sens. 9(5), 441 (2017)
Li, H., Wu, X.J.: Densefuse: a fusion approach to infrared and visible images. IEEE Trans. Image Process. 28(5), 2614–2623 (2019)
Li, H., Wu, X.J., Kittler, J.: MDLatLRR: a novel decomposition method for infrared and visible image fusion. IEEE Trans. Image Process. 29, 4733–4746 (2020)
Li, H., Wu, X.J., Kittler, J.: RFN-Nest: an end-to-end residual fusion network for infrared and visible images. Inf. Fusion 73, 72–86 (2021)
Liang, J., Cao, J., Sun, G., Zhang, K., Van Gool, L., Timofte, R.: SwinIR: image restoration using swin transformer. In: CVPR, pp. 1833–1844 (2021)
Liu, H., Sun, F.: Fusion tracking in color and infrared images using joint sparse representation. SCIENCE CHINA Inf. Sci. 55, 590–599 (2012)
Liu, Y., Chen, X., Ward, R.K., Jane Wang, Z.: Image fusion with convolutional sparse representation. IEEE Signal Process. Lett. 23(12), 1882–1886 (2016)
Ma, J., Chen, C., Li, C., Huang, J.: Infrared and visible image fusion via gradient transfer and total variation minimization. Inf. Fusion 31, 100–109 (2016)
Ma, J., et al.: Infrared and visible image fusion via detail preserving adversarial learning. Inf. Fusion 54, 85–98 (2020)
Ma, J., Ma, Y., Li, C.: Infrared and visible image fusion methods and applications: a survey. Inf. Fusion 45, 153–178 (2019)
Ma, J., Tang, L., Fan, F., Huang, J., Mei, X., Ma, Y.: Swinfusion: cross-domain long-range learning for general image fusion via swin transformer. IEEE/CAA J. Automatica Sinica 9(7), 1200–1217 (2022)
Ma, J., Yu, W., Liang, P., Li, C., Jiang, J.: Fusiongan: a generative adversarial network for infrared and visible image fusion. Inf. Fusion 48, 11–26 (2019)
Mou, J., Gao, W., Song, Z.: Image fusion based on non-negative matrix factorization and infrared feature extraction. In: CISP, vol. 2, pp. 1046–1050 (2013)
Rao, D., Xu, T., Wu, X.J.: TGFuse: an infrared and visible image fusion approach based on transformer and generative adversarial network. IEEE Trans. Image Process. (2023)
Rao, Y.J.: In-fibre bragg grating sensors. Meas. Sci. Technol. 8(4), 355 (1997)
Smith, D., Singh, S.: Approaches to multisensor data fusion in target tracking: a survey. IEEE Trans. Knowl. Data Eng. 18(12), 1696–1710 (2006)
Tang, L., Yuan, J., Zhang, H., Jiang, X., Ma, J.: Piafusion: a progressive infrared and visible image fusion network based on illumination aware. Inf. Fusion 83–84, 79–92 (2022)
Vanmali, A.V., Gadre, V.M.: Visible and NIR image fusion using weight-map-guided laplacian-gaussian pyramid for improving scene visibility. Sādhanā 42, 1063–1082 (2017)
Vs, V., Valanarasu, J.M.J., Oza, P., Patel, V.M.: Image fusion transformer. In: ICIP, pp. 3566–3570 (2022)
Wang, Z., Chen, Y., Shao, W., Li, H., Zhang, L.: Swinfuse: a residual swin transformer fusion network for infrared and visible images. IEEE Trans. Instrum. Meas. 71, 1–12 (2022)
Wu, M., Ma, Y., Fan, F., Mei, X., Huang, J.: Infrared and visible image fusion via joint convolutional sparse representation. J. Opt. Soc. Am. A 37(7), 1105–1115 (2020)
Xiang, Y., Fu, Y., Huang, H.: Global relative position space based pooling for fine-grained vehicle recognition. Neurocomputing 367, 287–298 (2019)
Xu, H., Ma, J., Jiang, J., Guo, X., Ling, H.: U2Fusion: a unified unsupervised image fusion network. IEEE Trans. Pattern Anal. Mach. Intell. 44(1), 502–518 (2022)
Xu, H., Ma, J., Le, Z., Jiang, J., Guo, X.: Fusiondn: a unified densely connected network for image fusion. In: AAAI, vol. 34, pp. 12484–12491 (2020)
Xu, H., Zhang, H., Ma, J.: Classification saliency-based rule for visible and infrared image fusion. IEEE Trans. Comput. Imaging 7, 824–836 (2021)
Xydeas, C.S., Petrovic, V., et al.: Objective image fusion performance measure. Electron. Lett. 36(4), 308–309 (2000)
Zhang, X., Zeng, H., Guo, S., Zhang, L.: Efficient long-range attention network for image super-resolution. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13677, pp. 649–667. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19790-1_39
Zhang, Y., Liu, Y., Sun, P., Yan, H., Zhao, X., Zhang, L.: IFCNN: a general image fusion framework based on convolutional neural network. Inf. Fusion 54, 99–118 (2020)
Zheng, S., et al.: Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In: CVPR, pp. 6881–6890 (2021)
Zhou, W., Liu, J., Lei, J., Yu, L., Hwang, J.N.: GMNet: graded-feature multilabel-learning network for RGB-thermal urban scene semantic segmentation. IEEE Trans. Image Process. 30, 7790–7802 (2021)
Acknowledgement
This work was supported by the National Natural Science Foundation of China (62331006, 62171038, and 62088101), and the Fundamental Research Funds for the Central Universities.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Zhang, T., Li, H., Liu, Q., Wang, X., Fu, Y. (2024). MGT: Modality-Guided Transformer for Infrared and Visible Image Fusion. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14425. Springer, Singapore. https://doi.org/10.1007/978-981-99-8429-9_26
Download citation
DOI: https://doi.org/10.1007/978-981-99-8429-9_26
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8428-2
Online ISBN: 978-981-99-8429-9
eBook Packages: Computer ScienceComputer Science (R0)