A Deep Learning Framework for Infrared and Visible Image Fusion Without Strict Registration

Li, Huafeng; Liu, Junyu; Zhang, Yafei; Liu, Yu

doi:10.1007/s11263-023-01948-x

A Deep Learning Framework for Infrared and Visible Image Fusion Without Strict Registration

Published: 30 November 2023

Volume 132, pages 1625–1644, (2024)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Huafeng Li¹^na1,
Junyu Liu¹^na1,
Yafei Zhang¹ &
…
Yu Liu ORCID: orcid.org/0000-0003-2211-3535²

856 Accesses
3 Citations
1 Altmetric
Explore all metrics

Abstract

In recent years, although significant progress has been made in infrared and visible image fusion, existing methods typically assume that the source images have been rigorously registered or aligned prior to image fusion. However, the difference in modalities of infrared and visible images poses a great challenge to achieve strict alignment automatically, affecting the quality of the subsequent fusion procedure. To address this problem, this paper proposes a deep learning framework for misaligned infrared and visible image fusion, aiming to free the fusion algorithm from strict registration. Technically, we design a convolutional neural network (CNN)-Transformer Hierarchical Interactive Embedding (CTHIE) module, which can combine the respective advantages of CNN and Transformer, to extract features from the source images. In addition, by characterizing the correlation between the features extracted from misaligned source images, a Dynamic Re-aggregation Feature Representation (DRFR) module is devised to align the features with a self-attention-based feature re-aggregation scheme. Finally, to effectively utilize the features at different levels of the network, a Fully Perceptual Forward Fusion (FPFF) module via interactive transmission of multi-modal features is introduced for feature fusion to reconstruct the fused image. Experimental results on both synthetic and real-world data demonstrate the effectiveness of the proposed method, verifying the feasibility of directly fusing infrared and visible images without strict registration.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 13

A dual-encoder network based on multi-layer feature fusion for infrared and visible image fusion

Article 04 April 2024

MSC-Fuse: An Unsupervised Multi-scale Convolutional Fusion Framework for Infrared and Visible Image

An interactive deep model combined with Retinex for low-light visible and infrared image fusion

Article 15 February 2023

Data Availibility

The datasets for this study can be found in the VOT2020-RGBT dataset https://www.votchallenge.net/vot2020/dataset.html, the KAIST dataset https://github.com/SoonminHwang/rgbt-ped-detection/blob/master/data/README.md and the CVC_14 dataset http://adas.cvc.uab.es/elektra/enigma-portfolio/cvc-14-visible-fir-day-night-pedestrian-sequence-dataset/.

Notes

References

Bulanon, D., Burks, T., & Alchanatis, V. (2009). Image fusion of visible and thermal images for fruit detection. Biosystems Engineering, 103(1), 12–22.
Article Google Scholar
Carion, N., Massa, F., Synnaeve, G., et al. (2020). End-to-end object detection with transformers. In European Conference on Computer Vision (ECCV) (pp. 213–229). Cham: Springer.
Chen, C.F., Panda, R., & Fan, Q. (2022). Regionvit: Regional-to-local attention for vision transformers. In Proceedings of the international conference on learning representations (ICLR).
Chen, H., & Varshney, P. K. (2007). A human perception inspired quality metric for image fusion based on regional information. Information Fusion, 8(2), 193–207.
Article Google Scholar
Chen, H., Wang, Y., Guo, T., et al. (2021). Pre-trained image processing transformer. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp 12,299–12,310).
Chen, Y., & Blum, R. S. (2009). A new automated quality assessment algorithm for image fusion. Image and Vision Computing, 27(10), 1421–1432.
Article Google Scholar
Dosovitskiy, A., Beyer, L., & Kolesnikov, A. (2021). An image is worth 16\(\times \)16 words: Transformers for image recognition at scale. In Proceedings of the international conference on learning representations (ICLR).
González, A., Fang, Z., Socarras, Y., et al. (2016). Pedestrian detection at day/night time with visible and fir cameras: A comparison. Sensors, 16(6), 820.
Article Google Scholar
Gu, J., Lu, H., Zuo, W. et al. (2019). Blind super-resolution with iterative kernel correction. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp 1604–1613)
Han, J., & Bhanu, B. (2007). Fusion of color and infrared video for moving human detection. Pattern Recognition, 40(6), 1771–1784.
Article Google Scholar
Han, K., Xiao, A., Wu, E., et al. (2021). Transformer in transformer. Advances in Neural Information Processing Systems, 34, 15908–15919.
Google Scholar
Hwang, S., Park, J., Kim, N., et al. (2015). Multispectral pedestrian detection: Benchmark dataset and baseline. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp 1037–1045).
Kingma, D.P., & Ba, J. (2015). Adam: A method for stochastic optimization. In International conference on learning representations (ICLR).
Kristan, M., Leonardis, A., Matas, J., et al. (2020). The eighth visual object tracking vot2020 challenge results. In European conference on computer vision (ECCV) (pp 547–601). Springer.
Kumar, P., Mittal, A., & Kumar, P. (2006). Fusion of thermal infrared and visible spectrum video for robust surveillance. In Computer vision, graphics and image processing (pp. 528–539). Springer.
Li, H., & Wu, X. (2018). Densefuse: A fusion approach to infrared and visible images. IEEE Transactions on Image Processing, 28(5), 2614–2623.
Article MathSciNet Google Scholar
Li, H., Qiu, H., Yu, Z., et al. (2016). Infrared and visible image fusion scheme based on NSCT and low-level visual features. Infrared Physics & Technology, 76, 174–184.
Article Google Scholar
Li, H., Wang, Y., Yang, Z., et al. (2020). Discriminative dictionary learning-based multiple component decomposition for detail-preserving noisy image fusion. IEEE Transactions on Instrumentation and Measurement, 69(4), 1082–1102.
Article Google Scholar
Li, H., Wu, X. J., & Durrani, T. (2020). Nestfuse: An infrared and visible image fusion architecture based on nest connection and spatial/channel attention models. IEEE Transactions on Instrumentation and Measurement, 69(12), 9645–9656.
Article Google Scholar
Li, H., Cen, Y., Liu, Y., et al. (2021). Different input resolutions and arbitrary output resolution: A meta learning-based deep framework for infrared and visible image fusion. IEEE Transactions on Image Processing, 30, 4070–4083.
Article MathSciNet Google Scholar
Li, H., Wu, J., & Kittler, J. (2021). Rfn-nest: An end-to-end residual fusion network for infrared and visible images. Information Fusion, 73, 72–86.
Article Google Scholar
Li, H., Xu, T., Wu, X., et al. (2023). Lrrnet: A novel representation learning guided fusion network for infrared and visible images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(9), 11040–11052.
Article Google Scholar
Liu, Y., Chen, X., Ward, R. K., et al. (2015). Simultaneous image fusion and denoising with adaptive sparse representation. IET Image Processing, 9(5), 347–357.
Article Google Scholar
Liu, Y., Liu, S., & Wang, Z. (2015). A general framework for image fusion based on multi-scale transform and sparse representation. Information Fusion, 24, 174–164.
Article Google Scholar
Liu, Y., Chen, X., Ward, R. K., et al. (2016). Image fusion with convolutional sparse representation. IEEE Signal Processing Letters, 23(12), 1882–1886.
Article Google Scholar
Liu, Y., Chen, X., Peng, H., et al. (2017). Multi-focus image fusion with a deep convolutional neural network. Information Fusion, 36, 191–207.
Article Google Scholar
Liu, Z., Lin, Y., Cao, Y., et al. (2021). Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision (ICCV) (pp. 10012–10022).
Ma, J., Ma, Y., & Li, C. (2019). Infrared and visible image fusion methods and applications: A survey. Information Fusion, 45, 153–178.
Article Google Scholar
Ma, J., Yu, W., Liang, P., et al. (2019). Fusiongan: A generative adversarial network for infrared and visible image fusion. Information Fusion, 48, 11–26.
Article Google Scholar
Ma, J., Xu, H., Jiang, J., et al. (2020). Ddcgan: A dual-discriminator conditional generative adversarial network for multi-resolution image fusion. IEEE Transactions on Image Processing, 29, 4980–4995.
Article Google Scholar
Ma, J., Tang, L., Fan, F., et al. (2022). Swinfusion: Cross-domain long-range learning for general image fusion via swin transformer. IEEE/CAA Journal of Automatica Sinica, 9(7), 1200–1217.
Article Google Scholar
Van der Maaten, L., & Hinton, G. (2008). Visualizing data using t-sne. Journal of machine learning research, 9(11), 2579–2605.
Google Scholar
Meinhardt, T., Kirillov, A., Leal-Taixé, L., et al. (2021). Trackformer: Multi-object tracking with transformers. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 8844–8854).
Paszke, A., Gross, S., Massa, F., et al. (2019). Pytorch: An imperative style, high-performance deep learning library. (pp. 8026–8037).
Ram Prabhakar, K., Sai Srikar, V., & Venkatesh Babu, R. (2017). Deepfuse: A deep unsupervised approach for exposure fusion with extreme exposure image pairs. In Proceedings of the IEEE international conference on computer vision (ICCV) (pp. 4714–4722).
Roberts, J. W., Van Aardt, J. A., & Ahmed, F. B. (2008). Assessment of image fusion procedures using entropy, image quality, and multispectral classification. Journal of Applied Remote Sensing, 2(1), 023522.
Article Google Scholar
Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In International conference on learning representations (ICLR).
Singh, R., Vatsa, M., & Noore, A. (2008). Integrated multilevel image fusion and match score fusion of visible and infrared face images for robust face recognition. Pattern Recognition, 41(3), 880–893.
Article Google Scholar
Srinivas, A., Lin, T.Y., Parmar, N., et al. (2021). Bottleneck transformers for visual recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 16,519–16,529)
Tang, H., Li, Z., Peng, Z., et al. (2020). Blockmix: Meta regularization and self-calibrated inference for metric-based meta-learning. In ACM Multimedia (pp. 610–618).
Tang, H., Yuan, C., Li, Z., et al. (2022). Learning attention-guided pyramidal features for few-shot fine-grained recognition. Pattern Recognition, 130(108), 792.
Google Scholar
Tang, L., Deng, Y., Ma, Y., et al. (2022). Superfusion: A versatile image registration and fusion network with semantic awareness. IEEE/CAA Journal of Automatica Sinica, 9(12), 2121–2137.
Article Google Scholar
Tang, L., Yuan, J., & Ma, J. (2022). Image fusion in the loop of high-level vision tasks: A semantic-aware real-time infrared and visible image fusion network. Information Fusion, 82, 28–42.
Article Google Scholar
Tang, W., He, F., Liu, Y., et al. (2023). Datfuse: Infrared and visible image fusion via dual attention transformer. IEEE Transactions on Circuits and Systems for Video Technology, 33(7), 3159–3172.
Article Google Scholar
Vaswani, A., Shazeer, N., Parmar, N., et al. (2017). Attention is all you need. In Advances in neural information processing systems.
Vs, V., Valanarasu, J.M.J., Oza, P., et al. (2022). Image fusion transformer. In 2022 IEEE International conference on image processing (ICIP) (pp. 3566–3570).
Wang, D., Liu, J., Fan, X., et al. (2022). Unsupervised misaligned infrared and visible image fusion via cross-modality image generation and registration. In Proceedings of the thirty-first international joint conference on artificial intelligence (IJCAI) (pp. 3508–3515).
Wang, X., Yu, K., Dong, C., et al. (2018). Recovering realistic texture in image super-resolution by deep spatial feature transform. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 606–615).
Wang, Z., Bovik, A. C., Sheikh, H. R., et al. (2004). Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4), 600–612.
Article Google Scholar
Wu, H., Xiao, B., & Codella, N. (2021). Cvt: Introducing convolutions to vision transformers. In Proceedings of the IEEE/CVF international conference on computer vision (ICCV) (pp. 22–31).
Xiao, W., Zhang, Y., Wang, H., et al. (2022). Heterogeneous knowledge distillation for simultaneous infrared-visible image fusion and super-resolution. IEEE Transactions on Instrumentation and Measurement, 71, 1–15.
Google Scholar
Xu, H., Ma, J., Jiang, J., et al. (2022). U2fusion: A unified unsupervised image fusion network. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(1), 502–518.
Article Google Scholar
Xu, H., Ma, J., Yuan, J., et al. (2022b). Rfnet: Unsupervised network for mutually reinforcing multi-modal image registration and fusion. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 19,679–19,688).
Xu, H., Yuan, J., & Ma, J. (2023). Murf: Mutually reinforcing multi-modal image registration and fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(10), 12148–12166.
Google Scholar
Xydeas, C. S., Petrovic, V., et al. (2000). Objective image fusion performance measure. Electronics Letters, 36(4), 308–309.
Article Google Scholar
Yao, Y., Zhang, Y., Wan, Y., et al. (2021). Heterologous images matching considering anisotropic weighted moment and absolute phase orientation. Geomatics and Information Science of Wuhan University, 46(11), 1727–1736.
Google Scholar
Yi, P., Wang, Z., & Jiang, K. (2021). Omniscient video super-resolution. In Proceedings of the IEEE/CVF international conference on computer vision (CVPR) (pp. 4429–4438).
Yu, C., Gao, C., Wang, J., et al. (2021). Bisenet v2: Bilateral network with guided aggregation for real-time semantic segmentation. International Journal of Computer Vision, 129, 3051–3068.
Article Google Scholar
Yuan, K., Guo, S., Liu, Z., et al. (2021). Incorporating convolution designs into visual transformers. In Proceedings of the IEEE/CVF international conference on computer vision (ICCV) (pp. 579–588)
Zhang, G., Zhang, P., Qi, J., et al. (2021). Hat: Hierarchical aggregation transformers for person re-identification. In Proceedings of the 29th ACM international conference on multimedia (ACMMM). (pp. 516–525).
Zhang, H., & Ma, J. (2021). Sdnet: A versatile squeeze-and-decomposition network for real-time image fusion. International Journal of Computer Vision, 129(10), 2761–2785.
Zhang, H., Xu, H., Xiao, Y., et al. (2020a). Rethinking the image fusion: A fast unified image fusion network based on proportional maintenance of gradient and intensity. In Proceedings of the AAAI conference on artificial intelligence (pp. 12,797–12,804).
Zhang, Y., Liu, Y., Sun, P., et al. (2020). IFCNN: A general image fusion framework based on convolutional neural network. Information Fusion, 54, 99–118.
Article Google Scholar
Zhao, Z., Xu, S., Zhang, J., et al. (2022). Efficient and model-based infrared and visible image fusion via algorithm unrolling. IEEE Transactions on Circuits and Systems for Video Technology, 32(3), 1186–1196.
Article Google Scholar
Zheng, S., Lu, J., & Zhao, H. (2021). Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 6881–6890).
Zhu, X., Su, W., & Lu, L. et al. (2021). Deformable detr: Deformable transformers for end-to-end object detection. In International conference on learning representations (ICLR).

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (No. 62161015, 62176081 and U23A20294), and Yunnan Fundamental Research Projects (No. 202301AV070004).

Author information

Huafeng Li and Junyu Liu have contributed equally to this work.

Authors and Affiliations

Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, 650500, Yunnan, China
Huafeng Li, Junyu Liu & Yafei Zhang
Department of Biomedical Engineering, Hefei University of Technology, Hefei, 230009, Anhui, China
Yu Liu

Authors

Huafeng Li
View author publications
You can also search for this author in PubMed Google Scholar
Junyu Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yafei Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yu Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yu Liu.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Communicated by Ondra Chum.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Li, H., Liu, J., Zhang, Y. et al. A Deep Learning Framework for Infrared and Visible Image Fusion Without Strict Registration. Int J Comput Vis 132, 1625–1644 (2024). https://doi.org/10.1007/s11263-023-01948-x

Download citation

Received: 02 August 2022
Accepted: 31 October 2023
Published: 30 November 2023
Issue Date: May 2024
DOI: https://doi.org/10.1007/s11263-023-01948-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Deep Learning Framework for Infrared and Visible Image Fusion Without Strict Registration

Abstract

Access this article

Similar content being viewed by others

A dual-encoder network based on multi-layer feature fusion for infrared and visible image fusion

MSC-Fuse: An Unsupervised Multi-scale Convolutional Fusion Framework for Infrared and Visible Image

An interactive deep model combined with Retinex for low-light visible and infrared image fusion

Data Availibility

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A Deep Learning Framework for Infrared and Visible Image Fusion Without Strict Registration

Abstract

Access this article

Similar content being viewed by others

A dual-encoder network based on multi-layer feature fusion for infrared and visible image fusion

MSC-Fuse: An Unsupervised Multi-scale Convolutional Fusion Framework for Infrared and Visible Image

An interactive deep model combined with Retinex for low-light visible and infrared image fusion

Data Availibility

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation