Abstract
Infrared and visible image fusion (IVIF) is a widely used technique in instrument-related fields. It aims at extracting contrast information from the infrared image and texture details from the visible image and combining these two kinds of information into a single image. Most auto-encoder-based methods train the network on natural images, such as MS-COCO, and test the model on IVIF datasets. This kind of method suffers from domain shift issues and cannot generalize well in real-world scenarios. To this end, we propose a self-supervised test-time training (TTT) approach to facilitate learning a better fusion result. Specifically, a new self-supervised loss is developed to evaluate the quality of the fusion result. This loss function directs the network to improve the fusion quality by optimizing model parameters with a small number of iterations in the test time. Besides, instead of manually designing fusion strategies, we leverage a fusion adapter to automatically learn fusion rules. Experimental comparisons on two public IVIF datasets validate that the proposed method outperforms existing methods subjectively and objectively.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aslantas, V., Bendes, E.: A new image quality metric for image fusion: the sum of the correlations of differences. Aeu-Inter. J. Electr. Commun. 69(12), 1890–1896 (2015)
Das, S., Zhang, Y.: Color night vision for navigation and surveillance. Transp. Res. Rec. 1708(1), 40–46 (2000)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. Ieee (2009)
Gandelsman, Y., Sun, Y., Chen, X., Efros, A.: Test-time training with masked autoencoders. Adv. Neural. Inf. Process. Syst. 35, 29374–29385 (2022)
Gao, Y., Ma, S., Liu, J.: Dcdr-gan: a densely connected disentangled representation generative adversarial network for infrared and visible image fusion. IEEE Trans. Circ. Syst. Video Technol. (2022)
Li, H., Wu, X.J.: Densefuse: a fusion approach to infrared and visible images. IEEE Trans. Image Process. 28(5), 2614–2623 (2018)
Li, H., Wu, X.J., Durrani, T.: Nestfuse: an infrared and visible image fusion architecture based on nest connection and spatial/channel attention models. IEEE Trans. Instrum. Meas. 69(12), 9645–9656 (2020)
Li, H., Wu, X.J., Kittler, J.: Infrared and visible image fusion using a deep learning framework. In: 2018 24th International Conference On Pattern Recognition (ICPR), pp. 2705–2710. IEEE (2018)
Li, H., Wu, X.J., Kittler, J.: Mdlatlrr: a novel decomposition method for infrared and visible image fusion. IEEE Trans. Image Process. 29, 4733–4746 (2020)
Li, Q., et al.: A multilevel hybrid transmission network for infrared and visible image fusion. IEEE Trans. Instrum. Meas. 71, 1–14 (2022)
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Lin, X., Zhou, G., Tu, X., Huang, Y., Ding, X.: Two-level consistency metric for infrared and visible image fusion. IEEE Trans. Instrum. Meas. 71, 1–13 (2022)
Liu, G., Lin, Z., Yan, S., Sun, J., Yu, Y., Ma, Y.: Robust recovery of subspace structures by low-rank representation. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 171–184 (2012)
Liu, G., Lin, Z., Yu, Y.: Robust subspace segmentation by low-rank representation. In: Proceedings of the 27th International Conference on Machine Learning (ICML 2010), pp. 663–670 (2010)
Liu, H., Wu, Z., Li, L., Salehkalaibar, S., Chen, J., Wang, K.: Towards multi-domain single image dehazing via test-time training. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5831–5840 (2022)
Ma, J., Chen, C., Li, C., Huang, J.: Infrared and visible image fusion via gradient transfer and total variation minimization. Inform. Fusion 31, 100–109 (2016)
Ma, J., Xu, H., Jiang, J., Mei, X., Zhang, X.P.: Ddcgan: a dual-discriminator conditional generative adversarial network for multi-resolution image fusion. IEEE Trans. Image Process. 29, 4980–4995 (2020)
Van der Maaten, L., Hinton, G.: Visualizing data using t-sne. J. Mach. Learn. Res. 9(11) (2008)
Piella, G.: A general framework for multiresolution image fusion: from pixels to regions. Inform. Fusion 4(4), 259–280 (2003)
Roberts, J.W., Van Aardt, J.A., Ahmed, F.B.: Assessment of image fusion procedures using entropy, image quality, and multispectral classification. J. Appl. Remote Sens. 2(1), 023522 (2008)
Sheikh, H.R., Bovik, A.C.: Image information and visual quality. IEEE Trans. Image Process. 15(2), 430–444 (2006)
Sun, Y., Wang, X., Liu, Z., Miller, J., Efros, A., Hardt, M.: Test-time training with self-supervision for generalization under distribution shifts. In: International Conference on Machine Learning, pp. 9229–9248. PMLR (2020)
Tang, W., He, F., Liu, Y.: Ydtr: infrared and visible image fusion via y-shape dynamic transformer. IEEE Trans. Multimedia (2022)
Toet, A.: The tno multiband image data collection. Data Brief 15, 249–251 (2017)
Vishwakarma, A.: Image fusion using adjustable non-subsampled shearlet transform. IEEE Trans. Instrum. Meas. 68(9), 3367–3378 (2018)
Wang, Z., Wu, Y., Wang, J., Xu, J., Shao, W.: Res2fusion: infrared and visible image fusion based on dense res2net and double nonlocal attention models. IEEE Trans. Instrum. Meas. 71, 1–12 (2022)
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)
Wang, Z., Simoncelli, E.P., Bovik, A.C.: Multiscale structural similarity for image quality assessment. In: The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers 2003, vol. 2, pp. 1398–1402. IEEE (2003)
Wright, J., Yang, A.Y., Ganesh, A., Sastry, S.S., Ma, Y.: Robust face recognition via sparse representation. IEEE Trans. Pattern Anal. Mach. Intell. 31(2), 210–227 (2008)
Xu, H., Ma, J., Jiang, J., Guo, X., Ling, H.: U2fusion: a unified unsupervised image fusion network. IEEE Trans. Pattern Anal. Mach. Intell. 44(1), 502–518 (2020)
Zhang, Q., Fu, Y., Li, H., Zou, J.: Dictionary learning method for joint sparse representation-based image fusion. Opt. Eng. 52(5), 057006–057006 (2013)
Zhang, X., Demiris, Y.: Visible and infrared image fusion using deep learning. IEEE Trans. Pattern Anal. Mach. Intell. (2023)
Zhang, X., Ye, P., Xiao, G.: Vifb: a visible and infrared image fusion benchmark. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 104–105 (2020)
Acknowledgements
The work was supported in part by the National Natural Science Foundation of China under Grant 82172033, U19B2031, 61971369, 52105126, 82272071, 62271430, and the Fundamental Research Funds for the Central Universities 20720230104.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Zheng, G., Fu, Z., Lin, X., Chu, X., Huang, Y., Ding, X. (2024). Infrared and Visible Image Fusion via Test-Time Training. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14434. Springer, Singapore. https://doi.org/10.1007/978-981-99-8549-4_7
Download citation
DOI: https://doi.org/10.1007/978-981-99-8549-4_7
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8548-7
Online ISBN: 978-981-99-8549-4
eBook Packages: Computer ScienceComputer Science (R0)