Abstract
Recent advances in deep networks have gained great attention in infrared and visible image fusion (IVIF). Nevertheless, most existing methods are incapable of dealing with slight misalignment on source images and suffer from high computational and spatial expenses. This paper tackles these two critical issues rarely touched in the community by developing a recurrent correction network for robust and efficient fusion, namely ReCoNet. Concretely, we design a deformation module to explicitly compensate geometrical distortions and an attention mechanism to mitigate ghosting-like artifacts, respectively. Meanwhile, the network consists of a parallel dilated convolutional layer and runs in a recurrent fashion, significantly reducing both spatial and computational complexities. ReCoNet can effectively and efficiently alleviates both structural distortions and textural artifacts brought by slight misalignment. Extensive experiments on two public datasets demonstrate the superior accuracy and efficacy of our ReCoNet against the state-of-the-art IVIF methods. Consequently, we obtain a \(16\%\) relative improvement of CC on datasets with misalignment and boost the efficiency by \(86\%\). The source code is available at https://github.com/dlut-dimt/reconet.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: YOLOv4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020)
Brasó, G., Leal-Taixé, L.: Learning a neural solver for multiple object tracking. In: IEEE CVPR, pp. 6247–6257 (2020)
Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Du, Q., Xu, H., Ma, Y., Huang, J., Fan, F.: Fusing infrared and visible images of different resolutions via total variation model. Sensors 18(11), 3827 (2018)
Fu, J., et al.: Dual attention network for scene segmentation. In: CVPR, pp. 3146–3154 (2019)
Gao, H., Cheng, B., Wang, J., Li, K., Zhao, J., Li, D.: Object classification using CNN-based fusion of vision and lidar in autonomous vehicle environment. IEEE Trans. Ind. Informat. 14(9), 4224–4231 (2018)
Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the KITTI dataset. Int. J. Rob. Res. 32(11), 1231–1237 (2013)
Godard, C., Mac Aodha, O., Brostow, G.J.: Unsupervised monocular depth estimation with left-right consistency. In: CVPR (2017)
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)
Jiang, Z., Li, Z., Yang, S., Fan, X., Liu, R.: Target oriented perceptual adversarial fusion network for underwater image enhancement. IEEE Trans. Circ. Syst. Video Technol. 32, 6584– 6598 (2022)
Kristan, M., et al.: The visual object tracking vot2017 challenge results. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 1949–1972 (2017)
Lan, X., et al.: Learning modality-consistency feature templates: a robust RGB-infrared tracking system. IEEE Tran. Ind. Enformat. 66(12), 9887–9897 (2019)
Li, H., Wu, X.J.: DenseFuse: a fusion approach to infrared and visible images. IEEE Trans. Image Process. 28(5), 2614–2623 (2018)
Li, H., Wu, X.J., Kittler, J.: RFN-nest: an end-to-end residual fusion network for infrared and visible images. Inf. Fus. 73, 72–86 (2021)
Li, J., Huo, H., Li, C., Wang, R., Feng, Q.: AttentionfGAN: Infrared and visible image fusion using attention-based generative adversarial networks. IEEE Trans. Multimedia 23, 1383–1396 (2020)
Li, P.: Didfuse: deep image decomposition for infrared and visible image fusion. In: Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence, pp. 976–976 (2021)
Li, S., Kang, X., Hu, J.: Image fusion with guided filtering. IEEE Trans. Image Process. 22(7), 2864–2875 (2013)
Liu, J., et al.: Target-aware dual adversarial learning and a multi-scenario multi-modality benchmark to fuse infrared and visible for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5802–5811 (2022)
Liu, J., Fan, X., Jiang, J., Liu, R., Luo, Z.: Learning a deep multi-scale feature ensemble and an edge-attention guidance for image fusion. In: IEEE TCSVT (2021)
Liu, J., Shang, J., Liu, R., Fan, X.: Attention-guided global-local adversarial learning for detail-preserving multi-exposure image fusion. IEEE Trans. Circ. Syst. Video Technol. 32, 5026–5040 (2022). https://doi.org/10.1109/TCSVT.2022.3144455
Liu, J., Wu, Y., Huang, Z., Liu, R., Fan, X.: SMOA: searching a modality-oriented architecture for infrared and visible image fusion. IEEE Signal Process. Lett. 28, 1818–1822 (2021)
Liu, R., Liu, J., Jiang, Z., Fan, X., Luo, Z.: A bilevel integrated model with data-driven layer ensemble for multi-modality image fusion. IEEE Trans. Image Process. 30, 1261–1274 (2021). https://doi.org/10.1109/TIP.2020.3043125
Liu, R., Liu, Z., Liu, J., Fan, X.: Searching a hierarchically aggregated fusion architecture for fast multi-modality image fusion. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 1600–1608 (2021)
Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Ma, J., Chen, C., Li, C., Huang, J.: Infrared and visible image fusion via gradient transfer and total variation minimization. Inf. Fus. 31, 100–109 (2016)
Ma, J., Yu, W., Liang, P., Li, C., Jiang, J.: FusionGAN: a generative adversarial network for infrared and visible image fusion. Inf. Fus. 48, 11–26 (2019)
Ma, J., Zhang, H., Shao, Z., Liang, P., Xu, H.: GANMcC:: a generative adversarial network with multiclassification constraints for infrared and visible image fusion. IEEE Trans. Instrum Meaure. 70, 1–14 (2020)
Ma, J., Zhou, Z., Wang, B., Zong, H.: Infrared and visible image fusion based on visual saliency map and weighted least square optimization. Infr. Phys. Technol. 82, 8–17 (2017)
Nencini, F., Garzelli, A., Baronti, S., Alparone, L.: Remote sensing image fusion using the curvelet transform. Inf. Fus. 8(2), 143–156 (2007)
Palsson, F., Sveinsson, J.R., Ulfarsson, M.O.: Multispectral and hyperspectral image fusion using a 3-d-convolutional neural network. IEEE Geosci. Remote Sens. Lett. 14(5), 639–643 (2017)
Paramanandham, N., Rajendiran, K.: Infrared and visible image fusion using discrete cosine transform and swarm intelligence for surveillance applications. Infrar. Phys. Technol. 88, 13–22 (2018)
Paszke, A., et al.: An imperative style, high-performance deep learning library. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 32, pp. 8024–8035. Curran Associates, Inc. (2019), http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf
Pu, M., Huang, Y., Guan, Q., Zou, Q.: GraphNet: learning image pseudo annotations for weakly-supervised semantaic segmentation. In: ACM MM, pp. 483–491. ACM (2018)
Qin, X., Zhang, Z., Huang, C., Dehghan, M., Zaiane, O., Jagersand, M.: U2-Net: going deeper with nested u-structure for salient object detection, vol. 106, p. 107404 (2020)
Ranftl, R., Lasinger, K., Hafner, D., Schindler, K., Koltun, V.: Towards robust monocular depth estimation: mixing datasets for zero-shot cross-dataset transfer. IEEE Trans. Pattern Anal. Mach. Intell. 44, 1623–1637 (2020)
Shreyamsha Kumar, B.: Image fusion based on pixel significance using cross bilateral filter. Sig. Image Video. Process. 9(5), 1193–1204 (2015)
Toet, A.: The tno multiband image data collection. Data Brief 15, 249 (2017)
Wang, D., Liu, J., Fan, X., Liu, R.: Unsupervised misaligned infrared and visible image fusion via cross-modality image generation and registration. arXiv preprint arXiv:2205.11876 (2022)
Wang, L., Zhang, J., Wang, Y., Lu, H., Ruan, X.: CLIFFNet for monocular depth estimation with hierarchical embedding Loss. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12350, pp. 316–331. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58558-7_19
Xiao, Y., Codevilla, F., Gurram, A., Urfalioglu, O., López, A.M.: Multimodal end-to-end autonomous driving. IEEE Trans. Intell. Trans. Syst. 23, 537–547 (2020)
Xu, H., Ma, J., Jiang, J., Guo, X., Ling, H.: U2fusion: A unified unsupervised image fusion network. In: IEEE TPAMI (2020)
Xu, H., Ma, J., Yuan, J., Le, Z., Liu, W.: RfNet: unsupervised network for mutually reinforcing multi-modal image registration and fusion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 19679–19688 (2022)
Xu, T., Feng, Z.H., Wu, X.J., Kittler, J.: Learning adaptive discriminative correlation filters via temporal consistency preserving spatial feature selection for robust visual object tracking. IEEE Trans. Image Process. 28(11), 5596–5609 (2019)
Zhang, H., Xu, H., Xiao, Y., Guo, X., Ma, J.: Rethinking the image fusion: A fast unified image fusion network based on proportional maintenance of gradient and intensity. In: AAAI. vol. 34, pp. 12797–12804 (2020)
Zhang, L., Zhu, X., Chen, X., Yang, X., Lei, Z., Liu, Z.: Weakly aligned cross-modal learning for multispectral pedestrian detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5127–5137 (2019)
Zhang, X., Ye, P., Leung, H., Gong, K., Xiao, G.: Object fusion tracking based on visible and infrared images: A comprehensive review. Inf. Fus. 63, 166–187 (2020)
Zhao, J.X., Liu, J.J., Fan, D.P., Cao, Y., Yang, J., Cheng, M.M.: EgNet: edge guidance network for salient object detection. In: CVPR, pp. 8779–8788 (2019)
Acknowledgments
This work is partially supported by the National Key R &D Program of China (2020YF-B1313503), the National Natural Science Foundation of China (Nos. 61922019, 61906029 and 62027826), and the Fundamental Research Funds for the Central Universities.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Huang, Z., Liu, J., Fan, X., Liu, R., Zhong, W., Luo, Z. (2022). ReCoNet: Recurrent Correction Network for Fast and Efficient Multi-modality Image Fusion. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13678. Springer, Cham. https://doi.org/10.1007/978-3-031-19797-0_31
Download citation
DOI: https://doi.org/10.1007/978-3-031-19797-0_31
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19796-3
Online ISBN: 978-3-031-19797-0
eBook Packages: Computer ScienceComputer Science (R0)