ReCoNet: Recurrent Correction Network for Fast and Efficient Multi-modality Image Fusion

Huang, Zhanbo; Liu, Jinyuan; Fan, Xin; Liu, Risheng; Zhong, Wei; Luo, Zhongxuan

doi:10.1007/978-3-031-19797-0_31

Zhanbo Huang¹²,
Jinyuan Liu¹³,
Xin Fan¹²,
Risheng Liu^12,14,
Wei Zhong¹² &
…
Zhongxuan Luo¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13678))

Included in the following conference series:

European Conference on Computer Vision

3657 Accesses
17 Citations

Abstract

Recent advances in deep networks have gained great attention in infrared and visible image fusion (IVIF). Nevertheless, most existing methods are incapable of dealing with slight misalignment on source images and suffer from high computational and spatial expenses. This paper tackles these two critical issues rarely touched in the community by developing a recurrent correction network for robust and efficient fusion, namely ReCoNet. Concretely, we design a deformation module to explicitly compensate geometrical distortions and an attention mechanism to mitigate ghosting-like artifacts, respectively. Meanwhile, the network consists of a parallel dilated convolutional layer and runs in a recurrent fashion, significantly reducing both spatial and computational complexities. ReCoNet can effectively and efficiently alleviates both structural distortions and textural artifacts brought by slight misalignment. Extensive experiments on two public datasets demonstrate the superior accuracy and efficacy of our ReCoNet against the state-of-the-art IVIF methods. Consequently, we obtain a \(16\%\) relative improvement of CC on datasets with misalignment and boost the efficiency by \(86\%\). The source code is available at https://github.com/dlut-dimt/reconet.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: YOLOv4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020)
Brasó, G., Leal-Taixé, L.: Learning a neural solver for multiple object tracking. In: IEEE CVPR, pp. 6247–6257 (2020)
Google Scholar
Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Google Scholar
Du, Q., Xu, H., Ma, Y., Huang, J., Fan, F.: Fusing infrared and visible images of different resolutions via total variation model. Sensors 18(11), 3827 (2018)
Article Google Scholar
Fu, J., et al.: Dual attention network for scene segmentation. In: CVPR, pp. 3146–3154 (2019)
Google Scholar
Gao, H., Cheng, B., Wang, J., Li, K., Zhao, J., Li, D.: Object classification using CNN-based fusion of vision and lidar in autonomous vehicle environment. IEEE Trans. Ind. Informat. 14(9), 4224–4231 (2018)
Article Google Scholar
Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the KITTI dataset. Int. J. Rob. Res. 32(11), 1231–1237 (2013)
Google Scholar
Godard, C., Mac Aodha, O., Brostow, G.J.: Unsupervised monocular depth estimation with left-right consistency. In: CVPR (2017)
Google Scholar
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)
Google Scholar
Jiang, Z., Li, Z., Yang, S., Fan, X., Liu, R.: Target oriented perceptual adversarial fusion network for underwater image enhancement. IEEE Trans. Circ. Syst. Video Technol. 32, 6584– 6598 (2022)
Google Scholar
Kristan, M., et al.: The visual object tracking vot2017 challenge results. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 1949–1972 (2017)
Google Scholar
Lan, X., et al.: Learning modality-consistency feature templates: a robust RGB-infrared tracking system. IEEE Tran. Ind. Enformat. 66(12), 9887–9897 (2019)
Article Google Scholar
Li, H., Wu, X.J.: DenseFuse: a fusion approach to infrared and visible images. IEEE Trans. Image Process. 28(5), 2614–2623 (2018)
Article MathSciNet Google Scholar
Li, H., Wu, X.J., Kittler, J.: RFN-nest: an end-to-end residual fusion network for infrared and visible images. Inf. Fus. 73, 72–86 (2021)
Article Google Scholar
Li, J., Huo, H., Li, C., Wang, R., Feng, Q.: AttentionfGAN: Infrared and visible image fusion using attention-based generative adversarial networks. IEEE Trans. Multimedia 23, 1383–1396 (2020)
Article Google Scholar
Li, P.: Didfuse: deep image decomposition for infrared and visible image fusion. In: Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence, pp. 976–976 (2021)
Google Scholar
Li, S., Kang, X., Hu, J.: Image fusion with guided filtering. IEEE Trans. Image Process. 22(7), 2864–2875 (2013)
Article Google Scholar
Liu, J., et al.: Target-aware dual adversarial learning and a multi-scenario multi-modality benchmark to fuse infrared and visible for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5802–5811 (2022)
Google Scholar
Liu, J., Fan, X., Jiang, J., Liu, R., Luo, Z.: Learning a deep multi-scale feature ensemble and an edge-attention guidance for image fusion. In: IEEE TCSVT (2021)
Google Scholar
Liu, J., Shang, J., Liu, R., Fan, X.: Attention-guided global-local adversarial learning for detail-preserving multi-exposure image fusion. IEEE Trans. Circ. Syst. Video Technol. 32, 5026–5040 (2022). https://doi.org/10.1109/TCSVT.2022.3144455
Liu, J., Wu, Y., Huang, Z., Liu, R., Fan, X.: SMOA: searching a modality-oriented architecture for infrared and visible image fusion. IEEE Signal Process. Lett. 28, 1818–1822 (2021)
Article Google Scholar
Liu, R., Liu, J., Jiang, Z., Fan, X., Luo, Z.: A bilevel integrated model with data-driven layer ensemble for multi-modality image fusion. IEEE Trans. Image Process. 30, 1261–1274 (2021). https://doi.org/10.1109/TIP.2020.3043125
Article Google Scholar
Liu, R., Liu, Z., Liu, J., Fan, X.: Searching a hierarchically aggregated fusion architecture for fast multi-modality image fusion. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 1600–1608 (2021)
Google Scholar
Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Chapter Google Scholar
Ma, J., Chen, C., Li, C., Huang, J.: Infrared and visible image fusion via gradient transfer and total variation minimization. Inf. Fus. 31, 100–109 (2016)
Article Google Scholar
Ma, J., Yu, W., Liang, P., Li, C., Jiang, J.: FusionGAN: a generative adversarial network for infrared and visible image fusion. Inf. Fus. 48, 11–26 (2019)
Article Google Scholar
Ma, J., Zhang, H., Shao, Z., Liang, P., Xu, H.: GANMcC:: a generative adversarial network with multiclassification constraints for infrared and visible image fusion. IEEE Trans. Instrum Meaure. 70, 1–14 (2020)
Google Scholar
Ma, J., Zhou, Z., Wang, B., Zong, H.: Infrared and visible image fusion based on visual saliency map and weighted least square optimization. Infr. Phys. Technol. 82, 8–17 (2017)
Article Google Scholar
Nencini, F., Garzelli, A., Baronti, S., Alparone, L.: Remote sensing image fusion using the curvelet transform. Inf. Fus. 8(2), 143–156 (2007)
Article Google Scholar
Palsson, F., Sveinsson, J.R., Ulfarsson, M.O.: Multispectral and hyperspectral image fusion using a 3-d-convolutional neural network. IEEE Geosci. Remote Sens. Lett. 14(5), 639–643 (2017)
Article Google Scholar
Paramanandham, N., Rajendiran, K.: Infrared and visible image fusion using discrete cosine transform and swarm intelligence for surveillance applications. Infrar. Phys. Technol. 88, 13–22 (2018)
Article Google Scholar
Paszke, A., et al.: An imperative style, high-performance deep learning library. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 32, pp. 8024–8035. Curran Associates, Inc. (2019), http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf
Pu, M., Huang, Y., Guan, Q., Zou, Q.: GraphNet: learning image pseudo annotations for weakly-supervised semantaic segmentation. In: ACM MM, pp. 483–491. ACM (2018)
Google Scholar
Qin, X., Zhang, Z., Huang, C., Dehghan, M., Zaiane, O., Jagersand, M.: U2-Net: going deeper with nested u-structure for salient object detection, vol. 106, p. 107404 (2020)
Google Scholar
Ranftl, R., Lasinger, K., Hafner, D., Schindler, K., Koltun, V.: Towards robust monocular depth estimation: mixing datasets for zero-shot cross-dataset transfer. IEEE Trans. Pattern Anal. Mach. Intell. 44, 1623–1637 (2020)
Google Scholar
Shreyamsha Kumar, B.: Image fusion based on pixel significance using cross bilateral filter. Sig. Image Video. Process. 9(5), 1193–1204 (2015)
Article Google Scholar
Toet, A.: The tno multiband image data collection. Data Brief 15, 249 (2017)
Article Google Scholar
Wang, D., Liu, J., Fan, X., Liu, R.: Unsupervised misaligned infrared and visible image fusion via cross-modality image generation and registration. arXiv preprint arXiv:2205.11876 (2022)
Wang, L., Zhang, J., Wang, Y., Lu, H., Ruan, X.: CLIFFNet for monocular depth estimation with hierarchical embedding Loss. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12350, pp. 316–331. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58558-7_19
Chapter Google Scholar
Xiao, Y., Codevilla, F., Gurram, A., Urfalioglu, O., López, A.M.: Multimodal end-to-end autonomous driving. IEEE Trans. Intell. Trans. Syst. 23, 537–547 (2020)
Google Scholar
Xu, H., Ma, J., Jiang, J., Guo, X., Ling, H.: U2fusion: A unified unsupervised image fusion network. In: IEEE TPAMI (2020)
Google Scholar
Xu, H., Ma, J., Yuan, J., Le, Z., Liu, W.: RfNet: unsupervised network for mutually reinforcing multi-modal image registration and fusion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 19679–19688 (2022)
Google Scholar
Xu, T., Feng, Z.H., Wu, X.J., Kittler, J.: Learning adaptive discriminative correlation filters via temporal consistency preserving spatial feature selection for robust visual object tracking. IEEE Trans. Image Process. 28(11), 5596–5609 (2019)
Article MathSciNet MATH Google Scholar
Zhang, H., Xu, H., Xiao, Y., Guo, X., Ma, J.: Rethinking the image fusion: A fast unified image fusion network based on proportional maintenance of gradient and intensity. In: AAAI. vol. 34, pp. 12797–12804 (2020)
Google Scholar
Zhang, L., Zhu, X., Chen, X., Yang, X., Lei, Z., Liu, Z.: Weakly aligned cross-modal learning for multispectral pedestrian detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5127–5137 (2019)
Google Scholar
Zhang, X., Ye, P., Leung, H., Gong, K., Xiao, G.: Object fusion tracking based on visible and infrared images: A comprehensive review. Inf. Fus. 63, 166–187 (2020)
Article Google Scholar
Zhao, J.X., Liu, J.J., Fan, D.P., Cao, Y., Yang, J., Cheng, M.M.: EgNet: edge guidance network for salient object detection. In: CVPR, pp. 8779–8788 (2019)
Google Scholar

Download references

Acknowledgments

This work is partially supported by the National Key R &D Program of China (2020YF-B1313503), the National Natural Science Foundation of China (Nos. 61922019, 61906029 and 62027826), and the Fundamental Research Funds for the Central Universities.

Author information

Authors and Affiliations

DUT-RU International School of Information Science and Engineering, Dalian University of Technology, Dalian, China
Zhanbo Huang, Xin Fan, Risheng Liu, Wei Zhong & Zhongxuan Luo
School of Software Technology, Dalian University of Technology, Dalian, China
Jinyuan Liu
Peng Cheng Laboratory, Shenzhen, China
Risheng Liu

Authors

Zhanbo Huang
View author publications
You can also search for this author in PubMed Google Scholar
Jinyuan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Xin Fan
View author publications
You can also search for this author in PubMed Google Scholar
Risheng Liu
View author publications
You can also search for this author in PubMed Google Scholar
Wei Zhong
View author publications
You can also search for this author in PubMed Google Scholar
Zhongxuan Luo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xin Fan .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 17811 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Huang, Z., Liu, J., Fan, X., Liu, R., Zhong, W., Luo, Z. (2022). ReCoNet: Recurrent Correction Network for Fast and Efficient Multi-modality Image Fusion. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13678. Springer, Cham. https://doi.org/10.1007/978-3-031-19797-0_31

Download citation

DOI: https://doi.org/10.1007/978-3-031-19797-0_31
Published: 03 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19796-3
Online ISBN: 978-3-031-19797-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

ReCoNet: Recurrent Correction Network for Fast and Efficient Multi-modality Image Fusion