FusionVAE: A Deep Hierarchical Variational Autoencoder for RGB Image Fusion

Duffhauss, Fabian; Vien, Ngo Anh; Ziesche, Hanna; Neumann, Gerhard

doi:10.1007/978-3-031-19842-7_39

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13699))

Included in the following conference series:

European Conference on Computer Vision

2663 Accesses
2 Citations

Abstract

Sensor fusion can significantly improve the performance of many computer vision tasks. However, traditional fusion approaches are either not data-driven and cannot exploit prior knowledge nor find regularities in a given dataset or they are restricted to a single application. We overcome this shortcoming by presenting a novel deep hierarchical variational autoencoder called FusionVAE that can serve as a basis for many fusion tasks. Our approach is able to generate diverse image samples that are conditioned on multiple noisy, occluded, or only partially visible input images. We derive and optimize a variational lower bound for the conditional log-likelihood of FusionVAE. In order to assess the fusion capabilities of our model thoroughly, we created three novel datasets for image fusion based on popular computer vision datasets. In our experiments, we show that FusionVAE learns a representation of aggregated information that is relevant to fusion tasks. The results demonstrate that our approach outperforms traditional methods significantly. Furthermore, we present the advantages and disadvantages of different design choices.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

MultiMAE: Multi-modal Multi-task Masked Autoencoders

An end-to-end multi-scale network based on autoencoder for infrared and visible image fusion

Article 27 December 2022

DenseNetFuse: a study of deep unsupervised DenseNet to infrared and visual image fusion

Article 28 January 2021

References

Bao, J., Chen, D., Wen, F., Li, H., Hua, G.: CVAE-GAN: fine-grained image generation through asymmetric training. In: ICCV, pp. 2745–2754 (2017)
Google Scholar
Becker, P., Pandya, H., Gebhardt, G., Zhao, C., Taylor, C.J., Neumann, G.: Recurrent Kalman networks: factorized inference in high-dimensional deep feature spaces. In: ICML, pp. 544–552. PMLR (2019)
Google Scholar
Burda, Y., Grosse, R., Salakhutdinov, R.: Importance weighted autoencoders. In: ICLR (2016)
Google Scholar
Canny, J.: A computational approach to edge detection. IEEE TPAMI 8(6), 679–698 (1986)
Article Google Scholar
Chen, X., et al.: Variational lossy autoencoder. In: ICLR (2017)
Google Scholar
Chen, X., Ma, H., Wan, J., Li, B., Xia, T.: Multi-view 3D object detection network for autonomous driving. In: CVPR, pp. 1907–1915 (2017)
Google Scholar
Child, R.: Very deep VAEs generalize autoregressive models and can outperform them on images. In: ICLR (2021)
Google Scholar
Goodfellow, I., et al.: Generative adversarial nets. In: NeurIPS, vol. 27, pp. 2672–2680 (2014)
Google Scholar
Gregor, K., Danihelka, I., Graves, A., Rezende, D., Wierstra, D.: DRAW: a recurrent neural network for image generation. In: ICML. Proceedings of Machine Learning Research, vol. 37, pp. 1462–1471. PMLR, July 2015
Google Scholar
Gu, J., et al.: Recent advances in convolutional neural networks. Pattern Recogn. 77, 354–377 (2018)
Article Google Scholar
Gulrajani, I., et al.: PixelVAE: a latent variable model for natural images. In: ICLR (2017)
Google Scholar
Guo, X., Nie, R., Cao, J., Zhou, D., Mei, L., He, K.: FuseGAN: learning to fuse multi-focus image via conditional generative adversarial network. IEEE TMM 21(8), 1982–1996 (2019)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Google Scholar
He, Y., Huang, H., Fan, H., Chen, Q., Sun, J.: FFB6D: a full flow bidirectional fusion network for 6D pose estimation. In: CVPR, pp. 3003–3013 (2021)
Google Scholar
He, Y., Sun, W., Huang, H., Liu, J., Fan, H., Sun, J.: PVN3D: a deep point-wise 3D keypoints voting network for 6DoF pose estimation. In: CVPR, pp. 11632–11641 (2020)
Google Scholar
Hodaň, T., Haluza, P., Obdržálek, Š., Matas, J., Lourakis, M., Zabulis, X.: T-LESS: an RGB-D dataset for 6D pose estimation of texture-less objects. In: WACV (2017)
Google Scholar
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: CVPR, pp. 4700–4708 (2017)
Google Scholar
Huang, J., Le, Z., Ma, Y., Mei, X., Fan, F.: A generative adversarial network with adaptive constraints for multi-focus image fusion. Neural Comput. Appl. 32(18), 15119–15129 (2020). https://doi.org/10.1007/s00521-020-04863-1
Article Google Scholar
Iizuka, S., Simo-Serra, E., Ishikawa, H.: Globally and locally consistent image completion. ACM TOG 36(4), 1–14 (2017)
Article Google Scholar
Jung, H., Kim, Y., Jang, H., Ha, N., Sohn, K.: Unsupervised deep image fusion with structure tensor representations. IEEE TIP 29, 3845–3858 (2020)
MATH Google Scholar
Kim, J., Yoo, J., Lee, J., Hong, S.: SetVAE: learning hierarchical composition for generative modeling of set-structured data. In: CVPR, pp. 15059–15068 (2021)
Google Scholar
Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. In: ICLR (2014)
Google Scholar
Kingma, D.P., Salimans, T., Jozefowicz, R., Chen, X., Sutskever, I., Welling, M.: Improving variational inference with inverse autoregressive flow. In: NeurIPS, vol. 29, pp. 4743–4751 (2016)
Google Scholar
Köhler, R., Schuler, C., Schölkopf, B., Harmeling, S.: Mask-specific inpainting with deep neural networks. In: Jiang, X., Hornegger, J., Koch, R. (eds.) GCPR 2014. LNCS, vol. 8753, pp. 523–534. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11752-2_43
Chapter Google Scholar
Ku, J., Mozifian, M., Lee, J., Harakeh, A., Waslander, S.L.: Joint 3D proposal generation and object detection from view aggregation. In: IROS, pp. 1–8. IEEE (2018)
Google Scholar
Larsen, A.B.L., Sønderby, S.K., Larochelle, H., Winther, O.: Autoencoding beyond pixels using a learned similarity metric. In: ICML, pp. 1558–1566. PMLR (2016)
Google Scholar
LeCun, Y.: The MNIST database of handwritten digits. http://yann.lecun.com/exdb/mnist (1998)
Li, H., Wu, X.J.: DenseFuse: a fusion approach to infrared and visible images. IEEE TIP 28(5), 2614–2623 (2018)
MathSciNet Google Scholar
Li, H., Wu, X.J., Durrani, T.S.: Infrared and visible image fusion with ResNet and zero-phase component analysis. Infrared Phys. Technol. 102, 103039 (2019)
Article Google Scholar
Li, H., Wu, X.J., Kittler, J.: Infrared and visible image fusion using a deep learning framework. In: ICPR, pp. 2705–2710. IEEE (2018)
Google Scholar
Li, Y., Liu, S., Yang, J., Yang, M.H.: Generative face completion. In: CVPR, pp. 3911–3919 (2017)
Google Scholar
Liu, Y., Chen, X., Cheng, J., Peng, H.: A medical image fusion method based on convolutional neural networks. In: International Conference on Information Fusion, pp. 1–7. IEEE (2017)
Google Scholar
Liu, Y., Chen, X., Peng, H., Wang, Z.: Multi-focus image fusion with a deep convolutional neural network. Inf. Fusion 36, 191–207 (2017)
Article Google Scholar
Liu, Z., Luo, P., Wang, X., Tang, X.: Deep learning face attributes in the wild. In: ICCV, December 2015
Google Scholar
Ma, J., et al.: Infrared and visible image fusion via detail preserving adversarial learning. Inf. Fusion 54, 85–98 (2020)
Google Scholar
Ma, J., Xu, H., Jiang, J., Mei, X., Zhang, X.P.: DDcGAN: a dual-discriminator conditional generative adversarial network for multi-resolution image fusion. IEEE TIP 29, 4980–4995 (2020)
MATH Google Scholar
Ma, J., Yu, W., Liang, P., Li, C., Jiang, J.: FusionGAN: a generative adversarial network for infrared and visible image fusion. Inf. Fusion 48, 11–26 (2019)
Article Google Scholar
Maaløe, L., Fraccaro, M., Liévin, V., Winther, O.: BIVA: a very deep hierarchy of latent variables for generative modeling. In: NeurIPS, vol. 32, pp. 6551–6562 (2019)
Google Scholar
Marinescu, R.V., Moyer, D., Golland, P.: Bayesian image reconstruction using deep generative models. arXiv:2012.04567 [cs.CV] (2020)
Parmar, G., Li, D., Lee, K., Tu, Z.: Dual contradistinctive generative autoencoder. In: CVPR, pp. 823–832 (2021)
Google Scholar
Pathak, D., Krahenbuhl, P., Donahue, J., Darrell, T., Efros, A.A.: Context encoders: Feature learning by inpainting. In: CVPR, pp. 2536–2544 (2016)
Google Scholar
Prabhakar, K.R., Srikar, V.S., Babu, R.V.: DeepFuse: a deep unsupervised approach for exposure fusion with extreme exposure image pairs. In: ICCV, pp. 4714–4722 (2017)
Google Scholar
Sadeghi, H., Andriyash, E., Vinci, W., Buffoni, L., Amin, M.H.: PixelVAE++: improved PixelVAE with discrete prior. arXiv:1908.09948 [cs.CV] (2019)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)
Google Scholar
Sohn, K., Lee, H., Yan, X.: Learning structured output representation using deep conditional generative models. In: NeurIPS, vol. 28, pp. 3483–3491 (2015)
Google Scholar
Sønderby, C.K., Raiko, T., Maaløe, L., Sønderby, S.K., Winther, O.: Ladder variational autoencoders. In: NeurIPS, vol. 29, 3738–3746 (2016)
Google Scholar
Song, Y., et al.: Contextual-based image inpainting: infer, match, and translate. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11206, pp. 3–18. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01216-8_1
Chapter Google Scholar
Song, Y., Yang, C., Shen, Y., Wang, P., Huang, Q., Kuo, C.C.J.: SPG-Net: segmentation prediction and guidance network for image inpainting. In: BMVC (2018)
Google Scholar
Vahdat, A., Kautz, J.: NVAE: a deep hierarchical variational autoencoder. In: NeurIPS, vol. 33, pp. 19667–19679 (2020)
Google Scholar
Vahdat, A., Macready, W., Bian, Z., Khoshaman, A., Andriyash, E.: DVAE++: discrete variational autoencoders with overlapping transformations. In: ICML, pp. 5035–5044. PMLR (2018)
Google Scholar
Volpp, M., Flürenbrock, F., Grossberger, L., Daniel, C., Neumann, G.: Bayesian context aggregation for neural processes. In: ICLR (2020)
Google Scholar
Walker, J., Doersch, C., Gupta, A., Hebert, M.: An uncertain future: forecasting from static images using variational autoencoders. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 835–851. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46478-7_51
Chapter Google Scholar
Wang, C., et al.: DenseFusion: 6D object pose estimation by iterative dense fusion. In: CVPR, pp. 3343–3352 (2019)
Google Scholar
Wang, W., Liu, W., Hu, J., Fang, Y., Shao, Q., Qi, J.: GraspFusionNet: a two-stage multi-parameter grasp detection network based on RGB-XYZ fusion in dense clutter. Mach. Vis. Appl. 31(7), 1–19 (2020)
Google Scholar
Xu, H., Liang, P., Yu, W., Jiang, J., Ma, J.: Learning a generative model for fusing infrared and visible images via conditional generative adversarial network with dual discriminators. In: IJCAI (2019)
Google Scholar
Xu, H., Ma, J., Le, Z., Jiang, J., Guo, X.: FusionDN: a unified densely connected network for image fusion. In: AAAI, vol. 34, pp. 12484–12491, April 2020
Google Scholar
Yan, Z., Li, X., Li, M., Zuo, W., Shan, S.: Shift-Net: image inpainting via deep feature rearrangement. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. LNCS, vol. 11218, pp. 3–19. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01264-9_1
Chapter Google Scholar
Yang, C., Lu, X., Lin, Z., Shechtman, E., Wang, O., Li, H.: High-resolution image inpainting using multi-scale neural patch synthesis. In: CVPR, pp. 6721–6729 (2017)
Google Scholar
Yoo, J.H., Kim, Y., Kim, J., Choi, J.W.: 3D-CVF: generating joint camera and LiDAR features using cross-view spatial feature fusion for 3D object detection. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12372, pp. 720–736. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58583-9_43
Chapter Google Scholar
Yu, J., Lin, Z., Yang, J., Shen, X., Lu, X., Huang, T.S.: Generative image inpainting with contextual attention. In: CVPR, pp. 5505–5514 (2018)
Google Scholar
Zaheer, M., Kottur, S., Ravanbakhsh, S., Póczos, B., Salakhutdinov, R., Smola, A.J.: Deep sets. In: NeurIPS, vol. 30, pp. 3391–3401 (2017)
Google Scholar
Zeng, Yu., Lin, Z., Yang, J., Zhang, J., Shechtman, E., Lu, H.: High-resolution image inpainting with iterative confidence feedback and guided upsampling. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12364, pp. 1–17. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58529-7_1
Chapter Google Scholar
Zhang, H., Xu, H., Xiao, Y., Guo, X., Ma, J.: Rethinking the image fusion: a fast unified image fusion network based on proportional maintenance of gradient and intensity. In: AAAI, vol. 34, pp. 12797–12804, April 2020
Google Scholar
Zhang, Q., Qu, D., Xu, F., Zou, F.: Robust robot grasp detection in multimodal fusion. In: MATEC Web of Conferences, vol. 139, p. 00060. EDP Sciences (2017)
Google Scholar
Zheng, C., Cham, T.J., Cai, J.: Pluralistic image completion. In: CVPR, pp. 1438–1447 (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

Bosch Center for Artificial Intelligence, Renningen, Germany
Fabian Duffhauss, Ngo Anh Vien & Hanna Ziesche
University of Tübingen, Tübingen, Germany
Fabian Duffhauss
Karlsruhe Institute of Technology, Karlsruhe, Germany
Gerhard Neumann

Authors

Fabian Duffhauss
View author publications
You can also search for this author in PubMed Google Scholar
Ngo Anh Vien
View author publications
You can also search for this author in PubMed Google Scholar
Hanna Ziesche
View author publications
You can also search for this author in PubMed Google Scholar
Gerhard Neumann
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fabian Duffhauss .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 2822 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Duffhauss, F., Vien, N.A., Ziesche, H., Neumann, G. (2022). FusionVAE: A Deep Hierarchical Variational Autoencoder for RGB Image Fusion. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13699. Springer, Cham. https://doi.org/10.1007/978-3-031-19842-7_39

Download citation

DOI: https://doi.org/10.1007/978-3-031-19842-7_39
Published: 23 October 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19841-0
Online ISBN: 978-3-031-19842-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

FusionVAE: A Deep Hierarchical Variational Autoencoder for RGB Image Fusion

Abstract

Access this chapter

Similar content being viewed by others

MultiMAE: Multi-modal Multi-task Masked Autoencoders

An end-to-end multi-scale network based on autoencoder for infrared and visible image fusion

DenseNetFuse: a study of deep unsupervised DenseNet to infrared and visual image fusion

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 2822 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

FusionVAE: A Deep Hierarchical Variational Autoencoder for RGB Image Fusion

Abstract

Access this chapter

Similar content being viewed by others

MultiMAE: Multi-modal Multi-task Masked Autoencoders

An end-to-end multi-scale network based on autoencoder for infrared and visible image fusion

DenseNetFuse: a study of deep unsupervised DenseNet to infrared and visual image fusion

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 2822 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation