Hypergraph Variational Autoencoder for Multimodal Semi-supervised Representation Learning

Liu, Jingquan; Du, Xiaoyong; Li, Yuanzhe; Hu, Weidong

doi:10.1007/978-3-031-15937-4_33

Jingquan Liu¹²,
Xiaoyong Du¹²,
Yuanzhe Li¹² &
…
Weidong Hu¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13532))

Included in the following conference series:

International Conference on Artificial Neural Networks

2173 Accesses
2 Citations

Abstract

In many real-world settings, the external environment is perceived through multi-modal information, such as visual, radar, lidar, etc. Naturally, the fact motivates us to exploit interaction intra modals and integrate multiple source information using limited labels on the multimodal dataset as a semi-supervised task. A challenging issue in multimodal semi-supervised learning is the complicated correlations under pairwise modalities. In this paper, we propose a hypergraph variational autoencoder (HVAE) which can mine high-order interaction of multimodal data and introduce extra prior knowledge for inferring multimodal fusion representation. On one hand, the hypergraph structure can represent high-order data correlation in multimodal scenes. On the other hand, a prior distribution is introduced by mask-based variational inference to enhance multi-modal characterization. Moreover, the variational lower bound is leveraged to collaborate semi-supervised learning. We conduct experiments on semi-supervised visual object recognition task, and extensive experiments on two datasets demonstrate the superiority of our method against the existing baselines.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Antol, S., et al.: VQA: visual question answering. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2425–433 (2015)
Google Scholar
Chen, Y.-T., Shi, J., Mertz, C., Kong, S., Ramanan, D.: Multimodal object detection via Bayesian fusion. arXiv preprint arXiv:2104.02904 (2021)
Qi, C.R., Su, H., Nießner, M., Dai, A., Yan, M., Guibas, L.J.: Volumetric and multi-view CNNs for object classification on 3D data. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5648–5656 (2015)
Google Scholar
Guillaumin, J.V., Schmid, C.: Multimodal semi-supervised learning for image classification. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 902–909. IEEE (2010)
Google Scholar
Liang, J., Li, R., Jin, Q.: Semi-supervised multi-modal emotion recognition with cross-modal distribution matching. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 2852–2861 (2020)
Google Scholar
Cheng, Y., Zhao, X., Cai, R., Li, Z., Huang, K., Rui, Z., et al.: Semi-supervised multimodal deep learning for RGB-D object recognition. In: IJCAI, pp. 3345–3351 (2016)
Google Scholar
van Engelen, J.E., Hoos, H.H.: A survey on semi-supervised learning. Mach. Learn. 109(2), 373–440 (2019). https://doi.org/10.1007/s10994-019-05855-6
Kipf, T.N., Welling, M.: Semi-supervised Classification with Graph Convolutional Networks. Toulon, France (2017)
Google Scholar
Chiang, W.-L., Liu, X., Si, S., Li, Y., Bengio, S., Hsieh, C.-J. Cluster-GCN: an efficient algorithm for training deep and large graph convolutional networks. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 257–266 (2019)
Google Scholar
Rahman, S., Khan, S., Barnes, N.: Transductive learning for zero-shot object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6082–6091 (2019)
Google Scholar
Gao, J., Li, P., Chen, Z., Zhang, J.: A survey on deep learning for multimodal data fusion. Neural Comput. 32(5), 829–864 (2020)
Article MathSciNet Google Scholar
Lee, M., Pavlovic, V.: Private-shared disentangled multimodal VAE for learning of latent representations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1692–1700 (2021)
Google Scholar
Xu, X., Lin, K., Gao, L., Lu, H., Shen, H.T., Li, H.T.: Learning cross-modal common representations by private-shared subspaces separation. In: IEEE Trans. Cybern. 52, 3261–3275 (2020)
Google Scholar
Prakash, A., Chitta, K., Geiger, A.: Multi-modal fusion transformer for end-to-end autonomous driving. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7077–7087 (2021)
Google Scholar
Feng, Y., You, H., Zhang, Z., Ji, R., Gao, Y.: Hypergraph neural net-works. Proc. AAAI Conf. Artif. Intell. 33(01), 3558–3565 (2019)
Google Scholar
Kim, E.-S., Kang, W.Y., On, K.-W., Heo, Y.-J., Zhang, B.-T.: Hypergraph attention networks for multimodal learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14 581–14 590 (2020)
Google Scholar
Bai, S., Zhang, F., Torr, P.H.: Hypergraph convolution and hyper-graph attention. Pattern Recogn. 110 (2021)
Google Scholar
Yu, J., Yin, H., Li, J., Wang, Q., Hung, N.Q.V., Zhang, X.: Self-supervised multi-channel hypergraph convolutional network for social recommendation. In: Proceedings of the Web Conference 2021, pp. 413–424 (2021)
Google Scholar
Sun, X., et al.: Heterogeneous hypergraph embedding for graph classification. In: Proceedings of the 14th ACM International Conference on Web Search and Data Mining, pp. 725–733 (2011)
Google Scholar
Kingma, D.P., Welling, M.: Auto-encoding Variational Bayes. Banff, AB, Canada (2014)
Google Scholar
Hui, B., Zhu, P., Hu, Q.: Collaborative graph convolutional networks: Unsupervised learning meets semi-supervised learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 4215–4222 (2020)
Google Scholar
Gao, Y., Zhang, Z., Lin, H., Zhao, X., Du, S., Zou, C.: Hypergraph learning: methods and practices. IEEE Trans. Pattern Anal. Mach. Intell. 44, 2548–2566 (2020)
Google Scholar
Kipf, T.N., Welling, M.: Variational graph auto-encoders. arXiv preprint arXiv:1611.07308 (2016)
Wu, Z., et al.: 3D SshapeNets: a deep representation for volumetric shapes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1912–1920 (2015)
Google Scholar
Chen, D.-Y., Tian, X.-P., Shen, Y.-T., Ouhyoung, M.: On visual similarity based 3D model retrieval. Comput. Graph. Forum 22(3), 223–232 (2003)
Google Scholar
Zadeh, A., Chen, M., Poria, S., Cambria, E., Morency, L.-P.: Tensor fusion network for multimodal sentiment analysis. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 1103–1114 (2017)
Google Scholar
Liu, Z., Shen, Y., Lakshminarasimhan, V.B., Liang, P.P., Zadeh, A.B., Morency, L.-P.: Efficient low-rank multimodal fusion with modality-specific factors. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 2247–2256 (2016)
Google Scholar
Su, H., Maji, S., Kalogerakis, E., Learned-Miller, E.: Multi-view convolutional neural networks for 3D shape recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 945–953 (2015)
Google Scholar
Feng, Y., Zhang, Z., Zhao, X., Ji, R., Gao, Y.: GVCNN: group-view convolutional neural networks for 3D shape recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 264–272 (2018)
Google Scholar
Chen, J., Zhang, A.: HGMF: heterogeneous graph-based fusion for multimodal data with incompleteness. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1295–1305 (2020)
Google Scholar

Download references

Author information

Authors and Affiliations

National Key Laboratory of Science and Technology on Automatic Target Recognition, National University of Defense Technology, Changsha, China
Jingquan Liu, Xiaoyong Du, Yuanzhe Li & Weidong Hu

Authors

Jingquan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoyong Du
View author publications
You can also search for this author in PubMed Google Scholar
Yuanzhe Li
View author publications
You can also search for this author in PubMed Google Scholar
Weidong Hu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Weidong Hu .

Editor information

Editors and Affiliations

University of the West of England, Bristol, UK
Elias Pimenidis
Lancaster University, Lancaster, UK
Plamen Angelov
Digital Innovation, Teeside University, Middlesbrough, UK
Chrisina Jayne
Democritus University of Thrace, Xanthi, Greece
Antonios Papaleonidas
The University of the West of England, Bristol, UK
Mehmet Aydin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, J., Du, X., Li, Y., Hu, W. (2022). Hypergraph Variational Autoencoder for Multimodal Semi-supervised Representation Learning. In: Pimenidis, E., Angelov, P., Jayne, C., Papaleonidas, A., Aydin, M. (eds) Artificial Neural Networks and Machine Learning – ICANN 2022. ICANN 2022. Lecture Notes in Computer Science, vol 13532. Springer, Cham. https://doi.org/10.1007/978-3-031-15937-4_33

Download citation

DOI: https://doi.org/10.1007/978-3-031-15937-4_33
Published: 07 September 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-15936-7
Online ISBN: 978-3-031-15937-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Hypergraph Variational Autoencoder for Multimodal Semi-supervised Representation Learning