Skip to main content

Hypergraph Variational Autoencoder for Multimodal Semi-supervised Representation Learning

  • Conference paper
  • First Online:
Artificial Neural Networks and Machine Learning – ICANN 2022 (ICANN 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13532))

Included in the following conference series:

Abstract

In many real-world settings, the external environment is perceived through multi-modal information, such as visual, radar, lidar, etc. Naturally, the fact motivates us to exploit interaction intra modals and integrate multiple source information using limited labels on the multimodal dataset as a semi-supervised task. A challenging issue in multimodal semi-supervised learning is the complicated correlations under pairwise modalities. In this paper, we propose a hypergraph variational autoencoder (HVAE) which can mine high-order interaction of multimodal data and introduce extra prior knowledge for inferring multimodal fusion representation. On one hand, the hypergraph structure can represent high-order data correlation in multimodal scenes. On the other hand, a prior distribution is introduced by mask-based variational inference to enhance multi-modal characterization. Moreover, the variational lower bound is leveraged to collaborate semi-supervised learning. We conduct experiments on semi-supervised visual object recognition task, and extensive experiments on two datasets demonstrate the superiority of our method against the existing baselines.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Antol, S., et al.: VQA: visual question answering. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2425–433 (2015)

    Google Scholar 

  2. Chen, Y.-T., Shi, J., Mertz, C., Kong, S., Ramanan, D.: Multimodal object detection via Bayesian fusion. arXiv preprint arXiv:2104.02904 (2021)

  3. Qi, C.R., Su, H., Nießner, M., Dai, A., Yan, M., Guibas, L.J.: Volumetric and multi-view CNNs for object classification on 3D data. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5648–5656 (2015)

    Google Scholar 

  4. Guillaumin, J.V., Schmid, C.: Multimodal semi-supervised learning for image classification. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 902–909. IEEE (2010)

    Google Scholar 

  5. Liang, J., Li, R., Jin, Q.: Semi-supervised multi-modal emotion recognition with cross-modal distribution matching. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 2852–2861 (2020)

    Google Scholar 

  6. Cheng, Y., Zhao, X., Cai, R., Li, Z., Huang, K., Rui, Z., et al.: Semi-supervised multimodal deep learning for RGB-D object recognition. In: IJCAI, pp. 3345–3351 (2016)

    Google Scholar 

  7. van Engelen, J.E., Hoos, H.H.: A survey on semi-supervised learning. Mach. Learn. 109(2), 373–440 (2019). https://doi.org/10.1007/s10994-019-05855-6

  8. Kipf, T.N., Welling, M.: Semi-supervised Classification with Graph Convolutional Networks. Toulon, France (2017)

    Google Scholar 

  9. Chiang, W.-L., Liu, X., Si, S., Li, Y., Bengio, S., Hsieh, C.-J. Cluster-GCN: an efficient algorithm for training deep and large graph convolutional networks. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 257–266 (2019)

    Google Scholar 

  10. Rahman, S., Khan, S., Barnes, N.: Transductive learning for zero-shot object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6082–6091 (2019)

    Google Scholar 

  11. Gao, J., Li, P., Chen, Z., Zhang, J.: A survey on deep learning for multimodal data fusion. Neural Comput. 32(5), 829–864 (2020)

    Article  MathSciNet  Google Scholar 

  12. Lee, M., Pavlovic, V.: Private-shared disentangled multimodal VAE for learning of latent representations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1692–1700 (2021)

    Google Scholar 

  13. Xu, X., Lin, K., Gao, L., Lu, H., Shen, H.T., Li, H.T.: Learning cross-modal common representations by private-shared subspaces separation. In: IEEE Trans. Cybern. 52, 3261–3275 (2020)

    Google Scholar 

  14. Prakash, A., Chitta, K., Geiger, A.: Multi-modal fusion transformer for end-to-end autonomous driving. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7077–7087 (2021)

    Google Scholar 

  15. Feng, Y., You, H., Zhang, Z., Ji, R., Gao, Y.: Hypergraph neural net-works. Proc. AAAI Conf. Artif. Intell. 33(01), 3558–3565 (2019)

    Google Scholar 

  16. Kim, E.-S., Kang, W.Y., On, K.-W., Heo, Y.-J., Zhang, B.-T.: Hypergraph attention networks for multimodal learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14 581–14 590 (2020)

    Google Scholar 

  17. Bai, S., Zhang, F., Torr, P.H.: Hypergraph convolution and hyper-graph attention. Pattern Recogn. 110 (2021)

    Google Scholar 

  18. Yu, J., Yin, H., Li, J., Wang, Q., Hung, N.Q.V., Zhang, X.: Self-supervised multi-channel hypergraph convolutional network for social recommendation. In: Proceedings of the Web Conference 2021, pp. 413–424 (2021)

    Google Scholar 

  19. Sun, X., et al.: Heterogeneous hypergraph embedding for graph classification. In: Proceedings of the 14th ACM International Conference on Web Search and Data Mining, pp. 725–733 (2011)

    Google Scholar 

  20. Kingma, D.P., Welling, M.: Auto-encoding Variational Bayes. Banff, AB, Canada (2014)

    Google Scholar 

  21. Hui, B., Zhu, P., Hu, Q.: Collaborative graph convolutional networks: Unsupervised learning meets semi-supervised learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 4215–4222 (2020)

    Google Scholar 

  22. Gao, Y., Zhang, Z., Lin, H., Zhao, X., Du, S., Zou, C.: Hypergraph learning: methods and practices. IEEE Trans. Pattern Anal. Mach. Intell. 44, 2548–2566 (2020)

    Google Scholar 

  23. Kipf, T.N., Welling, M.: Variational graph auto-encoders. arXiv preprint arXiv:1611.07308 (2016)

  24. Wu, Z., et al.: 3D SshapeNets: a deep representation for volumetric shapes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1912–1920 (2015)

    Google Scholar 

  25. Chen, D.-Y., Tian, X.-P., Shen, Y.-T., Ouhyoung, M.: On visual similarity based 3D model retrieval. Comput. Graph. Forum 22(3), 223–232 (2003)

    Google Scholar 

  26. Zadeh, A., Chen, M., Poria, S., Cambria, E., Morency, L.-P.: Tensor fusion network for multimodal sentiment analysis. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 1103–1114 (2017)

    Google Scholar 

  27. Liu, Z., Shen, Y., Lakshminarasimhan, V.B., Liang, P.P., Zadeh, A.B., Morency, L.-P.: Efficient low-rank multimodal fusion with modality-specific factors. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 2247–2256 (2016)

    Google Scholar 

  28. Su, H., Maji, S., Kalogerakis, E., Learned-Miller, E.: Multi-view convolutional neural networks for 3D shape recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 945–953 (2015)

    Google Scholar 

  29. Feng, Y., Zhang, Z., Zhao, X., Ji, R., Gao, Y.: GVCNN: group-view convolutional neural networks for 3D shape recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 264–272 (2018)

    Google Scholar 

  30. Chen, J., Zhang, A.: HGMF: heterogeneous graph-based fusion for multimodal data with incompleteness. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1295–1305 (2020)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Weidong Hu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Liu, J., Du, X., Li, Y., Hu, W. (2022). Hypergraph Variational Autoencoder for Multimodal Semi-supervised Representation Learning. In: Pimenidis, E., Angelov, P., Jayne, C., Papaleonidas, A., Aydin, M. (eds) Artificial Neural Networks and Machine Learning – ICANN 2022. ICANN 2022. Lecture Notes in Computer Science, vol 13532. Springer, Cham. https://doi.org/10.1007/978-3-031-15937-4_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-15937-4_33

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-15936-7

  • Online ISBN: 978-3-031-15937-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics