Multi-modal Graph and Sequence Fusion Learning for Recommendation

Wang, Zejun; Wu, Xinglong; Yang, Hongwei; He, Hui; Tai, Yu; Zhang, Weizhe

doi:10.1007/978-981-99-8429-9_29

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14425))

Included in the following conference series:

Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

969 Accesses

Abstract

Multi-modal recommendation aims to leverage multi-modal information for mining users’ latent preferences. Existing multi-modal recommendation approaches primarily exploit graph structures and multi-modal information to explore the graph information derived from user-item interactions, overlooking the underlying sequence information. Furthermore, by treating items solely as coarse-grained entities, the latent relationships of items within each modality are disregarded, impeding the effective extraction of latent user preferences. To address the limitations, we propose a novel approach called Multi-modal Graph and Sequence Fusion Learning Architecture for Recommendation (MMGCF). In MMGCF, we first construct dynamic item-item graphs to enhance item features and capture relationships within each modality. Subsequently, according to the influence between modalities, we design a self attention network to fuse multi-modal features. Finally, in addition to regular graph convolution, we also devise a sequence-aware learning layer to preserve and capture sequence information for model to learn user preferences from a sequential perspective. Extensive experiments conducted on three real-world datasets demonstrate the superiority of our method over various state-of-the-art baselines.

This work was supported in part by the Joint Funds of the National Natural Science Foundation of China (Grant No. U22A2036) and the Key-Area Research and Development Program of Guangdong Province (2020B0101360001).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Chang, J., Gao, C., et al.: Sequential recommendation with graph neural networks. In: SIGIR, pp. 378–387 (2021)
Google Scholar
Chen, J., Zhang, H., et al.: Attentive collaborative filtering: multimedia recommendation with item- and component-level attention. In: SIGIR. pp. 335–344 (2017)
Google Scholar
Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: EMNLP, pp. 1724–1734 (2014)
Google Scholar
Fan, W., Ma, Y., et al.: Graph neural networks for social recommendation. In: WWW, pp. 417–426 (2019)
Google Scholar
He, R., McAuley, J.J.: VBPR: visual Bayesian personalized ranking from implicit feedback. In: AAAI, pp. 144–150 (2016)
Google Scholar
He, X., Deng, K., et al.: LightGCN: simplifying and powering graph convolution network for recommendation. In: SIGIR, pp. 639–648 (2020)
Google Scholar
Huang, C., Xu, H., Xu, Y., et al.: Knowledge-aware coupled graph neural network for social recommendation. In: AAAI, pp. 4115–4122 (2021)
Google Scholar
Kingma, D.P., et al.: Adam: a method for stochastic optimization. In: ICLR (2015)
Google Scholar
Liu, F., Cheng, Z., Sun, C., et al.: User diverse preference modeling by multimodal attentive metric learning. In: MM, pp. 1526–1534 (2019)
Google Scholar
Liu, Q., Wu, S., Wang, L.: DeepStyle: learning user preferences for visual recommendation. In: SIGIR, pp. 841–844 (2017)
Google Scholar
Liu, X., Zhang, F., et al.: Self-supervised learning: generative or contrastive. IEEE Trans. Knowl. Data Eng. 35(1), 857–876 (2023)
Google Scholar
Rendle, S., Freudenthaler, C., et al.: BPR: Bayesian personalized ranking from implicit feedback. In: UAI, pp. 452–461 (2009)
Google Scholar
Tao, Z., Liu, X., et al.: Self-supervised learning for multimedia recommendation. IEEE Trans. Multimedia (2022)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: NIPS, pp. 5998–6008 (2017)
Google Scholar
Wang, Q., Wei, Y., et al.: DualGNN: dual graph neural network for multimedia recommendation. IEEE Trans. Multimedia 25, 1074–1084 (2023)
Article Google Scholar
Wang, S., Hu, L., Wang, Y., et al.: Graph learning based recommender systems: a review. In: IJCAI, pp. 4644–4652 (2021)
Google Scholar
Wang, X., He, X., et al.: Neural graph collaborative filtering. In: SIGIR, pp. 165–174 (2019)
Google Scholar
Wang, Z., Wei, W., et al.: Global context enhanced graph neural networks for session-based recommendation. In: SIGIR, pp. 169–178 (2020)
Google Scholar
Wei, W., Huang, C., et al.: Contrastive meta learning with behavior multiplicity for recommendation. In: WSDM, pp. 1120–1128 (2022)
Google Scholar
Wei, Y., Wang, X., Li, Q., et al.: Contrastive learning for cold-start recommendation. In: MM, pp. 5382–5390 (2021)
Google Scholar
Wei, Y., Wang, X., et al.: MMGCN: multi-modal graph convolution network for personalized recommendation of micro-video. In: MM, pp. 1437–1445 (2019)
Google Scholar
Wei, Y., Wang, X., et al.: Graph-refined convolutional network for multimedia recommendation with implicit feedback. In: MM, pp. 3541–3549 (2020)
Google Scholar
Wu, S., Sun, F., et al.: Graph neural networks in recommender systems: a survey. ACM Comput. Surv. 55(5), 97:1–97:37 (2023)
Google Scholar
Yi, Z., Wang, X., et al.: Multi-modal graph contrastive learning for micro-video recommendation. In: SIGIR, pp. 1807–1811 (2022)
Google Scholar
Yu, F., Zhu, Y., et al.: TAGNN: target attentive graph neural networks for session-based recommendation. In: SIGIR, pp. 1921–1924 (2020)
Google Scholar

Download references

Author information

Authors and Affiliations

Harbin Institute of Technology, Harbin, China
Zejun Wang, Xinglong Wu, Hongwei Yang, Hui He, Yu Tai & Weizhe Zhang

Authors

Zejun Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xinglong Wu
View author publications
You can also search for this author in PubMed Google Scholar
Hongwei Yang
View author publications
You can also search for this author in PubMed Google Scholar
Hui He
View author publications
You can also search for this author in PubMed Google Scholar
Yu Tai
View author publications
You can also search for this author in PubMed Google Scholar
Weizhe Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Weizhe Zhang .

Editor information

Editors and Affiliations

Nanjing University of Information Science and Technology, Nanjing, China
Qingshan Liu
Xiamen University, Xiamen, China
Hanzi Wang
Beijing University of Posts and Telecommunications, Beijing, China
Zhanyu Ma
Sun Yat-sen University, Guangzhou, China
Weishi Zheng
Peking University, Beijing, China
Hongbin Zha
Chinese Academy of Sciences, Beijing, China
Xilin Chen
Chinese Academy of Sciences, Beijing, China
Liang Wang
Xiamen University, Xiamen, China
Rongrong Ji

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, Z., Wu, X., Yang, H., He, H., Tai, Y., Zhang, W. (2024). Multi-modal Graph and Sequence Fusion Learning for Recommendation. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14425. Springer, Singapore. https://doi.org/10.1007/978-981-99-8429-9_29

Download citation

DOI: https://doi.org/10.1007/978-981-99-8429-9_29
Published: 24 December 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8428-2
Online ISBN: 978-981-99-8429-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Multi-modal Graph and Sequence Fusion Learning for Recommendation