Abstract
Multi-modal recommendation aims to leverage multi-modal information for mining users’ latent preferences. Existing multi-modal recommendation approaches primarily exploit graph structures and multi-modal information to explore the graph information derived from user-item interactions, overlooking the underlying sequence information. Furthermore, by treating items solely as coarse-grained entities, the latent relationships of items within each modality are disregarded, impeding the effective extraction of latent user preferences. To address the limitations, we propose a novel approach called Multi-modal Graph and Sequence Fusion Learning Architecture for Recommendation (MMGCF). In MMGCF, we first construct dynamic item-item graphs to enhance item features and capture relationships within each modality. Subsequently, according to the influence between modalities, we design a self attention network to fuse multi-modal features. Finally, in addition to regular graph convolution, we also devise a sequence-aware learning layer to preserve and capture sequence information for model to learn user preferences from a sequential perspective. Extensive experiments conducted on three real-world datasets demonstrate the superiority of our method over various state-of-the-art baselines.
This work was supported in part by the Joint Funds of the National Natural Science Foundation of China (Grant No. U22A2036) and the Key-Area Research and Development Program of Guangdong Province (2020B0101360001).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Chang, J., Gao, C., et al.: Sequential recommendation with graph neural networks. In: SIGIR, pp. 378–387 (2021)
Chen, J., Zhang, H., et al.: Attentive collaborative filtering: multimedia recommendation with item- and component-level attention. In: SIGIR. pp. 335–344 (2017)
Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: EMNLP, pp. 1724–1734 (2014)
Fan, W., Ma, Y., et al.: Graph neural networks for social recommendation. In: WWW, pp. 417–426 (2019)
He, R., McAuley, J.J.: VBPR: visual Bayesian personalized ranking from implicit feedback. In: AAAI, pp. 144–150 (2016)
He, X., Deng, K., et al.: LightGCN: simplifying and powering graph convolution network for recommendation. In: SIGIR, pp. 639–648 (2020)
Huang, C., Xu, H., Xu, Y., et al.: Knowledge-aware coupled graph neural network for social recommendation. In: AAAI, pp. 4115–4122 (2021)
Kingma, D.P., et al.: Adam: a method for stochastic optimization. In: ICLR (2015)
Liu, F., Cheng, Z., Sun, C., et al.: User diverse preference modeling by multimodal attentive metric learning. In: MM, pp. 1526–1534 (2019)
Liu, Q., Wu, S., Wang, L.: DeepStyle: learning user preferences for visual recommendation. In: SIGIR, pp. 841–844 (2017)
Liu, X., Zhang, F., et al.: Self-supervised learning: generative or contrastive. IEEE Trans. Knowl. Data Eng. 35(1), 857–876 (2023)
Rendle, S., Freudenthaler, C., et al.: BPR: Bayesian personalized ranking from implicit feedback. In: UAI, pp. 452–461 (2009)
Tao, Z., Liu, X., et al.: Self-supervised learning for multimedia recommendation. IEEE Trans. Multimedia (2022)
Vaswani, A., et al.: Attention is all you need. In: NIPS, pp. 5998–6008 (2017)
Wang, Q., Wei, Y., et al.: DualGNN: dual graph neural network for multimedia recommendation. IEEE Trans. Multimedia 25, 1074–1084 (2023)
Wang, S., Hu, L., Wang, Y., et al.: Graph learning based recommender systems: a review. In: IJCAI, pp. 4644–4652 (2021)
Wang, X., He, X., et al.: Neural graph collaborative filtering. In: SIGIR, pp. 165–174 (2019)
Wang, Z., Wei, W., et al.: Global context enhanced graph neural networks for session-based recommendation. In: SIGIR, pp. 169–178 (2020)
Wei, W., Huang, C., et al.: Contrastive meta learning with behavior multiplicity for recommendation. In: WSDM, pp. 1120–1128 (2022)
Wei, Y., Wang, X., Li, Q., et al.: Contrastive learning for cold-start recommendation. In: MM, pp. 5382–5390 (2021)
Wei, Y., Wang, X., et al.: MMGCN: multi-modal graph convolution network for personalized recommendation of micro-video. In: MM, pp. 1437–1445 (2019)
Wei, Y., Wang, X., et al.: Graph-refined convolutional network for multimedia recommendation with implicit feedback. In: MM, pp. 3541–3549 (2020)
Wu, S., Sun, F., et al.: Graph neural networks in recommender systems: a survey. ACM Comput. Surv. 55(5), 97:1–97:37 (2023)
Yi, Z., Wang, X., et al.: Multi-modal graph contrastive learning for micro-video recommendation. In: SIGIR, pp. 1807–1811 (2022)
Yu, F., Zhu, Y., et al.: TAGNN: target attentive graph neural networks for session-based recommendation. In: SIGIR, pp. 1921–1924 (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Wang, Z., Wu, X., Yang, H., He, H., Tai, Y., Zhang, W. (2024). Multi-modal Graph and Sequence Fusion Learning for Recommendation. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14425. Springer, Singapore. https://doi.org/10.1007/978-981-99-8429-9_29
Download citation
DOI: https://doi.org/10.1007/978-981-99-8429-9_29
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8428-2
Online ISBN: 978-981-99-8429-9
eBook Packages: Computer ScienceComputer Science (R0)