Abstract
Multi-modal recommendation using multi-modal features (e.g., image and text features) has received significant attention and has been shown to have more effective recommendation. However, there are currently the following problems with multi-modal recommendation: (1) Multi-modal recommendation often handle individual modes’ raw data directly, leading to noise affecting the model’s effectiveness and the failure to explore interconnections between modes; (2) Different users have different preferences. It’s impractical to treat all modalities equally, as this could interfere with the model’s ability to make recommendation. To address the above problems, this paper proposes a Multi-modal recommendation model with cross-modal correction (CMC-MMR). Firstly, in order to reduce the effect of noise in the raw data and to take full advantage of the relationships between modes, we designed a cross-modal correction module to denoise and correct the modes using a cross-modal correction mechanism; Secondly, the similarity between the same modalities of each item is used as a benchmark to build item-item graphs for each modality, and user-item graphs with degree-sensitive pruning strategies are also built to mine higher-order information; Finally, we designed a self-supervised task to adaptively mine user preferences for modality. We conducted comparative experiments with eleven baseline models on four real-world datasets. The experimental results show that CMC-MMR improves 6.202%, 4.975% , 6.054% and 11.368% on average on the four datasets, respectively, demonstrates the effectiveness of CMC-MMR.
Similar content being viewed by others
Data Availability
No datasets were generated or analysed during the current study.
Code availability
The code of our paper is temporarily not available.
Notes
Datasets are available at http://jmcauley.ucsd.edu/data/amazon/links.html
References
Adomavicius, G., & Tuzhilin, A. (2005). Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions. IEEE Transactions on Knowledge and Data Engineering, 17(6), 734–749. https://doi.org/10.1109/TKDE.2005.99
Braun, G., Fillottrani, P. R., & Keet, C. M. (2023). A framework for interoperability between models with hybrid tools. Journal of Intelligent Information Systems, 60(2), 437–462. https://doi.org/10.1007/s10844-022-00731-7
Chen, F., Wang, J., Wei, Y., et al. (2022). Breaking isolation: Multimodal graph fusion for multimedia recommendation by edge-wise modulation. In: Proceedings of the 30th ACM International Conference on Multimedia. Association for Computing Machinery, New York, NY, USA, MM ’22, pp 385–394. https://doi.org/10.1145/3503161.3548399
Chen, J., Hr, Fang, & Saad, Y. (2009). Fast approximate knn graph construction for high dimensional data via recursive lanczos bisection. Journal of Machine Learning Research, 10, 1989–2012.
Chen, M., Wei, Z., Huang, Z., et al. (2020). Simple and deep graph convolutional networks. In: III HD, Singh A (eds) Proceedings of the 37th International Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 119. PMLR, pp 1725–1735
Ding, S., Lin, D., & Zhou, X. (2021). Graph convolutional reinforcement learning for dependent task allocation in edge computing. In: 2021 IEEE International Conference on Agents (ICA), pp. 25–30. https://doi.org/10.1109/ICA54137.2021.00011
Guarascio, M., Minici, M., Pisani, F. S., et al. (2024). Movie tag prediction: An extreme multi-label multi-modal transformer-based solution with explanation. Journal of Intelligent Information Systems. https://doi.org/10.1007/s10844-023-00836-7
He, R., & McAuley, J. (2016). Vbpr: Visual bayesian personalized ranking from implicit feedback. Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 30, No. 1)
He, X., Deng, K., Wang, X., et al. (2020). Lightgcn: Simplifying and powering graph convolution network for recommendation. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. Association for Computing Machinery, New York, NY, USA, SIGIR ’20, pp. 639–648. https://doi.org/10.1145/3397271.3401063
Hjelm, R. D., Fedorov, A., Lavoie-Marchildon, S., et al. (2019). Learning deep representations by mutual information estimation and maximization
Kemertas, M., Pishdad, L., Derpanis, K. G., et al. (2020). Rankmi: A mutual information maximizing ranking loss. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Kipf, T. N., & Welling, M. (2017). Semi-supervised classification with graph convolutional networks
Koren, Y., Bell, R., & Volinsky, C. (2009). Matrix factorization techniques for recommender systems. Computer, 42(8), 30–37. https://doi.org/10.1109/MC.2009.263
Lei, S., Huanhuan, Y., Pengpeng, Z., et al. (2023). Improving graph collaborative filtering with multimodal-side-information-enriched contrastive learning. Journal of Intelligent Information Systems. https://doi.org/10.1007/s10844-023-00807-y
Liu, S., Chen, Z., Liu, H., et al. (2019). User-video co-attention network for personalized micro-video recommendation. In: The World Wide Web Conference. Association for Computing Machinery, New York, NY, USA, WWW ’19, pp. 3020–3026. https://doi.org/10.1145/3308558.3313513
Luo, D., Cheng, W., Yu, W., et al. (2021). Learning to drop: Robust graph neural network via topological denoising. In: Proceedings of the 14th ACM International Conference on Web Search and Data Mining. Association for Computing Machinery, New York, NY, USA, WSDM ’21, pp 779–787. https://doi.org/10.1145/3437963.3441734
van der Maaten, L., & Hinton, G. E. (2008). Visualizing data using t-sne. Journal of Machine Learning Research, 9, 2579–2605.
Ni, J., Li, J., & McAuley, J. (2019). Justifying recommendations using distantly-labeled reviews and fine-grained aspects. In: K. Inui, J. Jiang, V. Ng, et al. (Eds.) Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, pp 188–197, https://doi.org/10.18653/v1/D19-1018
Paszke, A., Gross, S., Massa, F., et al. (2019). Pytorch: An imperative style, high-performance deep learning library. In H. Wallach, H. Larochelle, A. Beygelzimer, et al. (Eds.), Advances in Neural Information Processing Systems. (Vol. 32). Curran Associates Inc.
Reimers, N., & Gurevych, I. (2019). Sentence-bert: Sentence embeddings using siamese bert-networks
Rendle, S., Freudenthaler, C., Gantner, Z., et al (2009). Bpr: Bayesian personalized ranking from implicit feedback. In: Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence. AUAI Press, Arlington, Virginia, USA, UAI ’09, pp. 452–461
Rong, Y., Huang, W., Xu, T., et al. (2020). Dropedge: Towards deep graph convolutional networks on node classification. In: International Conference on Learning Representations
Smith, B., & Linden, G. (2017). Two decades of recommender systems at amazon.com. IEEE Internet Computing, 21(3), 12–18. https://doi.org/10.1109/MIC.2017.72
Tao, Z., Liu, X., Xia, Y., et al. (2023). Self-supervised learning for multimedia recommendation. IEEE Transactions on Multimedia, 25, 5107–5116. https://doi.org/10.1109/TMM.2022.3187556
Terrell, G. R., & Scott, D. W. (1992). Variable kernel density estimation. The Annals of Statistics, 20(3), 1236–1265.
Wang, C., Yu, Y., Ma, W., et al. (2022). Towards representation alignment and uniformity in collaborative filtering. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, New York, NY, USA, KDD ’22, pp 1816–1825. https://doi.org/10.1145/3534678.3539253
Wang, Q., Wei, Y., Yin, J., et al. (2023). Dualgnn: Dual graph neural network for multimedia recommendation. IEEE Transactions on Multimedia, 25, 1074–1084. https://doi.org/10.1109/TMM.2021.3138298
Wang, W., Feng, F., He, X., et al. (2021). Denoising implicit feedback for recommendation. In: Proceedings of the 14th ACM International Conference on Web Search and Data Mining. Association for Computing Machinery, New York, NY, USA, WSDM ’21, pp 373–381. https://doi.org/10.1145/3437963.3441800
Wei, Y., Wang, X., Nie, L., et al. (2019). Mmgcn: Multi-modal graph convolution network for personalized recommendation of micro-video. In: Proceedings of the 27th ACM International Conference on Multimedia. Association for Computing Machinery, New York, NY, USA, MM ’19, pp. 1437–1445. https://doi.org/10.1145/3343031.3351034
Wei, Y., Wang, X., Nie, L., et al. (2020). Graph-refined convolutional network for multimedia recommendation with implicit feedback. In: Proceedings of the 28th ACM International Conference on Multimedia. Association for Computing Machinery, New York, NY, USA, MM ’20, pp. 3541–3549. https://doi.org/10.1145/3394171.3413556
Weston, J., Yee, H., & Weiss, R. J. (2013). Learning to rank recommendations with the k-order statistic loss. In: Proceedings of the 7th ACM Conference on Recommender Systems. Association for Computing Machinery, New York, NY, USA, RecSys ’13, pp 245–248. https://doi.org/10.1145/2507157.2507210
Yu, P., Tan, Z., Lu, G., et al. (2023). Multi-view graph convolutional network for multimedia recommendation. In: Proceedings of the 31st ACM International Conference on Multimedia. Association for Computing Machinery, New York, NY, USA, MM ’23, pp 6576–6585. https://doi.org/10.1145/3581783.3613915
Zhang, F., Yuan, N. J., Lian, D., et al. (2016). Collaborative knowledge base embedding for recommender systems. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, New York, NY, USA, KDD ’16, pp 353–362. https://doi.org/10.1145/2939672.2939673
Zhang, J., Zhu, Y., Liu, Q., et al. (2021). Mining latent structures for multimedia recommendation. In: Proceedings of the 29th ACM International Conference on Multimedia. Association for Computing Machinery, New York, NY, USA, MM ’21, pp. 3872–3880. https://doi.org/10.1145/3474085.3475259
Zhang, J., Zhu, Y., Liu, Q., et al. (2023). Latent structure mining with contrastive modality fusion for multimedia recommendation. IEEE Transactions on Knowledge and Data Engineering, 35(9), 9154–9167. https://doi.org/10.1109/TKDE.2022.3221949
Zhou, J., Cui, G., Hu, S., et al. (2020). Graph neural networks: A review of methods and applications. AI Open, 1, 57–81. https://doi.org/10.1016/j.aiopen.2021.01.001
Zhou, X. (2023). Mmrec: Simplifying multimodal recommendation
Zhou, X., & Shen, Z. (2023). A tale of two graphs: Freezing and denoising graph structures for multimodal recommendation. In: Proceedings of the 31st ACM International Conference on Multimedia. Association for Computing Machinery, New York, NY, USA, MM ’23, pp. 935–943. https://doi.org/10.1145/3581783.3611943
Zhou, X., Zhou, H., Liu, Y., et al. (2023). Bootstrap latent representations for multi-modal recommendation. In: Proceedings of the ACM Web Conference 2023. Association for Computing Machinery, New York, NY, USA, WWW ’23, pp. 845–854. https://doi.org/10.1145/3543507.3583251
Funding
This work was supported in part by the National Science and Technology Support Program of China (No.61672264).
Author information
Authors and Affiliations
Contributions
The research results of this manuscript come from our joint collaborative research.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Ethics approval
Ethics approval was not required for this research.
Consent to participate
No one participated in the study of the manuscript.
Consent for publication
Written informed consent for publication was obtained from all participants.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wang, Y., Xia, H. & Liu, Y. CMC-MMR: multi-modal recommendation model with cross-modal correction. J Intell Inf Syst (2024). https://doi.org/10.1007/s10844-024-00848-x
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10844-024-00848-x