CMC-MMR: multi-modal recommendation model with cross-modal correction

Wang, YuBin; Xia, HongBin; Liu, Yuan

doi:10.1007/s10844-024-00848-x

CMC-MMR: multi-modal recommendation model with cross-modal correction

Research
Published: 20 February 2024

(2024)
Cite this article

Journal of Intelligent Information Systems Aims and scope Submit manuscript

YuBin Wang¹,
HongBin Xia^1,2 &
Yuan Liu^1,2

168 Accesses
Explore all metrics

Abstract

Multi-modal recommendation using multi-modal features (e.g., image and text features) has received significant attention and has been shown to have more effective recommendation. However, there are currently the following problems with multi-modal recommendation: (1) Multi-modal recommendation often handle individual modes’ raw data directly, leading to noise affecting the model’s effectiveness and the failure to explore interconnections between modes; (2) Different users have different preferences. It’s impractical to treat all modalities equally, as this could interfere with the model’s ability to make recommendation. To address the above problems, this paper proposes a Multi-modal recommendation model with cross-modal correction (CMC-MMR). Firstly, in order to reduce the effect of noise in the raw data and to take full advantage of the relationships between modes, we designed a cross-modal correction module to denoise and correct the modes using a cross-modal correction mechanism; Secondly, the similarity between the same modalities of each item is used as a benchmark to build item-item graphs for each modality, and user-item graphs with degree-sensitive pruning strategies are also built to mine higher-order information; Finally, we designed a self-supervised task to adaptively mine user preferences for modality. We conducted comparative experiments with eleven baseline models on four real-world datasets. The experimental results show that CMC-MMR improves 6.202%, 4.975% , 6.054% and 11.368% on average on the four datasets, respectively, demonstrates the effectiveness of CMC-MMR.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-modal Graph and Sequence Fusion Learning for Recommendation

Automatic recommendation system based on hybrid filtering algorithm

Article 23 July 2021

Robust zero-shot discrete hashing with noisy labels for cross-modal retrieval

Article 13 April 2024

Data Availability

No datasets were generated or analysed during the current study.

Code availability

The code of our paper is temporarily not available.

Notes

Datasets are available at http://jmcauley.ucsd.edu/data/amazon/links.html
https://github.com/enoche/MMRec

References

Adomavicius, G., & Tuzhilin, A. (2005). Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions. IEEE Transactions on Knowledge and Data Engineering, 17(6), 734–749. https://doi.org/10.1109/TKDE.2005.99
Article Google Scholar
Braun, G., Fillottrani, P. R., & Keet, C. M. (2023). A framework for interoperability between models with hybrid tools. Journal of Intelligent Information Systems, 60(2), 437–462. https://doi.org/10.1007/s10844-022-00731-7
Article PubMed Google Scholar
Chen, F., Wang, J., Wei, Y., et al. (2022). Breaking isolation: Multimodal graph fusion for multimedia recommendation by edge-wise modulation. In: Proceedings of the 30th ACM International Conference on Multimedia. Association for Computing Machinery, New York, NY, USA, MM ’22, pp 385–394. https://doi.org/10.1145/3503161.3548399
Chen, J., Hr, Fang, & Saad, Y. (2009). Fast approximate knn graph construction for high dimensional data via recursive lanczos bisection. Journal of Machine Learning Research, 10, 1989–2012.
Google Scholar
Chen, M., Wei, Z., Huang, Z., et al. (2020). Simple and deep graph convolutional networks. In: III HD, Singh A (eds) Proceedings of the 37th International Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 119. PMLR, pp 1725–1735
Ding, S., Lin, D., & Zhou, X. (2021). Graph convolutional reinforcement learning for dependent task allocation in edge computing. In: 2021 IEEE International Conference on Agents (ICA), pp. 25–30. https://doi.org/10.1109/ICA54137.2021.00011
Guarascio, M., Minici, M., Pisani, F. S., et al. (2024). Movie tag prediction: An extreme multi-label multi-modal transformer-based solution with explanation. Journal of Intelligent Information Systems. https://doi.org/10.1007/s10844-023-00836-7
Article Google Scholar
He, R., & McAuley, J. (2016). Vbpr: Visual bayesian personalized ranking from implicit feedback. Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 30, No. 1)
He, X., Deng, K., Wang, X., et al. (2020). Lightgcn: Simplifying and powering graph convolution network for recommendation. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. Association for Computing Machinery, New York, NY, USA, SIGIR ’20, pp. 639–648. https://doi.org/10.1145/3397271.3401063
Hjelm, R. D., Fedorov, A., Lavoie-Marchildon, S., et al. (2019). Learning deep representations by mutual information estimation and maximization
Kemertas, M., Pishdad, L., Derpanis, K. G., et al. (2020). Rankmi: A mutual information maximizing ranking loss. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Kipf, T. N., & Welling, M. (2017). Semi-supervised classification with graph convolutional networks
Koren, Y., Bell, R., & Volinsky, C. (2009). Matrix factorization techniques for recommender systems. Computer, 42(8), 30–37. https://doi.org/10.1109/MC.2009.263
Article Google Scholar
Lei, S., Huanhuan, Y., Pengpeng, Z., et al. (2023). Improving graph collaborative filtering with multimodal-side-information-enriched contrastive learning. Journal of Intelligent Information Systems. https://doi.org/10.1007/s10844-023-00807-y
Article Google Scholar
Liu, S., Chen, Z., Liu, H., et al. (2019). User-video co-attention network for personalized micro-video recommendation. In: The World Wide Web Conference. Association for Computing Machinery, New York, NY, USA, WWW ’19, pp. 3020–3026. https://doi.org/10.1145/3308558.3313513
Luo, D., Cheng, W., Yu, W., et al. (2021). Learning to drop: Robust graph neural network via topological denoising. In: Proceedings of the 14th ACM International Conference on Web Search and Data Mining. Association for Computing Machinery, New York, NY, USA, WSDM ’21, pp 779–787. https://doi.org/10.1145/3437963.3441734
van der Maaten, L., & Hinton, G. E. (2008). Visualizing data using t-sne. Journal of Machine Learning Research, 9, 2579–2605.
Google Scholar
Ni, J., Li, J., & McAuley, J. (2019). Justifying recommendations using distantly-labeled reviews and fine-grained aspects. In: K. Inui, J. Jiang, V. Ng, et al. (Eds.) Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, pp 188–197, https://doi.org/10.18653/v1/D19-1018
Paszke, A., Gross, S., Massa, F., et al. (2019). Pytorch: An imperative style, high-performance deep learning library. In H. Wallach, H. Larochelle, A. Beygelzimer, et al. (Eds.), Advances in Neural Information Processing Systems. (Vol. 32). Curran Associates Inc.
Google Scholar
Reimers, N., & Gurevych, I. (2019). Sentence-bert: Sentence embeddings using siamese bert-networks
Rendle, S., Freudenthaler, C., Gantner, Z., et al (2009). Bpr: Bayesian personalized ranking from implicit feedback. In: Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence. AUAI Press, Arlington, Virginia, USA, UAI ’09, pp. 452–461
Rong, Y., Huang, W., Xu, T., et al. (2020). Dropedge: Towards deep graph convolutional networks on node classification. In: International Conference on Learning Representations
Smith, B., & Linden, G. (2017). Two decades of recommender systems at amazon.com. IEEE Internet Computing, 21(3), 12–18. https://doi.org/10.1109/MIC.2017.72
Article Google Scholar
Tao, Z., Liu, X., Xia, Y., et al. (2023). Self-supervised learning for multimedia recommendation. IEEE Transactions on Multimedia, 25, 5107–5116. https://doi.org/10.1109/TMM.2022.3187556
Article Google Scholar
Terrell, G. R., & Scott, D. W. (1992). Variable kernel density estimation. The Annals of Statistics, 20(3), 1236–1265.
Article MathSciNet Google Scholar
Wang, C., Yu, Y., Ma, W., et al. (2022). Towards representation alignment and uniformity in collaborative filtering. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, New York, NY, USA, KDD ’22, pp 1816–1825. https://doi.org/10.1145/3534678.3539253
Wang, Q., Wei, Y., Yin, J., et al. (2023). Dualgnn: Dual graph neural network for multimedia recommendation. IEEE Transactions on Multimedia, 25, 1074–1084. https://doi.org/10.1109/TMM.2021.3138298
Article Google Scholar
Wang, W., Feng, F., He, X., et al. (2021). Denoising implicit feedback for recommendation. In: Proceedings of the 14th ACM International Conference on Web Search and Data Mining. Association for Computing Machinery, New York, NY, USA, WSDM ’21, pp 373–381. https://doi.org/10.1145/3437963.3441800
Wei, Y., Wang, X., Nie, L., et al. (2019). Mmgcn: Multi-modal graph convolution network for personalized recommendation of micro-video. In: Proceedings of the 27th ACM International Conference on Multimedia. Association for Computing Machinery, New York, NY, USA, MM ’19, pp. 1437–1445. https://doi.org/10.1145/3343031.3351034
Wei, Y., Wang, X., Nie, L., et al. (2020). Graph-refined convolutional network for multimedia recommendation with implicit feedback. In: Proceedings of the 28th ACM International Conference on Multimedia. Association for Computing Machinery, New York, NY, USA, MM ’20, pp. 3541–3549. https://doi.org/10.1145/3394171.3413556
Weston, J., Yee, H., & Weiss, R. J. (2013). Learning to rank recommendations with the k-order statistic loss. In: Proceedings of the 7th ACM Conference on Recommender Systems. Association for Computing Machinery, New York, NY, USA, RecSys ’13, pp 245–248. https://doi.org/10.1145/2507157.2507210
Yu, P., Tan, Z., Lu, G., et al. (2023). Multi-view graph convolutional network for multimedia recommendation. In: Proceedings of the 31st ACM International Conference on Multimedia. Association for Computing Machinery, New York, NY, USA, MM ’23, pp 6576–6585. https://doi.org/10.1145/3581783.3613915
Zhang, F., Yuan, N. J., Lian, D., et al. (2016). Collaborative knowledge base embedding for recommender systems. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, New York, NY, USA, KDD ’16, pp 353–362. https://doi.org/10.1145/2939672.2939673
Zhang, J., Zhu, Y., Liu, Q., et al. (2021). Mining latent structures for multimedia recommendation. In: Proceedings of the 29th ACM International Conference on Multimedia. Association for Computing Machinery, New York, NY, USA, MM ’21, pp. 3872–3880. https://doi.org/10.1145/3474085.3475259
Zhang, J., Zhu, Y., Liu, Q., et al. (2023). Latent structure mining with contrastive modality fusion for multimedia recommendation. IEEE Transactions on Knowledge and Data Engineering, 35(9), 9154–9167. https://doi.org/10.1109/TKDE.2022.3221949
Article Google Scholar
Zhou, J., Cui, G., Hu, S., et al. (2020). Graph neural networks: A review of methods and applications. AI Open, 1, 57–81. https://doi.org/10.1016/j.aiopen.2021.01.001
Article Google Scholar
Zhou, X. (2023). Mmrec: Simplifying multimodal recommendation
Zhou, X., & Shen, Z. (2023). A tale of two graphs: Freezing and denoising graph structures for multimodal recommendation. In: Proceedings of the 31st ACM International Conference on Multimedia. Association for Computing Machinery, New York, NY, USA, MM ’23, pp. 935–943. https://doi.org/10.1145/3581783.3611943
Zhou, X., Zhou, H., Liu, Y., et al. (2023). Bootstrap latent representations for multi-modal recommendation. In: Proceedings of the ACM Web Conference 2023. Association for Computing Machinery, New York, NY, USA, WWW ’23, pp. 845–854. https://doi.org/10.1145/3543507.3583251

Download references

Funding

This work was supported in part by the National Science and Technology Support Program of China (No.61672264).

Author information

Authors and Affiliations

School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, Jiangsu, 214122, China
YuBin Wang, HongBin Xia & Yuan Liu
Jiangsu Key Laboratory of Media Design and Software Technology, Wuxi, Jiangsu, 214122, China
HongBin Xia & Yuan Liu

Authors

YuBin Wang
View author publications
You can also search for this author in PubMed Google Scholar
HongBin Xia
View author publications
You can also search for this author in PubMed Google Scholar
Yuan Liu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

The research results of this manuscript come from our joint collaborative research.

Corresponding author

Correspondence to HongBin Xia.

Ethics declarations

Competing interests

The authors declare no competing interests.

Ethics approval

Ethics approval was not required for this research.

Consent to participate

No one participated in the study of the manuscript.

Consent for publication

Written informed consent for publication was obtained from all participants.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wang, Y., Xia, H. & Liu, Y. CMC-MMR: multi-modal recommendation model with cross-modal correction. J Intell Inf Syst (2024). https://doi.org/10.1007/s10844-024-00848-x

Download citation

Received: 24 November 2023
Revised: 04 February 2024
Accepted: 05 February 2024
Published: 20 February 2024
DOI: https://doi.org/10.1007/s10844-024-00848-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CMC-MMR: multi-modal recommendation model with cross-modal correction

Abstract

Access this article

Similar content being viewed by others

Multi-modal Graph and Sequence Fusion Learning for Recommendation

Automatic recommendation system based on hybrid filtering algorithm

Robust zero-shot discrete hashing with noisy labels for cross-modal retrieval

Data Availability

Code availability

Notes

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Ethics approval

Consent to participate

Consent for publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

CMC-MMR: multi-modal recommendation model with cross-modal correction

Abstract

Access this article

Similar content being viewed by others

Multi-modal Graph and Sequence Fusion Learning for Recommendation

Automatic recommendation system based on hybrid filtering algorithm

Robust zero-shot discrete hashing with noisy labels for cross-modal retrieval

Data Availability

Code availability

Notes

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Ethics approval

Consent to participate

Consent for publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation