Abstract
Attention mechanism has been proven to be a useful model for sequence recommendation. Traditional multi-head self-attention architecture can exploit the entire user sequence and adaptively consider consumed items for the next item recommendation. However, the scaling between the number of heads and the size of each head in the multi-head attention model gives rise to a low-rank bottleneck problem, resulting in insufficient expression power and hurting the performance of recommendation model. In this paper, we propose a variant of self-attention called mixed distribution of multi-head attention for sequence recommendation (MMSRec), which constructs the mixed distribution model by weighted averaging of multiple simple distributions, instead of currently dominant methods by increasing the embedding size for addressing the low-rank bottleneck. Extensive experiments on four real-world datasets show that our MMSRec algorithm has significant improvements over state-of-the-art algorithms. Empirical evidence shows that the performance of our recommendation model can be effectively improved by stacking multiple low-rank distributions.
Similar content being viewed by others
References
Guan X, Cheng Z, He X, et al. (2019) Attentive aspect modeling for review-aware recommendation[J]. ACM Trans Inf Syst 37(3):1–27
Pujahari A, Sisodia DS (2021) Preference relation based collaborative filtering with graph aggregation for group recommender system[J]. Appl Intell 51(2):658–672
Wang D, Xu D, Yu D, et al. (2021) Time-aware sequence model for next-item recommendation[J]. Appl Intell 51(2):906–920
Li G, Qiu L, Yu C, et al. (2020) IPTV Channel zapping recommendation with attention mechanism[J]. IEEE Trans Multimed 23:538–549
Xu C, Feng J, Zhao P, et al. (2021) Long-and short-term self-attention network for sequential recommendation[J]. Neurocomputing 423:580–589
Tang J, Wang K (2018) Personalized top-n sequential recommendation via convolutional sequence embedding[C]. In: Proceedings of the 11th ACM International Conference on Web Search and Data Mining, pp 565–573
Li J, Wang Y, McAuley J (2020) Time interval aware self-attention for sequential recommendation[C]. In: Proceedings of the 13th International Conference on Web Search and Data Mining, pp 322–330
Zhang Y, Shi Z, Zuo W, et al. (2020) Joint Personalized Markov Chains with social network embedding for cold-start recommendation[J]. Neurocomputing 386:208–220
Donkers T, Loepp B, Ziegler J (2017) Sequential user-based recurrent neural network recommendations[C]. In: Proceedings of the 11th ACM Conference on Recommender Systems, pp 152–160
Vaswani A, Shazeer N, Parmar N, et al. (2017) Attention is all you need[C]. In: Advances in Neural Information Processing Systems, pp 5998–6008
Zhang T, Zhao P, Liu Y, et al. (2019) Feature-level deeper self-attention network for sequential recommendation[C]. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence, pp 4320–4326
Kang W C, McAuley J. (2018) Self-attentive sequential recommendation[C]. In: Proceedings of the 2018 IEEE International Conference on Data Mining, pp 197–206
Zhang S, Tay Y, Yao L, et al. (2019) Next item recommendation with self-attentive metric learning[C]. In: Proceedings of the 33rd AAAI Conference on Artificial Intelligence, pp 9
Wu L, Li S, Hsieh C J, et al. (2020) SSE-PT: Sequential Recommendation via personalized transformer[C]. In: Proceedings of the 14th ACM Conference on Recommender Systems, pp 328–337
Bhojanapalli S, Yun C, Rawat A S, et al. (2020) Low-rank bottleneck in multi-head attention models[C]. In: Proceedings of the 37th International Conference on Machine Learning, pp 864–873
Wang J, Zhu L, Dai T, et al. (2021) Low-rank and sparse matrix factorization with prior relations for recommender systems[J]. Appl Intell 51(6):3435–3449
Zhang S, Yao L, Sun A, et al. (2019) Deep learning based recommender system: A survey and new perspectives[J]. ACM Comput Surv 52(1):1–38
He X, Liao L, Zhang H, et al. (2017) Neural collaborative filtering[C]. In: Proceedings of the 26th International Conference on World Wide Web, pp 173–182
Nassar N, Jafar A, Rahhal Y (2020) A novel deep multi-criteria collaborative filtering model for recommendation system[J]. Knowl-Based Syst 187:104811
Wu C Y, Ahmed A, Beutel A, et al. (2017) Recurrent recommender networks[C]. In: Proceedings of the 10th ACM International Conference on Web Search and Data Mining, pp 495–503
Chen X, Xu H, Zhang Y, et al. (2018) Sequential recommendation with user memory networks[C]. In: Proceedings of the 11th ACM International Conference on Web Search and Data Mining, pp 108–116
Gehring J, Auli M, Grangier D, et al. (2017) Convolutional sequence to sequence learning[C]. In: Proceedings of the 34th International Conference on Machine Learning, vol 70, pp 1243–1252
Wu C, Wu F, Ge S, et al. (2019) Neural news recommendation with multi-head self-attention[C]. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, pp 6390–6395
Chen J, Wang C, Shi Q, et al. (2019) Social recommendation based on users’ attention and preference. Neurocomputing 341(5):1–9
Lei K, Fu Q, Yang M, et al. (2020) Tag recommendation by text classification with attention-based capsule network. Neurocomputing 391(5):65–73
Zhang Y, Liu X (2021) Learning attention embeddings based on memory networks for neural collaborative recommendation[J]. Expert Systems with Applications, pp 115439
Kovaleva O, Romanov A, Rogers A, et al. (2019) Revealing the dark secrets of BERT[c]. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp 4365–4374
Kingma D P, Ba J. (2015) Adam A method for stochastic optimization[C]. In: Proceedings of the 3rd International Conference on Learning Representations, pp 1–15
He R, McAuley J (2016) Ups and Downs Modeling the visual evolution of fashion trends with one-class collaborative filtering[C]. In: Proceedings of the 25th International Conference on World Wide Web, pp 507–517
Sarwar B M, Karypis G, Konstan J A, et al. (2001) Item-based collaborative filtering recommendation algorithms[C]. In: Proceedings of the 10th International World Wide Web Conference, pp 285–295
Ning X, Karypis G (2011) Slim: Sparse linear methods for top-n recommender systems[C]. In: Proceedings of the 11th International Conference on Data Mining, pp 497–506
Cheng Z, Ding Y, He X, et al. (2018) A3NCF: an adaptive aspect attention model for rating prediction[C]. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence, pp 3748–3754
Acknowledgements
The work is supported by the Natural Science Foundation of Chongqing (No.cstc2019jcyj-msxmX0544), the Science and Technology Research Program of Chongqing Municipal Education Commission (No.KJZD-K202101105, KJQN202001136), the National Natural Science Foundation of China (No.61702063).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
The authors declare that they have no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Grants No.cstc2019jcyj-msxmX0544, No.KJZD-K202101105, KJQN202001136, No.61702063
Rights and permissions
About this article
Cite this article
Zhang, Y., Liu, X. Leveraging mixed distribution of multi-head attention for sequential recommendation. Appl Intell 53, 454–469 (2023). https://doi.org/10.1007/s10489-022-03520-5
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-03520-5