Abstract
Recommendation systems, which predict relevant and appealing items for users on web platforms, often rely on static user interests, resulting in limited interactivity and adaptability. Reinforcement Learning (RL), while providing a dynamic and adaptive approach, brings its unique challenges in this context. Interpreting the behavior of an RL agent within recommendation systems is complex due to factors such as the vast and continuously evolving state and action spaces, non-stationary user preferences, and implicit, delayed rewards often associated with long-term user satisfaction.
Addressing the inherent complexities of applying RL in recommendation systems, we propose a framework that includes innovative metrics and a synthetic environment. The metrics aim to assess the real-time adaptability of an RL agent to dynamic user preferences. We apply this framework to LastFM datasets to interpret metric outcomes and test hypotheses regarding MDP setups and algorithm choices by adjusting dataset parameters within the synthetic environment. This approach illustrates potential applications of our framework, while highlighting the necessity for further research in this area.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bobadilla, J., Ortega, F., Hernando, A., Gutiérrez, A.: Recommender systems survey. Knowl.-Based Syst. 46, 109–132 (2013)
Dacrema, M.F., Boglio, S., Cremonesi, P., Jannach, D.: A troubling analysis of reproducibility and progress in recommender systems research. ACM Transactions on Information Systems 39(2), 1–49 (2021). https://doi.org/10.1145/3434185, arXiv:1911.07698 [cs]
Deffayet, R., et al.: Offline evaluation for reinforcement learning-based recommendation: a critical issue and some alternatives. arXiv preprint arXiv:2301.00993 (2023)
Frolov, E., Oseledets, I.: Fifty shades of ratings: how to benefit from a negative feedback in top-n recommendations tasks. In: Proceedings of the 10th ACM Conference on Recommender Systems, pp. 91–98 (2016). https://doi.org/10.1145/2959100.2959170, http://arxiv.org/abs/1607.04228, arXiv:1607.04228 [cs, stat]
Grishanov, A., Ianina, A., Vorontsov, K.: Multiobjective evaluation of reinforcement learning based recommender systems. In: Proceedings of the 16th ACM Conference on Recommender Systems, pp. 622–627 (2022)
Hou, Y., et al.: A deep reinforcement learning real-time recommendation model based on long and short-term preference. Int. J. Comput. Intell. Syst. 16(1), 4 (2023)
Ie, E., et al.: RECSIM: a configurable simulation platform for recommender systems (arXiv:1909.04847) (2019). http://arxiv.org/abs/1909.04847, arXiv:1909.04847 [cs, stat]
Koren, Y., Bell, R., Volinsky, C.: Matrix factorization techniques for recommender systems. Computer 42(8), 30–37 (2009)
Levine, S., Kumar, A., Tucker, G., Fu, J.: Offline reinforcement learning: tutorial, review, and perspectives on open problems. arXiv preprint arXiv:2005.01643 (2020)
Liu, F., et al.: Deep reinforcement learning based recommendation with explicit user-item interactions modeling. arXiv preprint arXiv:1810.12027 (2018)
Lundberg, S.M., Lee, S.I.: A unified approach to interpreting model predictions. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Marin, N., Makhneva, E., Lysyuk, M., Chernyy, V., Oseledets, I., Frolov, E.: Tensor-based collaborative filtering with smooth ratings scale. arXiv preprint arXiv:2205.05070 (2022)
Meggetto, F., et al.: Why people skip music? on predicting music skips using deep reinforcement learning. arXiv preprint arXiv:2301.03881 (2023)
Rohde, D., Bonner, S., Dunlop, T., Vasile, F., Karatzoglou, A.: RecoGym: a reinforcement learning environment for the problem of product recommendation in online advertising. arXiv preprint arXiv:1808.00720 (2018)
Schafer, J.B., Frankowski, D., Herlocker, J., Sen, S.: Collaborative filtering recommender systems. In: Brusilovsky, P., Kobsa, A., Nejdl, W. (eds.) The Adaptive Web. LNCS, vol. 4321, pp. 291–324. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-72079-9_9
Seno, T., Imai, M.: d3rlpy: an offline deep reinforcement learning library. J. Mach. Learn. Res. 23(1), 14205–14224 (2022)
Turrin, R., Quadrana, M., Condorelli, A., Pagano, R., Cremonesi, P.: 30music listening and playlists dataset. In: Castells, P. (ed.) Poster Proceedings of the 9th ACM Conference on Recommender Systems, RecSys 2015, Vienna, Austria, September 16, 2015. CEUR Workshop Proceedings, vol. 1441. CEUR-WS.org (2015). https://ceur-ws.org/Vol-1441/recsys2015_poster13.pdf
Wang, K., et al.: Rl4rs: A real-world benchmark for reinforcement learning based recommender system. arXiv preprint arXiv:2110.11073 (2021)
Wang, S., Cao, L., Wang, Y., Sheng, Q.Z., Orgun, M.A., Lian, D.: A survey on session-based recommender systems. ACM Comput. Surv. (CSUR) 54(7), 1–38 (2021)
Wang, S., Hu, L., Wang, Y., Cao, L., Sheng, Q.Z., Orgun, M.: Sequential recommender systems: challenges, progress and prospects. arXiv preprint arXiv:2001.04830 (2019)
Wang, X., et al.: A reinforcement learning framework for explainable recommendation. In: 2018 IEEE International Conference on Data Mining (ICDM), pp. 587–596. IEEE (2018)
Zhao, X., Xia, L., Zou, L., Yin, D., Tang, J.: Toward simulating environments in reinforcement learning based recommendations. arXiv preprint arXiv:1906.11462 (2019)
Acknowledgments
The authors extend their heartfelt gratitude to Evgeny Frolov and Alexey Skrynnyk for their insightful feedback, guidance on key focus areas for this research.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
Classic Recommendation Algorithms
Recommendation systems have been extensively studied, and several classic algorithms have emerged as popular approaches for generating recommendations. In this section, we provide a brief overview of some of these algorithms, such as matrix factorization and other notable examples.
1.1 Algorithms
Matrix factorization is a widely used technique for collaborative filtering in recommendation systems. The basic idea is to decompose the user-item interaction matrix into two lower-dimensional matrices, representing latent factors for users and items. The interaction between users and items can then be approximated by the product of these latent factors. Singular Value Decomposition (SVD) and Alternating Least Squares (ALS) are common methods for performing matrix factorization. The objective function for matrix factorization can be written as:
where \(R_{ij}\) is the observed interaction between user i and item j, \(U_i\) and \(V_j\) are the latent factors for user i and item j, respectively, \(\Omega \) is the set of observed user-item interactions, and \(\lambda \) is a regularization parameter to prevent overfitting.
Besides matrix factorization, other classic recommendation algorithms include:
-
User-based Collaborative Filtering: This approach finds users who are similar to the target user and recommends items that these similar users have liked or interacted with. The similarity between users can be computed using metrics such as Pearson correlation or cosine similarity.
-
Item-based Collaborative Filtering: Instead of focusing on user similarity, this method computes the similarity between items and recommends items that are similar to those the target user has liked or interacted with.
-
Content-based Filtering: This approach utilizes features of items and user profiles to generate recommendations, assuming that users are more likely to be interested in items that are similar to their previous interactions.
1.2 Evaluation Metrics for Recommendation Algorithms
Several evaluation metrics are commonly used to assess the performance of recommendation algorithms. We provide a brief description with equations for the metrics, which we use in our work.
Normalized Discounted Cumulative Gain (NDCG) used for measuring the effectiveness of ranking algorithms, takes into account the position of relevant items in the ranked list. First, DCG is calculated:
where k is the number of top recommendations considered, and \(\text {rel}_i\) is the relevance score of the item at position i in the ranked list. Then DCG value is normalized with the ideal DCG (IDCG), which represents the highest possible DCG value:
Coverage measures the fraction of recommended items to the total number of items:
High coverage indicates that the algorithm can recommend a diverse set of items, while low coverage implies that it is limited to a narrow subset. Hit Rate calculates the proportion of relevant recommendations out of the total recommendations provided:
where a “hit” occurs when a recommended item is considered relevant or of interest to the user. A higher hit rate indicates better performance.
Experiments in Dynamic Environment
Experiments in a high-dynamic environment for agents trained on optimal and sub-optimal data. Preference Correlation is the best metric for evaluating lower-quality datasets with negative reviews, as it assesses the agent’s understanding of user needs. However, high values can occur if the dataset has many positive ratings and the agent’s coverage is low. For higher-quality datasets with positive responses, the I-HitRate metric correlates with online evaluations but is sensitive to the agent’s coverage.
Reinforcement Learning as an MDP Problem
Reinforcement learning (RL) addresses the problem of learning optimal behaviors by interacting with an environment. A fundamental concept in RL is the Markov Decision Process (MDP), which models the decision-making problem as a tuple \((S, A, P, R, \gamma )\). In this framework, S represents the state space, A is the action space, P is the state transition probability function, R denotes the reward function, and \(\gamma \) is the discount factor. By formulating the recommendation problem as an MDP, RL algorithms can learn to make decisions that optimize long-term rewards. The MDP framework provides a solid foundation for designing and evaluating RL agents in recommendation systems, allowing for the development of more adaptive and effective algorithms.
Comparing Algorithms Behavior on High Dynamic Environment
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Volovikova, Z., Kuderov, P., Panov, A.I. (2024). Interpreting Decision Process in Offline Reinforcement Learning for Interactive Recommendation Systems. In: Luo, B., Cheng, L., Wu, ZG., Li, H., Li, C. (eds) Neural Information Processing. ICONIP 2023. Communications in Computer and Information Science, vol 1963. Springer, Singapore. https://doi.org/10.1007/978-981-99-8138-0_22
Download citation
DOI: https://doi.org/10.1007/978-981-99-8138-0_22
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8137-3
Online ISBN: 978-981-99-8138-0
eBook Packages: Computer ScienceComputer Science (R0)