Abstract
Reinforcement learning algorithms play an important role in modern day and have been applied to many domains. For example, personalized recommendations problem can be modelled as a contextual multi-armed bandit problem in reinforcement learning. In this paper, we propose a contextual bandit algorithm which is based on Contexts and the Chosen Number of Arm with Minimal Estimation, namely Con-CNAME in short. The continuous exploration and context used in our algorithm can address the cold start problem in recommender systems. Furthermore, the Con-CNAME algorithm can still make recommendations under the emergency circumstances where contexts are unavailable suddenly. In the experimental evaluation, the reference range of key parameters and the stability of Con-CNAME are discussed in detail. In addition, the performance of Con-CNAME is compared with some classic algorithms. Experimental results show that our algorithm outperforms several bandit algorithms.
This work is supported in part by the National Key Research and Development Program of China (2016YFC0800805).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Sutton, R.S., Barto, A.G.: Introduction to reinforcement learning. Mach. Learn. 16(1), 285–286 (2005)
Li, S., Karatzoglou, A., Gentile, C.: Collaborative filtering bandits. In: International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 539–548 (2016)
Eghbali, S., Ashtiani, M.H.Z., Ahmadabadi, M.N., et al.: Bandit-based structure learning for bayesian network classifiers. In: International Conference on Neural Information Processing, pp. 349–356 (2012)
Resnick, P., Varian, H.R.: Recommender systems. Commun. ACM 40(3), 56–58 (1997)
Balabanović, M., Shoham, Y.: Fab: content-based, collaborative recommendation. Commun. ACM 40(3), 66–72 (1997)
Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47(2), 235–256 (2002)
Liu, J., Dolan, P., Pedersen, E.R.: Personalized news recommendation based on click behavior. In: International Conference on Intelligent User Interfaces, pp. 31–40 (2010)
Dhanda, M., Verma, V.: Personalized recommendation approach for academic literature using high-utility itemset mining technique. Progress in Intelligent Computing Techniques: Theory, Practice, and Applications (2018)
Schein, A.I., Popescul, A., Ungar, L.H., et al.: Methods and metrics for cold-start recommendations. In: Proceedings of ACM SIGIR Conference on Research & Development in Information Retrieval, vol. 39(5), 253–260 (2002)
Mary, J., Gaudel, R., Philippe, P.: Bandits warm-up cold recommender systems. Computer Science (2014)
Tang, L., Jiang, Y., Li, L., Li, T.: Ensemble contextual bandits for personalized recommendation. In: RecSys, pp. 73–80 (2014)
Adomavicius, G., Tuzhilin, A.: Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions. IEEE Trans. Knowl. Data Eng. 17(6), 734–749 (2005)
Shani, G., Heckerman, D., Brafman, R.I.: An MDP-based recommender system. J. Mach. Learn. Res. 6(1), 1265–1295 (2005)
Ren, Z., Krogh, B.H.: State aggregation in markov decision processes. In: IEEE Conference on Decision and Control, pp. 3819–3824 (2002)
Cesa-Bianchi, N., Lugosi, G.: Prediction, Learning, and Games. Cambridge University Press, Cambridge (2006)
Cesa-Bianchi, N., Fischer, P.: Finite-time regret bounds for the multi-armed bandit problem. In: ICML, pp. 100–108 (1998)
Bubeck, S., Slivkins, A.: The best of both worlds: stochastic and adversarial bandits. J. Mach. Learn. Res. 23(42), 1–23 (2012)
Adomavicius, G., Tuzhilin, A.: Context-aware recommender systems. In: Recommender Systems Handbook, pp. 191–226 (2015)
Adomavicius, G., Sankaranarayanan, R., Sen, S., Tuzhilin, A.: Incorporating contextual information in recommender systems using a multidimensional approach. ACM Trans. Inf. Syst. 23(1), 103–145 (2005)
Li, L., Chu, W., Langford, J., Schapire, R. E.: A contextual-bandit approach to personalized news article recommendation. In: World Wide Web, pp. 661–670 (2010)
Song, L., Tekin, C., Schaar, M.V.D.: Online learning in large-scale contextual recommender systems. IEEE Trans. Serv. Comput. 9(3), 433–445 (2016)
Jośe, A.M.H., Vargas, A.M.: Linear bayes policy for learning in contextual-bandits. Expert Syst. Appl. 40(18), 7400–7406 (2013)
Zhou, Q., Zhang, X.F, Xu, J., et al.: Large-scale bandit approaches for recommender systems. In: International Conference on Neural Information Processing, pp. 811–821 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Zhang, X., Zhou, Q., He, T., Liang, B. (2018). Con-CNAME: A Contextual Multi-armed Bandit Algorithm for Personalized Recommendations. In: Kůrková, V., Manolopoulos, Y., Hammer, B., Iliadis, L., Maglogiannis, I. (eds) Artificial Neural Networks and Machine Learning – ICANN 2018. ICANN 2018. Lecture Notes in Computer Science(), vol 11140. Springer, Cham. https://doi.org/10.1007/978-3-030-01421-6_32
Download citation
DOI: https://doi.org/10.1007/978-3-030-01421-6_32
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01420-9
Online ISBN: 978-3-030-01421-6
eBook Packages: Computer ScienceComputer Science (R0)