Collaborative Thompson Sampling

Zhu, Zhenyu; Huang, Liusheng; Xu, Hongli

doi:10.1007/978-3-030-12981-1_2

Zhenyu Zhu¹⁹,
Liusheng Huang¹⁹ &
Hongli Xu¹⁹

Part of the book series: Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering ((LNICST,volume 268))

Included in the following conference series:

International Conference on Collaborative Computing: Networking, Applications and Worksharing

876 Accesses

Abstract

Thompson sampling is one of the most effective strategies to balance exploration-exploitation trade-off. It has been applied in a variety of domains and achieved remarkable success. Thompson sampling makes decisions in a noisy but stationary environment by accumulating uncertain information over time to improve prediction accuracy. In highly dynamic domains, however, the environment undergoes frequent and unpredictable changes. Making decisions in such an environment should rely on current information. Therefore, standard Thompson sampling may perform poorly in these domains. Here we present a collaborative Thompson sampling algorithm to apply the exploration-exploitation strategy to highly dynamic settings. The algorithm takes collaborative effects into account by dynamically clustering users into groups, and the feedback of all users in the same group will help to estimate the expected reward in the current context to find the optimal choice. Incorporating collaborative effects into Thompson sampling allows to capture real-time changes of the environment and adjust decision making strategy accordingly. We compare our algorithm with standard Thompson sampling algorithms on two real-world datasets. Our algorithm shows accelerated convergence and improved prediction performance in collaborative environments. We also provide a regret analysis of our algorithm on a non-contextual model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Agarwal, D., Long, B., Traupman, J., Xin, D., Zhang, L.: Laser: a scalable response prediction platform for online advertising. In: Proceedings of the 7th ACM International Conference on Web Search and Data Mining, pp. 173–182. ACM (2014)
Google Scholar
Agrawal, S., Goyal, N.: Analysis of Thompson sampling for the multi-armed bandit problem. In: Conference on Learning Theory, pp. 39.1–39.26 (2012)
Google Scholar
Agrawal, S., Goyal, N.: Thompson sampling for contextual bandits with linear payoffs. In: International Conference on Machine Learning, pp. 127–135 (2013)
Google Scholar
Banerjee, A.: On Bayesian bounds. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 81–88. ACM (2006)
Google Scholar
Bresler, G., Chen, G.H., Shah, D.: A latent source model for online collaborative filtering. In: Advances in Neural Information Processing Systems, pp. 3347–3355 (2014)
Google Scholar
Brodén, B., Hammar, M., Nilsson, B.J., Paraschakis, D.: Ensemble recommendations via Thompson sampling: an experimental study within e-Commerce. In: 23rd International Conference on Intelligent User Interfaces, pp. 19–29. ACM (2018)
Google Scholar
Chapelle, O., Li, L.: An empirical evaluation of Thompson sampling. In: Advances in Neural Information Processing Systems, pp. 2249–2257 (2011)
Google Scholar
Christakopoulou, K., Banerjee, A.: Learning to interact with users: a collaborative-bandit approach. In: Proceedings of the 2018 SIAM International Conference on Data Mining, pp. 612–620. SIAM (2018)
Chapter Google Scholar
Chu, W., Li, L., Reyzin, L., Schapire, R.: Contextual bandits with linear payoff functions. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp. 208–214 (2011)
Google Scholar
Chu, W., et al.: A case study of behavior-driven conjoint analysis on Yahoo!: front page today module. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1097–1104. ACM (2009)
Google Scholar
Ferreira, K., Simchi-Levi, D., Wang, H.: Online network revenue management using Thompson sampling (2017)
Google Scholar
Glaze, C.M., Filipowicz, A.L., Kable, J.W., Balasubramanian, V., Gold, J.I.: A bias-variance trade-off governs individual differences in on-line learning in an unpredictable environment. Nat. Hum. Behav. 2(3), 213 (2018)
Article Google Scholar
Gopalan, A., Mannor, S.: Thompson sampling for learning parameterized Markov decision processes. In: Conference on Learning Theory, pp. 861–898 (2015)
Google Scholar
Gopalan, A., Mannor, S., Mansour, Y.: Thompson sampling for complex online problems. In: International Conference on Machine Learning, pp. 100–108 (2014)
Google Scholar
Graepel, T., Candela, J.Q., Borchert, T., Herbrich, R.: Web-scale Bayesian click-through rate prediction for sponsored search advertising in Microsoft’s Bing search engine. Omnipress (2010)
Google Scholar
Johnson, C.C.: Logistic matrix factorization for implicit feedback data. In: Advances in Neural Information Processing Systems, vol. 27 (2014)
Google Scholar
Kaufmann, E., Korda, N., Munos, R.: Thompson sampling: an asymptotically optimal finite-time analysis. In: Bshouty, N.H., Stoltz, G., Vayatis, N., Zeugmann, T. (eds.) ALT 2012. LNCS (LNAI), vol. 7568, pp. 199–213. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-34106-9_18
Chapter Google Scholar
Kawale, J., Bui, H.H., Kveton, B., Tran-Thanh, L., Chawla, S.: Efficient Thompson sampling for online matrix-factorization recommendation. In: Advances in Neural Information Processing Systems, pp. 1297–1305 (2015)
Google Scholar
Lavancier, F., Rochet, P.: A general procedure to combine estimators. Comput. Stat. Data Anal. 94, 175–192 (2016)
Article MathSciNet Google Scholar
Li, L., Chu, W., Langford, J., Schapire, R.E.: A contextual-bandit approach to personalized news article recommendation. In: Proceedings of the 19th International Conference on World Wide Web, pp. 661–670. ACM (2010)
Google Scholar
Li, S., Karatzoglou, A., Gentile, C.: Collaborative filtering bandits. In: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 539–548. ACM (2016)
Google Scholar
Nguyen, T.T., Lauw, H.W.: Dynamic clustering of contextual multi-armed bandits. In: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, pp. 1959–1962. ACM (2014)
Google Scholar
Ouyang, Y., Gagrani, M., Nayyar, A., Jain, R.: Learning unknown Markov decision processes: a Thompson sampling approach. In: Advances in Neural Information Processing Systems, pp. 1333–1342 (2017)
Google Scholar
Russo, D.J., Van Roy, B., Kazerouni, A., Osband, I., Wen, Z., et al.: A tutorial on Thompson sampling. Found. Trends® in Mach. Learn. 11(1), 1–96 (2018)
Article Google Scholar
Schwartz, E.M., Bradlow, E.T., Fader, P.S.: Customer acquisition via display advertising using multi-armed bandit experiments. Mark. Sci. 36(4), 500–522 (2017)
Article Google Scholar
Scott, S.L.: A modern Bayesian look at the multi-armed bandit. Appl. Stoch. Models Bus. Ind. 26(6), 639–658 (2010)
Article MathSciNet Google Scholar
Thompson, W.R.: On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25(3/4), 285–294 (1933)
Article Google Scholar
Wolfinger, R.: Laplace’s approximation for nonlinear mixed models. Biometrika 80(4), 791–795 (1993)
Article MathSciNet Google Scholar
Wu, Q., Wang, H., Gu, Q., Wang, H.: Contextual bandits in a collaborative environment. In: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 529–538. ACM (2016)
Google Scholar

Download references

Acknowledgments

This paper is supported by the National Science Foundation of China under Grant 61472385 and Grant U1709217.

Author information

Authors and Affiliations

University of Science and Technology of China, Hefei, Anhui, China
Zhenyu Zhu, Liusheng Huang & Hongli Xu

Authors

Zhenyu Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Liusheng Huang
View author publications
You can also search for this author in PubMed Google Scholar
Hongli Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Liusheng Huang .

Editor information

Editors and Affiliations

Shanghai University, Shanghai, China
Honghao Gao
University of West London, London, UK
Xinheng Wang
Hangzhou Dianzi University, Hangzhou Shi, Zhejiang, China
Yuyu Yin
London South Bank University, London, UK
Muddesar Iqbal

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhu, Z., Huang, L., Xu, H. (2019). Collaborative Thompson Sampling. In: Gao, H., Wang, X., Yin, Y., Iqbal, M. (eds) Collaborative Computing: Networking, Applications and Worksharing. CollaborateCom 2018. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 268. Springer, Cham. https://doi.org/10.1007/978-3-030-12981-1_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-12981-1_2
Published: 07 February 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-12980-4
Online ISBN: 978-3-030-12981-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics