Collaborative Thompson Sampling

Published: 10 February 2020

Volume 25, pages 1351–1363, (2020)
Cite this article

Mobile Networks and Applications Aims and scope Submit manuscript

301 Accesses
2 Citations
Explore all metrics

Abstract

Thompson sampling is one of the most effective strategies to balance exploration-exploitation trade-off. It has been applied in a variety of domains and achieved remarkable success. Thompson sampling makes decisions in a noisy but stationary environment by accumulating uncertain information over time to improve prediction accuracy. In highly dynamic domains, however, the environment undergoes frequent and unpredictable changes. Making decisions in such an environment should rely on current information. Therefore, standard Thompson sampling may perform poorly in these domains. Here we present collaborative Thompson sampling to apply the exploration-exploitation strategy to highly dynamic settings. The algorithm takes collaborative effects into account by dynamically clustering users into groups, and the feedback of all users in the same group will help to estimate the expected reward in the current context to find the optimal choice. Incorporating collaborative effects into Thompson sampling allows to capture real-time changes of the environment and adjust decision making strategy accordingly. We compare our algorithm with standard Thompson sampling algorithms on two real-world datasets. Our algorithm shows accelerated convergence and improved prediction performance in collaborative environments. We also provide regret analyses of our algorithm in both contextual and non-contextual settings.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Fig. 2

Similar content being viewed by others

Collaborative Thompson Sampling

Chapter © 2019

Contextual Combinatorial Cascading Thompson Sampling

Chapter © 2019

Collaborative Contextual Combinatorial Cascading Thompson Sampling

Chapter © 2019

References

Abbasi-Yadkori Y, Pál D, Szepesvári C (2011) Improved algorithms for linear stochastic bandits. In: Advances in neural information processing systems, pp 2312–2320
Abeille M, Lazaric A, et al. (2017) Linear thompson sampling revisited. Electron J Stat 11(2):5165–5197
Article MathSciNet Google Scholar
Abramowitz M, Stegun IA (1964) Handbook of mathematical functions with formulas, graphs and mathematical tables (applied mathematics series, vol 55. National Bureau of Standards, Washington
MATH Google Scholar
Agarwal D, Bo L, Traupman J, Xin D, Zhang L (2014) Laser: A scalable response prediction platform for online advertising. In: Proceedings of the 7th ACM international conference on Web search and data mining. ACM, pp 173–182
Agrawal S, Goyal N (2012) Analysis of thompson sampling for the multi-armed bandit problem. In: Conference on learning theory, pp 39–1
Agrawal S, Goyal N (2013) Thompson sampling for contextual bandits with linear payoffs. In: International conference on machine learning, pp 127–135
Azuma K (1967) Weighted sums of certain dependent random variables. Tohoku Math J Sec Ser 19(3):357–367
Article MathSciNet Google Scholar
Bresler G, Chen GH, Shah D (2014) A latent source model for online collaborative filtering. In: Advances in neural information processing systems, pp 3347–3355
Brodén B, Hammar M, Nilsson BJ, Paraschakis D (2018) Ensemble recommendations via thompson sampling: an experimental study within e-commerce. In: 23Rd international conference on intelligent user interfaces. ACM, pp 19–29
Chapelle O, Li L (2011) An empirical evaluation of thompson sampling. In: Advances in neural information processing systems, pp 2249–2257
Christakopoulou K, Banerjee A (2018) Learning to interact with users: A collaborative-bandit approach. In: Proceedings of the 2018 SIAM International Conference on Data Mining. SIAM, pp 612–620
Chu W, Li L, Reyzin L, Schapire R (2011) Contextual bandits with linear payoff functions. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp 208–214
Chu W, Park S-T, Beaupre T, Motgi N, Phadke A, Chakraborty S, Zachariah J (2009) A case study of behavior-driven conjoint analysis on yahoo!: front page today module. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 1097–1104
Ferreira K, Simchi-Levi D, Wang H (2017) Online network revenue management using thompson sampling
Glaze CM, Filipowicz ALS, Kable JW, Balasubramanian V, Gold JI (2018) A bias–variance trade-off governs individual differences in on-line learning in an unpredictable environment. Nat Hum Behav 2(3):213
Article Google Scholar
Gopalan A, Mannor S, Mansour Y (2014) Thompson sampling for complex online problems. In: International conference on machine learning, pp 100–108
Graepel T, Candela JQ, Borchert T, Herbrich R (2010) Web-scale bayesian click-through rate prediction for sponsored search advertising in microsoft’s bing search engine. Omnipress
Hoeffding W (1963) Probability inequalities for sums of bounded random variables. J Amer Stat Assoc 58 (301):13–30
Article MathSciNet Google Scholar
Kaufmann E, Korda N, Munos R (2012) Thompson sampling: An asymptotically optimal finite-time analysis. In: International conference on algorithmic learning theory. Springer, pp 199–213
Kawale J, Bui HH, Kveton B, Tran-Thanh L, Chawla S (2015) Efficient thompson sampling for online matrix-factorization recommendation. In: Advances in neural information processing systems, pp 1297–1305
Lavancier F, Rochet P (2016) A general procedure to combine estimators. Comput Stat Data Anal 94:175–192
Article MathSciNet Google Scholar
Li L, Chu W, Langford J, Schapire RE (2010) A contextual-bandit approach to personalized news article recommendation. In: Proceedings of the 19th international conference on World wide web. ACM, pp 661–670
Li S, Karatzoglou A, Gentile C (2016) Collaborative filtering bandits. In: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. ACM, pp 539–548
Nguyen TT, Lauw HW (2014) Dynamic clustering of contextual multi-armed bandits. In: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management. ACM, pp 1959–1962
Yi O, Gagrani M, Nayyar A, Jain R (2017) Learning unknown markov decision processes: A thompson sampling approach. In: Advances in neural information processing systems, pp 1333–1342
Russo DJ, Van Roy B, Kazerouni A, Osband I, Wen Z, et al. (2018) A tutorial on thompson sampling. Found Trends®; Mach Learn 11(1):1–96
Article Google Scholar
Schwartz EM, Bradlow ET, Fader PS (2017) Customer acquisition via display advertising using multi-armed bandit experiments. Mark Sci 36(4):500–522
Article Google Scholar
Scott SL (2010) A modern bayesian look at the multi-armed bandit. Appl Stoch Model Bus Ind 26(6):639–658
Article MathSciNet Google Scholar
Thompson WR (1933) On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25(3/4):285–294
Article Google Scholar
Wolfinger R (1993) Laplace’s approximation for nonlinear mixed models. Biometrika 80(4):791–795
Article MathSciNet Google Scholar
Wu Q, Wang H, Gu Q, Wang H (2016) Contextual bandits in a collaborative environment. In: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. ACM, pp 529–538

Download references

Acknowledgments

This paper is supported by the National Science Foundation of China under Grant 61472385 and Grant U1709217.

Author information

Authors and Affiliations

University of Science and Technology of China, Suzhou, China
Zhenyu Zhu, Liusheng Huang & Hongli Xu

Authors

Zhenyu Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Liusheng Huang
View author publications
You can also search for this author in PubMed Google Scholar
Hongli Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Liusheng Huang.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhu, Z., Huang, L. & Xu, H. Collaborative Thompson Sampling. Mobile Netw Appl 25, 1351–1363 (2020). https://doi.org/10.1007/s11036-019-01453-x

Download citation

Published: 10 February 2020
Issue Date: August 2020
DOI: https://doi.org/10.1007/s11036-019-01453-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions