Abstract
The Multi-armed Bandit (MAB) framework has been applied successfully in many application fields. In the last years, the use of active approaches to tackle the nonstationary MAB setting, i.e., algorithms capable of detecting changes in the environment and re-configuring automatically to the change, has been widening the areas of application of MAB techniques. However, such approaches have the drawback of not reusing information in those settings where the same environment conditions recur over time. This paper presents a framework to integrate past information in the abruptly changing nonstationary setting, which allows the active MAB approaches to recover from changes quickly. The proposed framework is based on well-known break-point prediction methods to correctly identify the instant the environment changed in the past, and on the definition of recurring concepts specifically for the MAB setting to reuse information from recurring MAB states, when necessary. We show that this framework does not change the order of the regret suffered by the active approaches commonly used in the bandit field. Finally, we provide an extensive experimental analysis on both synthetic and real-world data, showing the improvement provided by our framework.
Keywords
- Multi-armed bandit
- Non-stationary MAB
- Break-point prediction
- Recurring concepts
This is a preview of subscription content, access via your institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
The extension to other finite support distributions is straightforward and the theoretical results here provided are still valid.
- 2.
Since we are considering Bernoulli reward, having the same expected value also implies to have the same distribution. This definition can be easily generalized to handle other distributions, requiring that the distribution repeats over different phases.
References
Alippi, C., Boracchi, G., Roveri, M.: Just-in-time classifiers for recurrent concepts. IEEE Trans. Neural Netw. Learn. Syst. 24(4), 620–634 (2013)
Auer, P.: Using confidence bounds for exploitation-exploration trade-offs. J. Mach. Learn. Res. 3(Nov), 397–422 (2002)
Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47(2–3), 235–256 (2002)
Aziz, M., Kaufmann, E., Riviere, M.K.: On multi-armed bandit designs for dose-finding clinical trials. J. Mach. Learn. Res. 22, 1–38 (2021)
Basseville, M., Nikiforov, I.V.: Detection of Abrupt Changes - Theory and Application. Prentice Hall, Hoboken (1993)
Besson, L., Kaufmann, E.: The generalized likelihood ratio test meets klUCB: an improved algorithm for piece-wise non-stationary bandits. arXiv preprint arXiv:1902.01575 (2019)
Bubeck, S., Cesa-Bianchi, N.: Regret analysis of stochastic and nonstochastic multi-armed bandit problems. CoRR abs/1204.5721 (2012)
Cao, Y., Wen, Z., Kveton, B., Xie, Y.: Nearly optimal adaptive procedure with change detection for piecewise-stationary bandit. In: AISTATS, pp. 418–427 (2019)
Garivier, A., Moulines, E.: On upper-confidence bound policies for switching bandit problems. In: ALT, pp. 174–188 (2011)
Hartland, C., Gelly, S., Baskiotis, N., Teytaud, O., Sebag, M.: Multi-armed bandit, dynamic environments and meta-bandits, November 2006. https://hal.archives-ouvertes.fr/hal-00113668/file/MetaEve.pdf, working paper
Hawkins, D.M., Qiu, P., Kang, C.W.: The changepoint model for statistical process control. J. Qual. Technol. 35(4), 355–366 (2003)
Hinkley, D.: Inference about the change-point from cumulative sum tests. Biometrika 58 (1971)
Italia, E., Nuara, A., Trovò, F., Restelli, M., Gatti, N., Dellavalle, E.: Internet advertising for non-stationary environments. In: AMEC, pp. 1–15 (2017)
Liu, F., Lee, J., Shroff, N.B.: A change-detection based framework for piecewise-stationary multi-armed bandit problem. In: AAAI (2018)
Mellor, J.C., Shapiro, J.L.: Thompson Sampling in switching environments with Bayesian online change point detection. CoRR abs/1302.3721 (2013)
Nuara, A., Trovo, F., Gatti, N., Restelli, M.: A combinatorial-bandit algorithm for the online joint bid/budget optimization of pay-per-click advertising campaigns. In: AAAI, vol. 32 (2018)
Parvin, M., Meybodi, M.R.: MABRP: a multi-armed bandit problem-based energy-aware routing protocol for wireless sensor network. In: AISP, pp. 464–468. IEEE (2012)
Ross, G.J., Adams, N.M.: Two nonparametric control charts for detecting arbitrary distribution changes. J. Qual. Technol. 44(2), 102–116 (2012)
Ross, G.J., Tasoulis, D.K., Adams, N.M.: Sequential monitoring of a Bernoulli sequence when the pre-change parameter is unknown. Comput. Statist. 28(2), 463–479 (2013)
Schuirmann, D.J.: A comparison of the two one-sided tests procedure and the power approach for assessing the equivalence of average bioavailability. J. Pharmacokinet. Biopharm. 15(6), 657–680 (1987)
Trovò, F., Paladino, S., Restelli, M., Gatti, N.: Improving multi-armed bandit algorithms in online pricing settings. Int. J. Approx. Reason. 98, 196–235 (2018)
Trovò, F., Paladino, S., Restelli, M., Gatti, N.: Sliding-window Thompson Sampling for non-stationary settings. J. Artif. Intell. Res. 68, 311–364 (2020)
Yahoo!: R6b - Yahoo! front page today module user click log dataset, version 2.0 (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Re, G., Chiusano, F., Trovò, F., Carrera, D., Boracchi, G., Restelli, M. (2021). Exploiting History Data for Nonstationary Multi-armed Bandit. In: Oliver, N., Pérez-Cruz, F., Kramer, S., Read, J., Lozano, J.A. (eds) Machine Learning and Knowledge Discovery in Databases. Research Track. ECML PKDD 2021. Lecture Notes in Computer Science(), vol 12975. Springer, Cham. https://doi.org/10.1007/978-3-030-86486-6_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-86486-6_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86485-9
Online ISBN: 978-3-030-86486-6
eBook Packages: Computer ScienceComputer Science (R0)
-
Published in cooperation with
http://www.ecmlpkdd.org/