Handling Concept Drift in Non-stationary Bandit Through Predicting Future Rewards

Tsai, Yun-Da; Lin, Shou-De

doi:10.1007/978-981-97-2650-9_13

Yun-Da Tsai⁹ &
Shou-De Lin⁹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14658))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

57 Accesses

Abstract

We present a study on the non-stationary stochastic multi-armed bandit (MAB) problem, which is relevant for addressing real-world challenges related to sequential decision-making. Our work involves a thorough analysis of state-of-the-art algorithms in dynamically changing environments. To address the limitations of existing methods, we propose the Concept Drift Adaptive Bandit (CDAB) framework, which aims to capture and predict potential future concept drift patterns in reward distribution, allowing for better adaptation in non-stationary environments. We conduct extensive numerical experiments to evaluate the effectiveness of the CDAB approach in comparison to both stationary and non-stationary state-of-the-art baselines. Our experiments involve testing on both artificial datasets and real-world data under different types of changing environments. The results show that the CDAB approach exhibits strong empirical performance, outperforming existing methods in all versions tested.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Auer, P., Gajane, P., Ortner, R.: Adaptively tracking the best bandit arm with an unknown number of distribution changes. In: Conference on Learning Theory, pp. 138–158. PMLR (2019)
Google Scholar
Awerbuch, B., Kleinberg, R.D.: Adaptive routing with end-to-end feedback: distributed learning and geometric approaches. In: Proceedings of the Thirty-Sixth Annual ACM Symposium on Theory of Computing, pp. 45–53 (2004)
Google Scholar
Bergemann, D., Hege, U.: The financing of innovation: learning and stopping. RAND J. Econ. 36(4), 719–752 (2005)
Google Scholar
Bergemann, D., Välimäki, J.: Learning and strategic pricing. Econometrica: J. Econometric Soc. 64(5), 1125–1149 (1996)
Article Google Scholar
Besbes, O., Gur, Y., Zeevi, A.: Stochastic multi-armed-bandit problem with non-stationary rewards. In: Advances in Neural Information Processing Systems, vol. 27 (2014)
Google Scholar
Bifet, A., Gavalda, R.: Learning from time-changing data with adaptive windowing. In: Proceedings of the 2007 SIAM International Conference on Data Mining, pp. 443–448. SIAM (2007)
Google Scholar
Cao, Y., Wen, Z., Kveton, B., Xie, Y.: Nearly optimal adaptive procedure with change detection for piecewise-stationary bandit. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 418–427. PMLR (2019)
Google Scholar
Carpentier, A., Valko, M.: Revealing graph bandits for maximizing local influence. In: Artificial Intelligence and Statistics, pp. 10–18. PMLR (2016)
Google Scholar
Cavenaghi, E., Sottocornola, G., Stella, F., Zanker, M.: Non stationary multi-armed bandit: empirical evaluation of a new concept drift-aware algorithm. Entropy 23(3), 380 (2021)
Article MathSciNet Google Scholar
Chen, C., Petty, K., Skabardonis, A., Varaiya, P., Jia, Z.: Freeway performance measurement system: mining loop detector data. Transp. Res. Rec. 1748(1), 96–102 (2001)
Article Google Scholar
Combes, R., Magureanu, S., Proutiere, A., Laroche, C.: Learning to rank: regret lower bounds and efficient algorithms. In: Proceedings of the 2015 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, pp. 231–244 (2015)
Google Scholar
Dries, A., Rückert, U.: Adaptive concept drift detection. Stat. Anal. Data Min. ASA Data Sci. J. 2(5–6), 311–327 (2009)
Article MathSciNet Google Scholar
Dudík, M., Langford, J., Li, L.: Doubly robust policy evaluation and learning. arXiv preprint arXiv:1103.4601 (2011)
Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A survey on concept drift adaptation. ACM Comput. Surv. (CSUR) 46(4), 1–37 (2014)
Article Google Scholar
Garivier, A., Moulines, E.: On upper-confidence bound policies for non-stationary bandit problems. arXiv preprint arXiv:0805.3415 (2008)
Guo, D., et al.: Deep Bayesian bandits: exploring in online personalized recommendations. In: Fourteenth ACM Conference on Recommender Systems, pp. 456–461 (2020)
Google Scholar
Harper, F.M., Konstan, J.A.: The movielens datasets: history and context. ACM Trans. Interact. Intell. Syst. (TIIS) 5(4), 1–19 (2015)
Google Scholar
Hartland, C., Gelly, S., Baskiotis, N., Teytaud, O., Sebag, M.: Multi-armed bandit, dynamic environments and meta-bandits (2006)
Google Scholar
Hernandez-Leal, P., Kaisers, M., Baarslag, T., de Cote, E.M.: A survey of learning in multiagent environments: dealing with non-stationarity. arXiv preprint arXiv:1707.09183 (2017)
Huang, K.H., Lin, H.T.: Linear upper confidence bound algorithm for contextual bandit problem with piled rewards. In: Bailey, J., Khan, L., Washio, T., Dobbie, G., Huang, J., Wang, R. (eds.) PAKDD 2016. LNCS, vol. 9652, pp. 143–155. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-31750-2_12
Chapter Google Scholar
Kleinberg, R., Leighton, T.: The value of knowing a demand curve: Bounds on regret for online posted-price auctions. In: 44th Annual IEEE Symposium on Foundations of Computer Science, Proceedings, pp. 594–605. IEEE (2003)
Google Scholar
Klinkenberg, R., Joachims, T.: Detecting concept drift with support vector machines. In: ICML, pp. 487–494 (2000)
Google Scholar
Kveton, B., Szepesvari, C., Wen, Z., Ashkan, A.: Cascading bandits: learning to rank in the cascade model. In: International Conference on Machine Learning, pp. 767–776. PMLR (2015)
Google Scholar
Liang, X., Li, S., Zhang, S., Huang, H., Chen, S.X.: PM\(_{2.5}\) data reliability, consistency, and air quality assessment in five Chinese cities. J. Geophys. Res. Atmos. 121(17), 10–220 (2016)
Article Google Scholar
Liu, F., Lee, J., Shroff, N.: A change-detection based framework for piecewise-stationary multi-armed bandit problem. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)
Google Scholar
Mellor, J., Shapiro, J.: Thompson sampling in switching environments with Bayesian online change detection. In: Artificial Intelligence and Statistics, pp. 442–450. PMLR (2013)
Google Scholar
Nishida, K., Yamauchi, K.: Detecting concept drift using statistical testing. In: Corruble, V., Takeda, M., Suzuki, E. (eds.) DS 2007. LNCS, vol. 4755, pp. 264–269. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-75488-6_27
Chapter Google Scholar
Pandey, A., Singh, P., Iyengar, L.: Bacterial decolorization and degradation of azo dyes. Int. Biodeterior. Biodegradation 59(2), 73–84 (2007)
Article Google Scholar
Raj, V., Kalyani, S.: Taming non-stationary bandits: a Bayesian approach. arXiv preprint arXiv:1707.09727 (2017)
Saito, Y., Aihara, S.: Large-scale open dataset, pipeline, and benchmark for bandit algorithms. arXiv preprint arXiv:2008.07146 (2020)
Tóth, B., Sachidanandan, S., Jørgensen, E.S.: Balancing relevance and discovery to inspire customers in the IKEA app. In: Fourteenth ACM Conference on Recommender Systems, pp. 563–563 (2020)
Google Scholar
Trovo, F., Paladino, S., Restelli, M., Gatti, N.: Sliding-window thompson sampling for non-stationary settings. J. Artif. Intell. Res. 68, 311–364 (2020)
Article MathSciNet Google Scholar
Tsai, T.H., Tsai, Y.D., Lin, S.D.: lil’HDoC: an algorithm for good arm identification under small threshold gap. arXiv preprint arXiv:2401.15879 (2024)
Tsai, Y.D., Lin, S.D., Lin, S.D.: Fast online inference for nonlinear contextual bandit based on generative adversarial network. arXiv preprint arXiv:2202.08867 (2022)
Tsai, Y.D., Tsai, T.H., Lin, S.D.: Differential good arm identification. arXiv preprint arXiv:2303.07154 (2023)
Zelen, M.: Play the winner rule and the controlled clinical trial. J. Am. Stat. Assoc. 64(325), 131–146 (1969)
Article MathSciNet Google Scholar
Žliobaitė, I.: Learning under concept drift: an overview. arXiv preprint arXiv:1010.4784 (2010)

Download references

Author information

Authors and Affiliations

National Taiwan University, Taipei, Taiwan
Yun-Da Tsai & Shou-De Lin

Authors

Yun-Da Tsai
View author publications
You can also search for this author in PubMed Google Scholar
Shou-De Lin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yun-Da Tsai .

Editor information

Editors and Affiliations

Singapore Management University, Singapore, Singapore
Zhaoxia Wang
Monash University, Clayton, VIC, Australia
Chang Wei Tan

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 2123 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tsai, YD., Lin, SD. (2024). Handling Concept Drift in Non-stationary Bandit Through Predicting Future Rewards. In: Wang, Z., Tan, C.W. (eds) Trends and Applications in Knowledge Discovery and Data Mining. PAKDD 2024. Lecture Notes in Computer Science(), vol 14658. Springer, Singapore. https://doi.org/10.1007/978-981-97-2650-9_13

Download citation

DOI: https://doi.org/10.1007/978-981-97-2650-9_13
Published: 28 April 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-2649-3
Online ISBN: 978-981-97-2650-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Handling Concept Drift in Non-stationary Bandit Through Predicting Future Rewards