Skip to main content

Handling Concept Drift in Non-stationary Bandit Through Predicting Future Rewards

  • Conference paper
  • First Online:
Trends and Applications in Knowledge Discovery and Data Mining (PAKDD 2024)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14658))

Included in the following conference series:

  • 57 Accesses

Abstract

We present a study on the non-stationary stochastic multi-armed bandit (MAB) problem, which is relevant for addressing real-world challenges related to sequential decision-making. Our work involves a thorough analysis of state-of-the-art algorithms in dynamically changing environments. To address the limitations of existing methods, we propose the Concept Drift Adaptive Bandit (CDAB) framework, which aims to capture and predict potential future concept drift patterns in reward distribution, allowing for better adaptation in non-stationary environments. We conduct extensive numerical experiments to evaluate the effectiveness of the CDAB approach in comparison to both stationary and non-stationary state-of-the-art baselines. Our experiments involve testing on both artificial datasets and real-world data under different types of changing environments. The results show that the CDAB approach exhibits strong empirical performance, outperforming existing methods in all versions tested.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Auer, P., Gajane, P., Ortner, R.: Adaptively tracking the best bandit arm with an unknown number of distribution changes. In: Conference on Learning Theory, pp. 138–158. PMLR (2019)

    Google Scholar 

  2. Awerbuch, B., Kleinberg, R.D.: Adaptive routing with end-to-end feedback: distributed learning and geometric approaches. In: Proceedings of the Thirty-Sixth Annual ACM Symposium on Theory of Computing, pp. 45–53 (2004)

    Google Scholar 

  3. Bergemann, D., Hege, U.: The financing of innovation: learning and stopping. RAND J. Econ. 36(4), 719–752 (2005)

    Google Scholar 

  4. Bergemann, D., Välimäki, J.: Learning and strategic pricing. Econometrica: J. Econometric Soc. 64(5), 1125–1149 (1996)

    Article  Google Scholar 

  5. Besbes, O., Gur, Y., Zeevi, A.: Stochastic multi-armed-bandit problem with non-stationary rewards. In: Advances in Neural Information Processing Systems, vol. 27 (2014)

    Google Scholar 

  6. Bifet, A., Gavalda, R.: Learning from time-changing data with adaptive windowing. In: Proceedings of the 2007 SIAM International Conference on Data Mining, pp. 443–448. SIAM (2007)

    Google Scholar 

  7. Cao, Y., Wen, Z., Kveton, B., Xie, Y.: Nearly optimal adaptive procedure with change detection for piecewise-stationary bandit. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 418–427. PMLR (2019)

    Google Scholar 

  8. Carpentier, A., Valko, M.: Revealing graph bandits for maximizing local influence. In: Artificial Intelligence and Statistics, pp. 10–18. PMLR (2016)

    Google Scholar 

  9. Cavenaghi, E., Sottocornola, G., Stella, F., Zanker, M.: Non stationary multi-armed bandit: empirical evaluation of a new concept drift-aware algorithm. Entropy 23(3), 380 (2021)

    Article  MathSciNet  Google Scholar 

  10. Chen, C., Petty, K., Skabardonis, A., Varaiya, P., Jia, Z.: Freeway performance measurement system: mining loop detector data. Transp. Res. Rec. 1748(1), 96–102 (2001)

    Article  Google Scholar 

  11. Combes, R., Magureanu, S., Proutiere, A., Laroche, C.: Learning to rank: regret lower bounds and efficient algorithms. In: Proceedings of the 2015 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, pp. 231–244 (2015)

    Google Scholar 

  12. Dries, A., Rückert, U.: Adaptive concept drift detection. Stat. Anal. Data Min. ASA Data Sci. J. 2(5–6), 311–327 (2009)

    Article  MathSciNet  Google Scholar 

  13. Dudík, M., Langford, J., Li, L.: Doubly robust policy evaluation and learning. arXiv preprint arXiv:1103.4601 (2011)

  14. Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A survey on concept drift adaptation. ACM Comput. Surv. (CSUR) 46(4), 1–37 (2014)

    Article  Google Scholar 

  15. Garivier, A., Moulines, E.: On upper-confidence bound policies for non-stationary bandit problems. arXiv preprint arXiv:0805.3415 (2008)

  16. Guo, D., et al.: Deep Bayesian bandits: exploring in online personalized recommendations. In: Fourteenth ACM Conference on Recommender Systems, pp. 456–461 (2020)

    Google Scholar 

  17. Harper, F.M., Konstan, J.A.: The movielens datasets: history and context. ACM Trans. Interact. Intell. Syst. (TIIS) 5(4), 1–19 (2015)

    Google Scholar 

  18. Hartland, C., Gelly, S., Baskiotis, N., Teytaud, O., Sebag, M.: Multi-armed bandit, dynamic environments and meta-bandits (2006)

    Google Scholar 

  19. Hernandez-Leal, P., Kaisers, M., Baarslag, T., de Cote, E.M.: A survey of learning in multiagent environments: dealing with non-stationarity. arXiv preprint arXiv:1707.09183 (2017)

  20. Huang, K.H., Lin, H.T.: Linear upper confidence bound algorithm for contextual bandit problem with piled rewards. In: Bailey, J., Khan, L., Washio, T., Dobbie, G., Huang, J., Wang, R. (eds.) PAKDD 2016. LNCS, vol. 9652, pp. 143–155. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-31750-2_12

    Chapter  Google Scholar 

  21. Kleinberg, R., Leighton, T.: The value of knowing a demand curve: Bounds on regret for online posted-price auctions. In: 44th Annual IEEE Symposium on Foundations of Computer Science, Proceedings, pp. 594–605. IEEE (2003)

    Google Scholar 

  22. Klinkenberg, R., Joachims, T.: Detecting concept drift with support vector machines. In: ICML, pp. 487–494 (2000)

    Google Scholar 

  23. Kveton, B., Szepesvari, C., Wen, Z., Ashkan, A.: Cascading bandits: learning to rank in the cascade model. In: International Conference on Machine Learning, pp. 767–776. PMLR (2015)

    Google Scholar 

  24. Liang, X., Li, S., Zhang, S., Huang, H., Chen, S.X.: PM\(_{2.5}\) data reliability, consistency, and air quality assessment in five Chinese cities. J. Geophys. Res. Atmos. 121(17), 10–220 (2016)

    Article  Google Scholar 

  25. Liu, F., Lee, J., Shroff, N.: A change-detection based framework for piecewise-stationary multi-armed bandit problem. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)

    Google Scholar 

  26. Mellor, J., Shapiro, J.: Thompson sampling in switching environments with Bayesian online change detection. In: Artificial Intelligence and Statistics, pp. 442–450. PMLR (2013)

    Google Scholar 

  27. Nishida, K., Yamauchi, K.: Detecting concept drift using statistical testing. In: Corruble, V., Takeda, M., Suzuki, E. (eds.) DS 2007. LNCS, vol. 4755, pp. 264–269. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-75488-6_27

    Chapter  Google Scholar 

  28. Pandey, A., Singh, P., Iyengar, L.: Bacterial decolorization and degradation of azo dyes. Int. Biodeterior. Biodegradation 59(2), 73–84 (2007)

    Article  Google Scholar 

  29. Raj, V., Kalyani, S.: Taming non-stationary bandits: a Bayesian approach. arXiv preprint arXiv:1707.09727 (2017)

  30. Saito, Y., Aihara, S.: Large-scale open dataset, pipeline, and benchmark for bandit algorithms. arXiv preprint arXiv:2008.07146 (2020)

  31. Tóth, B., Sachidanandan, S., Jørgensen, E.S.: Balancing relevance and discovery to inspire customers in the IKEA app. In: Fourteenth ACM Conference on Recommender Systems, pp. 563–563 (2020)

    Google Scholar 

  32. Trovo, F., Paladino, S., Restelli, M., Gatti, N.: Sliding-window thompson sampling for non-stationary settings. J. Artif. Intell. Res. 68, 311–364 (2020)

    Article  MathSciNet  Google Scholar 

  33. Tsai, T.H., Tsai, Y.D., Lin, S.D.: lil’HDoC: an algorithm for good arm identification under small threshold gap. arXiv preprint arXiv:2401.15879 (2024)

  34. Tsai, Y.D., Lin, S.D., Lin, S.D.: Fast online inference for nonlinear contextual bandit based on generative adversarial network. arXiv preprint arXiv:2202.08867 (2022)

  35. Tsai, Y.D., Tsai, T.H., Lin, S.D.: Differential good arm identification. arXiv preprint arXiv:2303.07154 (2023)

  36. Zelen, M.: Play the winner rule and the controlled clinical trial. J. Am. Stat. Assoc. 64(325), 131–146 (1969)

    Article  MathSciNet  Google Scholar 

  37. Žliobaitė, I.: Learning under concept drift: an overview. arXiv preprint arXiv:1010.4784 (2010)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yun-Da Tsai .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 2123 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tsai, YD., Lin, SD. (2024). Handling Concept Drift in Non-stationary Bandit Through Predicting Future Rewards. In: Wang, Z., Tan, C.W. (eds) Trends and Applications in Knowledge Discovery and Data Mining. PAKDD 2024. Lecture Notes in Computer Science(), vol 14658. Springer, Singapore. https://doi.org/10.1007/978-981-97-2650-9_13

Download citation

  • DOI: https://doi.org/10.1007/978-981-97-2650-9_13

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-97-2649-3

  • Online ISBN: 978-981-97-2650-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics