Solving Non-Stationary Bandit Problems by Random Sampling from Sibling Kalman Filters

  • Ole-Christoffer Granmo
  • Stian Berg
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6098)


The multi-armed bandit problem is a classical optimization problem where an agent sequentially pulls one of multiple arms attached to a gambling machine, with each pull resulting in a random reward. The reward distributions are unknown, and thus, one must balance between exploiting existing knowledge about the arms, and obtaining new information. Dynamically changing (non-stationary) bandit problems are particularly challenging because each change of the reward distributions may progressively degrade the performance of any fixed strategy.

Although computationally intractable in many cases, Bayesian methods provide a standard for optimal decision making. This paper proposes a novel solution scheme for bandit problems with non-stationary normally distributed rewards. The scheme is inherently Bayesian in nature, yet avoids computational intractability by relying simply on updating the hyper parameters of sibling Kalman Filters, and on random sampling from these posteriors. Furthermore, it is able to track the better actions, thus supporting non-stationary bandit problems.

Extensive experiments demonstrate that our scheme outperforms recently proposed bandit playing algorithms, not only in non-stationary environments, but in stationary environments also. Furthermore, our scheme is robust to inexact parameter settings. We thus believe that our methodology opens avenues for obtaining improved novel solutions.


Bandit Problems Kalman Filter Bayesian Learning 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Wyatt, J.: Exploration and Inference in Learning from Reinforcement. PhD thesis, University of Edinburgh (1997)Google Scholar
  2. 2.
    Granmo, O.C.: Solving Two-Armed Bernoulli Bandit Problems Using a Bayesian Learning Automaton. To Appear in the International Journal of Intelligent Computing and Cybernetics (2010)Google Scholar
  3. 3.
    Thompson, W.R.: On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25, 285–294 (1933)MATHGoogle Scholar
  4. 4.
    Granmo, O.C., Oommen, B.J., Myrer, S.A., Olsen, M.G.: Learning Automata-based Solutions to the Nonlinear Fractional Knapsack Problem with Applications to Optimal Resource Allocation. IEEE Transactions on Systems, Man, and Cybernetics, Part B 37(1), 166–175 (2007)CrossRefGoogle Scholar
  5. 5.
    Narendra, K.S., Thathachar, M.A.L.: Learning Automata: An Introduction. Prentice-Hall, Englewood Cliffs (1989)Google Scholar
  6. 6.
    Vermorel, J., Mohri, M.: Multi-armed bandit algorithms and empirical evaluation. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), vol. 3720, pp. 437–448. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  7. 7.
    Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)Google Scholar
  8. 8.
    Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time Analysis of the Multiarmed Bandit Problem. Machine Learning 47, 235–256 (2002)MATHCrossRefGoogle Scholar
  9. 9.
    Kaelbling, L.P.: Learning in Embedded Systems. PhD thesis, Stanford University (1993)Google Scholar
  10. 10.
    Wang, T., Lizotte, D., Bowling, M., Scuurmans, D.: Bayesian sparse sampling for on-line reward optimization. In: Proceedings of the 22nd International conference on Machine learning, pp. 956–963 (2005)Google Scholar
  11. 11.
    Dimitrakakis, C.: Nearly optimal exploration-exploitation decision thresholds. In: Kollias, S.D., Stafylopatis, A., Duch, W., Oja, E. (eds.) ICANN 2006. LNCS, vol. 4131, pp. 850–859. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  12. 12.
    Dearden, R., Friedman, N., Russell, S.: Bayesian q-learning. In: AAAI/IAAI, pp. 761–768. AAAI Press, Menlo Park (1998)Google Scholar
  13. 13.
    Engel, Y., Mannor, S., Meir, R.: Reinforcement learning with gaussian processes. In: Proceedings of the 22nd International conference on Machine learning, pp. 956–963 (2005)Google Scholar
  14. 14.
    Mitchell, T.M.: Machine Learning. McGraw-Hill, New York (1997)MATHGoogle Scholar
  15. 15.
    Russel, S., Norvig, P.: Artificial Intelligence - A Modern Approach, 2nd edn. Prentice-Hall, Englewood Cliffs (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Ole-Christoffer Granmo
    • 1
  • Stian Berg
    • 1
  1. 1.Department of ICTUniversity of AgderGrimstadNorway

Personalised recommendations