Online Sparse Bandit for Card Games

  • David L. St-Pierre
  • Quentin Louveaux
  • Olivier Teytaud
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7168)

Abstract

Finding an approximation of a Nash equilibrium in matrix games is an important topic that reaches beyond the strict application to matrix games. A bandit algorithm commonly used to approximate a Nash equilibrium is EXP3 [3]. However, the solution to many problems is often sparse, yet EXP3 inherently fails to exploit this property. To the best knowledge of the authors, there exist only an offline truncation proposed by [9] to handle such issue. In this paper, we propose a variation of EXP3 to exploit the fact that a solution is sparse by dynamically removing arms; the resulting algorithm empirically performs better than previous versions. We apply the resulting algorithm to an MCTS program for the Urban Rivals card game.

Keywords

Nash Equilibrium Urban Rival Matrix Game Card Game Bandit Problem 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Audibert, J.Y., Bubeck, S.: Minimax policies for adversarial and stochastic bandits. In: 22nd Annual Conference on Learning Theory (COLT), Montreal (June 2009)Google Scholar
  2. 2.
    Audibert, J.-Y., Bubeck, S.: Regret bounds and minimax policies under partial monitoring. Journal of Machine Learning Research (October 2010)Google Scholar
  3. 3.
    Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.E.: Gambling in a rigged casino: the adversarial multi-armed bandit problem. In: Proceedings of the 36th Annual Symposium on Foundations of Computer Science, pp. 322–331. IEEE Computer Society Press, Los Alamitos (1995)Google Scholar
  4. 4.
    Auger, D.: Multiple Tree for Partially Observable Monte-Carlo Tree Search. In: Di Chio, C., Cagnoni, S., Cotta, C., Ebner, M., Ekárt, A., Esparcia-Alcázar, A.I., Merelo, J.J., Neri, F., Preuss, M., Richter, H., Togelius, J., Yannakakis, G.N. (eds.) EvoApplications 2011, Part I. LNCS, vol. 6624, pp. 53–62. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  5. 5.
    Grigoriadis, M.D., Khachiyan, L.G.: A sublinear-time randomized approximation algorithm for matrix games. Operations Research Letters 18(2), 53–58 (1995)MathSciNetMATHCrossRefGoogle Scholar
  6. 6.
    Lai, T., Robbins, H.: Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics 6(1), 4–22 (1985)MathSciNetMATHCrossRefGoogle Scholar
  7. 7.
    Lanctot, M., Waugh, K., Zinkevich, M., Bowling, M.: Monte Carlo Sampling for Regret Minimization in Extensive Games. Advances in Neural Information Processing Systems 22, 1078–1086 (2009)Google Scholar
  8. 8.
    Ponsen, M., Lanctot, M., de Jong, S.: MCRNR: Fast Computing of Restricted Nash Responses by Means of Sampling. In: Proceedings of Interactive Decision Theory and Game Theory Workshop, AAAI 2010 (2010)Google Scholar
  9. 9.
    Teytaud, O., Flory, S.: Upper Confidence Trees with Short Term Partial Information. In: Di Chio, C., Cagnoni, S., Cotta, C., Ebner, M., Ekárt, A., Esparcia-Alcázar, A.I., Merelo, J.J., Neri, F., Preuss, M., Richter, H., Togelius, J., Yannakakis, G.N. (eds.) EvoApplications 2011, Part I. LNCS, vol. 6624, pp. 153–162. Springer, Heidelberg (2011)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • David L. St-Pierre
    • 1
  • Quentin Louveaux
    • 1
  • Olivier Teytaud
    • 2
    • 3
  1. 1.Department of Electrical Engineering and Computer Science, Faculty of EngineeringLiège UniversityBelgium
  2. 2.TAO (Inria, Lri, Univ. Paris-Sud, UMR CNRS 8623)France
  3. 3.OASE Lab.National University of TainanTaiwan

Personalised recommendations