Machine Learning

, Volume 74, Issue 2, pp 159–189 | Cite as

Effective short-term opponent exploitation in simplified poker

Article

Abstract

Uncertainty in poker stems from two key sources, the shuffled deck and an adversary whose strategy is unknown. One approach to playing poker is to find a pessimistic game-theoretic solution (i.e., a Nash equilibrium), but human players have idiosyncratic weaknesses that can be exploited if some model or counter-strategy can be learned by observing their play. However, games against humans last for at most a few hundred hands, so learning must be very fast to be useful. We explore two approaches to opponent modelling in the context of Kuhn poker, a small game for which game-theoretic solutions are known. Parameter estimation and expert algorithms are both studied. Experiments demonstrate that, even in this small game, convergence to maximally exploitive solutions in a small number of hands is impractical, but that good (e.g., better than Nash) performance can be achieved in as few as 50 hands. Finally, we show that amongst a set of strategies with equal game-theoretic value, in particular the set of Nash equilibrium strategies, some are preferable because they speed learning of the opponent’s strategy by exploring it more effectively.

Keywords

Game-playing Opponent modelling Experts Bayesian Poker 

Supplementary material

References

  1. Auer, P., Cesa-Bianchi, N., Freund, Y., & Schapire, R. E. (1995). Gambling in a rigged casino: the adversarial multi-armed bandit problem. In Proceedings of the 36th annual symposium on foundations of computer science (pp. 322–331). Google Scholar
  2. Billings, D., Burch, N., Davidson, A., Holte, R. C., Schaeffer, J., Schauenberg, T., & Szafron, D. (2003). Approximating game-theoretic optimal strategies for full-scale poker. In Proceedings of 18th international joint conference on artificial intelligence (IJCAI-2003) (pp. 661–675). Google Scholar
  3. Billings, D., Davidson, A., Schauenberg, T., Burch, N., Bowling, M., Holte, R., Schaeffer, J., & Szafron, D. (2004). Game tree search with adaptation in stochastic imperfect information games. In Computers and games’04 (pp. 21–34). Berlin: Springer. Google Scholar
  4. Brown, G. W. (1951). Iterative solution of games by fictitious play. In T. C. Koopmans (Ed.), Activity analysis of production and allocation (pp. 374–376). New York: Wiley. Google Scholar
  5. Dempster, A., Laird, N., & Rubin, D. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39, 1–38. MATHMathSciNetGoogle Scholar
  6. Freund, Y., & Schapire, R. (1996). Game theory, on-line prediction and boosting. In Proceedings of the 9th annual conference on computational learning theory (COLT-96) (pp. 325–332). Google Scholar
  7. Freund, Y., & Schapire, R. E. (1999). Adaptive game playing using multiplicative weights. Games and Economic Behavior, 29, 79–103. MATHCrossRefMathSciNetGoogle Scholar
  8. Fudenberg, D., & Levine, D. K. (1998). The theory of learning in games. Cambridge: MIT Press. MATHGoogle Scholar
  9. Gilpin, A., & Sandholm, T. (2005). Optimal Rhode Island Hold’em poker. Intelligent systems demonstration described. In Proceedings of the 20th national conference on artificial intelligence (AAAI-05) (pp. 1684–1685). Google Scholar
  10. Gilpin, A., & Sandholm, T. (2006). A competitive Texas Hold’em poker player via automated abstraction and real-time equilibrium computation. In Proceedings of the 21st national conference on artificial intelligence (AAAI-06) (pp. 1007–1013). Google Scholar
  11. Gilpin, A., & Sandholm, T. (2007a). Better automated abstraction techniques for imperfect information games with application to Texas Hold’em poker. In Proceedings of the 6th international joint conference on autonomous agents and multiagent systems (AAMAS-07) (p. 192). Google Scholar
  12. Gilpin, A., & Sandholm, T. (2007b). Lossless abstraction of imperfect information games. Journal of the ACM, 54(5). Google Scholar
  13. Gilpin, A., Sandholm, T., & Sørensen, T. B. (2007). Potential-aware automated abstraction of sequential games, and holistic equilibrium analysis of Texas Hold’em poker. In Proceedings of the 22nd national conference on artificial intelligence (AAAI-07). Google Scholar
  14. Gordon, G. (2005). No-regret algorithms for structured prediction problems (Draft version). Google Scholar
  15. Hoehn, B. (2006). The effectiveness of opponent modelling in a small imperfect information game. Master’s thesis, University of Alberta, Dept. of Computing Science. Google Scholar
  16. Hoehn, B., Southey, F., Holte, R. C., & Bulitko, V. (2005). Effective short-term opponent exploitation in simplified poker. In Proceedings of the 20th national conference on artificial intelligence (AAAI-2005) (pp. 783–788). Google Scholar
  17. Johanson, M., Zinkevich, M., & Bowling, M. (2007). Computing robust counter-strategies. In Advances in neural information processing systems 20 (NIPS 2007) (pp. 721–728). Google Scholar
  18. Koller, D., & Pfeffer, A. (1997). Representations and solutions for game-theoretic problems. Artificial Intelligence, 94(1), 167–215. MATHCrossRefMathSciNetGoogle Scholar
  19. Korb, K., Nicholson, A., & Jitnah, N. (1999). Bayesian poker. In Proceedings of the 15th conference on uncertainty in artificial intelligence (UAI-1999) (pp. 343–350). Google Scholar
  20. Kuhn, H. W. (1950). A simplified two-person poker. Contributions to the Theory of Games, 1, 97–103. MathSciNetGoogle Scholar
  21. Margineantu, D. D., & Dietterich, T. G. (2002). Improved class probability estimates from decision tree models. In D. D. Denison, M. H. Hansen, C. C. Holmes, B. Mallick, & B. Yu (Eds.), Lecture notes in statistics : Vol. 171. Nonlinear estimation and classification (pp. 169–184). New York: Springer. Google Scholar
  22. Michie, D. (1966). Game-playing and game-learning automata. In Advances in programming and non-numerical computation (pp. 183–196). Elmsford: Pergamon. Chap. 8. Google Scholar
  23. Poland, J., & Hutter, M. (2005). Universal learning of repeated matrix games (Tech. rep. IDSIA-18-05). IDSIA. Google Scholar
  24. Powers, R., & Shoham, Y. (2003). Computing best response strategies via sampling (Tech. rep.). Stanford. Google Scholar
  25. Powers, R., & Shoham, Y. (2005). New criteria and a new algorithm for learning in multi-agent systems. In L.K. Saul, Y. Weiss, & L. Bottou (Eds.), Advances in neural information processing systems 17 (NIPS-2004) (pp. 1089–1096). Google Scholar
  26. Robinson, J. (1951). An iterative method of solving a game. The Annals of Mathematics, 2nd Series, 54, 296–301. Google Scholar
  27. Russell, S., & Norvig, P. (1995). Artificial intelligence: a modern approach (1st ed.). Upper Saddle River: Prentice Hall. MATHGoogle Scholar
  28. Shi, J., & Littman, M. (2001). Abstraction models for game theoretic poker. In Computers and games (CG-2000) (pp. 333–345). Google Scholar
  29. Southey, F., Bowling, M., Larson, B., Piccione, C., Burch, N., Billings, D., & Rayner, C. (2005). Bayes’ bluff: opponent modelling in poker. In Proceedings of the 21st conference on uncertainty in artificial intelligence (UAI-2005) (pp. 550–558). Google Scholar
  30. von Neumann, J., & Morgenstern, O. (1947). The theory of games and economic behavior (2nd ed.). Princeton: Princeton University Press. Google Scholar
  31. Zinkevich, M. (2004). Theoretical guarantees for algorithms in multi-agent settings. Ph.D. thesis, Carnegie Mellon University, Pittsburgh, PA. Google Scholar
  32. Zinkevich, M., Bowling, M., & Burch, N. (2007a). A new algorithm for generating equilibria in massive zero-sum games. In Proceedings of the 22nd national conference on artificial intelligence (AAAI 2007) (pp. 788–793). Google Scholar
  33. Zinkevich, M., Bowling, M., Johanson, M., & Piccione, C. (2007b). Regret minimization in games with incomplete information. In Advances in neural information processing systems 20 (NIPS 2007) (pp. 721–728). Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2008

Authors and Affiliations

  • Finnegan Southey
    • 1
  • Bret Hoehn
    • 2
  • Robert C. Holte
    • 2
  1. 1.GoogleMountain ViewUSA
  2. 2.Dept. of Computing ScienceUniversity of AlbertaEdmontonCanada

Personalised recommendations