Monte-Carlo Tree Search in Poker Using Expected Reward Distributions

  • Guy Van den Broeck
  • Kurt Driessens
  • Jan Ramon
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5828)


We investigate the use of Monte-Carlo Tree Search (MCTS) within the field of computer Poker, more specifically No-Limit Texas Hold’em. The hidden information in Poker results in so called miximax game trees where opponent decision nodes have to be modeled as chance nodes. The probability distribution in these nodes is modeled by an opponent model that predicts the actions of the opponents. We propose a modification of the standard MCTS selection and backpropagation strategies that explicitly model and exploit the uncertainty of sampled expected values. The new strategies are evaluated as a part of a complete Poker bot that is, to the best of our knowledge, the first exploiting no-limit Texas Hold’em bot that can play at a reasonable level in games of more than two players.


Decision Node Game Tree Opponent Model Human Player Chance Node 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Gilpin, A., Sandholm, T., Sørensen, T.: A heads-up no-limit Texas Hold’em poker player: discretized betting models and automatically generated equilibrium-finding programs. In: Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems, International Foundation for Autonomous Agents and Multiagent Systems Richland, SC, vol. 2, pp. 911–918 (2008)Google Scholar
  2. 2.
    Billings, D.: Algorithms and assessment in computer poker. PhD thesis, Edmonton, Alta, Canada (2006)Google Scholar
  3. 3.
    Kocsis, L., Szepesvari, C.: Bandit based monte-carlo planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 282–293. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  4. 4.
    Russell, S., Norvig, P.: Artificial intelligence: A modern approach. Prentice Hall, Englewood Cliffs (2003)Google Scholar
  5. 5.
    Suffecool, K.: Cactus kev’s poker hand evaluator (July 2007),
  6. 6.
    Witten Ian, H., Eibe, F.: Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, San Francisco (2005)zbMATHGoogle Scholar
  7. 7.
    Wang, Y., Witten, I.: Induction of model trees for predicting continuous classes (1996)Google Scholar
  8. 8.
    Coulom, R.: Efficient selectivity and backup operators in Monte-Carlo tree search. In: van den Herik, H.J., Ciancarini, P., Donkers, H.H.L.M(J.) (eds.) CG 2006. LNCS, vol. 4630, pp. 72–83. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  9. 9.
    Gelly, S., Wang, Y.: Exploration exploitation in go: UCT for Monte-Carlo go. In: Twentieth Annual Conference on Neural Information Processing Systems, NIPS 2006 (2006)Google Scholar
  10. 10.
    Chaslot, G., Winands, M., Herik, H., Uiterwijk, J., Bouzy, B.: Progressive strategies for monte-carlo tree search. New Mathematics and Natural Computation 4(3), 343 (2008)zbMATHCrossRefMathSciNetGoogle Scholar
  11. 11.
    Van Lishout, F., Chaslot, G., Uiterwijk, J.: Monte-Carlo Tree Search in Backgammon. In: Computer Games Workshop, pp. 175–184 (2007)Google Scholar
  12. 12.
    Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Machine Learning 47(2-3), 235–256 (2002)zbMATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Guy Van den Broeck
    • 1
  • Kurt Driessens
    • 1
  • Jan Ramon
    • 1
  1. 1.Department of Computer ScienceKatholieke Universiteit LeuvenBelgium

Personalised recommendations