A Reinforcement Learning Algorithm Applied to Simplified Two-Player Texas Hold’em Poker

  • Fredrik A. Dahl
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2167)


We point out that value-based reinforcement learning, such as TDand Q-learning, is not applicable to games of imperfect information. We give a reinforcement learning algorithm for two-player poker based on gradient search in the agents’ parameter spaces. The two competing agents experiment with different strategies, and simultaneously shift their probability distributions towards more successful actions. The algorithm is a special case of the lagging anchor algorithm, to appear in the journal Machine Learning. We test the algorithm on a simplified, yet non-trivial, version of two-player Hold’em poker, with good results.


Reinforcement Learning Imperfect Information Game State Matrix Game Reinforcement Learn Algorithm 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Dahl, F.A.: The lagging anchor algorithm. Reinforcement learning in two-player zero-sum games with imperfect information. Machine Learning (to appear).Google Scholar
  2. 2.
    Owen, G.: Game Theory. 3rd ed. Academic Press, San Diego (1995).Google Scholar
  3. 3.
    Sutton, R.S.: Learning to predict by the methods of temporal differences. Machine Learning 3 (1988) 9–44.Google Scholar
  4. 4.
    Watkins, C.J.C.H.: Learning from Delayed Rewards. PhD thesis, University of Cambridge, UK (1989).Google Scholar
  5. 5.
    Szepesvari, C., Littman, M.L.: A unified analysis of value-function-based reinforcement learning algorithms. Neural Computation 11 (1999) 2017–2060.CrossRefGoogle Scholar
  6. 6.
    Tesauro, G.J.: Practical issues in temporal difference learning. Machine Learning 8 (1992) 257–277.zbMATHGoogle Scholar
  7. 7.
    Littman, M.L.: Markov games as a framework for multi-agent reinforcement learning. In: Proceedings of the 11th International Conference on Machine Learning, Morgan Kaufmann, New Brunswick (1994) 157–163.Google Scholar
  8. 8.
    Dahl F.A., Halck O.M.: Minimax TD-learning with neural nets in a Markov game. In: Lopez de Mantaras, R., Plaza, E. (eds.): ECML 2000. Proceedings of the 11th European Conference on Machine Learning. Lecture Notes in Computer Science Vol. 1810, Springer-Verlag, Berlin-Heidelberg-New York (2000) 117–128.CrossRefGoogle Scholar
  9. 9.
    Koller, D., Megiddo, N., von Stengel, B.: Efficient computation of equilibria for extensive two-person games. Games and Economic Behavior 14 (1996) 247–259.zbMATHCrossRefMathSciNetGoogle Scholar
  10. 10.
    Luce, R.D., Raiffa, H.: Games and Decisions. Wiley, New York (1957).zbMATHGoogle Scholar
  11. 11.
    Koller, D., Pfeffer, A.: Representations and solutions for game-theoretic problems. Artificial Intelligence 94 (1997) 167–215.zbMATHCrossRefMathSciNetGoogle Scholar
  12. 12.
    Schaeffer, J., Billings, D., Peña, L., Szafron, D.: Learning to play strong poker. In: Fürnkranz, J., Kubat, M. (eds.): Proceedings of the ICML-99-Workshop on Machine Learning in Game Playing, Jozef Stefan Institute, Ljubljana (1999).Google Scholar
  13. 13.
    Hassoun, M.H.: Fundamentals of Artificial Neural Networks. MIT Press, Cambridge, Massachusetts (1995).zbMATHGoogle Scholar
  14. 14.
    Selten R. (1991). Anticipatory learning in two-person games, in: Selten, R. (ed.): Game equilibrium models, vol. I: Evolution and game dynamics, Springer-Verlag, Berlin.Google Scholar
  15. 15.
    Halck, O.M., Dahl, F.A.: On classification of games and evaluation of players — with some sweeping generalizations about the literature. In: Fürnkranz, J., Kubat, M. (eds.): Proceedings of the ICML-99-Workshop on Machine Learning in Game Playing, Jozef Stefan Institute, Ljubljana (1999).Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Fredrik A. Dahl
    • 1
  1. 1.Norwegian Defence Research Establishment (FFI)KjellerNorway

Personalised recommendations