A Reinforcement Learning Algorithm Applied to Simplified Two-Player Texas Hold’em Poker
We point out that value-based reinforcement learning, such as TDand Q-learning, is not applicable to games of imperfect information. We give a reinforcement learning algorithm for two-player poker based on gradient search in the agents’ parameter spaces. The two competing agents experiment with different strategies, and simultaneously shift their probability distributions towards more successful actions. The algorithm is a special case of the lagging anchor algorithm, to appear in the journal Machine Learning. We test the algorithm on a simplified, yet non-trivial, version of two-player Hold’em poker, with good results.
Unable to display preview. Download preview PDF.
- 1.Dahl, F.A.: The lagging anchor algorithm. Reinforcement learning in two-player zero-sum games with imperfect information. Machine Learning (to appear).Google Scholar
- 2.Owen, G.: Game Theory. 3rd ed. Academic Press, San Diego (1995).Google Scholar
- 3.Sutton, R.S.: Learning to predict by the methods of temporal differences. Machine Learning 3 (1988) 9–44.Google Scholar
- 4.Watkins, C.J.C.H.: Learning from Delayed Rewards. PhD thesis, University of Cambridge, UK (1989).Google Scholar
- 7.Littman, M.L.: Markov games as a framework for multi-agent reinforcement learning. In: Proceedings of the 11th International Conference on Machine Learning, Morgan Kaufmann, New Brunswick (1994) 157–163.Google Scholar
- 8.Dahl F.A., Halck O.M.: Minimax TD-learning with neural nets in a Markov game. In: Lopez de Mantaras, R., Plaza, E. (eds.): ECML 2000. Proceedings of the 11th European Conference on Machine Learning. Lecture Notes in Computer Science Vol. 1810, Springer-Verlag, Berlin-Heidelberg-New York (2000) 117–128.CrossRefGoogle Scholar
- 12.Schaeffer, J., Billings, D., Peña, L., Szafron, D.: Learning to play strong poker. In: Fürnkranz, J., Kubat, M. (eds.): Proceedings of the ICML-99-Workshop on Machine Learning in Game Playing, Jozef Stefan Institute, Ljubljana (1999).Google Scholar
- 14.Selten R. (1991). Anticipatory learning in two-person games, in: Selten, R. (ed.): Game equilibrium models, vol. I: Evolution and game dynamics, Springer-Verlag, Berlin.Google Scholar
- 15.Halck, O.M., Dahl, F.A.: On classification of games and evaluation of players — with some sweeping generalizations about the literature. In: Fürnkranz, J., Kubat, M. (eds.): Proceedings of the ICML-99-Workshop on Machine Learning in Game Playing, Jozef Stefan Institute, Ljubljana (1999).Google Scholar