Improving Temporal Difference Learning Performance in Backgammon Variants
Conference paper
- 4 Citations
- 1.3k Downloads
Abstract
Palamedes is an ongoing project for building expert playing bots that can play backgammon variants. As in all successful modern backgammon programs, it is based on neural networks trained using temporal difference learning. This paper improves upon the training method that we used in our previous approach for the two backgammon variants popular in Greece and neighboring countries, Plakoto and Fevga. We show that the proposed methods result both in faster learning as well as better performance. We also present insights into the selection of the features in our experiments that can be useful to temporal difference learning in other games as well.
Keywords
Game Sequence Benchmark Program Open Source Program Temporal Difference Learn Training Game
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Preview
Unable to display preview. Download preview PDF.
References
- 1.BackGammon Variants, http://www.bkgm.com/variants
- 2.Baxter, J., Tridgell, A., Weaver, L.: Knightcap: a chess program that learns by combining td(lambda) with game-tree search. In: Shavlik, J.W. (ed.) Proc. 15th International Conf. on Machine Learning, pp. 28–36. Morgan Kaufmann, San Francisco (2001)Google Scholar
- 3.Baxter, J., Tridgell, A., Weaver, L.: Tdleaf(): Combining temporal difference learning with game-tree search. Australian Journal of Intelligent Information Processing Systems 5(1), 39–43 (1998)Google Scholar
- 4.Hauk, T., Buro, M., Schaeffer, J.: *-Minimax Performance in Backgammon. In: van den Herik, H.J., Björnsson, Y., Netanyahu, N.S. (eds.) CG 2004. LNCS, vol. 3846, pp. 51–66. Springer, Heidelberg (2006)CrossRefGoogle Scholar
- 5.Michie, D.: Game-playing and game-learning automata. In: Fox, L. (ed.) Advances in Programming and Non-Numerical Computation, pp. 183–200 (1966)Google Scholar
- 6.Palamedes, http://csse.uom.gr/~nikpapa/software.html
- 7.Papahristou, N., Refanidis, I.: Training Neural Networks to Play Backgammon Variants Using Reinforcement Learning. In: Di Chio, C., Cagnoni, S., Cotta, C., Ebner, M., Ekárt, A., Esparcia-Alcázar, A.I., Merelo, J.J., Neri, F., Preuss, M., Richter, H., Togelius, J., Yannakakis, G.N. (eds.) EvoApplications 2011, Part I. LNCS, vol. 6624, pp. 113–122. Springer, Heidelberg (2011)CrossRefGoogle Scholar
- 8.Pubeval source code backgammon benchmark player, http://www.bkgm.com/rgb/rgb.cgi?view+610
- 9.Schaeffer, J., Hlynka, M., Vili, J.: Temporal Difference Learning Applied to a High-Performance Game-Playing Program. In: Proceedings IJCAI, pp. 529–534 (2001)Google Scholar
- 10.Sutton, R.S.: Learning to predict by the methods of temporal differences. Machine Learning 3(1), 9–44 (1988)Google Scholar
- 11.Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Indroduction. MIT Press (1998)Google Scholar
- 12.Szepesvári, C.: Algorithms for Reinforcement Learning (Electronic Draft Version) (August 2010), http://www.sztaki.hu/~szcsaba/papers/RLAlgsInMDPs-lecture.pdf
- 13.
- 14.Tesauro, G.: Practical issues in temporal differnce learning. Machine Learning 4, 257–277 (1992)Google Scholar
- 15.Tesauro, G.: Programming backgammon using self-teching neural nets. Artificial Intelligence 134, 181–199 (2002)zbMATHCrossRefGoogle Scholar
- 16.Tesauro, G.: Temporal Difference Learning and TD-Gammon. Communications of the ACM 38(3), 58–68 (1995)CrossRefGoogle Scholar
- 17.Veness, J., Silver, D., Uther, W., Blair, A.: Bootstrapping from Game Tree Search. In: Advances in Neural Information Processing Systems, vol. 22, pp. 1937–1945 (2009)Google Scholar
- 18.Wiering, M.A.: Self-Play and Using an Expert to Learn to Play Backgammon with Temporal Difference Learning. Journal of Intelligent Learning Systems and Applications 2, 57–68 (2010)CrossRefGoogle Scholar
Copyright information
© Springer-Verlag Berlin Heidelberg 2012