Comparison of TDLeaf(λ) and TD(λ) Learning in Game Playing Domain
In this paper we compare the results of applying TD(λ) and TDLeaf(λ) algorithms to the game of give-away checkers. Experiments show comparable performance of both algorithms in general, although TDLeaf(λ) seems to be less vulnerable to weight over-fitting. Additional experiments were also performed in order to test three learning strategies used in self-play. The best performance was achieved when the weights were modified only after non-positive game outcomes, and also in the case when the training procedure was focused on stronger opponents. TD-learning results are also compared with a pseudo-evolutionary training method.
Unable to display preview. Download preview PDF.
- 1.Sutton, R.: Learning to predict by the method of temporal differences. Machine Learning 3, 9–44 (1988)Google Scholar
- 3.Baxter, J., Tridgell, A., Weaver, L.: Knightcap: A chess program that learns by combining td(λ) with game-tree search. In: Machine Learning, Proceedings of the Fifteenth International Conference (ICML 1998), Madison Wisconsin, pp. 28–36 (1998)Google Scholar
- 4.Schaeffer, J., Hlynka, M., Jussila, V.: Temporal difference learning applied to a highperformance game-playing program. In: International Joint Conference on Artificial Intelligence (IJCAI), pp. 529–534 (2001)Google Scholar
- 5.Schraudolph, N.N., Dayan, P., Sejnowski, T.J.: Learning to evaluate go positions via temporal difference methods. In: Baba, N., Jain, L. (eds.) Computational Intelligence in Games, vol. 62. Springer, Berlin (2001)Google Scholar
- 7.Alemanni, J.B.: Give-away checkers (1993), http://perso.wanadoo.fr/alemanni/giveaway.html
- 8.Kotnik, C., Kalita, J.K.: The significance of temporal-difference learning in self-play training td-rummy versus evo-rummy. In: Fawcett, T., Mishra, N. (eds.) Machine Learning, Proceedings of the Twentieth International Conference (ICML 2003), Washington, DC, USA, pp. 369–375. AAAI Press, Menlo Park (2003)Google Scholar