Advertisement

Machine Learning

, Volume 40, Issue 3, pp 243–263 | Cite as

Learning to Play Chess Using Temporal Differences

  • Jonathan Baxter
  • Andrew Tridgell
  • Lex Weaver
Article

Abstract

In this paper we present TDLEAF(λ), a variation on the TD(λ) algorithm that enables it to be used in conjunction with game-tree search. We present some experiments in which our chess program “KnightCap” used TDLEAF(λ) to learn its evaluation function while playing on Internet chess servers. The main success we report is that KnightCap improved from a 1650 rating to a 2150 rating in just 308 games and 3 days of play. As a reference, a rating of 1650 corresponds to about level B human play (on a scale from E (1000) to A (1800)), while 2150 is human master level. We discuss some of the reasons for this success, principle among them being the use of on-line, rather than self-play. We also investigate whether TDLEAF(λ) can yield better results in the domain of backgammon, where TD(λ) has previously yielded striking success.

temporal difference learning neural network TDLEAF chess backgammon 

References

  1. Beal, D. F. & Smith, M. C. (1997). Learning piece values using temporal differences. Journal of The International Computer Chess Association.Google Scholar
  2. Bertsekas, D. P. & Tsitsiklis, J. N. (1996). Neuro-Dynamic Programming. Athena Scientific.Google Scholar
  3. Marsland, T. A. & Schaeffer, J. (1990). Computers, Chess and Cognition. Springer Verlag.Google Scholar
  4. Plaat, A., Schaeffer, J., Pijls, W., & de Bruin, A. (1996). Best-first fixed-depth minmax algorithms. Artificial Intelligence, 87, 255–293.Google Scholar
  5. Pollack, J., Blair, A., & Land, M. (1996). Coevolution of a backgammon player. In Proceedings of the Fifth Artificial Life Conference, Nara, Japan.Google Scholar
  6. Samuel, A. L. (1959). Some studies in machine learning using the game of checkers. IBM Journal of Research and Development, 3, 210–229.Google Scholar
  7. Schaeffer, J. (1989). The history of heuristic and alpha-beta search enhancements in practice. IEEE Transactions on Pattern Analysis and Machine Learning, 11(11), 1203–1212.Google Scholar
  8. Schraudolph, N., Dayan, P., & Sejnowski, T. (1994). Temporal difference learning of position evaluation in the game of go. In J. Cowan, G. Tesauro, & J. Alspector (Eds.), Advances in Neural Information Processing Systems 6. San Fransisco: Morgan Kaufmann.Google Scholar
  9. Sutton, R. (1988). Learning to predict by the method of temporal differences. Machine Learning, 3, 9–44.Google Scholar
  10. Sutton, R. S. & Barto, A. G. (1998). Reinforcement Learning: An Introduction. Cambridge MA: MIT Press. ISBN 0–262–19398–1.Google Scholar
  11. Tesauro, G. (1992). Practical issues in temporal difference learning. Machine Learning, 8, 257–278.Google Scholar
  12. Tesauro, G. (1994). TD-Gammon, a self-teaching backgammon program, achieves master-level play. Neural Computation, 6, 215–219.Google Scholar
  13. Thrun, S. (1995). Learning to play the game of chess. In G. Tesauro, D. Touretzky, & T. Leen (Eds.), Advances in Neural Information Processing Systems 7. San Fransisco: Morgan Kaufmann.Google Scholar
  14. Tridgell, A. (1997). KnightCap—A parallel chess program on the AP1000+. In Proceedings of the Seventh Fujitsu Parallel Computing Workshop, Canberra, Australia. ftp://samba.anu.edu.au/tridge/knightcap_pcw97.ps.gz source code: http://wwwsysneg.anu.edu.au/lsg.Google Scholar
  15. Tsitsiklis, J. N. & Roy, B. V. (1997). An analysis of temporal difference learning with function approximation. IEEE Transactions on Automatic Control, 42(5), 674–690.Google Scholar
  16. Walker, S., Lister, R., & Downs, T. (1993). On self-learning patterns in the othello board game by the method of temporal differences. In C. Rowles, H. Liu, & N. Foo (Eds.), Proceedings of the 6th Australian Joint Conference on Artificial Intelligence (pp. 328–333). Melbourne: World Scientific.Google Scholar

Copyright information

© Kluwer Academic Publishers 2000

Authors and Affiliations

  • Jonathan Baxter
    • 1
  • Andrew Tridgell
    • 2
  • Lex Weaver
    • 3
  1. 1.Department of Systems EngineeringAustralian National UniversityAustralia
  2. 2.Department of Computer ScienceAustralian National UniversityAustralia
  3. 3.Department of Computer ScienceAustralian National UniversityAustralia

Personalised recommendations