First Results from Using Temporal Difference Learning in Shogi
Abstract
This paper describes first results from the application of Temporal Difference learning [1] to shogi. We report on experiments to determine whether sensible values for shogi pieces can be obtained in the same manner as for western chess pieces [2]. The learning is obtained entirely from randomised self-play, without access to any form of expert knowledge. The piece values are used in a simple search program that chooses shogi moves from a shallow lookahead, using pieces values to evaluate the leaves, with a random tie-break at the top level. Temporal difference learning is used to adjust the piece values over the course of a series of games. The method is successful in learning values that perform well in matches against hand-crafted values.
Keywords
Learning Shogi Temporal Difference Minimax Search GameplayingPreview
Unable to display preview. Download preview PDF.
References
- 1.Sutton, R.S.: Learning to Predict by the Methods of Temporal Differences. Machine Learning 3 (1988) 9–44Google Scholar
- 2.Beal, D.F. and Smith, M.C.: Learning Piece Values Using Temporal Differences International Computer Chess Association Journal, Vol. 20, No. 3 (1997) 147–151Google Scholar
- 3.Levinson, R. and Snyder, R.: Adaptive Pattern Oriented Chess. Proceedings of AAAI-91, Morgan-Kaufman (1991) 601–605Google Scholar
- 4.Christensen, J. and Korf, R.: A Unified Theory of Heuristic Evaluation Functions and its Application to Learning.. AAAI-86, Morgan-Kaufman (1986) 148–152Google Scholar
- 5.Baxter, J., Tridgell, A. and Weaver, L.: KnightCap: A chess program that learns by combining TD(lambda) with game-tree search. In: Machine Learning, Proceedings of the Fifteenth International Conference (ICML’ 98), Madison (1998) 28–36Google Scholar
- 6.Fairbairn, J.: Shogi for Beginners. Ishi Press International (1989)Google Scholar
- 7.Leggett, T.: Shogi: Japan’s Game of Strategy. Charles E. Tuttle Company [Reprinted in 1993, first published in 1966]Google Scholar
- 8.Matsubara, H., Iida, H. and Grimbergen, R.: Natural Developments in Game Research: From Chess to Shogi to Go International Computer Chess Association Journal, Vol. 19, No. 2 (1996) 103–112Google Scholar
- 9.Tesauro, G.: Practical Issues in Temporal Difference Learning. Machine Learning 8 (1988) 9–44Google Scholar
- 10.Tesauro, G.: TD-Gammon, a Self-Teaching Backgammon Program, achieves Master Level Play. Neural Computation, Vol. 6, No. 2 (1994) 215–219CrossRefGoogle Scholar
- 11.Marsland, T.A.: Computer Chess and Search. In: Shapiro, S. (ed.) Encyclopaedia of Artificial Intelligence. 2nd edn. J. Wiley & Sons (1992)Google Scholar
- 12.Beal, D.F.: Experiments with the Null Move. In: Beal, D.F. (ed.) Advances in Computer Chess 5. Elsevier Science Publishers (1989) 65–79Google Scholar
- 13.Donninger, C.: Null Move and Deep Search: Selective Search Heuristics for Obtuse Chess Programs. International Computer Chess Association Journal, Vol. 16, No. 3 (1993) 137–143Google Scholar
- 14.Mutz, M.: Gnu Shogi v1.2p03. Available from many sources, including ftp://ftp.unipassau. de/pub/local/shogi (1994)
- 15.Yamashita, H.: YSS: About the Data Structures and the Algorithm. Published on the WWW at http://plaza15.mbn.or.jp/~yss (1997)