Chess Neighborhoods, Function Combination, and Reinforcement Learning
 Robert Levinson,
 Ryan Weber
 … show all 2 hide
Abstract
Over the years, various research projects have attempted to develop a chess program that learns to play well given little prior knowledge beyond the rules of the game. Early on it was recognized that the key would be to adequately represent the relationships between the pieces and to evaluate the strengths or weaknesses of such relationships. As such, representations have developed, including a graphbased model. In this paper we extend the work on graph representation to a precise type of graph that we call a piece or square neighborhood. Specifically, a chessboard is represented as 64 neighborhoods, one for each square. Each neighborhood has a center, and 16 satellites corresponding to the pieces that are immediately close on the 4 diagonals, 2 ranks, 2 files, and 8 knight moves related to the square. Games are played and training values for boards are developed using temporal difference learning, as in other reinforcement learning systems. We then use a 2layer regression network to learn. At the lower level the values (expected probability of winning) of the neighborhoods are learned and at the top they are combined based on their product and entropy. We report on relevant experiments including a learning experience on the Internet Chess Club (ICC) from which we can estimate a rating for the new program. The level of chess play achieved in a few days of training is comparable to a few months of work on previous systems such as Morph which is described as “one of the best fromscratch game learning systems, perhaps the best” [22].
 Allen, J., Hamilton, E., Levinson, R., Herik, H.J., Uiterwijk, J.W.H. (1997) New Advances in Adaptive PatternOriented Chess. Advances in Computer Chess. Universiteit Maastricht, The Netherlandspp. 312233
 Baxter, J., Tridgell, A., Weaver, L. (1998) A chess program that learns by combining TD(λ) with game tree search. Proceedings of the 15th International Conference on Machine Learning (ICML98). Morgan Kaufmann, Madision, WI, pp. 2836
 Ballard, D. H. An Introduction to Natural Computation. Cambridge: MIT Press.
 Beal, D. F., Smith, M.C. (1994) Random Evaluation in Chess. ICCA Journal 17: pp. 39
 Beal, D. F., & Smith, M.C. Learning Piece Values Using Temporal Differences. Journal of The International Computer Chess Association, September 1997.
 Beal, D. F., Smith, M.C. First results from using temporal difference learning in Shogi. In: Herik, H. J., Iida, H. eds. (1998) Proceedings of the First International Conference on Computers and Games ( CG98). SpringerVerlag, Tsukuba, Japanpp. 114
 Bishop, Christopher M. Neural Networks for Pattern Recognition, Oxford Univ. Press, 1998. ISBN0198538642.
 Bradtke, S. J., Barto, A. G. (1996) Linear leastsquares algorithms for temporal difference learning. Machine Learning 22: pp. 3357
 Christensen, J. and Korf, R. (1986). A unified theory of heuristic evaluation functions and its applications to learning. Proceedings of AAAI86 (pp. 148–152).
 Fürnkranz, J. (1996) Machine learning in computer chess: The next generation. International Computer Chess Association Journal 19: pp. 147160
 Gherrity, M. A GameLearning Machine. Ph.D thesis. University of California, San Diego. San Diego, CA. 1993.
 Helmbold, D. P., Kivinen, J., Warmuth, M. K. (1996) Worstcase loss bounds for sigmoided linear neurons. Advances in Neural Information Processing Systems. MIT Press, Cambridge, MA
 Herik, H.J. van den. A New Research Scope. International Computer Chess Association Journal 21(4), 1998.
 Kivinen, J., Warmuth, M. K. (1998) Additive versus exponentiated gradient updates for linear prediction. Information and Computation 2: pp. 285318
 Levinson, R. A., and Snyder, R. (1991). Adaptive patternoriented chess. In L. Birnbaum and G. Collins (Eds.), Proceedings of the 8th International Workshop on Machine Learning, pp. 85–89, Morgan Kaufmann.
 Levinson, R. A., Snyder, R. (1993) Distance: Towards the Unification of Chess Knowledge. International Computer Chess Association Journal 16: pp. 123136
 Levinson, R. A., and Weber, R. J. (2000). “Patternlevel Temporal Difference Learning, Data Fusion, and Chess”. In SPIE’s 14^{th} Annual Conference on Aerospace/Defense Sensing and Controls: Sensor Fusion: Architectures, Algorithms, and Applications IV.
 Littlestone, N., Long, P.M., Warmuth, M. K. (1995) Online learning of linear functions. Journal of Computational Complexity 5: pp. 123 CrossRef
 Pearl, J. (1984) Heuristics: Intelligent Search Strategies for Computer Problem Solving. AddisonWesley, Reading, Massachusetts
 Pellen, Luke. Neural net chess program Octavius: http://home.seol.net.au/luke/Octavius (1999).
 Samuel, A. (1959) Some studies in machine learning using the game of checkers. IBM J. of Research and Development 3: pp. 210229 CrossRef
 Scott, J. Machine Learning in Games: the Morph Project, Swarthmore College, Swarthmore, PA. http://forum.swarthmore.edu/~jay/learngame/projects/morph.html.
 Slate, D.J. (1987) A chess program that uses its transposition table to learn from experience. International Computer Chess Association Journal 10: pp. 5971
 Sutton, R. S. (1988) Learning to predict by the methods of temporal differences. Machine Learning 3: pp. 944
 Sutton, R. S., Barto, A.G. (1998) Reinforcement Learning: An Introduction. MIT Press, Cambridge
 Tesauro, G. Temporal Difference Learning and TDGammon. Communications of the ACM, Vol 38, No 3, March 1995.
 Tesauro, G. (1992) Practical Issues in Temporal Difference Learning. Machine Learning 8: pp. 257278
 Thrun, S., 1995. Learning to Play the Game of Chess. In Advances in Neural Information Processing Systems (NIPS) 7, G. Tesauro, D. Touretzky, and T. Leen (eds.), MIT Press.
 Widrow, B., Stearns, S. (1985) Adaptive Signal Processing. Prentice Hall, Engelwood Cliffs, NJ
 Title
 Chess Neighborhoods, Function Combination, and Reinforcement Learning
 Book Title
 Computers and Games
 Book Subtitle
 Second International Conference, CG 2000 Hamamatsu, Japan, October 26–28, 2000 Revised Papers
 Book Part
 Part 2
 Pages
 pp 133150
 Copyright
 2001
 DOI
 10.1007/3540455795_9
 Print ISBN
 9783540430803
 Online ISBN
 9783540455790
 Series Title
 Lecture Notes in Computer Science
 Series Volume
 2063
 Series ISSN
 03029743
 Publisher
 Springer Berlin Heidelberg
 Copyright Holder
 SpringerVerlag Berlin Heidelberg
 Additional Links
 Topics
 Keywords

 linear regression
 value function approximation
 temporal difference learning
 reinforcement learning
 computer chess
 exponentiated gradient
 gradient descent
 multilayer neural nets
 Industry Sectors
 eBook Packages
 Editors

 Tony Marsland ^{(4)}
 Ian Frank ^{(5)}
 Editor Affiliations

 4. Department of Computer Science, University of Alberta
 5. Future University  Hakodate
 Authors

 Robert Levinson ^{(6)}
 Ryan Weber ^{(6)}
 Author Affiliations

 6. University of California Santa Cruz, Santa Cruz, CA, 95064, USA
Continue reading...
To view the rest of this content please follow the download PDF link above.