Learning Time Allocation Using Neural Networks
Abstract
The strength of a game-playing program is mainly based on the adequacy of the evaluation function and the efficacy of the search algorithm. This paper investigates how temporal difference learning and genetic algorithms can be used to improve various decisions made during game-tree search. The existent TD algorithms are not directly suitable for learning search decisions. Therefore we propose a modified update rule that uses the TD error of the evaluation function to shorten the lag between two rewards. The genetic algorithms can be applied directly to learn search decisions. For our experiments we selected the problem of time allocation from the set of search decisions. On each move the player can decide on a certain search depth, being constrained by the amount of time left. As testing ground, we used the game of Lines of Action, which has roughly the same complexity as Othello. From the results we conclude that both the TD and the genetic approach lead to good results when compared to the existent time-allocation techniques. Finally, a brief discussion of the issues that can emerge when the algorithms are applied to more complex search decisions is given.
keywords
temporal difference learning genetic algorithms search decisions time allocation lines of ActionPreview
Unable to display preview. Download preview PDF.
References
- 1.T.S. Anantharaman. Evaluation tuning for computer chess: Linear discriminant methods. ICCA Journal, 20(4):224–242, 1997.Google Scholar
- 2.E.B. Baum and W.D. Smith. A bayesian approach to relevance in game playing. Artificial Intelligence, 97(1-2):195–242, 1997.MATHCrossRefMathSciNetGoogle Scholar
- 3.J. Baxter, A. Tridgell, and L. Weaver. Experiments in parameter learning using temporal differences. ICCA Journal, 21(2):84–99, 1998.Google Scholar
- 4.D.F. Beal and M.C. Smith. Learning piece values using temporal difference learning. ICCA Journal, 20(3):147–151, 1997.Google Scholar
- 5.D.F. Beal and M.C. Smith. Temporal difference learning for heuristic search and game playing. Information Sciences, 122(1):3–21, 2000.CrossRefGoogle Scholar
- 6.Y. Björnsson and T. Marsland. Learning search control in adversary games. In H.J. van den Herik and B. Monien, editors, Proceedings of the Advances in Computer Games 9 Conference, 2000.Google Scholar
- 7.M. Buro. Experiments with Multi-ProbCut and a new high-quality evaluation function for Othello. In H. J. van den Herik and H. Iida, editors, Games in AI Research. 1999.Google Scholar
- 8.D. Carmel and S. Markovitch. Incorporating opponent models into adversary search. In Proceedings of the Thirteenth National Conference on Artificial Intelligence (AAAI-96), pages 120–125, 1996.Google Scholar
- 9.K. Chellapilla and D.B. Fogel. Co-evolving checkers playing programs using only win, lose, or draw. In Proceedings of SPIE’s AeroSense’99: Applications and Science of Computational Intelligence II, 1999.Google Scholar
- 10.D.E. Goldberg. Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley, Reading, MA, 1989.MATHGoogle Scholar
- 11.D. Harada and S. Russell. Extended abstract: Learning search strategies. In AAAI Spring Symposium on Search Techniques for Problem Solving under Uncertainty and Incomplete Information, 1999.Google Scholar
- 12.E.A. Heinz. Adaptive null-move pruning. ICCA Journal, 22(3):123–132, 1999.Google Scholar
- 13.R.M. Hyatt. Using time wisely. ICCA Journal, 7(1):4–9, 1984.Google Scholar
- 14.H. Iida, J.W.H.M. Uiterwijk, H.J. van den Herik, and I.S. Herschberg. Potential applications of opponent-model search. Part 1: The domain of applicability. ICCA Journal, 16(4):201–208, 1993.Google Scholar
- 15.T. Jaakkola, S. Singh, and M. Jordan. Reinforcement learning algorithm for partially observable markov problems. In Advances in Neural Information Processing Systems 7, pages 345–352, 1994.Google Scholar
- 16.L.P. Kaelbling, M.L. Littman, and A.R. Cassandra. Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101(1-2):99–134, 1998.MATHCrossRefMathSciNetGoogle Scholar
- 17.H. Kimura and S. Kobayashi. An analysis of actor/critic algorithms using eligibility traces: Reinforcement learning with imperfect value function. In Proceedings of the 15th International Conference on Machine Learning, pages 278–286, 1998.Google Scholar
- 18.V.R. Konda and V.S. Borkar. Actor-critic type learning algorithms for markov decision processes. SIAM Journal of Control and Optimisation, 38(1):94–133, 1999.MATHCrossRefMathSciNetGoogle Scholar
- 19.V.R. Konda and J.N. Tsitsiklis. Actor-critic algorithms. In Advances in Neural Information Processing Systems 12, 2000.Google Scholar
- 20.B.C. Kuszmaul. The StarTech massively parallel chess program. ICCA Journal, 18(1):3–19, 1995.Google Scholar
- 21.C. Leiserson. Using the Cilk multithreaded programming language to implement a multiprocessor chess program. In H.J. van den Herik and B. Monien, editors, Proceedings of the Advances in Computer Games 9 Conference, 2000.Google Scholar
- 22.S. Markovitch and Y. Sella. Learning of resource allocation strategies for game playing. Computational Intelligence, 12(1):88–105, 1996.CrossRefGoogle Scholar
- 23.T.M. Mitchell, R.M. Keller, and S. Kedar-Cabelli. Explanation-based generalization: A unifying view. Machine Learning, 1(1):47–80, 1986.Google Scholar
- 24.D.E. Moriarty and R. Miikkulainen. Hierarchical evolution of neural networks. In Proceedings of the 1998 IEEE Conference on Evolutionary Computation, pages 428–433, 1998.Google Scholar
- 25.D.E. Moriarty, A.C. Schultz, and J.J. Grefenstette. Evolutionary algorithms for reinforcement learning. Journal of Artificial Intelligence Research, 11:241–276, 1999.MATHGoogle Scholar
- 26.N. Richards, D. Moriarty, and R. Miikkulainen. Evolving neural networks to play Go. Applied Intelligence, 8:85–96, 1998.CrossRefGoogle Scholar
- 27.S. Russell and E.H. Wefald. Do the Right Thing: Studies in Limited Rationality. MIT Press, 1991.Google Scholar
- 28.A.L. Samuel. Some studies in machine learning using the game of checkers. IBM Journal of Research and Development, 3(3):211–229, 1959.MathSciNetCrossRefGoogle Scholar
- 29.J. Schaeffer and A. Plaat. Kasparov versusDEEPBLUE: The rematch. ICCA Journal, 20(2):95–101, 1997.Google Scholar
- 30.Y. Seirawan. The Kasparov-DEEP BLUE games. ICCA Journal, 20(2):102–125, 1997.Google Scholar
- 31.R.S. Sutton and A.G. Barto. Reinforcement Learning: An Introduction. MIT Press, 1998.Google Scholar
- 32.R.S. Sutton, D. McAllester, S. Singh, and Y. Mansour. Policy gradient methods for reinforcement learning with function approximation. In Advances in Neural Information Processing Systems 12, pages 1057–1063, 2000.Google Scholar
- 33.G.J. Tesauro. Practical issues in temporal difference learning. Machine Learning, 8:257–277, 1992.MATHGoogle Scholar
- 34.S. Thrun. Learning to play the game of chess. In Advances in Neural Information Processing Systems 7, pages 1069–1076, 1995.Google Scholar
- 35.M.H.M. Winands. Analysis and implementation of Lines of Action. Master’s thesis, Department of Computer Science, Universiteit Maastricht, 2000.Google Scholar