Advertisement

Learning Time Allocation Using Neural Networks

  • Levente Kocsis
  • Jos Uiterwijk
  • Jaap van den Herik
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2063)

Abstract

The strength of a game-playing program is mainly based on the adequacy of the evaluation function and the efficacy of the search algorithm. This paper investigates how temporal difference learning and genetic algorithms can be used to improve various decisions made during game-tree search. The existent TD algorithms are not directly suitable for learning search decisions. Therefore we propose a modified update rule that uses the TD error of the evaluation function to shorten the lag between two rewards. The genetic algorithms can be applied directly to learn search decisions. For our experiments we selected the problem of time allocation from the set of search decisions. On each move the player can decide on a certain search depth, being constrained by the amount of time left. As testing ground, we used the game of Lines of Action, which has roughly the same complexity as Othello. From the results we conclude that both the TD and the genetic approach lead to good results when compared to the existent time-allocation techniques. Finally, a brief discussion of the issues that can emerge when the algorithms are applied to more complex search decisions is given.

keywords

temporal difference learning genetic algorithms search decisions time allocation lines of Action 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    T.S. Anantharaman. Evaluation tuning for computer chess: Linear discriminant methods. ICCA Journal, 20(4):224–242, 1997.Google Scholar
  2. 2.
    E.B. Baum and W.D. Smith. A bayesian approach to relevance in game playing. Artificial Intelligence, 97(1-2):195–242, 1997.MATHCrossRefMathSciNetGoogle Scholar
  3. 3.
    J. Baxter, A. Tridgell, and L. Weaver. Experiments in parameter learning using temporal differences. ICCA Journal, 21(2):84–99, 1998.Google Scholar
  4. 4.
    D.F. Beal and M.C. Smith. Learning piece values using temporal difference learning. ICCA Journal, 20(3):147–151, 1997.Google Scholar
  5. 5.
    D.F. Beal and M.C. Smith. Temporal difference learning for heuristic search and game playing. Information Sciences, 122(1):3–21, 2000.CrossRefGoogle Scholar
  6. 6.
    Y. Björnsson and T. Marsland. Learning search control in adversary games. In H.J. van den Herik and B. Monien, editors, Proceedings of the Advances in Computer Games 9 Conference, 2000.Google Scholar
  7. 7.
    M. Buro. Experiments with Multi-ProbCut and a new high-quality evaluation function for Othello. In H. J. van den Herik and H. Iida, editors, Games in AI Research. 1999.Google Scholar
  8. 8.
    D. Carmel and S. Markovitch. Incorporating opponent models into adversary search. In Proceedings of the Thirteenth National Conference on Artificial Intelligence (AAAI-96), pages 120–125, 1996.Google Scholar
  9. 9.
    K. Chellapilla and D.B. Fogel. Co-evolving checkers playing programs using only win, lose, or draw. In Proceedings of SPIE’s AeroSense’99: Applications and Science of Computational Intelligence II, 1999.Google Scholar
  10. 10.
    D.E. Goldberg. Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley, Reading, MA, 1989.MATHGoogle Scholar
  11. 11.
    D. Harada and S. Russell. Extended abstract: Learning search strategies. In AAAI Spring Symposium on Search Techniques for Problem Solving under Uncertainty and Incomplete Information, 1999.Google Scholar
  12. 12.
    E.A. Heinz. Adaptive null-move pruning. ICCA Journal, 22(3):123–132, 1999.Google Scholar
  13. 13.
    R.M. Hyatt. Using time wisely. ICCA Journal, 7(1):4–9, 1984.Google Scholar
  14. 14.
    H. Iida, J.W.H.M. Uiterwijk, H.J. van den Herik, and I.S. Herschberg. Potential applications of opponent-model search. Part 1: The domain of applicability. ICCA Journal, 16(4):201–208, 1993.Google Scholar
  15. 15.
    T. Jaakkola, S. Singh, and M. Jordan. Reinforcement learning algorithm for partially observable markov problems. In Advances in Neural Information Processing Systems 7, pages 345–352, 1994.Google Scholar
  16. 16.
    L.P. Kaelbling, M.L. Littman, and A.R. Cassandra. Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101(1-2):99–134, 1998.MATHCrossRefMathSciNetGoogle Scholar
  17. 17.
    H. Kimura and S. Kobayashi. An analysis of actor/critic algorithms using eligibility traces: Reinforcement learning with imperfect value function. In Proceedings of the 15th International Conference on Machine Learning, pages 278–286, 1998.Google Scholar
  18. 18.
    V.R. Konda and V.S. Borkar. Actor-critic type learning algorithms for markov decision processes. SIAM Journal of Control and Optimisation, 38(1):94–133, 1999.MATHCrossRefMathSciNetGoogle Scholar
  19. 19.
    V.R. Konda and J.N. Tsitsiklis. Actor-critic algorithms. In Advances in Neural Information Processing Systems 12, 2000.Google Scholar
  20. 20.
    B.C. Kuszmaul. The StarTech massively parallel chess program. ICCA Journal, 18(1):3–19, 1995.Google Scholar
  21. 21.
    C. Leiserson. Using the Cilk multithreaded programming language to implement a multiprocessor chess program. In H.J. van den Herik and B. Monien, editors, Proceedings of the Advances in Computer Games 9 Conference, 2000.Google Scholar
  22. 22.
    S. Markovitch and Y. Sella. Learning of resource allocation strategies for game playing. Computational Intelligence, 12(1):88–105, 1996.CrossRefGoogle Scholar
  23. 23.
    T.M. Mitchell, R.M. Keller, and S. Kedar-Cabelli. Explanation-based generalization: A unifying view. Machine Learning, 1(1):47–80, 1986.Google Scholar
  24. 24.
    D.E. Moriarty and R. Miikkulainen. Hierarchical evolution of neural networks. In Proceedings of the 1998 IEEE Conference on Evolutionary Computation, pages 428–433, 1998.Google Scholar
  25. 25.
    D.E. Moriarty, A.C. Schultz, and J.J. Grefenstette. Evolutionary algorithms for reinforcement learning. Journal of Artificial Intelligence Research, 11:241–276, 1999.MATHGoogle Scholar
  26. 26.
    N. Richards, D. Moriarty, and R. Miikkulainen. Evolving neural networks to play Go. Applied Intelligence, 8:85–96, 1998.CrossRefGoogle Scholar
  27. 27.
    S. Russell and E.H. Wefald. Do the Right Thing: Studies in Limited Rationality. MIT Press, 1991.Google Scholar
  28. 28.
    A.L. Samuel. Some studies in machine learning using the game of checkers. IBM Journal of Research and Development, 3(3):211–229, 1959.MathSciNetCrossRefGoogle Scholar
  29. 29.
    J. Schaeffer and A. Plaat. Kasparov versusDEEPBLUE: The rematch. ICCA Journal, 20(2):95–101, 1997.Google Scholar
  30. 30.
    Y. Seirawan. The Kasparov-DEEP BLUE games. ICCA Journal, 20(2):102–125, 1997.Google Scholar
  31. 31.
    R.S. Sutton and A.G. Barto. Reinforcement Learning: An Introduction. MIT Press, 1998.Google Scholar
  32. 32.
    R.S. Sutton, D. McAllester, S. Singh, and Y. Mansour. Policy gradient methods for reinforcement learning with function approximation. In Advances in Neural Information Processing Systems 12, pages 1057–1063, 2000.Google Scholar
  33. 33.
    G.J. Tesauro. Practical issues in temporal difference learning. Machine Learning, 8:257–277, 1992.MATHGoogle Scholar
  34. 34.
    S. Thrun. Learning to play the game of chess. In Advances in Neural Information Processing Systems 7, pages 1069–1076, 1995.Google Scholar
  35. 35.
    M.H.M. Winands. Analysis and implementation of Lines of Action. Master’s thesis, Department of Computer Science, Universiteit Maastricht, 2000.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2001

Authors and Affiliations

  • Levente Kocsis
    • 1
  • Jos Uiterwijk
    • 1
  • Jaap van den Herik
    • 1
  1. 1.Department of Computer Science, Institute for Knowledge and Agent TechnologyUniversiteit MaastrichtMaastrichtThe Netherlands

Personalised recommendations