Skip to main content

Learning Time Allocation Using Neural Networks

Part of the Lecture Notes in Computer Science book series (LNCS,volume 2063)

Abstract

The strength of a game-playing program is mainly based on the adequacy of the evaluation function and the efficacy of the search algorithm. This paper investigates how temporal difference learning and genetic algorithms can be used to improve various decisions made during game-tree search. The existent TD algorithms are not directly suitable for learning search decisions. Therefore we propose a modified update rule that uses the TD error of the evaluation function to shorten the lag between two rewards. The genetic algorithms can be applied directly to learn search decisions. For our experiments we selected the problem of time allocation from the set of search decisions. On each move the player can decide on a certain search depth, being constrained by the amount of time left. As testing ground, we used the game of Lines of Action, which has roughly the same complexity as Othello. From the results we conclude that both the TD and the genetic approach lead to good results when compared to the existent time-allocation techniques. Finally, a brief discussion of the issues that can emerge when the algorithms are applied to more complex search decisions is given.

keywords

  • temporal difference learning
  • genetic algorithms
  • search decisions
  • time allocation
  • lines of Action

Acknowledgements

The authors thank the referees for their constructive comments and suggestions for improvements.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/3-540-45579-5_11
  • Chapter length: 16 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   99.00
Price excludes VAT (USA)
  • ISBN: 978-3-540-45579-0
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   129.00
Price excludes VAT (USA)

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. T.S. Anantharaman. Evaluation tuning for computer chess: Linear discriminant methods. ICCA Journal, 20(4):224–242, 1997.

    Google Scholar 

  2. E.B. Baum and W.D. Smith. A bayesian approach to relevance in game playing. Artificial Intelligence, 97(1-2):195–242, 1997.

    MATH  CrossRef  MathSciNet  Google Scholar 

  3. J. Baxter, A. Tridgell, and L. Weaver. Experiments in parameter learning using temporal differences. ICCA Journal, 21(2):84–99, 1998.

    Google Scholar 

  4. D.F. Beal and M.C. Smith. Learning piece values using temporal difference learning. ICCA Journal, 20(3):147–151, 1997.

    Google Scholar 

  5. D.F. Beal and M.C. Smith. Temporal difference learning for heuristic search and game playing. Information Sciences, 122(1):3–21, 2000.

    CrossRef  Google Scholar 

  6. Y. Björnsson and T. Marsland. Learning search control in adversary games. In H.J. van den Herik and B. Monien, editors, Proceedings of the Advances in Computer Games 9 Conference, 2000.

    Google Scholar 

  7. M. Buro. Experiments with Multi-ProbCut and a new high-quality evaluation function for Othello. In H. J. van den Herik and H. Iida, editors, Games in AI Research. 1999.

    Google Scholar 

  8. D. Carmel and S. Markovitch. Incorporating opponent models into adversary search. In Proceedings of the Thirteenth National Conference on Artificial Intelligence (AAAI-96), pages 120–125, 1996.

    Google Scholar 

  9. K. Chellapilla and D.B. Fogel. Co-evolving checkers playing programs using only win, lose, or draw. In Proceedings of SPIE’s AeroSense’99: Applications and Science of Computational Intelligence II, 1999.

    Google Scholar 

  10. D.E. Goldberg. Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley, Reading, MA, 1989.

    MATH  Google Scholar 

  11. D. Harada and S. Russell. Extended abstract: Learning search strategies. In AAAI Spring Symposium on Search Techniques for Problem Solving under Uncertainty and Incomplete Information, 1999.

    Google Scholar 

  12. E.A. Heinz. Adaptive null-move pruning. ICCA Journal, 22(3):123–132, 1999.

    Google Scholar 

  13. R.M. Hyatt. Using time wisely. ICCA Journal, 7(1):4–9, 1984.

    Google Scholar 

  14. H. Iida, J.W.H.M. Uiterwijk, H.J. van den Herik, and I.S. Herschberg. Potential applications of opponent-model search. Part 1: The domain of applicability. ICCA Journal, 16(4):201–208, 1993.

    Google Scholar 

  15. T. Jaakkola, S. Singh, and M. Jordan. Reinforcement learning algorithm for partially observable markov problems. In Advances in Neural Information Processing Systems 7, pages 345–352, 1994.

    Google Scholar 

  16. L.P. Kaelbling, M.L. Littman, and A.R. Cassandra. Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101(1-2):99–134, 1998.

    MATH  CrossRef  MathSciNet  Google Scholar 

  17. H. Kimura and S. Kobayashi. An analysis of actor/critic algorithms using eligibility traces: Reinforcement learning with imperfect value function. In Proceedings of the 15th International Conference on Machine Learning, pages 278–286, 1998.

    Google Scholar 

  18. V.R. Konda and V.S. Borkar. Actor-critic type learning algorithms for markov decision processes. SIAM Journal of Control and Optimisation, 38(1):94–133, 1999.

    MATH  CrossRef  MathSciNet  Google Scholar 

  19. V.R. Konda and J.N. Tsitsiklis. Actor-critic algorithms. In Advances in Neural Information Processing Systems 12, 2000.

    Google Scholar 

  20. B.C. Kuszmaul. The StarTech massively parallel chess program. ICCA Journal, 18(1):3–19, 1995.

    Google Scholar 

  21. C. Leiserson. Using the Cilk multithreaded programming language to implement a multiprocessor chess program. In H.J. van den Herik and B. Monien, editors, Proceedings of the Advances in Computer Games 9 Conference, 2000.

    Google Scholar 

  22. S. Markovitch and Y. Sella. Learning of resource allocation strategies for game playing. Computational Intelligence, 12(1):88–105, 1996.

    CrossRef  Google Scholar 

  23. T.M. Mitchell, R.M. Keller, and S. Kedar-Cabelli. Explanation-based generalization: A unifying view. Machine Learning, 1(1):47–80, 1986.

    Google Scholar 

  24. D.E. Moriarty and R. Miikkulainen. Hierarchical evolution of neural networks. In Proceedings of the 1998 IEEE Conference on Evolutionary Computation, pages 428–433, 1998.

    Google Scholar 

  25. D.E. Moriarty, A.C. Schultz, and J.J. Grefenstette. Evolutionary algorithms for reinforcement learning. Journal of Artificial Intelligence Research, 11:241–276, 1999.

    MATH  Google Scholar 

  26. N. Richards, D. Moriarty, and R. Miikkulainen. Evolving neural networks to play Go. Applied Intelligence, 8:85–96, 1998.

    CrossRef  Google Scholar 

  27. S. Russell and E.H. Wefald. Do the Right Thing: Studies in Limited Rationality. MIT Press, 1991.

    Google Scholar 

  28. A.L. Samuel. Some studies in machine learning using the game of checkers. IBM Journal of Research and Development, 3(3):211–229, 1959.

    MathSciNet  CrossRef  Google Scholar 

  29. J. Schaeffer and A. Plaat. Kasparov versusDEEPBLUE: The rematch. ICCA Journal, 20(2):95–101, 1997.

    Google Scholar 

  30. Y. Seirawan. The Kasparov-DEEP BLUE games. ICCA Journal, 20(2):102–125, 1997.

    Google Scholar 

  31. R.S. Sutton and A.G. Barto. Reinforcement Learning: An Introduction. MIT Press, 1998.

    Google Scholar 

  32. R.S. Sutton, D. McAllester, S. Singh, and Y. Mansour. Policy gradient methods for reinforcement learning with function approximation. In Advances in Neural Information Processing Systems 12, pages 1057–1063, 2000.

    Google Scholar 

  33. G.J. Tesauro. Practical issues in temporal difference learning. Machine Learning, 8:257–277, 1992.

    MATH  Google Scholar 

  34. S. Thrun. Learning to play the game of chess. In Advances in Neural Information Processing Systems 7, pages 1069–1076, 1995.

    Google Scholar 

  35. M.H.M. Winands. Analysis and implementation of Lines of Action. Master’s thesis, Department of Computer Science, Universiteit Maastricht, 2000.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kocsis, L., Uiterwijk, J., van den Herik, J. (2001). Learning Time Allocation Using Neural Networks. In: Marsland, T., Frank, I. (eds) Computers and Games. CG 2000. Lecture Notes in Computer Science, vol 2063. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45579-5_11

Download citation

  • DOI: https://doi.org/10.1007/3-540-45579-5_11

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-43080-3

  • Online ISBN: 978-3-540-45579-0

  • eBook Packages: Springer Book Archive