Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search

  • Rémi Coulom
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4630)

Abstract

A Monte-Carlo evaluation consists in estimating a position by averaging the outcome of several random continuations. The method can serve as an evaluation function at the leaves of a min-max tree. This paper presents a new framework to combine tree search with Monte-Carlo evaluation, that does not separate between a min-max phase and a Monte-Carlo phase. Instead of backing-up the min-max value close to the root, and the average value at some depth, a more general backup operator is defined that progressively changes from averaging to min-max as the number of simulations grows. This approach provides a fine-grained control of the tree growth, at the level of individual simulations, and allows efficient selectivity. The resulting algorithm was implemented in a 9×9 Go-playing program, Crazy Stone, that won the 10th KGS computer-Go tournament.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Abramson, B.: Expected-Outcome: A General Model of Static Evaluation. IEEE Transactions on Pattern Analysis and Machine Intelligence 12(2), 182–193 (1990)CrossRefGoogle Scholar
  2. 2.
    Allis, L.V.: Searching for Solutions in Games and Artificial Intelligence. PhD thesis, Universiteit Maastricht, Maastricht, The Netherlands (1994)Google Scholar
  3. 3.
    Alrefaei, M.H., Andradóttir, S.: A Simulated Annealing Algorithm with Constant Temperature for Discrete Stochastic Optimization. Management Science 45(5), 748–764 (1999)CrossRefGoogle Scholar
  4. 4.
    Baum, E.B., Smith, W.D.: A Bayesian Approach to Relevance in Game Playing. Artificial Intelligence 97(1–2), 195–242 (1997)MATHCrossRefMathSciNetGoogle Scholar
  5. 5.
    Billings, D., Papp, D., Peña, L., Schaeffer, J., Szafron, D.: Using Selective-Sampling Simulations in Poker. In: Proceedings of the AAAI Spring Symposium on Search Techniques for Problem Solving under Uncertainty and Incomplete Information (1999)Google Scholar
  6. 6.
    Bouzy, B.: Associating Shallow and Selective Global Tree Search with Monte Carlo for 9×9 Go. In: van den Herik, H.J., Björnsson, Y., Netanyahu, N.S. (eds.) CG 2004. LNCS, vol. 3846, pp. 67–80. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  7. 7.
    Bouzy, B.: Move Pruning Techniques for Monte-Carlo Go. In: van den Herik, H.J., Hsu, S.-C., Hsu, T.-s., Donkers, H.H.L.M. (eds.) CG 2005. LNCS, vol. 4250, pp. 104–119. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  8. 8.
    Bouzy, B., Cazenave, T.: Computer Go: an AI-oriented Survey. Artificial Intelligence 132, 39–103 (2001)MATHCrossRefMathSciNetGoogle Scholar
  9. 9.
    Bouzy, B., Helmstetter, B.: Monte Carlo Go Developments. In: van den Herik, H.J., Iida, H., Heinz, E.A. (eds.) 10th Advances in Computer Games (ACG10), Many Games, Many Challenges, pp. 159–174. Kluwer Academic Publishers, Boston (2004)Google Scholar
  10. 10.
    Brügmann, B.: Monte Carlo Go, Unpublished technical report (1993)Google Scholar
  11. 11.
    Cazenave, T., Helmstetter, B.: Combining Tactical Search and Monte-Carlo in the Game of Go. In: Kendall, G., Lucas, S. (eds.) Proceedings of the IEEE Symposium on Computational Intelligence and Games, pp. 117–124. IEEE Computer Society Press, Los Alamitos (2005)Google Scholar
  12. 12.
    Chang, H.S., Fu, M.C., Hu, J., Marcus, S.I.: An Adaptive Sampling Algorithm for Solving Markov Decision Processes. Operations Research 53(1), 126–139 (2005)CrossRefMathSciNetGoogle Scholar
  13. 13.
    Chen, C.-H., Lin, J., Yücesan, E., Chick, S.E.: Simulation Budget Allocation for Further Enhancing the Efficiency of Ordinal Optimization. Journal of Discrete Event Dynamic Systems: Theory and Applications 10(3), 251–270 (2000)MATHCrossRefGoogle Scholar
  14. 14.
    Chung, M., Buro, M., Schaeffer, J.: Monte-Carlo Planning in RTS Games. In: Kendall, G., Lucas, S. (eds.) Proceedings of the IEEE Symposium on Computational Intelligence and Games, pp. 117–124. IEEE Computer Society Press, Los Alamitos (2005)Google Scholar
  15. 15.
    Enzenberger, M.: Evaluation in Go by a Neural Network Using Soft Segmentation. In: van den Herik, H.J., Iida, H., Heinz, E.A. (eds.) 10th Advances in Computer Games (ACG10), Many Games, Many Challenges, pp. 97–108. Kluwer Academic Publishers, Boston (2004)Google Scholar
  16. 16.
    Futschik, A., Pflug, G.Ch.: Optimal Allocation of Simulation Experiments in Discrete Stochastic Optimization and Approximative Algorithms. European Journal of Operational Research 101, 245–260 (1997)MATHCrossRefGoogle Scholar
  17. 17.
    Ginsberg, M.L.: GIB: Steps Toward an Expert-Level Bridge-Playing Program. In: Dean, Th. (ed.) Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence, pp. 584–593. Morgan Kaufmann, Los Altos, CA (1999)Google Scholar
  18. 18.
    Juillé, H.: Methods for Statistical Inference: Extending the Evolutionary Computation Paradigm. PhD thesis, Brandeis University, Department of Computer Science (May 1999)Google Scholar
  19. 19.
    Kearns, M., Mansour, Y., Ng, A.Y.: A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes. In: Dean, T. (ed.) Proceedings of the Sixteenth Internation Joint Conference on Artificial Intelligence, pp. 1324–1331. Morgan Kaufmann, Los Alamitos, CA (1999)Google Scholar
  20. 20.
    Knuth, D.E., Moore, R.W.: An Analysis of Alpha-Beta Pruning. Artificial Intelligence 6, 293–326 (1975)MATHCrossRefMathSciNetGoogle Scholar
  21. 21.
    Palay, A.J.: Searching with Probabilities. Pitman, Marshfield, MA (1984)Google Scholar
  22. 22.
    Péret, L., Garcia, F.: On-line Search for Solving Large Markov Decision Processes. In: De Mantaras, R.L., Saitta, L. (eds.) Proceedings of the 16th European Conference on Artificial Intelligence (2004)Google Scholar
  23. 23.
    Sheppard, B.: Efficient Control of Selective Simulations. ICGA Journal 27(2), 67–79 (2004)Google Scholar
  24. 24.
    Sutton, R.S.: Learning to Predict by the Methods of Temporal Differences. Machine Learning 3, 9–44 (1988)Google Scholar
  25. 25.
    Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA (1998)Google Scholar
  26. 26.
    Tesauro, G.: Programming Backgammon Using Self-Teaching Neural Nets. Artificial Intelligence 134, 181–199 (2002)MATHCrossRefGoogle Scholar
  27. 27.
    Tromp, J., Farnebäck, G.: Combinatorics of Go. In: van den Herik, H.J., Ciancarini, P., Donkers, H.L.L.M. (eds.) CG 2006. 5th Computers and Games Conference. LNCS, vol. 4630, pp. 85–100. Springer, Heidelberg (2007)Google Scholar
  28. 28.
    Wedd, N.: Computer Go Tournaments on KGS (2005), http://www.weddslist.com/kgs/

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Rémi Coulom
    • 1
  1. 1.CNRS-LIFL, INRIA-SequeL, Université Charles de Gaulle, LilleFrance

Personalised recommendations