Advertisement

Adding Expert Knowledge and Exploration in Monte-Carlo Tree Search

  • Guillaume Chaslot
  • Christophe Fiter
  • Jean-Baptiste Hoock
  • Arpad Rimmel
  • Olivier Teytaud
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6048)

Abstract

We present a new exploration term, more efficient than classical UCT-like exploration terms. It combines efficiently expert rules, patterns extracted from datasets, All-Moves-As-First values, and classical online values. As this improved bandit formula does not solve several important situations (semeais, nakade) in computer Go, we present three other important improvements which are central in the recent progress of our program MoGo.
  • We show an expert-based improvement of Monte-Carlo simulations for nakade situations; we also emphasize some limitations of this modification.

  • We show a technique which preserves diversity in the Monte-Carlo simulation, which greatly improves the results in 19x19.

  • Whereas the UCB-based exploration term is not efficient in MoGo, we show a new exploration term which is highly efficient in MoGo.

MoGo recently won a game with handicap 7 against a 9Dan Pro player, Zhou JunXun, winner of the LG Cup 2007, and a game with handicap 6 against a 1Dan pro player, Li-Chen Chien.

Keywords

Online Learning Capture Move Legal Move Approach Move Empty Location 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Chaslot, G.M.J.B., Winands, M.H.M., Uiterwijk, J.W.H.M., van den Herik, H.J., Bouzy, B.: Progressive strategies for monte-carlo tree search. In: Wang, P., et al. (eds.) Proceedings of the 10th Joint Conference on Information Sciences (JCIS 2007), pp. 655–661. World Scientific Publishing Co. Pte. Ltd., Singapore (2007)CrossRefGoogle Scholar
  2. 2.
    Coulom, R.: Efficient selectivity and backup operators in monte-carlo tree search. In: Ciancarini, P., van den Herik, H.J. (eds.) CG 2006. LNCS, vol. 4630, pp. 72–83. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  3. 3.
    Kocsis, L., Szepesvari, C.: Bandit-based monte-carlo planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 282–293. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  4. 4.
    Gelly, S., Silver, D.: Combining online and offline knowledge in uct. In: ICML 2007: Proceedings of the 24th international conference on Machine learning, New York, NY, USA, pp. 273–280. ACM Press, New York (2007)CrossRefGoogle Scholar
  5. 5.
    Brügmann, B.: Monte-Carlo Go (Unpublished) (1993)Google Scholar
  6. 6.
    Bouzy, B., Helmstetter, B.: Monte-Carlo Go developments. In: van den Herik, H.J., Iida, H., Heinz, E.A. (eds.) 10th Advances in Computer Games, pp. 159–174 (2003)Google Scholar
  7. 7.
    Coquelin, P.A., Munos, R.: Bandit algorithms for tree search. In: Proceedings of UAI 2007 (2007)Google Scholar
  8. 8.
    Gelly, S., Hoock, J.B., Rimmel, A., Teytaud, O., Kalemkarian, Y.: The parallelization of monte-carlo planning. In: Proceedings of the International Conference on Informatics in Control, Automation and Robotics (ICINCO 2008), pp. 198–203 (2008) (to appear)Google Scholar
  9. 9.
    Bouzy, B., Chaslot, G.M.J.B.: Bayesian generation and integration of k-nearest-neighbor patterns for 19x19 go. In: Kendall, G., Lucas, S. (eds.) IEEE 2005 Symposium on Computational Intelligence in Games, Colchester, UK, pp. 176–181 (2005)Google Scholar
  10. 10.
    Coulom, R.: Computing elo ratings of move patterns in the game of go. In: Computer Games Workshop, Amsterdam, The Netherlands (2007)Google Scholar
  11. 11.
    Bouzy, B., Chaslot, G.M.J.B.: Monte-Carlo Go Reinforcement Learning Experiments. In: Kendall, G., Louis, S. (eds.) IEEE 2006 Symposium on Computational Intelligence in Games, Reno, USA, pp. 187–194 (2006)Google Scholar
  12. 12.
    Wang, Y., Gelly, S.: Modifications of UCT and sequence-like simulations for Monte-Carlo Go. In: IEEE Symposium on Computational Intelligence and Games, Honolulu, Hawaii, pp. 175–182 (2007)Google Scholar
  13. 13.
    Bouzy, B.: Associating domain-dependent knowledge and Monte-Carlo approaches within a go program. In: Chen, K. (ed.) Information Sciences, Heuristic Search and Computer Game Playing IV, vol. 175, pp. 247–257 (2005)Google Scholar
  14. 14.
    Ralaivola, L., Wu, L., Baldi, P.: SVM and pattern-enriched common fate graphs for the game of Go. In: Proceedings of ESANN 2005, pp. 485–490 (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Guillaume Chaslot
    • 1
  • Christophe Fiter
    • 2
  • Jean-Baptiste Hoock
    • 2
  • Arpad Rimmel
    • 2
  • Olivier Teytaud
    • 2
  1. 1.Games and AI Group, MICC, Faculty of Humanities and SciencesUniversiteit MaastrichtMaastrichtThe Netherlands
  2. 2.TAO (Inria), LRI, UMR 8623 (CNRS - Univ. Paris-Sud)OrsayFrance

Personalised recommendations