Advertisement

Accelerated UCT and Its Application to Two-Player Games

  • Junichi Hashimoto
  • Akihiro Kishimoto
  • Kazuki Yoshizoe
  • Kokolo Ikeda
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7168)

Abstract

Monte-Carlo Tree Search (MCTS) is a successful approach for improving the performance of game-playing programs. This paper presents the Accelerated UCT algorithm, which overcomes a weakness of MCTS caused by deceptive structures which often appear in game tree search. It consists in using a new backup operator that assigns higher weights to recently visited actions, and lower weights to actions that have not been visited for a long time. Results in Othello, Havannah, and Go show that Accelerated UCT is not only more effective than previous approaches but also improves the strength of Fuego, which is one of the best computer Go programs.

Keywords

Tree Search Bias Term Winning Percentage Multiarmed Bandit Problem Promising Move 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Machine Learning 47(2-3), 235–256 (2002)zbMATHCrossRefGoogle Scholar
  2. 2.
    Bouzy, B., Helmstetter, B.: Monte Carlo Go developments. In: Proc. of the 10th International Conference on Advances in Computer Games (ACG 2010). IFIP, vol. 263, pp. 159–174. Kluwer Academic (2003)Google Scholar
  3. 3.
    Brügmann, B.: Monte Carlo Go (1993), http://www.ideanest.com/vegos/MonteCarloGo.pdf
  4. 4.
    Coquelin, P.-A., Munos, R.: Bandit algorithms for tree search. In: Proc. of the 23rd Conference on Uncertainty in Artificial Intelligence (UAI 2007), pp. 67–74. AUAI press (2007)Google Scholar
  5. 5.
    Coulom, R.: Computing Elo ratings of move patterns in the game of Go. ICGA Journal 30(4), 198–208 (2007)Google Scholar
  6. 6.
    Coulom, R.: Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search. In: van den Herik, H.J., Ciancarini, P., Donkers, H.H.L.M(J.) (eds.) CG 2006. LNCS, vol. 4630, pp. 72–83. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  7. 7.
    Enzenberger, M., Müller, M., Arneson, B., Segal, R.: Fuego - an open-source framework for board games and Go engine based on Monte-Carlo tree search. IEEE Transactions on Computational Intelligence and AI in Games 2(4), 259–270 (2010)CrossRefGoogle Scholar
  8. 8.
    Gelly, S.: Discounted UCB. Posted to Computer Go Mailing List (2007), http://www.mail-archive.com/computer-go@computer-go.org/msg02124.htmlGoogle Scholar
  9. 9.
    Gelly, S., Silver, D.: Combining online and offline knowledge in UCT. In: Proc. of the 24th International Conference on Machine Learning (ICML 2007). ACM International Conference Proceeding Series, vol. 227, pp. 273–280 (2007)Google Scholar
  10. 10.
    Gelly, S., Wang, Y., Munos, R., Teytaud, O.: Modification of UCT with patterns in Monte-Carlo Go. Technical Report RR-6062, INRIA (2006)Google Scholar
  11. 11.
    Huang, S.-C., Coulom, R., Lin, S.-S.: Monte-Carlo Simulation Balancing in Practice. In: van den Herik, H.J., Iida, H., Plaat, A. (eds.) CG 2010. LNCS, vol. 6515, pp. 81–92. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  12. 12.
    Kloetzer, J., Iida, H., Bouzy, B.: A comparative study of solvers in Amazons endgames. In: Proc. of the IEEE Symposium on Computational Intelligence and Games (CIG 2008), pp. 378–384. IEEE Press (2008)Google Scholar
  13. 13.
    Kocsis, L., Szepesvári, C.: Bandit Based Monte-Carlo Planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 282–293. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  14. 14.
    Kocsis, L., Szepesvári, C.: Discounted UCB. Video Lecture. In: The Lectures of PASCAL Second Challenges Workshop (2006), Slides are available at http://www.lri.fr/~sebag/Slides/Venice/Kocsis.pdf. Video is available at http://videolectures.net/pcw06_venice/
  15. 15.
    Lorentz, R.J.: Amazons Discover Monte-Carlo. In: van den Herik, H.J., Xu, X., Ma, Z., Winands, M.H.M. (eds.) CG 2008. LNCS, vol. 5131, pp. 13–24. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  16. 16.
    Ramanujan, R., Selman, B.: Trade-offs in sampling-based adversarial planning. In: Proc. of 21st International Conference on Automated Planning and Scheduling (ICAPS 2011), pp. 202–209. AAAI (2011)Google Scholar
  17. 17.
    Silver, D., Tesauro, G.: Monte-Carlo simulation balancing. In: Proc. of the 26th International Conference on Machine Learning (ICML 2009). ACM International Conference Proceeding Series, vol. 382, pp. 945–952 (2009)Google Scholar
  18. 18.
    Sutton, R.S.: Learning to predict by the methods of temporal differences. Machine Learning 3(1), 9–44 (1988)Google Scholar
  19. 19.
    Teytaud, F., Teytaud, O.: Creating an Upper-Confidence-Tree Program for Havannah. In: van den Herik, H.J., Spronck, P. (eds.) ACG 2009. LNCS, vol. 6048, pp. 65–74. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  20. 20.
    Tom, D., Müller, M.: A Study of UCT and Its Enhancements in an Artificial Game. In: van den Herik, H.J., Spronck, P. (eds.) ACG 2009. LNCS, vol. 6048, pp. 55–64. Springer, Heidelberg (2010)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Junichi Hashimoto
    • 1
    • 2
  • Akihiro Kishimoto
    • 3
  • Kazuki Yoshizoe
    • 4
  • Kokolo Ikeda
    • 1
  1. 1.Japan Advanced Institute of Science and TechnologyJapan
  2. 2.Tilburg center for Cognition and CommunicationThe Netherlands
  3. 3.Tokyo Institute of Technology and Japan Science and Technology AgencyJapan
  4. 4.The University of TokyoJapan

Personalised recommendations