Computational Experiments with the RAVE Heuristic

  • David Tom
  • Martin Müller
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6515)

Abstract

The Monte-Carlo tree search algorithm Upper Confidence bounds applied to Trees (UCT) has become extremely popular in computer games research. The Rapid Action Value Estimation (RAVE) heuristic is a strong estimator that often improves the performance of UCT-based algorithms. However, there are situations where RAVE misleads the search whereas pure UCT search can find the correct solution. Two games, the simple abstract game Sum of Switches (SOS) and the game of Go, are used to study the behavior of the RAVE heuristic. In SOS, RAVE updates are manipulated to mimic game situations where RAVE misleads the search. Such false RAVE updates are used to create RAVE overestimates and underestimates. A study of the distributions of mean and RAVE values reveals great differences between Go and SOS. While the RAVE-max update rule is able to correct extreme cases of RAVE underestimation, it is not effective in closer to practical settings and in Go.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Kocsis, L., Szepesvári, C.: Bandit based monte-carlo planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 282–293. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  2. 2.
    Gelly, S., Wang, Y., Munos, R., Teytaud, O.: Modification of UCT with patterns in Monte-Carlo Go, Technical Report RR-6062, INRIA, France (2006)Google Scholar
  3. 3.
    Finnsson, H., Björnsson, Y.: Simulation-based approach to General Game Playing. In: Fox, D., Gomes, C. (eds.) AAAI, pp. 259–264. AAAI Press, Menlo Park (2008)Google Scholar
  4. 4.
    Arneson, B., Hayward, R., Henderson, P.: Wolve 2008 wins Hex Tournament. ICGA Journal 32(1), 49–53 (2009)CrossRefGoogle Scholar
  5. 5.
    Lorentz, R.J.: Amazons discover monte-carlo. In: van den Herik, H.J., Xu, X., Ma, Z., Winands, M.H.M. (eds.) CG 2008. LNCS, vol. 5131, pp. 13–24. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  6. 6.
    Winands, M., Björnsson, Y.: Evaluation function based Monte-Carlo LOA. In: [15], pp. 33–44Google Scholar
  7. 7.
    Brügmann, B.: Monte Carlo Go (March 1993) (unpublished manuscript), http://www.cgl.ucsf.edu/go/Programs/Gobble.html
  8. 8.
    Gelly, S., Silver, D.: Combining online and offline knowledge in UCT. In: Ghahramani, Z. (ed.) ICML. ACM International Conference Proceeding Series, vol. 227, pp. 273–280. ACM, New York (2007)CrossRefGoogle Scholar
  9. 9.
    Tom, D., Müller, M.: A study of UCT and its enhancements in an artificial game. In: [15], pp. 55–64Google Scholar
  10. 10.
    Teytaud, F., Teytaud, O.: Creating an Upper-Confidence-Tree program for Havannah. In: [15], pp. 65–74Google Scholar
  11. 11.
    Enzenberger, M., Müller, M.: Fuego (2008), http://fuego.sf.net/ (Retrieved December 22, 2008)
  12. 12.
    Silver, D.: Reinforcement Learning and Simulation-Based Search. PhD thesis, University of Alberta (2009)Google Scholar
  13. 13.
    Tom, D.: Investigating UCT and RAVE: Steps Towards a More Robust Method. Master’s thesis, University of Alberta, Department of Computing Science (2010)Google Scholar
  14. 14.
    Enzenberger, M., Müller, M., Arneson, B., Segal, R.: Fuego – an open-source framework for board games and Go engine based on Monte-Carlo tree search. Submitted to IEEE Transactions on Computational Intelligence and AI in Games (2010)Google Scholar
  15. 15.
    van den Herik, H.J., Spronck, P. (eds.): ACG 2009. LNCS, vol. 6048. Springer, Heidelberg (2010)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • David Tom
    • 1
  • Martin Müller
    • 1
  1. 1.Department of Computing ScienceUniversity of AlbertaEdmontonCanada

Personalised recommendations