Bandit Based Monte-Carlo Planning

  • Levente Kocsis
  • Csaba Szepesvári
Conference paper

DOI: 10.1007/11871842_29

Part of the Lecture Notes in Computer Science book series (LNCS, volume 4212)
Cite this paper as:
Kocsis L., Szepesvári C. (2006) Bandit Based Monte-Carlo Planning. In: Fürnkranz J., Scheffer T., Spiliopoulou M. (eds) Machine Learning: ECML 2006. ECML 2006. Lecture Notes in Computer Science, vol 4212. Springer, Berlin, Heidelberg

Abstract

For large state-space Markovian Decision Problems Monte-Carlo planning is one of the few viable approaches to find near-optimal solutions. In this paper we introduce a new algorithm, UCT, that applies bandit ideas to guide Monte-Carlo planning. In finite-horizon or discounted MDPs the algorithm is shown to be consistent and finite sample bounds are derived on the estimation error due to sampling. Experimental results show that in several domains, UCT is significantly more efficient than its alternatives.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Levente Kocsis
    • 1
  • Csaba Szepesvári
    • 1
  1. 1.Computer and Automation Research Institute of the Hungarian Academy of SciencesBudapestHungary

Personalised recommendations