Rollout sampling approximate policy iteration
- 645 Downloads
Several researchers have recently investigated the connection between reinforcement learning and classification. We are motivated by proposals of approximate policy iteration schemes without value functions, which focus on policy representation using classifiers and address policy learning as a supervised learning problem. This paper proposes variants of an improved policy iteration scheme which addresses the core sampling problem in evaluating a policy through simulation as a multi-armed bandit machine. The resulting algorithm offers comparable performance to the previous algorithm achieved, however, with significantly less computational effort. An order of magnitude improvement is demonstrated experimentally in two standard reinforcement learning domains: inverted pendulum and mountain-car.
KeywordsReinforcement learning Approximate policy iteration Rollouts Bandit problems Classification Sample complexity
- Dimitrakakis, C., & Lagoudakis, M. (2008). Algorithms and bounds for sampling-based approximate policy iteration. (To be presented at the 8th European Workshop on Reinforcement Learning). Google Scholar
- Fern, A., Yoon, S., & Givan, R. (2004). Approximate policy iteration with a policy language bias. Advances in Neural Information Processing Systems, 16(3). Google Scholar
- Kocsis, L., & Szepesvári, C. (2006). Bandit based Monte-Carlo planning. In Proceedings of the European conference on machine learning. Google Scholar
- Lagoudakis, M. G. (2003). Efficient approximate policy iteration methods for sequential decision making in reinforcement learning. PhD thesis, Department of Computer Science, Duke University. Google Scholar
- Lagoudakis, M. G. & Parr, R. (2003b). Reinforcement learning as classification: Leveraging modern classifiers. In Proceedings of the 20th international conference on machine learning (ICML) (pp. 424–431). Washington, DC, USA. Google Scholar
- Langford, J., & Zadrozny, B. (2005). Relating reinforcement learning performance to classification performance. In Proceedings of the 22nd international conference on machine learning (ICML) (pp. 473–480). Bonn, Germany, 2005. ISBN 1-59593-180-5. doi: 10.1145/1102351.1102411.
- Rexakis, I., & Lagoudakis, M. (2008). Classifier-based policy representation. (To be presented at the 8th European Workshop on Reinforcement Learning). Google Scholar
- Riedmiller, M. (2005). Neural fitted Q iteration-first experiences with a data efficient neural reinforcement learning method. In 16th European conference on machine learning (pp. 317–328). Google Scholar
- Sutton, R., & Barto, A. (1998). Reinforcement learning: an introduction. Cambridge: MIT Press. Google Scholar
Open AccessThis is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.