Rollout sampling approximate policy iteration

Dimitrakakis, Christos; Lagoudakis, Michail G.

doi:10.1007/s10994-008-5069-3

Rollout sampling approximate policy iteration

Open access
Published: 10 July 2008

Volume 72, pages 157–171, (2008)
Cite this article

Download PDF

You have full access to this open access article

Machine Learning Aims and scope Submit manuscript

Rollout sampling approximate policy iteration

Download PDF

Christos Dimitrakakis¹ &
Michail G. Lagoudakis²

1230 Accesses
24 Citations
Explore all metrics

Abstract

Several researchers have recently investigated the connection between reinforcement learning and classification. We are motivated by proposals of approximate policy iteration schemes without value functions, which focus on policy representation using classifiers and address policy learning as a supervised learning problem. This paper proposes variants of an improved policy iteration scheme which addresses the core sampling problem in evaluating a policy through simulation as a multi-armed bandit machine. The resulting algorithm offers comparable performance to the previous algorithm achieved, however, with significantly less computational effort. An order of magnitude improvement is demonstrated experimentally in two standard reinforcement learning domains: inverted pendulum and mountain-car.

Article PDF

Reinforcement Learning

Importance sampling in reinforcement learning with an estimated behavior policy

Article Open access 07 May 2021

Model-Based Reinforcement Learning from PILCO to PETS

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Antos, A., Szepesvári, C., & Munos, R. (2008). Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path. Machine Learning, 71(1), 89–129. 10.1007/s10994-007-5038-2.
Article Google Scholar
Auer, P., Cesa-Bianchi, N., & Fischer, P. (2002). Finite-time analysis of the multiarmed bandit problem. Machine Learning Journal 47(2–3), 235–256.
Article MATH Google Scholar
Dimitrakakis, C., & Lagoudakis, M. (2008). Algorithms and bounds for sampling-based approximate policy iteration. (To be presented at the 8th European Workshop on Reinforcement Learning).
Even-Dar, E., Mannor, S., & Mansour, Y. (2006). Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems. Journal of Machine Learning Research, 7, 1079–1105. ISSN 1533-7928.
MathSciNet Google Scholar
Fern, A., Yoon, S., & Givan, R. (2004). Approximate policy iteration with a policy language bias. Advances in Neural Information Processing Systems, 16(3).
Fern, A., Yoon, S., & Givan, R. (2006). Approximate policy iteration with a policy language bias: Solving relational Markov decision processes. Journal of Artificial Intelligence Research, 25, 75–118.
MathSciNet Google Scholar
Howard, R. A. (1960). Dynamic programming and Markov processes. Cambridge: MIT Press.
MATH Google Scholar
Kocsis, L., & Szepesvári, C. (2006). Bandit based Monte-Carlo planning. In Proceedings of the European conference on machine learning.
Lagoudakis, M. G. (2003). Efficient approximate policy iteration methods for sequential decision making in reinforcement learning. PhD thesis, Department of Computer Science, Duke University.
Lagoudakis, M. G., & Parr, R. (2003a). Least-squares policy iteration. Journal of Machine Learning Research, 4(6), 1107–1149.
Article MathSciNet Google Scholar
Lagoudakis, M. G. & Parr, R. (2003b). Reinforcement learning as classification: Leveraging modern classifiers. In Proceedings of the 20th international conference on machine learning (ICML) (pp. 424–431). Washington, DC, USA.
Langford, J., & Zadrozny, B. (2005). Relating reinforcement learning performance to classification performance. In Proceedings of the 22nd international conference on machine learning (ICML) (pp. 473–480). Bonn, Germany, 2005. ISBN 1-59593-180-5. doi:10.1145/1102351.1102411.
Rexakis, I., & Lagoudakis, M. (2008). Classifier-based policy representation. (To be presented at the 8th European Workshop on Reinforcement Learning).
Riedmiller, M. (2005). Neural fitted Q iteration-first experiences with a data efficient neural reinforcement learning method. In 16th European conference on machine learning (pp. 317–328).
Sutton, R., & Barto, A. (1998). Reinforcement learning: an introduction. Cambridge: MIT Press.
Google Scholar
Wang, H. O., Tanaka, K., & Griffin, M. F. (1996). An approach to fuzzy control of nonlinear systems: Stability and design issues. IEEE Transactions on Fuzzy Systems, 4(1), 14–23.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Informatics Institute, University of Amsterdam, Kruislaan 403, 1098SJ, Amsterdam, The Netherlands
Christos Dimitrakakis
Department of Electronic and Computer Engineering, Technical University of Crete, Chania, 73100, Crete, Greece
Michail G. Lagoudakis

Authors

Christos Dimitrakakis
View author publications
You can also search for this author in PubMed Google Scholar
Michail G. Lagoudakis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Christos Dimitrakakis.

Additional information

Editors: Walter Daelemans, Bart Goethals, Katharina Morik.

Rights and permissions

Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Reprints and permissions

About this article

Cite this article

Dimitrakakis, C., Lagoudakis, M.G. Rollout sampling approximate policy iteration. Mach Learn 72, 157–171 (2008). https://doi.org/10.1007/s10994-008-5069-3

Download citation

Received: 22 June 2008
Revised: 22 June 2008
Accepted: 23 June 2008
Published: 10 July 2008
Issue Date: September 2008
DOI: https://doi.org/10.1007/s10994-008-5069-3

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Rollout sampling approximate policy iteration

Abstract

Article PDF

Similar content being viewed by others

Reinforcement Learning

Importance sampling in reinforcement learning with an estimated behavior policy

Model-Based Reinforcement Learning from PILCO to PETS

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Rollout sampling approximate policy iteration

Abstract

Article PDF

Similar content being viewed by others

Reinforcement Learning

Importance sampling in reinforcement learning with an estimated behavior policy

Model-Based Reinforcement Learning from PILCO to PETS

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation