Reinforcement Learning in Fine Time Discretization

Wawrzyński, Paweł

doi:10.1007/978-3-540-71618-1_52

Paweł Wawrzyński¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4431))

Included in the following conference series:

International Conference on Adaptive and Natural Computing Algorithms

2186 Accesses
1 Citations

Abstract

Reinforcement Learning (RL) is analyzed here as a tool for control system optimization. State and action spaces are assumed to be continuous. Time is assumed to be discrete, yet the discretization may be arbitrarily fine. It is shown here that stationary policies, applied by most RL methods, are improper in control applications, since for fine time discretization they can not assure bounded variance of policy gradient estimators. As a remedy to that difficulty, we propose the use of piecewise non-Markov policies. Policies of this type can be optimized by means of most RL algorithms, namely those based on likelihood ratio.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Barto, A.G., Sutton, R.S., Anderson, C.W.: Neuronlike Adaptive Elements That Can Learn Difficult Learning Control Problems. IEEE Transactions on System Man, and Cybernetics 13, 834–846 (1983)
Google Scholar
Baxter, J., Bartlett, P.L.: Infinite-Horizon Policy-Gradient Estimation. Journal of Artificial Intelligence Research 15, 319–350 (2001)
Article MATH MathSciNet Google Scholar
Baxter, J., Bartlett, P.L., Weaver, L.: Experiments with Infinite-Horizon, Policy-Gradient Estimation. Journal of Artificial Intelligence Research 15, 351–381 (2001)
MATH MathSciNet Google Scholar
Kimura, H., Kobayashi, S.: An Analysis of Actor/Critic Algorithm Using Eligibility Traces: reinforcement learning with imperfect value functions. In: Proceedings of the ICML-98 (1998)
Google Scholar
Konda, V.R., Tsitsiklis, J.N.: Actor-Critic Algorithms. SIAM Journal on Control and Optimization 42(4), 1143–1166 (2003)
Article MATH MathSciNet Google Scholar
Munos, R.: Policy Gradient in Continuous Time. Journal of Machine Learning Research 7, 771–791 (2006)
MathSciNet Google Scholar
Peters, J., Vijayakumar, S., Schaal, S.: Reinforcement learning for humanoid robotics. In: Humanoids2003, 3rd IEEE-RAS International Conference on Humanoid Robots, Karlsruhe, Germany, Sep. 29-30 (2003)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Google Scholar
Watkins, C., Dayan, P.: Q-Learning. Machine Learning 8, 279–292 (1992)
MATH Google Scholar
Williams, R.: Simple Statistical Gradient Following Algorithms for Connectionist Reinforcement Learning. Machine Learning 8, 229–256 (1992)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Control and Computation Engineering, Warsaw University of Technology, Nowowiejska 15/19, 00-665 Warsaw, Poland
Paweł Wawrzyński

Authors

Paweł Wawrzyński
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Bartlomiej Beliczynski Andrzej Dzielinski Marcin Iwanowski Bernardete Ribeiro

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wawrzyński, P. (2007). Reinforcement Learning in Fine Time Discretization. In: Beliczynski, B., Dzielinski, A., Iwanowski, M., Ribeiro, B. (eds) Adaptive and Natural Computing Algorithms. ICANNGA 2007. Lecture Notes in Computer Science, vol 4431. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71618-1_52

Download citation

DOI: https://doi.org/10.1007/978-3-540-71618-1_52
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71589-4
Online ISBN: 978-3-540-71618-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics