Abstract
Reinforcement Learning (RL) is analyzed here as a tool for control system optimization. State and action spaces are assumed to be continuous. Time is assumed to be discrete, yet the discretization may be arbitrarily fine. It is shown here that stationary policies, applied by most RL methods, are improper in control applications, since for fine time discretization they can not assure bounded variance of policy gradient estimators. As a remedy to that difficulty, we propose the use of piecewise non-Markov policies. Policies of this type can be optimized by means of most RL algorithms, namely those based on likelihood ratio.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Barto, A.G., Sutton, R.S., Anderson, C.W.: Neuronlike Adaptive Elements That Can Learn Difficult Learning Control Problems. IEEE Transactions on System Man, and Cybernetics 13, 834–846 (1983)
Baxter, J., Bartlett, P.L.: Infinite-Horizon Policy-Gradient Estimation. Journal of Artificial Intelligence Research 15, 319–350 (2001)
Baxter, J., Bartlett, P.L., Weaver, L.: Experiments with Infinite-Horizon, Policy-Gradient Estimation. Journal of Artificial Intelligence Research 15, 351–381 (2001)
Kimura, H., Kobayashi, S.: An Analysis of Actor/Critic Algorithm Using Eligibility Traces: reinforcement learning with imperfect value functions. In: Proceedings of the ICML-98 (1998)
Konda, V.R., Tsitsiklis, J.N.: Actor-Critic Algorithms. SIAM Journal on Control and Optimization 42(4), 1143–1166 (2003)
Munos, R.: Policy Gradient in Continuous Time. Journal of Machine Learning Research 7, 771–791 (2006)
Peters, J., Vijayakumar, S., Schaal, S.: Reinforcement learning for humanoid robotics. In: Humanoids2003, 3rd IEEE-RAS International Conference on Humanoid Robots, Karlsruhe, Germany, Sep. 29-30 (2003)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Watkins, C., Dayan, P.: Q-Learning. Machine Learning 8, 279–292 (1992)
Williams, R.: Simple Statistical Gradient Following Algorithms for Connectionist Reinforcement Learning. Machine Learning 8, 229–256 (1992)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this paper
Cite this paper
Wawrzyński, P. (2007). Reinforcement Learning in Fine Time Discretization. In: Beliczynski, B., Dzielinski, A., Iwanowski, M., Ribeiro, B. (eds) Adaptive and Natural Computing Algorithms. ICANNGA 2007. Lecture Notes in Computer Science, vol 4431. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71618-1_52
Download citation
DOI: https://doi.org/10.1007/978-3-540-71618-1_52
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71589-4
Online ISBN: 978-3-540-71618-1
eBook Packages: Computer ScienceComputer Science (R0)