Abstract
The most interesting challenge for a reinforcement learning agent is to learn online in unknown large discrete, or continuous stochastic model. The agent has not only to trade-off between exploration and exploitation, but also has to find a good set of basis functions to approximate the value function. We extend offline kernel-based LSPI (or least squares policy iteration) to online learning. Online kernel-based LSPI combines feature of offline kernel-based LSPI and online LSPI. Online kernel-based LSPI uses knowledge gradient policy as an exploration policy to trade-off between exploration and exploitation, and the approximate linear dependency based kernel sparsification method to select basis functions automatically. We compare between online kernel-based LSPI and online LSPI on 5 discrete Markov decision problems, where online kernel-based LSPI outperforms online LSPI according to the optimal policy performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
In order not to overload the notation we omit the time step t when it does not cause confusion.
- 2.
References
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Lagoudakis, M.G., Parr, R.: Model-free least squares policy iteration. Technical report, Computer Science Department, Duke University, Durham, North Carolina, United States (2003)
Xu, X., Hu, D., Lu, X.: Kernel-based least squares policy iteration for reinforcement learning. J. IEEE Trans. Neural Netw. 18(4), 973–992 (2007)
Vapnik, V.: The Grid: Statistical Learning Theory. Wiley, New York (1998)
Engel, Y., Mannor, S., Meir, R.: The kernel recursive least-squares algorithm. J. IEEE Trans. Signal Process. 52(8), 2275–2285 (2004)
Buşoniu, L., Ernst, D., De Schutter, B., Babuška, R.: Online least-squares policy iteration for reinforcement learning control. In: American Control Conference (ACC), pp. 486–491 (2010)
Li, L., Littman, M.L., Mansley, C.R.: Online exploration in least-squares policy iteration. J. Comput. (2008)
Yahyaa, S., Manderick, B.: Knowledge gradient exploration in online least squares policy iteration. In: 5th International Conference on Agents and Artificial Intelligence (ICAART). Springer-Verlag, Barcelona (2013)
Ryzhov, I.O., Powell, W.B., Frazier, P.I.: The knowledge-gradient policy for a general class of online learning problems. J. Oper. Res. 60, 180–195 (2011)
Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, New York (1994)
Powell, W.B., Ryzhov, I.O.: Optimal Learning. Willey, Canada (2012)
Engel, Y., Meir, R.: Algorithms and representations for reinforcement learning. Technical report, Computer Science Department, Senate of the Hebrew (2005)
Engel, Y., Mannor, S., Meir, R.: Reinforcement learning with Gaussian processes. In: 22nd International Conference on Machine learning (ICML), New York (2005)
Koller, D., Parr, R.: Policy iteration for factored MDPs. In: 16th Annual Conference on Uncertainty in Artificial Intelligence American Control Conference (UAI 2000) (2000)
Mahadevan, S.: Representation Discovery Using Harmonic Analysis. Morgan and Claypool Publishers, San Rafael (2008)
Scholkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge (2002)
Sugiyama, M., Hachiya, H., Towell, C., Vijayakumar, S.: Geodesic Gaussian kernels for value function approximation. J. Auton. Robots 25(3), 287–304 (2008)
Yahyaa, S., Manderick, B.: Shortest path Gaussian kernels for state action graphs: an empirical study. In: 24th Benelux Conference on Artificial Intelligence (BNAIC). Maastricht University, The Netherlands (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Yahyaa, S., Manderick, B. (2015). Knowledge Gradient for Online Reinforcement Learning. In: Duval, B., van den Herik, J., Loiseau, S., Filipe, J. (eds) Agents and Artificial Intelligence. ICAART 2014. Lecture Notes in Computer Science(), vol 8946. Springer, Cham. https://doi.org/10.1007/978-3-319-25210-0_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-25210-0_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25209-4
Online ISBN: 978-3-319-25210-0
eBook Packages: Computer ScienceComputer Science (R0)