Knowledge Gradient for Online Reinforcement Learning

Yahyaa, Saba; Manderick, Bernard

doi:10.1007/978-3-319-25210-0_7

Saba Yahyaa¹⁷ &
Bernard Manderick¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8946))

Included in the following conference series:

International Conference on Agents and Artificial Intelligence

739 Accesses

Abstract

The most interesting challenge for a reinforcement learning agent is to learn online in unknown large discrete, or continuous stochastic model. The agent has not only to trade-off between exploration and exploitation, but also has to find a good set of basis functions to approximate the value function. We extend offline kernel-based LSPI (or least squares policy iteration) to online learning. Online kernel-based LSPI combines feature of offline kernel-based LSPI and online LSPI. Online kernel-based LSPI uses knowledge gradient policy as an exploration policy to trade-off between exploration and exploitation, and the approximate linear dependency based kernel sparsification method to select basis functions automatically. We compare between online kernel-based LSPI and online LSPI on 5 discrete Markov decision problems, where online kernel-based LSPI outperforms online LSPI according to the optimal policy performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
In order not to overload the notation we omit the time step t when it does not cause confusion.
2.
Note that, [8] used a variant of the KG policy. [8] used the RMSE \(\hat{\bar{\sigma }}\) instead of the change in the RMSE \(\widetilde{\sigma }\) to calculate the KG index \(V^{KG}\) to get better trade-off between exploration and exploitation.

References

Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Google Scholar
Lagoudakis, M.G., Parr, R.: Model-free least squares policy iteration. Technical report, Computer Science Department, Duke University, Durham, North Carolina, United States (2003)
Google Scholar
Xu, X., Hu, D., Lu, X.: Kernel-based least squares policy iteration for reinforcement learning. J. IEEE Trans. Neural Netw. 18(4), 973–992 (2007)
Article Google Scholar
Vapnik, V.: The Grid: Statistical Learning Theory. Wiley, New York (1998)
MATH Google Scholar
Engel, Y., Mannor, S., Meir, R.: The kernel recursive least-squares algorithm. J. IEEE Trans. Signal Process. 52(8), 2275–2285 (2004)
Article MathSciNet Google Scholar
Buşoniu, L., Ernst, D., De Schutter, B., Babuška, R.: Online least-squares policy iteration for reinforcement learning control. In: American Control Conference (ACC), pp. 486–491 (2010)
Google Scholar
Li, L., Littman, M.L., Mansley, C.R.: Online exploration in least-squares policy iteration. J. Comput. (2008)
Google Scholar
Yahyaa, S., Manderick, B.: Knowledge gradient exploration in online least squares policy iteration. In: 5th International Conference on Agents and Artificial Intelligence (ICAART). Springer-Verlag, Barcelona (2013)
Google Scholar
Ryzhov, I.O., Powell, W.B., Frazier, P.I.: The knowledge-gradient policy for a general class of online learning problems. J. Oper. Res. 60, 180–195 (2011)
Article MATH Google Scholar
Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, New York (1994)
Book MATH Google Scholar
Powell, W.B., Ryzhov, I.O.: Optimal Learning. Willey, Canada (2012)
Book Google Scholar
Engel, Y., Meir, R.: Algorithms and representations for reinforcement learning. Technical report, Computer Science Department, Senate of the Hebrew (2005)
Google Scholar
Engel, Y., Mannor, S., Meir, R.: Reinforcement learning with Gaussian processes. In: 22nd International Conference on Machine learning (ICML), New York (2005)
Google Scholar
Koller, D., Parr, R.: Policy iteration for factored MDPs. In: 16th Annual Conference on Uncertainty in Artificial Intelligence American Control Conference (UAI 2000) (2000)
Google Scholar
Mahadevan, S.: Representation Discovery Using Harmonic Analysis. Morgan and Claypool Publishers, San Rafael (2008)
MATH Google Scholar
Scholkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge (2002)
Google Scholar
Sugiyama, M., Hachiya, H., Towell, C., Vijayakumar, S.: Geodesic Gaussian kernels for value function approximation. J. Auton. Robots 25(3), 287–304 (2008)
Article Google Scholar
Yahyaa, S., Manderick, B.: Shortest path Gaussian kernels for state action graphs: an empirical study. In: 24th Benelux Conference on Artificial Intelligence (BNAIC). Maastricht University, The Netherlands (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, Vrije Universiteit Brussel, Pleinlaan 2, 1050, Brussels, Belgium
Saba Yahyaa & Bernard Manderick

Authors

Saba Yahyaa
View author publications
You can also search for this author in PubMed Google Scholar
Bernard Manderick
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Saba Yahyaa .

Editor information

Editors and Affiliations

LERIA - UFR Sciences, Angers, France
Béatrice Duval
Leiden University, Leiden, The Netherlands
Jaap van den Herik
LERIA - UFR Sciences, Angers, France
Stephane Loiseau
INSTICC, Polytechnic Institute of Setúbal, Setúbal, Portugal
Joaquim Filipe

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yahyaa, S., Manderick, B. (2015). Knowledge Gradient for Online Reinforcement Learning. In: Duval, B., van den Herik, J., Loiseau, S., Filipe, J. (eds) Agents and Artificial Intelligence. ICAART 2014. Lecture Notes in Computer Science(), vol 8946. Springer, Cham. https://doi.org/10.1007/978-3-319-25210-0_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-25210-0_7
Published: 01 December 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25209-4
Online ISBN: 978-3-319-25210-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics