Gaussian Process Reinforcement Learning

Engel, Yaakov

doi:10.1007/978-0-387-30164-8_325

Yaakov Engel³

382 Accesses

Definition

Gaussian process reinforcement learning generically refers to a class of reinforcement learning (RL) algorithms that use Gaussian processes (GPs) to model and learn some aspect of the problem.

Such methods may be divided roughly into two groups:

1.
Model-based methods: Here, GPs are used to learn the transition and reward model of the Markov decision process (MDP) underlying the RL problem. The estimated MDP model is then used to compute an approximate solution to the true MDP.
2.
Model-free methods: Here no explicit representation of the MDP is maintained. Rather, GPs are used to learn either the MDP’s value function, state-action value function, or some other quantity that may be used to solve the MDP.

This entry is concerned with the latter class of methods, as these constitute the majority of published research in this area.

Motivation and Background

Reinforcement learningis a class of learning problems concerned with achieving long-term goals in unfamiliar, uncertain,...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

References

Bellman, R. E. (1956). A problem in the sequential design of experiments. Sankhya, 16, 221–229.
MathSciNet MATH Google Scholar
Bellman, R. E. (1957). Dynamic programming. Princeton, NJ: Princeton University Press.
MATH Google Scholar
Bertsekas, D. P. (1995). Dynamic programming and optimal control. Belmont, MA: Athena Scientific.
MATH Google Scholar
Bertsekas, D. P., & Tsitsiklis, J. N. (1996). Neuro-dynamic programming. Belmont, MA: Athena Scientific.
MATH Google Scholar
Boyan, J. A. (1999). Least-squares temporal difference learning. In Proceedings of the 16th international conference on machine learning (pp. 49–56). San Francisco: Morgan Kaufmann.
Google Scholar
Bradtke, S. J., & Barto, A. G. (1996). Linear least-squares algorithms for temporal difference learning. Machine Learning, 22, 33–57.
MATH Google Scholar
Dearden, R., Friedman, N., & Andre, D. (1999). Model based Bayesian exploration. In Proceedings of the fifteenth conference on uncertainty in artificial intelligence (pp. 150–159). San Francisco: Morgan Kaufmann.
Google Scholar
Dearden, R., Friedman, N., & Russell, S. (1998). Bayesian Q-learning. In Proceedings of the fifteenth national conference on artificial intelligence (pp. 761–768). Menlo Park, CA: AAAI Press.
Google Scholar
Duff, M. (2002). Optimal learning: Computational procedures for Bayes-adaptive Markov decision processes. PhD thesis, University of Massachusetts, Amherst.
Google Scholar
Engel, Y. (2005). Algorithms and representations for reinforcement learning. PhD thesis, The Hebrew University of Jerusalem.
Google Scholar
Engel, Y., Mannor, S., & Meir, R. (2003). Bayes meets Bellman: The Gaussian process approach to temporal difference learning. In Proceedings of the 20th international conference on machine learning. San Francisco: Morgan Kaufmann.
Google Scholar
Engel, Y., Mannor, S., & Meir, R. (2005). Reinforcement learning with Gaussian processes. In Proceedings of the 22nd international conference on machine learning.
Google Scholar
Engel, Y., Szabo, P., & Volkinshtein, D. (2005). Learning to control an Octopus arm with Gaussian process temporal difference methods. Technical report, Technion Institute of Technology. www.cs.ualberta.ca/~yaki/reports/octopus.pdf.
Ghavamzadeh, M., & Engel, Y. (2007). Bayesian actor-critic algorithms. In Z. Ghahramani (Ed.), 24th international conference on machine learning. Corvallis, OR: Omnipress.
Google Scholar
Howard, R. (1960). Dynamic programming and Markov processes. Cambridge, MA: MIT Press.
MATH Google Scholar
Kaelbling, L. P., Littman, M. L., & Cassandra, A. R. (1998). Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101, 99–134.
Article MathSciNet MATH Google Scholar
Kushner, H. J., & Yin, C. J. (1997). Stochastic approximation algorithms and applications. Berlin: Springer.
MATH Google Scholar
Littman, M. L. (1994). Markov games as a framework for multi-agent reinforcement learning. In Proceedings of the 11th international conference on machine learning (ICML-94) (pp. 157–163). New Brunswick, NJ: Morgan Kaufmann.
Google Scholar
Mannor, S., Simester, D., Sun, P., & Tsitsiklis, J. N. (2004). Bias and variance in value function estimation. In Proceedings of the 21st international conference on machine learning.
Google Scholar
Poupart, P., Vlassis, N. A., Hoey, J., & Regan, K. (2006). An analytic solution to discrete Bayesian reinforcement learning. In Proceedings of the twenty-third international conference on machine learning (pp. 697–704). Pittsburgh, PA.
Google Scholar
Puterman, M. L. (1994). Markov decision processes: Discrete stochastic dynamic programming. New York: Wiley.
MATH Google Scholar
Rummery, G., & Niranjan, M. (1994). On-line Q-learning using connectionist systems. Technical report CUED/F-INFENG/TR 166, Cambridge University Engineering Department.
Google Scholar
Strens, M. (2000). A Bayesian framework for reinforcement learning. In Proceedings of the 17th international conference on machine learning (pp. 943–950). San Francisco: Morgan Kaufmann.
Google Scholar
Sutton, R. S. (1984). Temporal credit assignment in reinforcement learning. PhD thesis, University of Massachusetts, Amherst.
Google Scholar
Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction. Cambridge, MA: MIT Press.
Google Scholar
Tsitsiklis, J. N., & Van Roy, B. (1996). An analysis of temporal-difference learning with function approximation. Technical report LIDS-P-2322, Cambridge, MA: MIT Press.
Google Scholar
Wang, T., Lizotte, D., Bowling, M., & Schuurmans, D. (2005). Bayesian sparse sampling for on-line reward optimization. In Proceedings of the 22nd international conference on machine learning (pp. 956–963). New York: ACM Press.
Google Scholar
Watkins, C. J. C. H. (1989). Learning from delayed rewards. PhD thesis, King’s College, Cambridge, UK.
Google Scholar

Download references

Author information

Authors and Affiliations

University of Alberta, Edmonton, Alberta, Canada
Yaakov Engel

Authors

Yaakov Engel
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer Science and Engineering, University of New South Wales, Sydney, Australia, 2052
Claude Sammut
Faculty of Information Technology, Clayton School of Information Technology, Monash University, P.O. Box 63, Victoria, Australia, 3800
Geoffrey I. Webb

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Engel, Y. (2011). Gaussian Process Reinforcement Learning. In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-30164-8_325

Download citation

DOI: https://doi.org/10.1007/978-0-387-30164-8_325
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-30768-8
Online ISBN: 978-0-387-30164-8
eBook Packages: Computer ScienceReference Module Computer Science and Engineering

Publish with us

Policies and ethics