Definition
Gaussian process reinforcement learning generically refers to a class of reinforcement learning (RL) algorithms that use Gaussian processes (GPs) to model and learn some aspect of the problem.
Such methods may be divided roughly into two groups:
- 1.
Model-based methods: Here, GPs are used to learn the transition and reward model of the Markov decision process (MDP) underlying the RL problem. The estimated MDP model is then used to compute an approximate solution to the true MDP.
- 2.
Model-free methods: Here no explicit representation of the MDP is maintained. Rather, GPs are used to learn either the MDP’s value function, state-action value function, or some other quantity that may be used to solve the MDP.
This entry is concerned with the latter class of methods, as these constitute the majority of published research in this area.
Motivation and Background
Reinforcement learningis a class of learning problems concerned with achieving long-term goals in unfamiliar, uncertain,...
References
Bellman, R. E. (1956). A problem in the sequential design of experiments. Sankhya, 16, 221–229.
Bellman, R. E. (1957). Dynamic programming. Princeton, NJ: Princeton University Press.
Bertsekas, D. P. (1995). Dynamic programming and optimal control. Belmont, MA: Athena Scientific.
Bertsekas, D. P., & Tsitsiklis, J. N. (1996). Neuro-dynamic programming. Belmont, MA: Athena Scientific.
Boyan, J. A. (1999). Least-squares temporal difference learning. In Proceedings of the 16th international conference on machine learning (pp. 49–56). San Francisco: Morgan Kaufmann.
Bradtke, S. J., & Barto, A. G. (1996). Linear least-squares algorithms for temporal difference learning. Machine Learning, 22, 33–57.
Dearden, R., Friedman, N., & Andre, D. (1999). Model based Bayesian exploration. In Proceedings of the fifteenth conference on uncertainty in artificial intelligence (pp. 150–159). San Francisco: Morgan Kaufmann.
Dearden, R., Friedman, N., & Russell, S. (1998). Bayesian Q-learning. In Proceedings of the fifteenth national conference on artificial intelligence (pp. 761–768). Menlo Park, CA: AAAI Press.
Duff, M. (2002). Optimal learning: Computational procedures for Bayes-adaptive Markov decision processes. PhD thesis, University of Massachusetts, Amherst.
Engel, Y. (2005). Algorithms and representations for reinforcement learning. PhD thesis, The Hebrew University of Jerusalem.
Engel, Y., Mannor, S., & Meir, R. (2003). Bayes meets Bellman: The Gaussian process approach to temporal difference learning. In Proceedings of the 20th international conference on machine learning. San Francisco: Morgan Kaufmann.
Engel, Y., Mannor, S., & Meir, R. (2005). Reinforcement learning with Gaussian processes. In Proceedings of the 22nd international conference on machine learning.
Engel, Y., Szabo, P., & Volkinshtein, D. (2005). Learning to control an Octopus arm with Gaussian process temporal difference methods. Technical report, Technion Institute of Technology. www.cs.ualberta.ca/~yaki/reports/octopus.pdf.
Ghavamzadeh, M., & Engel, Y. (2007). Bayesian actor-critic algorithms. In Z. Ghahramani (Ed.), 24th international conference on machine learning. Corvallis, OR: Omnipress.
Howard, R. (1960). Dynamic programming and Markov processes. Cambridge, MA: MIT Press.
Kaelbling, L. P., Littman, M. L., & Cassandra, A. R. (1998). Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101, 99–134.
Kushner, H. J., & Yin, C. J. (1997). Stochastic approximation algorithms and applications. Berlin: Springer.
Littman, M. L. (1994). Markov games as a framework for multi-agent reinforcement learning. In Proceedings of the 11th international conference on machine learning (ICML-94) (pp. 157–163). New Brunswick, NJ: Morgan Kaufmann.
Mannor, S., Simester, D., Sun, P., & Tsitsiklis, J. N. (2004). Bias and variance in value function estimation. In Proceedings of the 21st international conference on machine learning.
Poupart, P., Vlassis, N. A., Hoey, J., & Regan, K. (2006). An analytic solution to discrete Bayesian reinforcement learning. In Proceedings of the twenty-third international conference on machine learning (pp. 697–704). Pittsburgh, PA.
Puterman, M. L. (1994). Markov decision processes: Discrete stochastic dynamic programming. New York: Wiley.
Rummery, G., & Niranjan, M. (1994). On-line Q-learning using connectionist systems. Technical report CUED/F-INFENG/TR 166, Cambridge University Engineering Department.
Strens, M. (2000). A Bayesian framework for reinforcement learning. In Proceedings of the 17th international conference on machine learning (pp. 943–950). San Francisco: Morgan Kaufmann.
Sutton, R. S. (1984). Temporal credit assignment in reinforcement learning. PhD thesis, University of Massachusetts, Amherst.
Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction. Cambridge, MA: MIT Press.
Tsitsiklis, J. N., & Van Roy, B. (1996). An analysis of temporal-difference learning with function approximation. Technical report LIDS-P-2322, Cambridge, MA: MIT Press.
Wang, T., Lizotte, D., Bowling, M., & Schuurmans, D. (2005). Bayesian sparse sampling for on-line reward optimization. In Proceedings of the 22nd international conference on machine learning (pp. 956–963). New York: ACM Press.
Watkins, C. J. C. H. (1989). Learning from delayed rewards. PhD thesis, King’s College, Cambridge, UK.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer Science+Business Media, LLC
About this entry
Cite this entry
Engel, Y. (2011). Gaussian Process Reinforcement Learning. In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-30164-8_325
Download citation
DOI: https://doi.org/10.1007/978-0-387-30164-8_325
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-30768-8
Online ISBN: 978-0-387-30164-8
eBook Packages: Computer ScienceReference Module Computer Science and Engineering