Skip to main content

Gaussian Process Reinforcement Learning

  • Reference work entry
Encyclopedia of Machine Learning
  • 382 Accesses

Definition

Gaussian process reinforcement learning generically refers to a class of reinforcement learning (RL) algorithms that use Gaussian processes (GPs) to model and learn some aspect of the problem.

Such methods may be divided roughly into two groups:

  1. 1.

    Model-based methods: Here, GPs are used to learn the transition and reward model of the Markov decision process (MDP) underlying the RL problem. The estimated MDP model is then used to compute an approximate solution to the true MDP.

  2. 2.

    Model-free methods: Here no explicit representation of the MDP is maintained. Rather, GPs are used to learn either the MDP’s value function, state-action value function, or some other quantity that may be used to solve the MDP.

This entry is concerned with the latter class of methods, as these constitute the majority of published research in this area.

Motivation and Background

Reinforcement learningis a class of learning problems concerned with achieving long-term goals in unfamiliar, uncertain,...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  • Bellman, R. E. (1956). A problem in the sequential design of experiments. Sankhya, 16, 221–229.

    MathSciNet  MATH  Google Scholar 

  • Bellman, R. E. (1957). Dynamic programming. Princeton, NJ: Princeton University Press.

    MATH  Google Scholar 

  • Bertsekas, D. P. (1995). Dynamic programming and optimal control. Belmont, MA: Athena Scientific.

    MATH  Google Scholar 

  • Bertsekas, D. P., & Tsitsiklis, J. N. (1996). Neuro-dynamic programming. Belmont, MA: Athena Scientific.

    MATH  Google Scholar 

  • Boyan, J. A. (1999). Least-squares temporal difference learning. In Proceedings of the 16th international conference on machine learning (pp. 49–56). San Francisco: Morgan Kaufmann.

    Google Scholar 

  • Bradtke, S. J., & Barto, A. G. (1996). Linear least-squares algorithms for temporal difference learning. Machine Learning, 22, 33–57.

    MATH  Google Scholar 

  • Dearden, R., Friedman, N., & Andre, D. (1999). Model based Bayesian exploration. In Proceedings of the fifteenth conference on uncertainty in artificial intelligence (pp. 150–159). San Francisco: Morgan Kaufmann.

    Google Scholar 

  • Dearden, R., Friedman, N., & Russell, S. (1998). Bayesian Q-learning. In Proceedings of the fifteenth national conference on artificial intelligence (pp. 761–768). Menlo Park, CA: AAAI Press.

    Google Scholar 

  • Duff, M. (2002). Optimal learning: Computational procedures for Bayes-adaptive Markov decision processes. PhD thesis, University of Massachusetts, Amherst.

    Google Scholar 

  • Engel, Y. (2005). Algorithms and representations for reinforcement learning. PhD thesis, The Hebrew University of Jerusalem.

    Google Scholar 

  • Engel, Y., Mannor, S., & Meir, R. (2003). Bayes meets Bellman: The Gaussian process approach to temporal difference learning. In Proceedings of the 20th international conference on machine learning. San Francisco: Morgan Kaufmann.

    Google Scholar 

  • Engel, Y., Mannor, S., & Meir, R. (2005). Reinforcement learning with Gaussian processes. In Proceedings of the 22nd international conference on machine learning.

    Google Scholar 

  • Engel, Y., Szabo, P., & Volkinshtein, D. (2005). Learning to control an Octopus arm with Gaussian process temporal difference methods. Technical report, Technion Institute of Technology. www.cs.ualberta.ca/~yaki/reports/octopus.pdf.

  • Ghavamzadeh, M., & Engel, Y. (2007). Bayesian actor-critic algorithms. In Z. Ghahramani (Ed.), 24th international conference on machine learning. Corvallis, OR: Omnipress.

    Google Scholar 

  • Howard, R. (1960). Dynamic programming and Markov processes. Cambridge, MA: MIT Press.

    MATH  Google Scholar 

  • Kaelbling, L. P., Littman, M. L., & Cassandra, A. R. (1998). Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101, 99–134.

    Article  MathSciNet  MATH  Google Scholar 

  • Kushner, H. J., & Yin, C. J. (1997). Stochastic approximation algorithms and applications. Berlin: Springer.

    MATH  Google Scholar 

  • Littman, M. L. (1994). Markov games as a framework for multi-agent reinforcement learning. In Proceedings of the 11th international conference on machine learning (ICML-94) (pp. 157–163). New Brunswick, NJ: Morgan Kaufmann.

    Google Scholar 

  • Mannor, S., Simester, D., Sun, P., & Tsitsiklis, J. N. (2004). Bias and variance in value function estimation. In Proceedings of the 21st international conference on machine learning.

    Google Scholar 

  • Poupart, P., Vlassis, N. A., Hoey, J., & Regan, K. (2006). An analytic solution to discrete Bayesian reinforcement learning. In Proceedings of the twenty-third international conference on machine learning (pp. 697–704). Pittsburgh, PA.

    Google Scholar 

  • Puterman, M. L. (1994). Markov decision processes: Discrete stochastic dynamic programming. New York: Wiley.

    MATH  Google Scholar 

  • Rummery, G., & Niranjan, M. (1994). On-line Q-learning using connectionist systems. Technical report CUED/F-INFENG/TR 166, Cambridge University Engineering Department.

    Google Scholar 

  • Strens, M. (2000). A Bayesian framework for reinforcement learning. In Proceedings of the 17th international conference on machine learning (pp. 943–950). San Francisco: Morgan Kaufmann.

    Google Scholar 

  • Sutton, R. S. (1984). Temporal credit assignment in reinforcement learning. PhD thesis, University of Massachusetts, Amherst.

    Google Scholar 

  • Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction. Cambridge, MA: MIT Press.

    Google Scholar 

  • Tsitsiklis, J. N., & Van Roy, B. (1996). An analysis of temporal-difference learning with function approximation. Technical report LIDS-P-2322, Cambridge, MA: MIT Press.

    Google Scholar 

  • Wang, T., Lizotte, D., Bowling, M., & Schuurmans, D. (2005). Bayesian sparse sampling for on-line reward optimization. In Proceedings of the 22nd international conference on machine learning (pp. 956–963). New York: ACM Press.

    Google Scholar 

  • Watkins, C. J. C. H. (1989). Learning from delayed rewards. PhD thesis, King’s College, Cambridge, UK.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer Science+Business Media, LLC

About this entry

Cite this entry

Engel, Y. (2011). Gaussian Process Reinforcement Learning. In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-30164-8_325

Download citation

Publish with us

Policies and ethics