Definition
Bayesian reinforcement learning refers to reinforcement learning modeled as a Bayesian learning problem (see Bayesian Methods). More specifically, following Bayesian learning theory, reinforcement learning is performed by computing a posterior distribution on the unknowns (e.g., any combination of the transition probabilities, reward probabilities, value function, value gradient, or policy) based on the evidence received (e.g., history of past state–action pairs).
Motivation and Background
Bayesian reinforcement learning can be traced back to the 1950s and 1960s in the work of Bellman (1961), Fel’Dbaum (1965), and several of Howard’s students (Martin 1967). Shortly after Markov decision processeswere formalized, the above researchers (and several others) in Operations Research considered the problem of controlling a Markov process with uncertain transition and...
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Recommended Reading
Bellman R (1961) Adaptive control processes: a guided tour. Princeton University Press, Princeton
Chalkiadakis G, Boutilier C (2003) Coordination in multi-agent reinforcement learning: a Bayesian approach. In: International joint conference on autonomous agents and multiagent systems (AAMAS), Melbourne, pp 709–716
Chalkiadakis G, Boutilier C (2004) Bayesian reinforcement learning for coalition formation under uncertainty. In: International joint conference on autonomous agents and multiagent systems (AAMAS), New York, pp 1090–1097
Dearden R, Friedman N, Russell SJ (1998) Bayesian Q-learning. In: National conference on artificial intelligence (AAAI), Madison, pp 761–768
DeGroot MH (1970) Optimal statistical decisions. McGraw-Hill, New York
Duff M (2002) Optimal learning: computational procedures for Bayes-adaptive Markov decision processes. PhD thesis, University of Massachusetts, Amherst
Engel Y, Mannor S, Meir R (2005) Reinforcement learning with Gaussian processes. In: International conference on machine learning (ICML), Bonn
Fel’Dbaum A (1965) Optimal control systems. Academic, New York
Ghavamzadeh M, Engel Y (2006) Bayesian policy gradient algorithms. In: Advances in neural information processing systems (NIPS), Vancouver, pp 457–464
Gmytrasiewicz P, Doshi P (2005) A framework for sequential planning in multi-agent settings. J Artif Intell Res (JAIR) 24:49–79
Martin JJ(1967) Bayesian decision problems and Markov chains. Wiley, New York
Poupart P, Vlassis N (2008) Model-based Bayesian reinforcement learning in partially observable domains. In: International symposium on artificial intelligence and mathematics (ISAIM), Beijing
Poupart P, Vlassis N, Hoey J, Regan K (2006) An analytic solution to discrete Bayesian reinforcement learning. In: International conference on machine learning (ICML), Pittsburgh, pp 697–704
Puterman ML (1994) Markov decision processes. Wiley, New York
Ross S, Chaib-Draa B, Pineau J (2007) Bayes-adaptive POMDPs. In: Advances in neural information processing systems (NIPS), Vancouver
Ross S, Chaib-Draa B, Pineau J (2008) Bayesian reinforcement learning in continuous POMDPs with application to robot navigation. In: IEEE international conference on robotics and automation (ICRA), Pasadena, pp 2845–2851
Sutton RS, Barto AG (1998) Reinforcement learning. MIT Press, Cambridge, MA
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Science+Business Media New York
About this entry
Cite this entry
Poupart, P. (2017). Bayesian Reinforcement Learning. In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning and Data Mining. Springer, Boston, MA. https://doi.org/10.1007/978-1-4899-7687-1_929
Download citation
DOI: https://doi.org/10.1007/978-1-4899-7687-1_929
Published:
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4899-7685-7
Online ISBN: 978-1-4899-7687-1
eBook Packages: Computer ScienceReference Module Computer Science and Engineering