Bayesian Reinforcement Learning

Poupart, Pascal

doi:10.1007/978-1-4899-7687-1_929

Pascal Poupart³

268 Accesses
1 Citations

Synonyms

Adaptive control processes; Bayes adaptive Markov decision processes; Dual control; Optimal learning

Definition

Bayesian reinforcement learning refers to reinforcement learning modeled as a Bayesian learning problem (see Bayesian Methods). More specifically, following Bayesian learning theory, reinforcement learning is performed by computing a posterior distribution on the unknowns (e.g., any combination of the transition probabilities, reward probabilities, value function, value gradient, or policy) based on the evidence received (e.g., history of past state–action pairs).

Motivation and Background

Bayesian reinforcement learning can be traced back to the 1950s and 1960s in the work of Bellman (1961), Fel’Dbaum (1965), and several of Howard’s students (Martin 1967). Shortly after Markov decision processeswere formalized, the above researchers (and several others) in Operations Research considered the problem of controlling a Markov process with uncertain transition and...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 699.99; Price excludes VAT (USA)

Hardcover Book: USD 949.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Recommended Reading

Bellman R (1961) Adaptive control processes: a guided tour. Princeton University Press, Princeton
Book MATH Google Scholar
Chalkiadakis G, Boutilier C (2003) Coordination in multi-agent reinforcement learning: a Bayesian approach. In: International joint conference on autonomous agents and multiagent systems (AAMAS), Melbourne, pp 709–716
Google Scholar
Chalkiadakis G, Boutilier C (2004) Bayesian reinforcement learning for coalition formation under uncertainty. In: International joint conference on autonomous agents and multiagent systems (AAMAS), New York, pp 1090–1097
Google Scholar
Dearden R, Friedman N, Russell SJ (1998) Bayesian Q-learning. In: National conference on artificial intelligence (AAAI), Madison, pp 761–768
Google Scholar
DeGroot MH (1970) Optimal statistical decisions. McGraw-Hill, New York
MATH Google Scholar
Duff M (2002) Optimal learning: computational procedures for Bayes-adaptive Markov decision processes. PhD thesis, University of Massachusetts, Amherst
Google Scholar
Engel Y, Mannor S, Meir R (2005) Reinforcement learning with Gaussian processes. In: International conference on machine learning (ICML), Bonn
Google Scholar
Fel’Dbaum A (1965) Optimal control systems. Academic, New York
MATH Google Scholar
Ghavamzadeh M, Engel Y (2006) Bayesian policy gradient algorithms. In: Advances in neural information processing systems (NIPS), Vancouver, pp 457–464
Google Scholar
Gmytrasiewicz P, Doshi P (2005) A framework for sequential planning in multi-agent settings. J Artif Intell Res (JAIR) 24:49–79
MATH Google Scholar
Martin JJ(1967) Bayesian decision problems and Markov chains. Wiley, New York
Google Scholar
Poupart P, Vlassis N (2008) Model-based Bayesian reinforcement learning in partially observable domains. In: International symposium on artificial intelligence and mathematics (ISAIM), Beijing
Google Scholar
Poupart P, Vlassis N, Hoey J, Regan K (2006) An analytic solution to discrete Bayesian reinforcement learning. In: International conference on machine learning (ICML), Pittsburgh, pp 697–704
Google Scholar
Puterman ML (1994) Markov decision processes. Wiley, New York
Book MATH Google Scholar
Ross S, Chaib-Draa B, Pineau J (2007) Bayes-adaptive POMDPs. In: Advances in neural information processing systems (NIPS), Vancouver
Google Scholar
Ross S, Chaib-Draa B, Pineau J (2008) Bayesian reinforcement learning in continuous POMDPs with application to robot navigation. In: IEEE international conference on robotics and automation (ICRA), Pasadena, pp 2845–2851
Google Scholar
Sutton RS, Barto AG (1998) Reinforcement learning. MIT Press, Cambridge, MA
Google Scholar

Download references

Author information

Authors and Affiliations

University of Waterloo, Waterloo, ON, Canada
Pascal Poupart

Authors

Pascal Poupart
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

The University of New South Wales, Sydney, NSW, Australia
Claude Sammut
Faculty of Information Technology, Monash University, Melbourne, VIC, Australia
Geoffrey I. Webb

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Poupart, P. (2017). Bayesian Reinforcement Learning. In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning and Data Mining. Springer, Boston, MA. https://doi.org/10.1007/978-1-4899-7687-1_929

Download citation

DOI: https://doi.org/10.1007/978-1-4899-7687-1_929
Published: 14 April 2017
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4899-7685-7
Online ISBN: 978-1-4899-7687-1
eBook Packages: Computer ScienceReference Module Computer Science and Engineering

Publish with us

Policies and ethics