Skip to main content

Bayesian Reinforcement Learning

  • Reference work entry
  • First Online:
Encyclopedia of Machine Learning and Data Mining

Synonyms

Adaptive control processes; Bayes adaptive Markov decision processes; Dual control; Optimal learning

Definition

Bayesian reinforcement learning refers to reinforcement learning modeled as a Bayesian learning problem (see Bayesian Methods). More specifically, following Bayesian learning theory, reinforcement learning is performed by computing a posterior distribution on the unknowns (e.g., any combination of the transition probabilities, reward probabilities, value function, value gradient, or policy) based on the evidence received (e.g., history of past state–action pairs).

Motivation and Background

Bayesian reinforcement learning can be traced back to the 1950s and 1960s in the work of Bellman (1961), Fel’Dbaum (1965), and several of Howard’s students (Martin 1967). Shortly after Markov decision processeswere formalized, the above researchers (and several others) in Operations Research considered the problem of controlling a Markov process with uncertain transition and...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 699.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 949.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Recommended Reading

  • Bellman R (1961) Adaptive control processes: a guided tour. Princeton University Press, Princeton

    Book  MATH  Google Scholar 

  • Chalkiadakis G, Boutilier C (2003) Coordination in multi-agent reinforcement learning: a Bayesian approach. In: International joint conference on autonomous agents and multiagent systems (AAMAS), Melbourne, pp 709–716

    Google Scholar 

  • Chalkiadakis G, Boutilier C (2004) Bayesian reinforcement learning for coalition formation under uncertainty. In: International joint conference on autonomous agents and multiagent systems (AAMAS), New York, pp 1090–1097

    Google Scholar 

  • Dearden R, Friedman N, Russell SJ (1998) Bayesian Q-learning. In: National conference on artificial intelligence (AAAI), Madison, pp 761–768

    Google Scholar 

  • DeGroot MH (1970) Optimal statistical decisions. McGraw-Hill, New York

    MATH  Google Scholar 

  • Duff M (2002) Optimal learning: computational procedures for Bayes-adaptive Markov decision processes. PhD thesis, University of Massachusetts, Amherst

    Google Scholar 

  • Engel Y, Mannor S, Meir R (2005) Reinforcement learning with Gaussian processes. In: International conference on machine learning (ICML), Bonn

    Google Scholar 

  • Fel’Dbaum A (1965) Optimal control systems. Academic, New York

    MATH  Google Scholar 

  • Ghavamzadeh M, Engel Y (2006) Bayesian policy gradient algorithms. In: Advances in neural information processing systems (NIPS), Vancouver, pp 457–464

    Google Scholar 

  • Gmytrasiewicz P, Doshi P (2005) A framework for sequential planning in multi-agent settings. J Artif Intell Res (JAIR) 24:49–79

    MATH  Google Scholar 

  • Martin JJ(1967) Bayesian decision problems and Markov chains. Wiley, New York

    Google Scholar 

  • Poupart P, Vlassis N (2008) Model-based Bayesian reinforcement learning in partially observable domains. In: International symposium on artificial intelligence and mathematics (ISAIM), Beijing

    Google Scholar 

  • Poupart P, Vlassis N, Hoey J, Regan K (2006) An analytic solution to discrete Bayesian reinforcement learning. In: International conference on machine learning (ICML), Pittsburgh, pp 697–704

    Google Scholar 

  • Puterman ML (1994) Markov decision processes. Wiley, New York

    Book  MATH  Google Scholar 

  • Ross S, Chaib-Draa B, Pineau J (2007) Bayes-adaptive POMDPs. In: Advances in neural information processing systems (NIPS), Vancouver

    Google Scholar 

  • Ross S, Chaib-Draa B, Pineau J (2008) Bayesian reinforcement learning in continuous POMDPs with application to robot navigation. In: IEEE international conference on robotics and automation (ICRA), Pasadena, pp 2845–2851

    Google Scholar 

  • Sutton RS, Barto AG (1998) Reinforcement learning. MIT Press, Cambridge, MA

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Science+Business Media New York

About this entry

Cite this entry

Poupart, P. (2017). Bayesian Reinforcement Learning. In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning and Data Mining. Springer, Boston, MA. https://doi.org/10.1007/978-1-4899-7687-1_929

Download citation

Publish with us

Policies and ethics