Skip to main content

Reinforcement Learning

  • Living reference work entry
  • First Online:
Encyclopedia of Algorithms
  • 190 Accesses

Keywords

Neuro-dynamic programming

Years and Authors of Summarized Original Work

  • 1992; Watkins

Problem Definition

Many sequential decision problems ranging from dynamic resource allocation to robotics can be formulated in terms of stochastic control and solved by methods of reinforcement learning. Therefore, reinforcement learning (a.k.a neuro-dynamic programming) has become one of the major approaches to tackling real-life problems.

In reinforcement learning, an agent wanders in an unknown environment and tries to maximize its long-term return by performing actions and receiving rewards. The most popular mathematical models to describe reinforcement learning problems are the Markov Decision Process (MDP) and its generalization, the partially observable MDP. In contrast to supervised learning, in reinforcement learning, the agent is learning through interaction with the environment and thus influences the “future.” One of the challenges that arises in such cases is the...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Recommended Reading

  1. Allender E, Arora S, Kearns M, Moore C, Russell A (2002) Note on the representational incompatibility of function approximation and factored dynamics. In: Becker S, Thrun S, Obermayer K (eds) Advances in neural information processing systems 15. MIT, Cambridge

    Google Scholar 

  2. Bertsekas DP, Tsitsiklis JN (1996) Neuro-dynamic programming. Athena Scientific, Belmont

    MATH  Google Scholar 

  3. Brafman R, Tennenholtz M (2002) R-max – a general polynomial time algorithm for near optimal reinforcement learning. J Mach Learn Res 3:213–231

    MathSciNet  Google Scholar 

  4. Even-Dar E, Mansour Y (2003) Learning rates for qlearning. J Mach Learn Res 5:1–25

    MathSciNet  Google Scholar 

  5. Guestrin C, Koller D, Parr R, Venkataraman S (2003) Efficient solution algorithms for factored MDPs. J Artif Intell Res 19:399–468

    MATH  MathSciNet  Google Scholar 

  6. Kakade S (2003) On the sample complexity of reinforcement learning. Ph.D. thesis, University College London

    Google Scholar 

  7. Kearns M, Singh S (2002) Near-optimal reinforcement learning in polynomial time. Mach Learn 49(2–3):209–232

    Article  MATH  Google Scholar 

  8. Lusena C, Goldsmith J, Mundhenk M (2001) Nonapproximability results for partially observable Markov decision processes. J Artif Intell Res 14:83–103

    MATH  MathSciNet  Google Scholar 

  9. Ng AY, Coates A, Diel M, Ganapathi V, Schulte J, Tse B, Berger E, Liang E (2006) Inverted autonomous helicopter flight via reinforcement learning. In: International symposium on experimental robotics. Springer tracts in advanced robotics 21. Springer, Berlin/New York

    Google Scholar 

  10. Papadimitriou CH, Tsitsiklis JN (1987) The complexity of Markov decision processes. Math Oper Res 12(3):441–450

    Article  MATH  MathSciNet  Google Scholar 

  11. Puterman M (1994) Markov decision processes. Wiley-Interscience, New York

    Book  MATH  Google Scholar 

  12. Sutton R (1988) Learning to predict by the methods of temporal differences. Mach Learn 3:9–44

    Google Scholar 

  13. Sutton R, Barto A (1998) Reinforcement learning. An introduction. MIT, Cambridge

    Google Scholar 

  14. Tesauro GJ (1996) TD-gammon, a self-teaching backgammon program, achieves a master-level play. Neural Comput 6:215–219

    Article  Google Scholar 

  15. Tsitsiklis JN, Van Roy B (1996) Feature-based methods for large scale dynamic programming. Mach Learn 22:59–94

    MATH  Google Scholar 

  16. Watkins C (1989) Learning from delayed rewards. Ph.D. thesis, Cambridge University

    Google Scholar 

  17. Watkins C, Dyan P (1992) Qlearning. Mach Learn 8(3/4):279–292

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eyal Even-Dar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer Science+Business Media New York

About this entry

Cite this entry

Even-Dar, E. (2015). Reinforcement Learning. In: Kao, MY. (eds) Encyclopedia of Algorithms. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-27848-8_341-2

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-27848-8_341-2

  • Received:

  • Accepted:

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Online ISBN: 978-3-642-27848-8

  • eBook Packages: Springer Reference Computer SciencesReference Module Computer Science and Engineering

Publish with us

Policies and ethics