Reinforcement Learning

Even-Dar, Eyal

doi:10.1007/978-3-642-27848-8_341-2

Eyal Even-Dar²

190 Accesses

Keywords

Neuro-dynamic programming

Years and Authors of Summarized Original Work

1992; Watkins

Problem Definition

Many sequential decision problems ranging from dynamic resource allocation to robotics can be formulated in terms of stochastic control and solved by methods of reinforcement learning. Therefore, reinforcement learning (a.k.a neuro-dynamic programming) has become one of the major approaches to tackling real-life problems.

In reinforcement learning, an agent wanders in an unknown environment and tries to maximize its long-term return by performing actions and receiving rewards. The most popular mathematical models to describe reinforcement learning problems are the Markov Decision Process (MDP) and its generalization, the partially observable MDP. In contrast to supervised learning, in reinforcement learning, the agent is learning through interaction with the environment and thus influences the “future.” One of the challenges that arises in such cases is the...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Recommended Reading

Allender E, Arora S, Kearns M, Moore C, Russell A (2002) Note on the representational incompatibility of function approximation and factored dynamics. In: Becker S, Thrun S, Obermayer K (eds) Advances in neural information processing systems 15. MIT, Cambridge
Google Scholar
Bertsekas DP, Tsitsiklis JN (1996) Neuro-dynamic programming. Athena Scientific, Belmont
MATH Google Scholar
Brafman R, Tennenholtz M (2002) R-max – a general polynomial time algorithm for near optimal reinforcement learning. J Mach Learn Res 3:213–231
MathSciNet Google Scholar
Even-Dar E, Mansour Y (2003) Learning rates for qlearning. J Mach Learn Res 5:1–25
MathSciNet Google Scholar
Guestrin C, Koller D, Parr R, Venkataraman S (2003) Efficient solution algorithms for factored MDPs. J Artif Intell Res 19:399–468
MATH MathSciNet Google Scholar
Kakade S (2003) On the sample complexity of reinforcement learning. Ph.D. thesis, University College London
Google Scholar
Kearns M, Singh S (2002) Near-optimal reinforcement learning in polynomial time. Mach Learn 49(2–3):209–232
Article MATH Google Scholar
Lusena C, Goldsmith J, Mundhenk M (2001) Nonapproximability results for partially observable Markov decision processes. J Artif Intell Res 14:83–103
MATH MathSciNet Google Scholar
Ng AY, Coates A, Diel M, Ganapathi V, Schulte J, Tse B, Berger E, Liang E (2006) Inverted autonomous helicopter flight via reinforcement learning. In: International symposium on experimental robotics. Springer tracts in advanced robotics 21. Springer, Berlin/New York
Google Scholar
Papadimitriou CH, Tsitsiklis JN (1987) The complexity of Markov decision processes. Math Oper Res 12(3):441–450
Article MATH MathSciNet Google Scholar
Puterman M (1994) Markov decision processes. Wiley-Interscience, New York
Book MATH Google Scholar
Sutton R (1988) Learning to predict by the methods of temporal differences. Mach Learn 3:9–44
Google Scholar
Sutton R, Barto A (1998) Reinforcement learning. An introduction. MIT, Cambridge
Google Scholar
Tesauro GJ (1996) TD-gammon, a self-teaching backgammon program, achieves a master-level play. Neural Comput 6:215–219
Article Google Scholar
Tsitsiklis JN, Van Roy B (1996) Feature-based methods for large scale dynamic programming. Mach Learn 22:59–94
MATH Google Scholar
Watkins C (1989) Learning from delayed rewards. Ph.D. thesis, Cambridge University
Google Scholar
Watkins C, Dyan P (1992) Qlearning. Mach Learn 8(3/4):279–292
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Google, New York, NY, USA
Eyal Even-Dar

Authors

Eyal Even-Dar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Eyal Even-Dar .

Editor information

Editors and Affiliations

Dept. Electrical Engineering & Computer Science, Northwestern University, Evanston, Illinois, USA
Ming-Yang Kao

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Even-Dar, E. (2015). Reinforcement Learning. In: Kao, MY. (eds) Encyclopedia of Algorithms. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-27848-8_341-2

Download citation

DOI: https://doi.org/10.1007/978-3-642-27848-8_341-2
Received: 10 May 2015
Accepted: 10 May 2015
Published: 02 June 2015
Publisher Name: Springer, Berlin, Heidelberg
Online ISBN: 978-3-642-27848-8
eBook Packages: Springer Reference Computer SciencesReference Module Computer Science and Engineering

Publish with us

Policies and ethics