An Introduction to Reinforcement Learning Theory: Value Function Methods

Bartlett, Peter L.

doi:10.1007/3-540-36434-X_5

An Introduction to Reinforcement Learning Theory: Value Function Methods

Peter L. Bartlett³

Chapter
First Online: 01 January 2003

3118 Accesses
4 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2600))

Abstract

These lecture notes are intended to give a tutorial introduction to the formulation and analysis of reinforcement learning problems. In these problems, an agent chooses actions to take in some environment, aiming to maximize a reward function. Many control, scheduling, planning and game-playing tasks can be formulated in this way, as problems of controlling a Markov decision process.We review the classical dynamic programming approaches to .nding optimal controllers. For large state spaces, these techniques are impractical. We review methods based on approximate value functions, estimated via simulation. In particular, we discuss the motivation for (and shortcomings of) the TD (ë) algorithm.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 49.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

J. Baxter and P. L. Bartlett. Infinite-horizon gradient-based policy search. Journal of Arti.cial Intelligence Research, 15:319–350, 2001.
MATH MathSciNet Google Scholar
D P Bertsekas and J N Tsitsiklis. Neuro-Dynamic Programming. Athena Scientific, 1996.
Google Scholar
R. H. Crites and A. G. Barto. Improving elevator performance using reinforcement learning. In D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors, Advances in Neural Information Processing Systems 8, pages 1017–1023. MIT Press, 1996.
Google Scholar
E Seneta. Non-negative Matrices and Markov Chains. Springer-Verlag, New-York, 1981.
MATH Google Scholar
S. P. Singh and D. Bertsekas. Reinforcement learning for dynamic channel allocation in cellular telephone systems. In Advances in Neural Information Processing Systems: Proceedings of the 1996 Conference, pages 974–980. MIT Press, 1997.
Google Scholar
R. S. Sutton and A. G. Barto. Reinforcement Learning: An Introduction. MIT Press, Cambridge MA, 1998. ISBN 0-262-19398-1.
Google Scholar
G. Tesauro. TD-Gammon, a self-teaching backgammon program, achieves masterlevel play. Neural Computation, 6:215–219, 1994.
Article Google Scholar
J. N. Tsitsiklis and B. Van-Roy. An Analysis of Temporal Di.erence Learning with Function Approximation. IEEE Transactions on Automatic Control, 42(5):674–690, 1997.
Article MATH Google Scholar
L. Weaver and J. Baxter. STD(λ): learning state di.erences with TD(λ). In Proceedings of the Post-graduate ADFA Conference on Computer Science (PACCS’01), ADFA Monographs in Computer Science Series (1), pages 63–70, 2001.
Google Scholar
W. Zhang and T. G. Dietterich. A reinforcement learning approach to job-shop scheduling. In Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, pages 1114–1120. Morgan Kaufmann, 1995.
Google Scholar

Download references

Author information

Authors and Affiliations

Barnhill Technologies, USA
Peter L. Bartlett

Authors

Peter L. Bartlett
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

RSISE, The Australian National University, 0200, Canberra, ACT, Australia
Shahar Mendelson
Research School for Information Sciences and Engineering, The Australian National University, 0200, Canberra, ACT, Australia
Alexander J. Smola

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Bartlett, P.L. (2003). An Introduction to Reinforcement Learning Theory: Value Function Methods. In: Mendelson, S., Smola, A.J. (eds) Advanced Lectures on Machine Learning. Lecture Notes in Computer Science(), vol 2600. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36434-X_5

Download citation

DOI: https://doi.org/10.1007/3-540-36434-X_5
Published: 30 January 2003
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-00529-2
Online ISBN: 978-3-540-36434-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics