An Introduction to Reinforcement Learning Theory: Value Function Methods

  • Peter L. Bartlett
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2600)


These lecture notes are intended to give a tutorial introduction to the formulation and analysis of reinforcement learning problems. In these problems, an agent chooses actions to take in some environment, aiming to maximize a reward function. Many control, scheduling, planning and game-playing tasks can be formulated in this way, as problems of controlling a Markov decision process.We review the classical dynamic programming approaches to .nding optimal controllers. For large state spaces, these techniques are impractical. We review methods based on approximate value functions, estimated via simulation. In particular, we discuss the motivation for (and shortcomings of) the TD (ë) algorithm.


Optimal Policy Markov Decision Process Transition Probability Matrix Reward Function Bellman Equation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    J. Baxter and P. L. Bartlett. Infinite-horizon gradient-based policy search. Journal of Arti.cial Intelligence Research, 15:319–350, 2001.zbMATHMathSciNetGoogle Scholar
  2. 2.
    D P Bertsekas and J N Tsitsiklis. Neuro-Dynamic Programming. Athena Scientific, 1996.Google Scholar
  3. 3.
    R. H. Crites and A. G. Barto. Improving elevator performance using reinforcement learning. In D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors, Advances in Neural Information Processing Systems 8, pages 1017–1023. MIT Press, 1996.Google Scholar
  4. 4.
    E Seneta. Non-negative Matrices and Markov Chains. Springer-Verlag, New-York, 1981.zbMATHGoogle Scholar
  5. 5.
    S. P. Singh and D. Bertsekas. Reinforcement learning for dynamic channel allocation in cellular telephone systems. In Advances in Neural Information Processing Systems: Proceedings of the 1996 Conference, pages 974–980. MIT Press, 1997.Google Scholar
  6. 6.
    R. S. Sutton and A. G. Barto. Reinforcement Learning: An Introduction. MIT Press, Cambridge MA, 1998. ISBN 0-262-19398-1.Google Scholar
  7. 7.
    G. Tesauro. TD-Gammon, a self-teaching backgammon program, achieves masterlevel play. Neural Computation, 6:215–219, 1994.CrossRefGoogle Scholar
  8. 8.
    J. N. Tsitsiklis and B. Van-Roy. An Analysis of Temporal Di.erence Learning with Function Approximation. IEEE Transactions on Automatic Control, 42(5):674–690, 1997.zbMATHCrossRefGoogle Scholar
  9. 9.
    L. Weaver and J. Baxter. STD(λ): learning state di.erences with TD(λ). In Proceedings of the Post-graduate ADFA Conference on Computer Science (PACCS’01), ADFA Monographs in Computer Science Series (1), pages 63–70, 2001.Google Scholar
  10. 10.
    W. Zhang and T. G. Dietterich. A reinforcement learning approach to job-shop scheduling. In Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, pages 1114–1120. Morgan Kaufmann, 1995.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Peter L. Bartlett
    • 1
  1. 1.Barnhill TechnologiesUSA

Personalised recommendations