Advertisement

Towards Finite-Sample Convergence of Direct Reinforcement Learning

  • Shiau Hong Lim
  • Gerald DeJong
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3720)

Abstract

While direct, model-free reinforcement learning often performs better than model-based approaches in practice, only the latter have yet supported theoretical guarantees for finite-sample convergence. A major difficulty in analyzing the direct approach in an online setting is the absence of a definitive exploration strategy. We extend the notion of admissibility to direct reinforcement learning and show that standard Q-learning with optimistic initial values and constant learning rate is admissible. The notion justifies the use of a greedy strategy that we believe performs very well in practice and holds theoretical significance in deriving finite-sample convergence for direct reinforcement learning. We present empirical evidence that supports our idea.

References

  1. Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-Dynamic Programming. Athena Scientific, Belmont (1996)zbMATHGoogle Scholar
  2. Brafman, R.I., Tennenholtz, M.: R-max, A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning. Journal of Machine Learning Research 3, 213–231 (2002)CrossRefMathSciNetGoogle Scholar
  3. Even-Dar, E., Mansour, Y.: Learning rates for Q-Learning. Journal of Machine Learning Research 5, 1–25 (2003)MathSciNetGoogle Scholar
  4. Kaelbling, L.: Learning in Embedded Systems. PhD thesis, Computer Science Department, Stanford University (1990)Google Scholar
  5. Kearns, M., Singh, S.: Near-Optimal Reinforcement Learning in Polynomial Time. In: Proc. of 15th ICML, pp. 260–268. Morgan Kaufmann, San Francisco (1998)Google Scholar
  6. Kearns, M., Singh, S.: Finite-Sample Rates of Convergence for Q-Learning and Indirect Methods. In: Advances in Neural Information Processing Systems 11, pp. 996–1002. The MIT Press, Cambridge (1999)Google Scholar
  7. Koenig, S., Simmons, R.G.: The Effect of Representation and Knowledge on Goal-Directed Exploration with Reinforcement Learning Algorithms. Machine Learning 22(1/3), 227–250 (1996)zbMATHGoogle Scholar
  8. Rummery, G.A., Niranjan, M.: On-line Q-learning using connectionist systems. Tech. Report CUED/F-INFENG/TR 166, Cambridge University Engineering Dept. (1994)Google Scholar
  9. Sutton, R.: Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding. In: Advances in Neural Information Processing Systems 8, pp. 1038–1044. MIT Press, Cambridge (1996)Google Scholar
  10. Sutton, R., Barto, A.: Reinforcement Learning. MIT Press, Cambridge (1998)Google Scholar
  11. Watkins, C.J.C.H.: Learning from Delayed Rewards. PhD thesis, Cambridge, England (1989)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Shiau Hong Lim
    • 1
  • Gerald DeJong
    • 1
  1. 1.Dept. of Computer ScienceUniversity of IllinoisUrbana-Champaign

Personalised recommendations