Learning Rates for Q-Learning

  • Eyal Even-Dar
  • Yishay Mansour
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2111)


In this paper we derive convergence rates for Q-learning. We show an interesting relationship between the convergence rate and the learning rate used in the Q-learning. For a polynomial learning rate, one which is 1/tω at time t where ω ε (1/2, 1), we show that that the convergence rate is polynomial in 1/(1 - γ), where γ is the discount factor. In contrast we show that for a linear learning rate, one which is 1/t at time t, the convergence rate has an exponential dependence on 1/(1 - γ). In addition we show a simple example that proves that this exponential behavior is inherent for a linear learning rate.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [BGS99]
    F. Beleznay, T. Grobler, and Cs. Szepesvari. Comparing value-function estimation algorithms in undiscounted problems. Technical Report TR-99-02, Mindmaker Ltd, 1999.Google Scholar
  2. [BM00]
    V.S. Borkar and S.P. Meyn. The o.d.e method for convergence of stochstic approximation and reinforcement learning. Siam J. control, 38(2):447–69, 2000.MATHCrossRefMathSciNetGoogle Scholar
  3. [BT96]
    Dimitri P. Bertsekas and Jhon N. Tsitsklis. Neuro-Dynamic Programming. Athena Scientific, Belmont, MA, 1996.MATHGoogle Scholar
  4. [JJS94]
    T. Jaakkola, M.I. Jordan, and S.P. Singh. On the convergence of stochastic iterative dynamic programming algorithms. Neural Computation, 6, 1994.Google Scholar
  5. [KS98]
    Michael Kearns and Stinder Singh. Finite-sample convergence rates for qlearning and indirect algorithms. In Neural Information Processing Systems 10, 1998.Google Scholar
  6. [MS96]
    Littman M. and Cs. Szepesvari. A generalized reinforcement learning model: convergence and applications. In In International Conference on Machine Learning, 1996.Google Scholar
  7. [Put94]
    M.L Puterman. Markov Decision Processes-Discrete Stochastic Dynamic Programming. Jhon Wiley & Sons. Inc., New York, NY, 1994.MATHGoogle Scholar
  8. [SB98]
    Richard S. Sutton and Andrew G. Bato. Reinforcement Learning. Mit press, 1998.Google Scholar
  9. [Sze97]
    Cs. Szepesvari. The asymptotic convergence-rate of q-learning. In Neural Information Processing Systems 10, pages 1064–1070, 1997.Google Scholar
  10. [Tsi94]
    Jhon N. Tsitsklis. Asynchronous stochastic approximation and q-learning. Machine Learning, 16:185–202, 1994.Google Scholar
  11. [Wat89]
    C. Watkins. Learning from Delayed Rewards. PhD thesis, Cambridge University, 1989.Google Scholar
  12. [WD92]
    C. Watking and P. Dyan. Q-learning. Machine Learning, 8(3/4):279–292, 1992.CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2001

Authors and Affiliations

  • Eyal Even-Dar
    • 1
  • Yishay Mansour
    • 1
  1. 1.School of Computer ScienceTel-Aviv UniversityIsrael

Personalised recommendations