Learning Rates for Q-Learning
In this paper we derive convergence rates for Q-learning. We show an interesting relationship between the convergence rate and the learning rate used in the Q-learning. For a polynomial learning rate, one which is 1/tω at time t where ω ε (1/2, 1), we show that that the convergence rate is polynomial in 1/(1 - γ), where γ is the discount factor. In contrast we show that for a linear learning rate, one which is 1/t at time t, the convergence rate has an exponential dependence on 1/(1 - γ). In addition we show a simple example that proves that this exponential behavior is inherent for a linear learning rate.
Unable to display preview. Download preview PDF.
- [BGS99]F. Beleznay, T. Grobler, and Cs. Szepesvari. Comparing value-function estimation algorithms in undiscounted problems. Technical Report TR-99-02, Mindmaker Ltd, 1999.Google Scholar
- [JJS94]T. Jaakkola, M.I. Jordan, and S.P. Singh. On the convergence of stochastic iterative dynamic programming algorithms. Neural Computation, 6, 1994.Google Scholar
- [KS98]Michael Kearns and Stinder Singh. Finite-sample convergence rates for qlearning and indirect algorithms. In Neural Information Processing Systems 10, 1998.Google Scholar
- [MS96]Littman M. and Cs. Szepesvari. A generalized reinforcement learning model: convergence and applications. In In International Conference on Machine Learning, 1996.Google Scholar
- [SB98]Richard S. Sutton and Andrew G. Bato. Reinforcement Learning. Mit press, 1998.Google Scholar
- [Sze97]Cs. Szepesvari. The asymptotic convergence-rate of q-learning. In Neural Information Processing Systems 10, pages 1064–1070, 1997.Google Scholar
- [Tsi94]Jhon N. Tsitsklis. Asynchronous stochastic approximation and q-learning. Machine Learning, 16:185–202, 1994.Google Scholar
- [Wat89]C. Watkins. Learning from Delayed Rewards. PhD thesis, Cambridge University, 1989.Google Scholar