Robustness Analysis of SARSA(λ): Different Models of Reward and Initialisation
In the paper the robustness of SARSA(λ), the reinforcement learning algorithm with eligibility traces, is confronted with different models of reward and initialisation of the Q-table. Most of the empirical analyses of eligibility traces in the literature have focused mainly on the step-penalty reward. We analyse two general types of rewards (final goal and step-penalty rewards) and show that learning with long traces, i.e., with high values of λ, can lead to suboptimal solutions in some situations. Problems are identified and discussed. Specifically, obtained results show that SARSA(λ) is sensitive to different models of reward and initialisation. In some cases the asymptotic performance can be significantly reduced.
Keywordsreinforcement learning temporal-difference learning SARSA Q-learning eligibility traces
Unable to display preview. Download preview PDF.
- 2.Sutton, R.S.: Temporal credit assignment in reinforcement learning. PhD thesis, Department of Computer Science, University of Massachusetts, Amherst (1984)Google Scholar
- 3.Loch, J., Singh, S.: Using eligibility traces to find the best memoryless policy in partially observable Markov Decision Processes. In: Proceedings of the 15th International Conferrence on Machine Learning, pp. 323–331 (1998)Google Scholar
- 4.Sutton, R.: Learning to predict by the methods of temporal differences. Machine Learning 3, 9–44 (1988)Google Scholar
- 5.Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)Google Scholar
- 7.Peng, J., Williams, R.J.: Incremental multi-step Q-learning. Machine Learning 22, 283–290 (1996)Google Scholar
- 8.Cichosz, P.: Truncating temporal differences: On the efficient implementation of TD(λ) for reinforcement learning. Journal of Artificial Intelligence Research 2, 287–318 (1995)Google Scholar
- 10.Sutton, R.S.: Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In: Proceedings of the 7th International Conference on Machine Learning, pp. 216–224 (1990)Google Scholar
- 11.Russell, S.J., Norvig, P.: Artificial Intelligence: A Modern Approach, 2nd edn. Prentice-Hall, Englewood Cliffs (2002)Google Scholar
- 12.Jong, N.K., Stone, P.: Model-based exploration in continuous state spaces. In: The 7th Symposium on Abstraction, Reformulation, and Approximation (2007)Google Scholar
- 13.Crites, R.H., Barto, A.G.: Improving elevator performance using reinforcement learning. Advances in Neural Information Processing Systems 8, 1017–1023 (1996)Google Scholar