Advertisement

Sparse Kernel-SARSA(λ) with an Eligibility Trace

  • Matthew Robards
  • Peter Sunehag
  • Scott Sanner
  • Bhaskara Marthi
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6913)

Abstract

We introduce the first online kernelized version of SARSA(λ) to permit sparsification for arbitrary λ for 0 ≤ λ ≤ 1; this is possible via a novel kernelization of the eligibility trace that is maintained separately from the kernelized value function. This separation is crucial for preserving the functional structure of the eligibility trace when using sparse kernel projection techniques that are essential for memory efficiency and capacity control. The result is a simple and practical Kernel-SARSA(λ) algorithm for general 0 ≤ λ ≤ 1 that is memory-efficient in comparison to standard SARSA(λ) (using various basis functions) on a range of domains including a real robotics task running on a Willow Garage PR2 robot.

Keywords

Function Approximation Reinforcement Learning Markov Decision Process Reproduce Kernel Hilbert Space Robot Navigation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Aronszajn, N.: Theory of reproducing kernels. Transactions of the American Mathematical Society 68 (1950)Google Scholar
  2. 2.
    Bishop, C.M.: Pattern Recognition and Machine Learning (Information Science and Statistics), 1st edn. Springer, Heidelberg (October 2007)Google Scholar
  3. 3.
    Csato, L., Opper, M.: Sparse representation for gaussian process models. In: Advances in Neural Information Processing Systems, vol. 13 (2001)Google Scholar
  4. 4.
    Engel, Y.: Algorithms and Representations for Reinforcement Learning. PhD thesis, Hebrew University (April 2005)Google Scholar
  5. 5.
    Engel, Y., Mannor, S., Meir, R.: Sparse online greedy support vector regression. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) ECML 2002. LNCS (LNAI), vol. 2430, pp. 84–96. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  6. 6.
    Engel, Y., Mannor, S., Meir, R.: Bayes meets bellman: The gaussian process approach to temporal difference learning. In: Proc. of the 20th International Conference on Machine Learning, pp. 154–161 (2003)Google Scholar
  7. 7.
    Jong, N., Stone, P.: Kernel-based models for reinforcement learning in continuous state spaces. In: ICML Workshop on Kernel Machines and Reinforcement Learning (2006)Google Scholar
  8. 8.
    Jung, T., Stone, P.: Gaussian processes for sample efficient reinforcement learning with rmax-like exploration. In: Proceedings of the European Conference on Machine Learning (September 2010)Google Scholar
  9. 9.
    Orabona, F., Keshet, J., Caputo, B.: The projectron: a bounded kernel-based perceptron. In: ICML 2008: Proceedings of the 25th International Conference on Machine Learning, pp. 720–727. ACM, New York (2008)CrossRefGoogle Scholar
  10. 10.
    Ormoneit, D., Sen, S.: Kernel-based reinforcement learning. Machine Learning 49, 161–178 (2002)CrossRefzbMATHGoogle Scholar
  11. 11.
    Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, New York (1994)CrossRefzbMATHGoogle Scholar
  12. 12.
    Rasmussen, C.E., Kuss, M.: Gaussian processes in reinforcement learning. In: Advances in Neural Information Processing Systems, vol. 16, pp. 751–759. MIT Press, Cambridge (2003)Google Scholar
  13. 13.
    Scholkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge (2001)Google Scholar
  14. 14.
    Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)CrossRefzbMATHGoogle Scholar
  15. 15.
    Sutton, R., Barto, A.: Reinforcement Learning. The MIT Press, Cambridge (1998)Google Scholar
  16. 16.
    Taylor, G., Parr, R.: Kernelized value function approximation for reinforcement learning. In: ICML 2009: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 1017–1024. ACM, New York (2009)Google Scholar
  17. 17.
    Xu, X.: A sparse kernel-based least-squares temporal difference algorithm for reinforcement learning. In: Jiao, L., Wang, L., Gao, X.-b., Liu, J., Wu, F. (eds.) ICNC 2006. LNCS, vol. 4221, pp. 47–56. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  18. 18.
    Xu, X., Hu, D., Lu, X.: Kernel-based least squares policy iteration for reinforcement learning. IEEE Transactions on Neural Networks 18(4), 973–992 (2007)CrossRefGoogle Scholar
  19. 19.
    Xu, X., Xie, T., Hu, D., Lu, X., Xu, X., Xie, T., Hu, D., Lu, X.: Kernel least-squares temporal difference learning kernel least-squares temporal difference learning. International Journal of Information Technology, 54–63 (2005)Google Scholar
  20. 20.
    Engel, Y., Mannor, S., Meir, R.: Reinforcement learning with Gaussian processes. In: 22nd International Conference on Machine Learning (ICML 2005), Bonn, Germany, pp. 201–208 (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Matthew Robards
    • 1
    • 2
  • Peter Sunehag
    • 2
  • Scott Sanner
    • 1
    • 2
  • Bhaskara Marthi
    • 3
  1. 1.National ICT AustraliaCanberraAustralia
  2. 2.Research School of Computer ScienceAustralian National UniversityCanberraAustralia
  3. 3.Willow Garage, Inc.Menlo ParkUSA

Personalised recommendations