Advertisement

Autonomous Robots

, Volume 25, Issue 3, pp 287–304 | Cite as

Geodesic Gaussian kernels for value function approximation

  • Masashi SugiyamaEmail author
  • Hirotaka Hachiya
  • Christopher Towell
  • Sethu Vijayakumar
Article

Abstract

The least-squares policy iteration approach works efficiently in value function approximation, given appropriate basis functions. Because of its smoothness, the Gaussian kernel is a popular and useful choice as a basis function. However, it does not allow for discontinuity which typically arises in real-world reinforcement learning tasks. In this paper, we propose a new basis function based on geodesic Gaussian kernels, which exploits the non-linear manifold structure induced by the Markov decision processes. The usefulness of the proposed method is successfully demonstrated in simulated robot arm control and Khepera robot navigation.

Keywords

Reinforcement learning Value function approximation Markov decision process Least-squares policy iteration Gaussian kernel 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bishop, C. M. (1995). Neural networks for pattern recognition. Oxford: Clarendon. Google Scholar
  2. Chung, F. R. K. (1997). Spectral graph theory. Providence: Am. Math. Soc. zbMATHGoogle Scholar
  3. Coifman, R., & Maggioni, M. (2006). Diffusion wavelets. Applied and Computational Harmonic Analysis, 21, 53–94. zbMATHCrossRefMathSciNetGoogle Scholar
  4. Daubechies, I. (1992). Ten lectures on wavelets. Philadelphia: SIAM. zbMATHGoogle Scholar
  5. Dijkstra, E. W. (1959). A note on two problems in connexion with graphs. Numerische Mathematik, 1, 269–271. zbMATHCrossRefMathSciNetGoogle Scholar
  6. Engel, Y., Mannor, S., & Meir, R. (2005). Reinforcement learning with Gaussian processes. In Proceedings of international conference on machine learning, Bonn, Germany. Google Scholar
  7. Fredman, M. L., & Tarjan, R. E. (1987). Fibonacci heaps and their uses in improved network optimization algorithms. Journal of the ACM, 34, 569–615. CrossRefMathSciNetGoogle Scholar
  8. Girosi, F., Jones, M., & Poggio, T. (1995). Regularization theory and neural networks architectures. Neural Computation, 7, 219–269. CrossRefGoogle Scholar
  9. Goldberg, A. V., & Harrelson, C. (2005). Computing the shortest path: A* search meets graph theory. In 16th annual ACM-SIAM symposium on discrete algorithms, Vancouver, Canada (pp. 156–165). Google Scholar
  10. Hachiya, H., Akiyama, T., Sugiyama, M., & Peters, J. (2008). Adaptive importance sampling with automatic model selection in value function approximation. In Proceedings of the twenty-third AAAI conference on artificial intelligence (AAAI-08), Chicago, USA (pp. 1351–1356). Google Scholar
  11. Kohonen, T. (1995). Self-organizing maps. Berlin: Springer. Google Scholar
  12. Kolter, J. Z., & Ng, A. Y. (2007). Learning omnidirectional path following using dimensionality reduction. In Proceedings of robotics: science and systems. Google Scholar
  13. Lagoudakis, M. G., & Parr, R. (2003). Least-squares policy iteration. Journal of Machine Learning Research, 4, 1107–1149. CrossRefMathSciNetGoogle Scholar
  14. Mahadevan, S. (2005). Proto-value functions: Developmental reinforcement learning. In Proceedings of international conference on machine learning, Bonn, Germany. Google Scholar
  15. Mahadevan, S., & Maggioni, M. (2006). Value function approximation with diffusion wavelets and Laplacian eigenfunctions. In Advances in neural information processing systems (Vol. 18, pp. 843–850). Cambridge: MIT Press. Google Scholar
  16. Morimoto, J., & Doya, K. (2007). Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning. Robotics and Autonomous Systems, 36, 37–51. CrossRefGoogle Scholar
  17. Osentoski, S., & Mahadevan, S. (2007). Learning state-action basis functions for hierarchical MDPs. In Proceedings of the 24th international conference on machine learning. Google Scholar
  18. Precup, D., Sutton, R. S., & Singh, S. (2000). Eligibility traces for off-policy policy evaluation. In Proceedings of the seventeenth international conference on machine learning (pp. 759–766). San Mateo: Morgan Kaufmann. Google Scholar
  19. Schölkopf, B., & Smola, A. J. (2002). Learning with kernels. Cambridge: MIT Press. Google Scholar
  20. Sugiyama, M., Hachiya, H., Towell, C., & Vijayakumar, S. (2007). Value function approximation on non-linear manifolds for robot motor control. In Proceedings of 2007 IEEE international conference on robotics and automation (ICRA2007) (pp. 1733–1740). Google Scholar
  21. Sutton, R. S., & Barto, G. A. (1998). Reinforcement learning: An introduction. Cambridge: MIT Press. Google Scholar
  22. Vapnik, V. N. (1998). Statistical learning theory. New York: Wiley. zbMATHGoogle Scholar
  23. Vesanto, J., Himberg, J., Alhoniemi, E., & Parhankangas, J. (2000). SOM toolbox for Matlab 5 (Technical Report A57). Helsinki University of Technology. Google Scholar
  24. Vijayakumar, S., D’Souza, A., Shibata, T., Conradt, J., & Schaal, S. (2002). Statistical learning for humanoid robots. Autonomous Robot, 12, 55–69. zbMATHCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2008

Authors and Affiliations

  • Masashi Sugiyama
    • 1
    • 2
    Email author
  • Hirotaka Hachiya
    • 1
  • Christopher Towell
    • 2
  • Sethu Vijayakumar
    • 2
  1. 1.Department of Computer ScienceTokyo Institute of TechnologyTokyoJapan
  2. 2.School of InformaticsUniversity of EdinburghEdinburgh EH9UK

Personalised recommendations