Geodesic Gaussian kernels for value function approximation

Sugiyama, Masashi; Hachiya, Hirotaka; Towell, Christopher; Vijayakumar, Sethu

doi:10.1007/s10514-008-9095-6

Geodesic Gaussian kernels for value function approximation

Published: 09 July 2008

Volume 25, pages 287–304, (2008)
Cite this article

Autonomous Robots Aims and scope Submit manuscript

Masashi Sugiyama^1,2,
Hirotaka Hachiya¹,
Christopher Towell² &
…
Sethu Vijayakumar²

320 Accesses
17 Citations
Explore all metrics

Abstract

The least-squares policy iteration approach works efficiently in value function approximation, given appropriate basis functions. Because of its smoothness, the Gaussian kernel is a popular and useful choice as a basis function. However, it does not allow for discontinuity which typically arises in real-world reinforcement learning tasks. In this paper, we propose a new basis function based on geodesic Gaussian kernels, which exploits the non-linear manifold structure induced by the Markov decision processes. The usefulness of the proposed method is successfully demonstrated in simulated robot arm control and Khepera robot navigation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Novel Feature Selection and Kernel-Based Value Approximation Method for Reinforcement Learning

Sparse Approximations to Value Functions in Reinforcement Learning

Policy Gradient Methods

References

Bishop, C. M. (1995). Neural networks for pattern recognition. Oxford: Clarendon.
Google Scholar
Chung, F. R. K. (1997). Spectral graph theory. Providence: Am. Math. Soc.
MATH Google Scholar
Coifman, R., & Maggioni, M. (2006). Diffusion wavelets. Applied and Computational Harmonic Analysis, 21, 53–94.
Article MATH MathSciNet Google Scholar
Daubechies, I. (1992). Ten lectures on wavelets. Philadelphia: SIAM.
MATH Google Scholar
Dijkstra, E. W. (1959). A note on two problems in connexion with graphs. Numerische Mathematik, 1, 269–271.
Article MATH MathSciNet Google Scholar
Engel, Y., Mannor, S., & Meir, R. (2005). Reinforcement learning with Gaussian processes. In Proceedings of international conference on machine learning, Bonn, Germany.
Fredman, M. L., & Tarjan, R. E. (1987). Fibonacci heaps and their uses in improved network optimization algorithms. Journal of the ACM, 34, 569–615.
Article MathSciNet Google Scholar
Girosi, F., Jones, M., & Poggio, T. (1995). Regularization theory and neural networks architectures. Neural Computation, 7, 219–269.
Article Google Scholar
Goldberg, A. V., & Harrelson, C. (2005). Computing the shortest path: A* search meets graph theory. In 16th annual ACM-SIAM symposium on discrete algorithms, Vancouver, Canada (pp. 156–165).
Hachiya, H., Akiyama, T., Sugiyama, M., & Peters, J. (2008). Adaptive importance sampling with automatic model selection in value function approximation. In Proceedings of the twenty-third AAAI conference on artificial intelligence (AAAI-08), Chicago, USA (pp. 1351–1356).
Kohonen, T. (1995). Self-organizing maps. Berlin: Springer.
Google Scholar
Kolter, J. Z., & Ng, A. Y. (2007). Learning omnidirectional path following using dimensionality reduction. In Proceedings of robotics: science and systems.
Lagoudakis, M. G., & Parr, R. (2003). Least-squares policy iteration. Journal of Machine Learning Research, 4, 1107–1149.
Article MathSciNet Google Scholar
Mahadevan, S. (2005). Proto-value functions: Developmental reinforcement learning. In Proceedings of international conference on machine learning, Bonn, Germany.
Mahadevan, S., & Maggioni, M. (2006). Value function approximation with diffusion wavelets and Laplacian eigenfunctions. In Advances in neural information processing systems (Vol. 18, pp. 843–850). Cambridge: MIT Press.
Google Scholar
Morimoto, J., & Doya, K. (2007). Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning. Robotics and Autonomous Systems, 36, 37–51.
Article Google Scholar
Osentoski, S., & Mahadevan, S. (2007). Learning state-action basis functions for hierarchical MDPs. In Proceedings of the 24th international conference on machine learning.
Precup, D., Sutton, R. S., & Singh, S. (2000). Eligibility traces for off-policy policy evaluation. In Proceedings of the seventeenth international conference on machine learning (pp. 759–766). San Mateo: Morgan Kaufmann.
Google Scholar
Schölkopf, B., & Smola, A. J. (2002). Learning with kernels. Cambridge: MIT Press.
Google Scholar
Sugiyama, M., Hachiya, H., Towell, C., & Vijayakumar, S. (2007). Value function approximation on non-linear manifolds for robot motor control. In Proceedings of 2007 IEEE international conference on robotics and automation (ICRA2007) (pp. 1733–1740).
Sutton, R. S., & Barto, G. A. (1998). Reinforcement learning: An introduction. Cambridge: MIT Press.
Google Scholar
Vapnik, V. N. (1998). Statistical learning theory. New York: Wiley.
MATH Google Scholar
Vesanto, J., Himberg, J., Alhoniemi, E., & Parhankangas, J. (2000). SOM toolbox for Matlab 5 (Technical Report A57). Helsinki University of Technology.
Vijayakumar, S., D’Souza, A., Shibata, T., Conradt, J., & Schaal, S. (2002). Statistical learning for humanoid robots. Autonomous Robot, 12, 55–69.
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Tokyo Institute of Technology, 2-12-1, O-okayama, Meguro-ku, Tokyo, 152-8552, Japan
Masashi Sugiyama & Hirotaka Hachiya
School of Informatics, University of Edinburgh, The King’s Buildings, Mayfield Road, Edinburgh EH9, 3JZ, UK
Masashi Sugiyama, Christopher Towell & Sethu Vijayakumar

Authors

Masashi Sugiyama
View author publications
You can also search for this author in PubMed Google Scholar
Hirotaka Hachiya
View author publications
You can also search for this author in PubMed Google Scholar
Christopher Towell
View author publications
You can also search for this author in PubMed Google Scholar
Sethu Vijayakumar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Masashi Sugiyama.

Additional information

The current paper is a complete version of our earlier manuscript (Sugiyama et al. 2007). The major differences are that we included more technical details of the proposed method in Sect. 3, discussions on the relation to related methods in Sect. 4, and the application to map building in Sect. 6. A demo movie of the proposed method applied in simulated robot arm control and Khepera robot navigation is available from http://sugiyama-www.cs.titech.ac.jp/~sugi/2008/GGKvsOGK.wmv.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sugiyama, M., Hachiya, H., Towell, C. et al. Geodesic Gaussian kernels for value function approximation. Auton Robot 25, 287–304 (2008). https://doi.org/10.1007/s10514-008-9095-6

Download citation

Received: 06 July 2007
Accepted: 06 June 2008
Published: 09 July 2008
Issue Date: October 2008
DOI: https://doi.org/10.1007/s10514-008-9095-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Geodesic Gaussian kernels for value function approximation

Abstract

Access this article

Similar content being viewed by others

Novel Feature Selection and Kernel-Based Value Approximation Method for Reinforcement Learning

Sparse Approximations to Value Functions in Reinforcement Learning

Policy Gradient Methods

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Geodesic Gaussian kernels for value function approximation

Abstract

Access this article

Similar content being viewed by others

Novel Feature Selection and Kernel-Based Value Approximation Method for Reinforcement Learning

Sparse Approximations to Value Functions in Reinforcement Learning

Policy Gradient Methods

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation