Abstract
The least-squares policy iteration approach works efficiently in value function approximation, given appropriate basis functions. Because of its smoothness, the Gaussian kernel is a popular and useful choice as a basis function. However, it does not allow for discontinuity which typically arises in real-world reinforcement learning tasks. In this paper, we propose a new basis function based on geodesic Gaussian kernels, which exploits the non-linear manifold structure induced by the Markov decision processes. The usefulness of the proposed method is successfully demonstrated in simulated robot arm control and Khepera robot navigation.
Similar content being viewed by others
References
Bishop, C. M. (1995). Neural networks for pattern recognition. Oxford: Clarendon.
Chung, F. R. K. (1997). Spectral graph theory. Providence: Am. Math. Soc.
Coifman, R., & Maggioni, M. (2006). Diffusion wavelets. Applied and Computational Harmonic Analysis, 21, 53–94.
Daubechies, I. (1992). Ten lectures on wavelets. Philadelphia: SIAM.
Dijkstra, E. W. (1959). A note on two problems in connexion with graphs. Numerische Mathematik, 1, 269–271.
Engel, Y., Mannor, S., & Meir, R. (2005). Reinforcement learning with Gaussian processes. In Proceedings of international conference on machine learning, Bonn, Germany.
Fredman, M. L., & Tarjan, R. E. (1987). Fibonacci heaps and their uses in improved network optimization algorithms. Journal of the ACM, 34, 569–615.
Girosi, F., Jones, M., & Poggio, T. (1995). Regularization theory and neural networks architectures. Neural Computation, 7, 219–269.
Goldberg, A. V., & Harrelson, C. (2005). Computing the shortest path: A* search meets graph theory. In 16th annual ACM-SIAM symposium on discrete algorithms, Vancouver, Canada (pp. 156–165).
Hachiya, H., Akiyama, T., Sugiyama, M., & Peters, J. (2008). Adaptive importance sampling with automatic model selection in value function approximation. In Proceedings of the twenty-third AAAI conference on artificial intelligence (AAAI-08), Chicago, USA (pp. 1351–1356).
Kohonen, T. (1995). Self-organizing maps. Berlin: Springer.
Kolter, J. Z., & Ng, A. Y. (2007). Learning omnidirectional path following using dimensionality reduction. In Proceedings of robotics: science and systems.
Lagoudakis, M. G., & Parr, R. (2003). Least-squares policy iteration. Journal of Machine Learning Research, 4, 1107–1149.
Mahadevan, S. (2005). Proto-value functions: Developmental reinforcement learning. In Proceedings of international conference on machine learning, Bonn, Germany.
Mahadevan, S., & Maggioni, M. (2006). Value function approximation with diffusion wavelets and Laplacian eigenfunctions. In Advances in neural information processing systems (Vol. 18, pp. 843–850). Cambridge: MIT Press.
Morimoto, J., & Doya, K. (2007). Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning. Robotics and Autonomous Systems, 36, 37–51.
Osentoski, S., & Mahadevan, S. (2007). Learning state-action basis functions for hierarchical MDPs. In Proceedings of the 24th international conference on machine learning.
Precup, D., Sutton, R. S., & Singh, S. (2000). Eligibility traces for off-policy policy evaluation. In Proceedings of the seventeenth international conference on machine learning (pp. 759–766). San Mateo: Morgan Kaufmann.
Schölkopf, B., & Smola, A. J. (2002). Learning with kernels. Cambridge: MIT Press.
Sugiyama, M., Hachiya, H., Towell, C., & Vijayakumar, S. (2007). Value function approximation on non-linear manifolds for robot motor control. In Proceedings of 2007 IEEE international conference on robotics and automation (ICRA2007) (pp. 1733–1740).
Sutton, R. S., & Barto, G. A. (1998). Reinforcement learning: An introduction. Cambridge: MIT Press.
Vapnik, V. N. (1998). Statistical learning theory. New York: Wiley.
Vesanto, J., Himberg, J., Alhoniemi, E., & Parhankangas, J. (2000). SOM toolbox for Matlab 5 (Technical Report A57). Helsinki University of Technology.
Vijayakumar, S., D’Souza, A., Shibata, T., Conradt, J., & Schaal, S. (2002). Statistical learning for humanoid robots. Autonomous Robot, 12, 55–69.
Author information
Authors and Affiliations
Corresponding author
Additional information
The current paper is a complete version of our earlier manuscript (Sugiyama et al. 2007). The major differences are that we included more technical details of the proposed method in Sect. 3, discussions on the relation to related methods in Sect. 4, and the application to map building in Sect. 6. A demo movie of the proposed method applied in simulated robot arm control and Khepera robot navigation is available from http://sugiyama-www.cs.titech.ac.jp/~sugi/2008/GGKvsOGK.wmv.
Rights and permissions
About this article
Cite this article
Sugiyama, M., Hachiya, H., Towell, C. et al. Geodesic Gaussian kernels for value function approximation. Auton Robot 25, 287–304 (2008). https://doi.org/10.1007/s10514-008-9095-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10514-008-9095-6