An Online Kernel-Based Clustering Approach for Value Function Approximation

  • Nikolaos Tziortziotis
  • Konstantinos Blekas
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7297)

Abstract

Value function approximation is a critical task in solving Markov decision processes and accurately modeling reinforcement learning agents. A significant issue is how to construct efficient feature spaces from samples collected by the environment in order to obtain an optimal policy. The particular study addresses this challenge by proposing an on-line kernel-based clustering approach for building appropriate basis functions during the learning process. The method uses a kernel function capable of handling pairs of state-action as sequentially generated by the agent. At each time step, the procedure either adds a new cluster, or adjusts the winning cluster’s parameters. By considering the value function as a linear combination of the constructed basis functions, the weights are optimized in a temporal-difference framework in order to minimize the Bellman approximation error. The proposed method is evaluated in numerous known simulated environments.

Keywords

Function Approximation Markov Decision Process Policy Iteration Stochastic Gradient Descent Eligibility Trace 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)Google Scholar
  2. 2.
    Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: A survey. Journal of Artificial Inteligence Research 4, 237–285 (1996)Google Scholar
  3. 3.
    Sutton, R.: Learning to predict by the method of temporal differences. Machine Learning 3(1), 9–44 (1988)Google Scholar
  4. 4.
    Boyan, J.A.: Technical update: Least-squares temporal difference learning. Machine Learning, 233–246 (2002)Google Scholar
  5. 5.
    Lagoudakis, M.G., Parr, R.: Least-squares policy iteration. Journal of Machine Learning Research 4, 1107–1149 (2003)MathSciNetGoogle Scholar
  6. 6.
    Xu, X., Hu, D., Lu, X.: Kernel-based least squares policy iteration for reinforcement learning. IEEE Transactions on Neural Networks 18(4), 973–992 (2007)CrossRefGoogle Scholar
  7. 7.
    Rasmussen, C.E., Kuss, M.: Gaussian processes in reinforcement learning. In: Advances in Neural Information Processing Systems 16, pp. 751–759 (2004)Google Scholar
  8. 8.
    Engel, Y., Mannor, S., Meir, R.: Reinforcement learning with gaussian process. In: International Conference on Machine Learning, pp. 201–208 (2005)Google Scholar
  9. 9.
    Farahmand, A.M., Ghavamzadeh, M., Szepesvári, C., Mannor, S.: Regularized policy iteration. In: NIPS, pp. 441–448 (2008)Google Scholar
  10. 10.
    Konidaris, G.D., Osentoski, S., Thomas, P.S.: Value function approximation in reinforcement learning using the fourier basis. In: AAAI Conf. on Artificial Intelligence, pp. 380–385 (2011)Google Scholar
  11. 11.
    Mahadevan, S.: Samuel meets amarel: Automating value function approximation using global state space analysis. In: AAAI (2005)Google Scholar
  12. 12.
    Mahadevan, S., Maggione, M.: Proto-value Functions: A Laplacian Framework for Learning Repersentation and Control in Markov Decision Porocesses. Journal of Machine Learning Research 8, 2169–2231 (2007)MATHGoogle Scholar
  13. 13.
    Menache, I., Mannor, S., Shimkin, N.: Basis Function Adaptation in Temporal Difference Reinforcement Learning. Annals of Operations Research 134, 215–238 (2005)MathSciNetMATHCrossRefGoogle Scholar
  14. 14.
    Petrik, M.: An analysis of laplacian methods for value function approximation in mdps. In: International Joint Conference on Artificial Intelligence, pp. 2574–2579 (2007)Google Scholar
  15. 15.
    Scholkopf, B., Smola, A.J., Muller, K.-R.: Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation 10(5), 1299–1319 (1998)CrossRefGoogle Scholar
  16. 16.
    Tzortzis, G., Likas, A.: The Global Kernel k-Means Clustering Algorithm. IEEE Trans. on Neural Networks 20(7), 1181–1194 (2009)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Nikolaos Tziortziotis
    • 1
  • Konstantinos Blekas
    • 1
  1. 1.Department of Computer ScienceUniversity of IoanninaIoanninaGreece

Personalised recommendations