Reordering Sparsification of Kernel Machines in Approximate Policy Iteration

  • Chunming Liu
  • Jinze Song
  • Xin Xu
  • Pengcheng Zhang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5552)


Approximate policy iteration (API), which includes least-squares policy iteration (LSPI) and its kernelized version (KLSPI), has received increasing attention due to their good convergence and generalization abilities in solving difficult reinforcement learning problems. However, the sparsification of feature vectors, especially the kernel-based features, greatly influences the performance of API methods. In this paper, a novel reordering sparsification method is proposed for sparsifiying kernel machines in API. In this method, a greedy strategy is adopted, which adds the sample with the maximal squared approximation error to the kernel dictionary, so that the samples are reordered to improve the performance of kernel sparsification. Experimental results on the learning control of an inverted pendulum verify that by using the proposed algorithm, the size of the kernel dictionary is smaller than that of the previous sequential sparsification algorithm with the same level of sparsity, and the performance of the control policies learned by KLSPI can also be improved.


Reinforcement learning Approximate policy iteration Sparsification 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Sutton, R., Barto, A.: Reinforcement Learning. MIT Press, Cambridge (1998)Google Scholar
  2. 2.
    Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement Learning: A Survey. Journal of Artificial Intelligence Research 4, 237–285 (1996)Google Scholar
  3. 3.
    Xu, X., Hu, D.W., Lu, X.C.: Kernel Based Least-squares Policy Iteration. IEEE Transactions on Neural Networks 18, 973–992 (2007)CrossRefGoogle Scholar
  4. 4.
    Bertsekas, D.P., Tsitsiklis, J.N.: Neurodynamic Programming. Athena Scientific, Belmont (1996)Google Scholar
  5. 5.
    Moody, J., Saffell, M.: Learning to Trade Via Direct Reinforcement. IEEE Transactions on Neural Networks 12, 875–889 (2001)CrossRefGoogle Scholar
  6. 6.
    Baxter, J., Bartlett, P.L.: Infinite-horizon Policy-gradient Estimation. Journal of Artificial Intelligence Research 15, 319–350 (2001)MathSciNetzbMATHGoogle Scholar
  7. 7.
    Barto, A.G., Sutton, R.S., Anderson, C.W.: Neuronlike Adaptive Elements that Can Solve Difficult Learning Control Problems. IEEE Transactions on System, Man, and Cybernetics 13, 835–846 (1983)Google Scholar
  8. 8.
    Konda, V.R., Tsitsiklis, J.N.: Actor-Critic Algorithm. In: Advances in Neural Information Processing Systems. MIT Press, Cambridge (2000)Google Scholar
  9. 9.
    Xu, X., He, H.G., Hu, D.W.: Efficient Reinforcement Learning Using Recursive Least-squares Methods. Journal of Artificial Intelligence Research 16, 259–292 (2002)MathSciNetzbMATHGoogle Scholar
  10. 10.
    Boyan, J.: Technical Update: Least-squares Temporal Difference Learning. Machine Learning 49, 233–246 (2002)CrossRefzbMATHGoogle Scholar
  11. 11.
    Lagoudakis, M.G., Parr, P.: Least-squares Policy Iteration. Journal of Machine Learning Research 4, 1107–1149 (2003)MathSciNetzbMATHGoogle Scholar
  12. 12.
    Engel, Y., Mannor, S., Meir, R.: The Kernel Recursive Least-squares Algorithm. IEEE Transactions on Signal Processing 52, 2275–2285 (2004)MathSciNetCrossRefzbMATHGoogle Scholar
  13. 13.
    Vapnik, V.: Statistical Learning Theory. Wiley Interscience, NewYork (1998)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Chunming Liu
    • 1
  • Jinze Song
    • 1
  • Xin Xu
    • 1
  • Pengcheng Zhang
    • 1
  1. 1.Institute of AutomationNational University of Defense TechnologyChangshaChina

Personalised recommendations