Value Function Approximation through Sparse Bayesian Modeling

  • Nikolaos Tziortziotis
  • Konstantinos Blekas
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7188)


In this study we present a sparse Bayesian framework for value function approximation. The proposed method is based on the on-line construction of a dictionary of states which are collected during the exploration of the environment by the agent. A linear regression model is established for the observed partial discounted return of such dictionary states, where we employ the Relevance Vector Machine (RVM) and exploit its enhanced modeling capability due to the embedded sparsity properties. In order to speed-up the optimization procedure and allow dealing with large-scale problems, an incremental strategy is adopted. A number of experiments have been conducted on both simulated and real environments, where we took promising results in comparison with another Bayesian approach that uses Gaussian processes.


Value function approximation Sparse Bayesian modeling Relevance Vector Machine Incremental learning 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, Heidelberg (2006)zbMATHGoogle Scholar
  2. 2.
    Bradtke, S.J., Barto, A.G.: Linear least-squares algorithms for temporal difference learning. Machine Learning 22, 33–57 (1996)zbMATHGoogle Scholar
  3. 3.
    Engel, Y., Mannor, S., Meir, R.: Reinforcement learning with gaussian process. In: International Conference on Machine Learning, pp. 201–208 (2005)Google Scholar
  4. 4.
    Geist, M., Pietquin, O.: Kalman Temporal Differences. Journal of Artificial Intelligence Research 39, 483–532 (2010)MathSciNetzbMATHGoogle Scholar
  5. 5.
    Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: A survey. Journal of Artificial Inteligence Research 4, 237–285 (1996)Google Scholar
  6. 6.
    Lagoudakis, M.G., Parr, R.: Least-squares policy iteration. Journal of Machine Learning Research 4, 1107–1149 (2003)MathSciNetGoogle Scholar
  7. 7.
    Moore, A.: Variable resolution dynamic programming: Efficiently learning action maps in multivariate real-valued state-spaces. In: Machine Learning: Proceedings of the Eighth International Conference. Morgan Kaufmann (June 1991)Google Scholar
  8. 8.
    Rasmussen, C., Williams, C.: Gaussian Processes for Machine Learning. MIT Press (2006)Google Scholar
  9. 9.
    Rummery, G.A., Niranjan, M.: On-line q-learning using connectionist systems. Tech. rep., Cambridge University Engineering Department (1994)Google Scholar
  10. 10.
    Scholkopf, B., Smola, A.: Learning with Kernels. MIT Press (2002)Google Scholar
  11. 11.
    Seeger, M.: Bayesian Inference and Optimal Design for the Sparse Linear Model. Journal of Machine Learning Research 9, 759–813 (2008)MathSciNetzbMATHGoogle Scholar
  12. 12.
    Singh, S., Sutton, R.S., Kaelbling, P.: Reinforcement learning with replacing eligibility traces. Machine Learning, 123–158 (1996)Google Scholar
  13. 13.
    Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)Google Scholar
  14. 14.
    Taylor, G., Parr, R.: Kernelized value function approximation for reinforcement learning. In: International Conference on Machine Learning, pp. 1017–1024 (2009)Google Scholar
  15. 15.
    Tipping, M.E.: Sparse bayesian learning and the relevance vector machine. Journal of Machine Learning Research 1, 211–244 (2001)MathSciNetzbMATHGoogle Scholar
  16. 16.
    Tipping, M.E., Faul, A.C.: Fast marginal likelihood maximization for sparse bayesian models. In: Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics (2003)Google Scholar
  17. 17.
    Tzikas, D., Likas, A., Galatsanos, N.: Sparse Bayesian modeling with adaptive kernel learning. IEEE Trans. on Neural Networks 20(6), 926–937 (2009)CrossRefGoogle Scholar
  18. 18.
    Watkins, C., Dayan, P.: Q-learning. Machine Learning 8(3), 279–292 (1992)zbMATHGoogle Scholar
  19. 19.
    Xu, X., Hu, D., Lu, X.: Kernel-based least squares policy iteration for reinforcement learning. IEEE Transactions on Neural Networks 18(4), 973–992 (2007)CrossRefGoogle Scholar
  20. 20.
    Xu, X., Xie, T., Hu, D., Lu, X.: Kernel least-squares temporal difference learning. International Journal of Information Technology 11(9), 54–63 (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Nikolaos Tziortziotis
    • 1
  • Konstantinos Blekas
    • 1
  1. 1.Department of Computer ScienceUniversity of IoanninaIoanninaGreece

Personalised recommendations