Adaptive Sparse Grids in Reinforcement Learning

  • Jochen GarckeEmail author
  • Irene Klompmaker
Part of the Lecture Notes in Computational Science and Engineering book series (LNCSE, volume 102)


We propose a model-based online reinforcement learning approach for continuous domains with deterministic transitions using a spatially adaptive sparse grid in the planning stage. The model learning employs Gaussian processes regression and allows a low sample complexity. The adaptive sparse grid is introduced to allow the representation of the value function in the planning stage in higher dimensional state spaces. This work gives numerical evidence that adaptive sparse grids are applicable in the case of reinforcement learning.


Reinforcement Learning Markov Decision Process Reward Function Sparse Grid Gaussian Process Regression 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Bardi, M., Capuzzo-Dolcetta, I.: Optimal control and viscosity solutions of Hamilton-Jacobi-Bellman equations. In: Systems and Control: Foundations and Applications. Birkhäuser, Boston (1997)Google Scholar
  2. 2.
    Barles, G., Jakobsen, E.R.: On the convergence rate of approximation schemes for Hamilton-Jacobi-Bellman equations. M2AN Math. Model. Numer. Anal. 36(1), 33–54 (2002)Google Scholar
  3. 3.
    Barles, G., Jakobsen, E.R.: Error bounds for monotone approximation schemes for parabolic Hamilton-Jacobi-Bellman equations. Math. Comput. 76(240), 1861–1893 (2007)CrossRefzbMATHMathSciNetGoogle Scholar
  4. 4.
    Barles, G., Souganidis, P.: Convergence of approximation schemes for fully nonlinear second order equations. Asymptot. Anal. 4(3), 271–283 (1991)zbMATHMathSciNetGoogle Scholar
  5. 5.
    Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-Dynamic Programming. Athena Scientific, Belmont (1996)zbMATHGoogle Scholar
  6. 6.
    Bokanowski, O., Garcke, J., M-Griebel, Klompmaker, I.: An adaptive sparse grid semi-Lagrangian scheme for first order Hamilton-Jacobi Bellman equations. J. Sci. Comput. 55(3), 575–605 (2013)Google Scholar
  7. 7.
    Bonnans, J.F., Ottenwaelter, E., Zidani, H.: A fast algorithm for the two dimensional HJB equation of stochastic control. M2AN, Math. Model. Numer. Anal. 38(4), 723–735 (2004)Google Scholar
  8. 8.
    Bonnans, J.F., Zidani, H.: Consistency of generalized finite difference schemes for the stochastic HJB equation. SIAM J. Numer. Anal. 41(3), 1008–1021 (2003)CrossRefzbMATHMathSciNetGoogle Scholar
  9. 9.
    Brafman, R., Tennenholtz, M.: R-max-a general polynomial time algorithm for near-optimal reinforcement learning. J. Mach. Learn. Res. 3, 213–231 (2002)MathSciNetGoogle Scholar
  10. 10.
    Bungartz, H.J., Griebel, M.: Sparse grids. Acta Numer. 13, 1–123 (2004)CrossRefMathSciNetGoogle Scholar
  11. 11.
    Camilli, F., Falcone, M.: An approximation scheme for the optimal control of diffusion processes. RAIRO, Modélisation Math. Anal. Numér. 29(1), 97–122 (1995)Google Scholar
  12. 12.
    Chapman, D., Kaelbling, L.P.: Input generalization in delayed reinforcement learning: an algorithm and performance comparisons. In: Proceedings of the 12th International Joint Conference on Artificial Intelligence, San Mateo, pp. 726–731 (1991)Google Scholar
  13. 13.
    Deisenroth, M.P., Rasmussen, C., Peters, J.: Gaussian process dynamic programming. Neurocomputing 72(7–9), 1508–1524 (2009)CrossRefGoogle Scholar
  14. 14.
    Farahmand, A.M., Munos, R., Szepesvári, C.: Error propagation for approximate policy and value iteration. In: NIPS. Lafferty, J., Williams, C.K.I., Shawe-Taylor, J., Zemel, R.S., Culotta, A. (eds.) Advances in Neural Information Processing Systems, vol. 23, pp. 568–576. (2010)Google Scholar
  15. 15.
    Feuersänger, C.: Sparse grid methods for higher dimensional approximation. Dissertation, Institut für Numerische Simulation, Universität Bonn (2010)Google Scholar
  16. 16.
    Garcke, J.: Regression with the optimised combination technique. In: Cohen, W., Moore, A. (eds.) Proceedings of the 23rd ICML’06, Pittsburgh, pp. 321–328. ACM, New York (2006)Google Scholar
  17. 17.
    Garcke, J.: Sparse grids in a nutshell. In: Sparse Grids and Applications. Lecture Notes in Computational Science and Engineering, vol. 88, pp. 57–80. Springer, Berlin/New York (2013)Google Scholar
  18. 18.
    Griebel, M.: Adaptive sparse grid multilevel methods for elliptic PDEs based on finite differences. Computing 61(2), 151–179 (1998)CrossRefzbMATHMathSciNetGoogle Scholar
  19. 19.
    Grüne, L.: An adaptive grid scheme for the discrete Hamilton-Jacobi-Bellman equation. Numer. Math. 75(3), 319–337 (1997)CrossRefzbMATHMathSciNetGoogle Scholar
  20. 20.
    Grüne, L.: Error estimation and adaptive discretization for the discrete stochastic Hamilton-Jacobi-Bellman equation. Numer. Math. 99(1), 85–112 (2004)CrossRefzbMATHMathSciNetGoogle Scholar
  21. 21.
    Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer, New York (2001)CrossRefzbMATHGoogle Scholar
  22. 22.
    Heinecke, A., PflügerS, D.: Multi- and many-core data mining with adaptive sparse grids. In: Proceedings of the 8th ACM International Conference on Computing Frontiers, CF’11, Ischia, pp. 29:1–29:10. ACM (2011)Google Scholar
  23. 23.
    Jung, T., Stone, P.: Gaussian processes for sample efficient reinforcement learning with RMAX-Like exploration. In: Balcázar, J.L., Bonchi, F., Gionis, A. Sebag, M. (eds.) ECML/PKDD 2010 (1). Lecture Notes in Computer Science, vol. 6321, pp. 601–616. Springer, Berlin/New York (2010)Google Scholar
  24. 24.
    Krylov, N.V.: The rate of convergence of finite-difference approximations for Bellman equations with Lipschitz coefficients. Appl. Math. Optim. 52(3), 365–399 (2005)CrossRefzbMATHMathSciNetGoogle Scholar
  25. 25.
    Kushner, H., Dupuis, P.: Numerical Methods for Stochastic Control Problems in Continuous Time. No. 24 in Applications of Mathematics, 2nd edn. Springer, New York (2001)Google Scholar
  26. 26.
    Munos, R.: A study of reinforcement learning in the continuous case by the means of viscosity solutions. Mach. Learn. 40(3), 265–299 (2000)CrossRefzbMATHMathSciNetGoogle Scholar
  27. 27.
    Munos, R.: Performance bounds in L p-norm for approximate value iteration. SIAM J. Control Optim. 46(2), 541–561 (2007)CrossRefzbMATHMathSciNetGoogle Scholar
  28. 28.
    Munos, R., Moore, A.: Variable resolution discretization in optimal control. Mach. Learn. 49(2–3), 291–323 (2002)CrossRefzbMATHGoogle Scholar
  29. 29.
    Noordmans, J., Hemker, P.: Application of an adaptive sparse grid technique to a model singular perturbation problem. Computing 65, 357–378 (2000)CrossRefzbMATHMathSciNetGoogle Scholar
  30. 30.
    Pareigis, S.: Adaptive choice of grid and time in reinforcement learning. In: NIPS. MIT, Cambridge (1997).Google Scholar
  31. 31.
    Pflüger, D.: Spatially Adaptive Sparse Grids for High-Dimensional Problems. Verlag Dr. Hut, München (2010)Google Scholar
  32. 32.
    Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning. MIT, Cambridge (2006)zbMATHGoogle Scholar
  33. 33.
    Sutton, R.S., Barto, A.: Reinforcement Learning: An Introduction. MIT, Cambridge (1998)Google Scholar
  34. 34.
    Smolyak, S.A.: Quadrature and interpolation formulas for tensor products of certain classes of functions. Dokl. Akad. Nauk SSSR 148, 1042–1043 (1963)zbMATHMathSciNetGoogle Scholar
  35. 35.
    Tourin, A.: Splitting methods for Hamilton-Jacobi equations. Numer. Methods Partial Differ. Equ. 22(2), 381–396 (2006)CrossRefzbMATHMathSciNetGoogle Scholar
  36. 36.
    Yserentant, H.: On the multi-level splitting of finite element spaces. Numerische Mathematik 49, 379–412 (1986)CrossRefzbMATHMathSciNetGoogle Scholar
  37. 37.
    Zenger, C.: Sparse grids. In: Hackbusch, W. (ed.) Parallel Algorithms for Partial Differential Equations, Proceedings of the Sixth GAMM-Seminar, Kiel, 1990. Notes on Numerical Fluid Mechanics, vol. 31, pp. 241–251. Vieweg, Braunschweig (1991)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  1. 1.University of BonnBonnGermany

Personalised recommendations