Adaptive Sparse Grids in Reinforcement Learning

Chapter
Part of the Lecture Notes in Computational Science and Engineering book series (LNCSE, volume 102)

Abstract

We propose a model-based online reinforcement learning approach for continuous domains with deterministic transitions using a spatially adaptive sparse grid in the planning stage. The model learning employs Gaussian processes regression and allows a low sample complexity. The adaptive sparse grid is introduced to allow the representation of the value function in the planning stage in higher dimensional state spaces. This work gives numerical evidence that adaptive sparse grids are applicable in the case of reinforcement learning.

References

  1. 1.
    Bardi, M., Capuzzo-Dolcetta, I.: Optimal control and viscosity solutions of Hamilton-Jacobi-Bellman equations. In: Systems and Control: Foundations and Applications. Birkhäuser, Boston (1997)Google Scholar
  2. 2.
    Barles, G., Jakobsen, E.R.: On the convergence rate of approximation schemes for Hamilton-Jacobi-Bellman equations. M2AN Math. Model. Numer. Anal. 36(1), 33–54 (2002)Google Scholar
  3. 3.
    Barles, G., Jakobsen, E.R.: Error bounds for monotone approximation schemes for parabolic Hamilton-Jacobi-Bellman equations. Math. Comput. 76(240), 1861–1893 (2007)CrossRefMATHMathSciNetGoogle Scholar
  4. 4.
    Barles, G., Souganidis, P.: Convergence of approximation schemes for fully nonlinear second order equations. Asymptot. Anal. 4(3), 271–283 (1991)MATHMathSciNetGoogle Scholar
  5. 5.
    Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-Dynamic Programming. Athena Scientific, Belmont (1996)MATHGoogle Scholar
  6. 6.
    Bokanowski, O., Garcke, J., M-Griebel, Klompmaker, I.: An adaptive sparse grid semi-Lagrangian scheme for first order Hamilton-Jacobi Bellman equations. J. Sci. Comput. 55(3), 575–605 (2013)Google Scholar
  7. 7.
    Bonnans, J.F., Ottenwaelter, E., Zidani, H.: A fast algorithm for the two dimensional HJB equation of stochastic control. M2AN, Math. Model. Numer. Anal. 38(4), 723–735 (2004)Google Scholar
  8. 8.
    Bonnans, J.F., Zidani, H.: Consistency of generalized finite difference schemes for the stochastic HJB equation. SIAM J. Numer. Anal. 41(3), 1008–1021 (2003)CrossRefMATHMathSciNetGoogle Scholar
  9. 9.
    Brafman, R., Tennenholtz, M.: R-max-a general polynomial time algorithm for near-optimal reinforcement learning. J. Mach. Learn. Res. 3, 213–231 (2002)MathSciNetGoogle Scholar
  10. 10.
    Bungartz, H.J., Griebel, M.: Sparse grids. Acta Numer. 13, 1–123 (2004)CrossRefMathSciNetGoogle Scholar
  11. 11.
    Camilli, F., Falcone, M.: An approximation scheme for the optimal control of diffusion processes. RAIRO, Modélisation Math. Anal. Numér. 29(1), 97–122 (1995)Google Scholar
  12. 12.
    Chapman, D., Kaelbling, L.P.: Input generalization in delayed reinforcement learning: an algorithm and performance comparisons. In: Proceedings of the 12th International Joint Conference on Artificial Intelligence, San Mateo, pp. 726–731 (1991)Google Scholar
  13. 13.
    Deisenroth, M.P., Rasmussen, C., Peters, J.: Gaussian process dynamic programming. Neurocomputing 72(7–9), 1508–1524 (2009)CrossRefGoogle Scholar
  14. 14.
    Farahmand, A.M., Munos, R., Szepesvári, C.: Error propagation for approximate policy and value iteration. In: NIPS. Lafferty, J., Williams, C.K.I., Shawe-Taylor, J., Zemel, R.S., Culotta, A. (eds.) Advances in Neural Information Processing Systems, vol. 23, pp. 568–576. (2010)Google Scholar
  15. 15.
    Feuersänger, C.: Sparse grid methods for higher dimensional approximation. Dissertation, Institut für Numerische Simulation, Universität Bonn (2010)Google Scholar
  16. 16.
    Garcke, J.: Regression with the optimised combination technique. In: Cohen, W., Moore, A. (eds.) Proceedings of the 23rd ICML’06, Pittsburgh, pp. 321–328. ACM, New York (2006)Google Scholar
  17. 17.
    Garcke, J.: Sparse grids in a nutshell. In: Sparse Grids and Applications. Lecture Notes in Computational Science and Engineering, vol. 88, pp. 57–80. Springer, Berlin/New York (2013)Google Scholar
  18. 18.
    Griebel, M.: Adaptive sparse grid multilevel methods for elliptic PDEs based on finite differences. Computing 61(2), 151–179 (1998)CrossRefMATHMathSciNetGoogle Scholar
  19. 19.
    Grüne, L.: An adaptive grid scheme for the discrete Hamilton-Jacobi-Bellman equation. Numer. Math. 75(3), 319–337 (1997)CrossRefMATHMathSciNetGoogle Scholar
  20. 20.
    Grüne, L.: Error estimation and adaptive discretization for the discrete stochastic Hamilton-Jacobi-Bellman equation. Numer. Math. 99(1), 85–112 (2004)CrossRefMATHMathSciNetGoogle Scholar
  21. 21.
    Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer, New York (2001)CrossRefMATHGoogle Scholar
  22. 22.
    Heinecke, A., PflügerS, D.: Multi- and many-core data mining with adaptive sparse grids. In: Proceedings of the 8th ACM International Conference on Computing Frontiers, CF’11, Ischia, pp. 29:1–29:10. ACM (2011)Google Scholar
  23. 23.
    Jung, T., Stone, P.: Gaussian processes for sample efficient reinforcement learning with RMAX-Like exploration. In: Balcázar, J.L., Bonchi, F., Gionis, A. Sebag, M. (eds.) ECML/PKDD 2010 (1). Lecture Notes in Computer Science, vol. 6321, pp. 601–616. Springer, Berlin/New York (2010)Google Scholar
  24. 24.
    Krylov, N.V.: The rate of convergence of finite-difference approximations for Bellman equations with Lipschitz coefficients. Appl. Math. Optim. 52(3), 365–399 (2005)CrossRefMATHMathSciNetGoogle Scholar
  25. 25.
    Kushner, H., Dupuis, P.: Numerical Methods for Stochastic Control Problems in Continuous Time. No. 24 in Applications of Mathematics, 2nd edn. Springer, New York (2001)Google Scholar
  26. 26.
    Munos, R.: A study of reinforcement learning in the continuous case by the means of viscosity solutions. Mach. Learn. 40(3), 265–299 (2000)CrossRefMATHMathSciNetGoogle Scholar
  27. 27.
    Munos, R.: Performance bounds in L p-norm for approximate value iteration. SIAM J. Control Optim. 46(2), 541–561 (2007)CrossRefMATHMathSciNetGoogle Scholar
  28. 28.
    Munos, R., Moore, A.: Variable resolution discretization in optimal control. Mach. Learn. 49(2–3), 291–323 (2002)CrossRefMATHGoogle Scholar
  29. 29.
    Noordmans, J., Hemker, P.: Application of an adaptive sparse grid technique to a model singular perturbation problem. Computing 65, 357–378 (2000)CrossRefMATHMathSciNetGoogle Scholar
  30. 30.
    Pareigis, S.: Adaptive choice of grid and time in reinforcement learning. In: NIPS. MIT, Cambridge (1997).Google Scholar
  31. 31.
    Pflüger, D.: Spatially Adaptive Sparse Grids for High-Dimensional Problems. Verlag Dr. Hut, München (2010)Google Scholar
  32. 32.
    Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning. MIT, Cambridge (2006)MATHGoogle Scholar
  33. 33.
    Sutton, R.S., Barto, A.: Reinforcement Learning: An Introduction. MIT, Cambridge (1998)Google Scholar
  34. 34.
    Smolyak, S.A.: Quadrature and interpolation formulas for tensor products of certain classes of functions. Dokl. Akad. Nauk SSSR 148, 1042–1043 (1963)MATHMathSciNetGoogle Scholar
  35. 35.
    Tourin, A.: Splitting methods for Hamilton-Jacobi equations. Numer. Methods Partial Differ. Equ. 22(2), 381–396 (2006)CrossRefMATHMathSciNetGoogle Scholar
  36. 36.
    Yserentant, H.: On the multi-level splitting of finite element spaces. Numerische Mathematik 49, 379–412 (1986)CrossRefMATHMathSciNetGoogle Scholar
  37. 37.
    Zenger, C.: Sparse grids. In: Hackbusch, W. (ed.) Parallel Algorithms for Partial Differential Equations, Proceedings of the Sixth GAMM-Seminar, Kiel, 1990. Notes on Numerical Fluid Mechanics, vol. 31, pp. 241–251. Vieweg, Braunschweig (1991)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  1. 1.University of BonnBonnGermany

Personalised recommendations