Encyclopedia of Machine Learning and Data Mining

Living Edition
| Editors: Claude Sammut, Geoffrey I. Webb

Value Function Approximation

  • Michail G. Lagoudakis
Living reference work entry
DOI: https://doi.org/10.1007/978-1-4899-7502-7_876-1

Abstract

The goal in sequential decision making under uncertainty is to find good or optimal policies for selecting actions in stochastic environments in order to achieve a long-term goal; such problems are typically modeled as Markov decision processes (MDPs). A key concept in MDPs is the value function, a real-valued function that summarizes the long-term goodness of a decision into a single number and allows the formulation of optimal decision making as an optimization problem. An exact representation of value functions in large real-world problems is infeasible; therefore, a large body of research has been devoted to value-function approximation methods, which sacrifice some representation accuracy for the sake of scalability. These approaches have delivered effective approaches to deriving good policies in hard decision problems and laid the foundation for efficient reinforcement learning algorithms, which learn good policies in unknown stochastic environments through interaction.

This is a preview of subscription content, log in to check access.

References

  1. Bertsekas DP, Tsitsiklis JN (1996) Neuro-dynamic programming. Athena Scientific, BelmontzbMATHGoogle Scholar
  2. Bethke B, How JP (2009) Approximate dynamic programming using Bellman residual elimination and Gaussian process regression. In: Proceedings of the American control conference, St. Louis, pp 745–750Google Scholar
  3. Bethke B, How JP, Ozdaglar A (2008) Approximate dynamic programming using support vector regression. In: Proceedings of the IEEE conference on decision and control, Cancun, pp 745–750Google Scholar
  4. Bradtke SJ, Barto AG (1996) Linear least-squares algorithms for temporal difference learning. Mach Learn 22(1–3):33–57zbMATHGoogle Scholar
  5. Buşoniu L, Babuška R, Schutter BD, Ernst D (2010) Reinforcement learning and dynamic programming using functions approximators. CRC, Boca RatonzbMATHGoogle Scholar
  6. de Farias DP, Van Roy B (2003) The linear programming approach to approximate dynamic programming. Oper Res 51(6):850–865zbMATHMathSciNetCrossRefGoogle Scholar
  7. Engel Y, Mannor S, Meir R (2003) Bayes meets Bellman: the Gaussian process approach to temporal difference learning. In: Proceedings of the international conference on machine learning (ICML), Washington, DC, pp 154–161Google Scholar
  8. Engel Y, Mannor S, Meir R (2005) Reinforcement learning with Gaussian processes. In: Proceedings of the international conference on machine learning (ICML), Bonn, pp 201–208Google Scholar
  9. Ernst D, Geurts P, Wehenkel L (2005) Tree-based batch mode reinforcement learning. J Mach Learn Res 6:503–556MathSciNetGoogle Scholar
  10. Johns J, Petrik M, Mahadevan S (2009) Hybrid least-squares algorithms for approximate policy evaluation. Mach Learn 76(2–3):243–256CrossRefGoogle Scholar
  11. Lagoudakis MG, Parr R (2003) Least-squares policy iteration. J Mach Learn Res 4:1107–1149MathSciNetGoogle Scholar
  12. Mahadevan S, Maggioni M (2007) Proto-value functions: a Laplacian framework for learning representation and control in Markov decision processes. J Mach Learn Res 8:2169–2231zbMATHMathSciNetGoogle Scholar
  13. Menache I, Mannor S, Shimkin N (2005) Basis function adaptation in temporal difference reinforcement learning. Ann Oper Res 134(1):215–238zbMATHMathSciNetCrossRefGoogle Scholar
  14. Nedić A, Bertsekas DP (2003) Least-squares policy evaluation algorithms with linear function approximation. Discret Event Dyn Syst Theory Appl 13(1–2):79–110zbMATHGoogle Scholar
  15. Parr R, Painter-Wakefield C, Li L, Littman M (2007) Analyzing feature generation for value-function approximation. In: Proceedings of the international conference on machine learning (ICML), Corvallis, pp 449–456Google Scholar
  16. Puterman ML (1994) Markov decision processes: discrete stochastic dynamic programming. Wiley, New YorkzbMATHCrossRefGoogle Scholar
  17. Rasmussen CE, Kuss M (2004) Gaussian processes in reinforcement learning. In: Thrun S, Saul LK, Scholkopf B (eds) Advances in neural information processing systems (NIPS). MIT Press, Cambridge pp 751–759Google Scholar
  18. Sutton R, Barto A (1998) Reinforcement learning: an introduction. MIT, CambridgeGoogle Scholar
  19. Taylor G, Parr R (2009) Kernelized value function approximation for reinforcement learning. In: Proceedings of the international conference on machine learning (ICML), Toronto, pp 1017–1024Google Scholar
  20. Xu X, Hu D, Lu X (2007) Kernel-based least-squares policy iteration for reinforcement learning. IEEE Trans Neural Netw 18(4):973–992CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  1. 1.Technical University of CreteChaniaGreece