Encyclopedia of Machine Learning

2010 Edition
| Editors: Claude Sammut, Geoffrey I. Webb

Value Function Approximation

  • Michail G. Lagoudakis
Reference work entry
DOI: https://doi.org/10.1007/978-0-387-30164-8_870



The goal in sequential decision making under uncertainty is to find good or optimal policies for selecting actions in stochastic environments in order to achieve a long term goal; such problems are typically modeled as  Markov Decision Processes (MDPs). A key concept in MDPs is the value function, a real-valued function that summarizes the long-term goodness of a decision into a single number and allows the formulation of optimal decision making as an optimization problem. Exact representation of value functions in large real-world problems is infeasible, therefore a large body of research has been devoted to value function approximationmethods, which sacrifice some representation accuracy for the sake of scalability. These approaches have delivered effective approaches to deriving good policies in hard decision problems and laid the foundation for efficient...

This is a preview of subscription content, log in to check access.

Recommended Reading

  1. Brett, B., & How, J. P. (2009). Approximate dynamic programming using Bellman residual elimination and Gaussian process regression. Proceedings of the American Control Conference, St. Louis, MO, USA, pp. 745–750.Google Scholar
  2. Brett, B., How, J. P., & Ozdaglar, A. (2008). Approximate dynamic programming using support vector regression. Proceedings of the IEEE Conference on Decision and Control, Cancun, Mexico, pp. 745–750.Google Scholar
  3. Bertsekas, D. P., & Tsitsiklis, J. N. (1996). Neuro-dynamic programming. Belmont: Athena Scientific.zbMATHGoogle Scholar
  4. Bradtke, S. J., & Barto, A. G. (1996). Linear least-squares algorithms for temporal difference learning. Machine Learning, 22(1–3), 33–57.zbMATHGoogle Scholar
  5. Buşoniu, L., Babuška, R., De Schutter, B., & Ernst, D. (2010). Reinforcement learning and dynamic programming using functions approximators. CRC Press, Boca Raton, FL, USA.Google Scholar
  6. de Farias, D. P., & Van Roy, B. (2003). The linear programming approach to approximate dynamic programming. Operations Research, 51(6), 850–865.zbMATHCrossRefMathSciNetGoogle Scholar
  7. Engel, Y., Mannor, S., & Meir, R. (2003). Bayes meets Bellman: the Gaussian process approach to temporal difference learning. Proceedings of the International Conference on Machine Learning (ICML), Washington, DC, pp. 154–161.Google Scholar
  8. Engel, Y., Mannor, S., & Meir, R. (2005). Reinforcement learning with Gaussian processes. Proceedings of the International Conference on Machine Learning (ICML), Bonn, Germany, pp. 201–208.Google Scholar
  9. Ernst, D., Geurts, P., & Wehenkel, L. (2005). Tree-based batch mode reinforcement learning. Journal of Machine Learning Research, 6, 503–556.MathSciNetGoogle Scholar
  10. Johns, J., Petrik, M., & Mahadevan, S. (2009). Hybrid least-squares algorithms for approximate policy evaluation. Machine Learning, 76(2–3), 243–256.CrossRefGoogle Scholar
  11. Lagoudakis, M. G., & Parr, R. (2003). Least-squares policy iteration. Journal of Machine Learning Research, 4, 1107–1149.CrossRefMathSciNetGoogle Scholar
  12. Mahadevan, S., & Maggioni, M. (2007). Proto-value functions: a Laplacian framework for learning representation and control in Markov decision processes. Journal of Machine Learning Research, 8, 2169–2231.MathSciNetGoogle Scholar
  13. Menache, I., Mannor, S., & Shimkin, N. (2005). Basis function adaptation in temporal difference reinforcement learning. Annals of Operations Research, 134(1), 215–238.zbMATHCrossRefMathSciNetGoogle Scholar
  14. Nedić, A., & Bertsekas, D. P. (2003). Least-squares policy evaluation algorithms with linear function approximation. Discrete Event Dynamic Systems: Theory and Applications, 13(1–2), 79–110.zbMATHMathSciNetGoogle Scholar
  15. Parr, R., Painter-Wakefield, C., Li, L., & Littman, M. (2007). Analyzing feature generation for value-function approximation. Proceedings of the International Conference on Machine Learning (ICML), Corvallis, pp. 449–456.Google Scholar
  16. Puterman, M. L. (1994). Markov decision processes: discrete stochastic dynamic programming. New York: Wiley.zbMATHGoogle Scholar
  17. Rasmussen, C. E., & Kuss, M. (2004). Gaussian processes in reinforcement learning. Advances in Neural Information Processing Systems (NIPS), pp. 751–759.Google Scholar
  18. Sutton, R., & Barto, A. (1998). Reinforcement learning: an introduction. Cambridge: MIT Press.Google Scholar
  19. Taylor, G., & Parr, R. (2009). Kernelized value function approximation for reinforcement learning. Proceedings of the International Conference on Machine Learning (ICML), Toronto, Canada, pp. 1017–1024.Google Scholar
  20. Xu, X., Hu, D., & Lu, X. (2007). Kernel-based least-squares policy iteration for reinforcement learning. IEEE Transactions on Neural Networks, 18(4), 973–992.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  • Michail G. Lagoudakis

There are no affiliations available