Advertisement

Approximation of Infinite Horizon Discounted Cost Markov Decision Processes

  • François Dufour
  • Tomás Prieto-RumeauEmail author
Chapter
Part of the Systems & Control: Foundations & Applications book series (SCFA)

Abstract

We deal with a discrete-time infinite horizon Markov decision process with locally compact Borel state and action spaces and possibly unbounded cost function. Based on Lipschitz continuity of the elements of the control model, we propose a state and action discretization procedure for approximating the optimal value function and an optimal policy of the original control model. We provide explicit bounds on the approximation errors.

References

  1. 1.
    Altman, E.: Constrained Markov Decision Processes. Chapman & Hall/CRC, Boca Raton FL (1999)zbMATHGoogle Scholar
  2. 2.
    Arapostathis, A., Borkar, V.S., Fernández-Gaucherand, E., Ghosh, M.K., Marcus, S.I.: Discrete-time controlled Markov processes with average cost criterion: a survey. SIAM J. Control Optim. 31, 282–344 (1993)MathSciNetzbMATHCrossRefGoogle Scholar
  3. 3.
    Bertsekas, D.P.: Convergence of discretization procedures in dynamic programming. IEEE Trans. Automat. Control 20, 415–419 (1975)MathSciNetzbMATHCrossRefGoogle Scholar
  4. 4.
    Bertsekas, D.P., Shreve, S.E.: Stochastic Optimal Control: the Discrete Time Case. Academic Press, New York (1978)zbMATHGoogle Scholar
  5. 5.
    Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-Dynamic Programming. Athena Scientific, Belmont MA (1996)zbMATHGoogle Scholar
  6. 6.
    Chang, H.S., Fu, M.C., Hu, J.Q., Marcus, S.I.: Simulation-Based Algorithms for Markov Decision Processes. Springer, London (2007)zbMATHGoogle Scholar
  7. 7.
    Dufour, F., Prieto-Rumeau, T.: Approximation of Markov decision processes with general state space. J. Math. Anal. Appl. 388, 1254–1267 (2012)MathSciNetzbMATHCrossRefGoogle Scholar
  8. 8.
    Filar, J., Vrieze, K.: Competitive Markov Decision Processes. Springer, New York (1997)zbMATHGoogle Scholar
  9. 9.
    Hernández-Lerma, O.: Adaptive Markov Control Processes. Springer, New York (1989)zbMATHCrossRefGoogle Scholar
  10. 10.
    Hernández-Lerma, O., Lasserre, J.B.: Discrete-Time Markov Control Processes: Basic Optimality Criteria. Springer, New York (1996)CrossRefGoogle Scholar
  11. 11.
    Hernández-Lerma, O., Lasserre, J.B.: Further Topics on Discrete-Time Markov Control Processes. Springer, New York (1999)zbMATHCrossRefGoogle Scholar
  12. 12.
    Hinderer, K.: On approximate solutions of finite-stage dynamic programs, in Dynamic Programming and Its Applications. Proc. Conf. Univ. British Columbia, Vancouver BC, 1977 (Academic Press, New York, 1978), pp. 289–317Google Scholar
  13. 13.
    Hinderer, K.: Lipschitz continuity of value functions in Markovian decision processes. Math. Methods Oper. Res. 62, 3–22 (2005)MathSciNetzbMATHCrossRefGoogle Scholar
  14. 14.
    Langen, H.J.: Convergence of dynamic programming models. Math. Oper. Res. 6, 493–512 (1981)MathSciNetzbMATHCrossRefGoogle Scholar
  15. 15.
    Morin, T.L.: Computational advances in dynamic programming, in Dynamic Programming and Its Applications. Proc. Conf. Univ. British Columbia, Vancouver BC, 1977 (Academic Press, New York, 1978), pp. 53–90Google Scholar
  16. 16.
    Powell, W.B.: Approximate Dynamic Programming. Wiley, Hoboken NJ (2007)zbMATHCrossRefGoogle Scholar
  17. 17.
    Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, New York (1994)zbMATHGoogle Scholar
  18. 18.
    Sennott, L.I.: Stochastic Dynamic Programming and the Control of Queueing Systems. Wiley, New York (1999)zbMATHGoogle Scholar
  19. 19.
    Sutton, R.S., Barto, A.G.: Reinforcement Learning: an Introduction. MIT Press, Cambridge MA (1998)Google Scholar
  20. 20.
    Van Roy, B.: Neuro-dynamic programming: overview and recent trends, in Handbook of Markov Decision Processes. Internat. Ser. Oper. Res. Management Sci (Kluwer, Boston MA, 2002), pp. 431–459Google Scholar
  21. 21.
    Whitt, W.: Approximations of dynamic programs, I. Math. Oper. Res. 3, 231–243 (1978)MathSciNetzbMATHCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  1. 1.Institut de Mathématiques de BordeauxUniversité Bordeaux ITalenceFrance
  2. 2.INRIA Bordeaux Sud Ouest, Team CQFDBordeauxFrance
  3. 3.Department of StatisticsUNEDMadridSpain

Personalised recommendations