Skip to main content
Log in

Convergence of Markov decision processes with constraints and state-action dependent discount factors

  • Articles
  • Published:
Science China Mathematics Aims and scope Submit manuscript

Abstract

This paper is concerned with the convergence of a sequence of discrete-time Markov decision processes (DTMDPs) with constraints, state-action dependent discount factors, and possibly unbounded costs. Using the convex analytic approach under mild conditions, we prove that the optimal values and optimal policies of the original DTMDPs converge to those of the limit" one. Furthermore, we show that any countable- state DTMDP can be approximated by a sequence of finite-state DTMDPs, which are constructed using the truncation technique. Finally, we illustrate the approximation by solving a controlled queueing system numeri- cally, and give the corresponding error bound of the approximation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Almudevar A. Approximate Iterative Algorithms. Leiden: CRC Press/Balkema, 2014

    Book  Google Scholar 

  2. Altman E. Denumerable constrained Markov decision processes and finite approximations. Math Methods Oper Res, 1994, 19: 169–191

    Article  MathSciNet  Google Scholar 

  3. Altman E. Constrained Markov Decision Processes. Florida: Chapman & Hall/CRC, 1999

    MATH  Google Scholar 

  4. Alvarez-Mena J, Hernández-Lerma O. Convergence of the optimal values of constrained Markov control processes. Math Methods Oper Res, 2002, 55: 461–484

    Article  MathSciNet  Google Scholar 

  5. Feinberg E A. Constrained discounted Markov decision processes and Hamiltonian cycles. Math Oper Res, 2000, 25: 130–140

    Article  MathSciNet  Google Scholar 

  6. Feinberg E A, Shwartz A. Constrained dynamic programming with two discount factors: Applications and an algorithm. IEEE Trans Automat Control, 1999, 44: 628–631

    Article  MathSciNet  Google Scholar 

  7. González-Hernández J, Hernández-Lerma O. Extreme points of sets of randomized strategies in constrained optimiza-tion and control problems. SIAM J Optim, 2005, 15: 1085–1104

    Article  MathSciNet  Google Scholar 

  8. González-Hernández J, López-Martńez R R, Minjárez-Sosa J A. Approximation, estimation, and control of stochastic systems under a randomized discounted cost criterion. Kybernetika (Prague), 2009, 45: 737–754

    MathSciNet  MATH  Google Scholar 

  9. Guo X P, Hernández-Lerma O. Continuous-Time Markov Decision Processes. Berlin-Heidelberg: Springer-Verlag, 2009

    Book  Google Scholar 

  10. Guo X P, Piunovskiy A. Discounted continuous-time Markov decision processes with constraints: Unbounded transition and loss rates. Math Oper Res, 2011, 36: 105–132

    Article  MathSciNet  Google Scholar 

  11. Guo X P, Zhang W Z. Convergence of controlled models and finite-state approximation for discounted continuous-time Markov decision processes with constraints. European J Oper Res, 2014, 238: 486–496

    Article  MathSciNet  Google Scholar 

  12. Hernández-Lerma O, González-Hernández J. Constrained Markov decision processes in Borel spaces: The discounted case. Math Methods Oper Res, 2000, 52: 271–285

    Article  MathSciNet  Google Scholar 

  13. Hernández-Lerma O, González-Hernández J, López-Martńez R R. Constrained average cost Markov control processes in Borel spaces. SIAM J Control Optim, 2003, 42: 442–468

    Article  MathSciNet  Google Scholar 

  14. Hernández-Lerma O, Lasserre J B. Discrete-Time Markov Control Processes. New York: Springer-Verlag, 1996

    Book  Google Scholar 

  15. Hernández-Lerma O, Lasserre J B. Further Topics on Discrete-Time Markov Control Processes. New York: Springer-Verlag, 1999

    Book  Google Scholar 

  16. Hinderer K, Waldmann K-H. Algorithms for countable state Markov decision models with an absorbing set. SIAM J Control Optim, 2005, 43: 2109–2131

    Article  MathSciNet  Google Scholar 

  17. Huang X X, Zou X L, Guo X P. A minimization problem of the risk probability in first passage semi-Markov decision processes with loss rates. Sci China Math, 2015, 58: 1923–1938

    Article  MathSciNet  Google Scholar 

  18. Huang Y H, Wei Q D, Guo X P. Constrained Markov decision processes with first passage criteria. Ann Oper Res, 2013, 206: 197–219

    Article  MathSciNet  Google Scholar 

  19. Mao X, Piunovskiy A. Strategic measures in optimal control problems for stochastic sequences. Stoch Anal Appl, 2000, 18: 755–776

    Article  MathSciNet  Google Scholar 

  20. Piunovskiy A. Optimal Control of Random Sequences in Problems with Constraints. Dordrecht: Kluwer Academic, 1997

    Book  Google Scholar 

  21. Piunovskiy A. Controlled random sequences: The convex analytic approach and constrained problems. Russian Math Surveys, 2000, 53: 1233–1293

    Article  Google Scholar 

  22. Prieto-Rumeau T, Hernández-Lerma O. Discounted continuous-time controlled Markov chains: Convergence of control models. J Appl Probab, 2012, 49: 1072–1090

    Article  MathSciNet  Google Scholar 

  23. Prokhorov Y. Convergence of random processes and limit theorems in probability theory. Theory Probab Appl, 1956, 1: 157–214

    Article  MathSciNet  Google Scholar 

  24. Puterman, M L. Markov Decision Processes. New York: Wiley, 1994

    Book  Google Scholar 

  25. Saldi N, Linder T, Yuksel S. Asymptotic optimality and rates of convergence of quantized stationary policies in stochastic control. IEEE Trans Automat Control, 2015, 60: 553–558

    Article  MathSciNet  Google Scholar 

  26. Sennott L I. Constrained discounted Markov decision chains. Probab Engrg Inform Sci, 1991, 5: 463–475

    Article  MathSciNet  Google Scholar 

  27. Sennott L I. Stochastic Dynamic Programming and the Control of Queueing Systems. New York: Wiley, 1999

    Google Scholar 

  28. Wei Q D, Guo X P. Markov decision processes with state-dependent discount factors and unbounded rewards/costs. Oper Res Lett, 2011, 39: 369–374

    MathSciNet  MATH  Google Scholar 

  29. Wu X, Guo X P. First passage optimality and variance minimization of Markov decision processes with varying discount factors. J Appl Probab, 2015, 52: 441–456

    Article  MathSciNet  Google Scholar 

  30. Wu X, Zhang J Y. Finite approximation of the first passage models for discrete-time Markov decision processes with varying discount factors. Discrete Event Dyn Syst, 2016, 26: 669–683

    Article  MathSciNet  Google Scholar 

  31. Wu X, Zou X L, Guo X P. First passage Markov decision processes with constraints and varying discount factors. Front Math China, 2015, 10: 1005–1023

    Article  MathSciNet  Google Scholar 

  32. Zhang W Z, Guo X P. Nonzero-sum games for continuous-time Markov chains with unbounded transition and average payoff rates. Sci China Math, 2012, 55: 2405–2416

    Article  MathSciNet  Google Scholar 

  33. Zhang Y. Convex analytic approach to constrained discounted Markov decision processes with non-constant discount factors. TOP, 2013, 21: 378–408

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China (Grant Nos. 61374067 and 41271076).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xianping Guo.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, X., Guo, X. Convergence of Markov decision processes with constraints and state-action dependent discount factors. Sci. China Math. 63, 167–182 (2020). https://doi.org/10.1007/s11425-017-9292-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11425-017-9292-1

Keywords

MSC(2010)

Navigation