Convergence of Markov decision processes with constraints and state-action dependent discount factors

Wu, Xiao; Guo, Xianping

doi:10.1007/s11425-017-9292-1

Convergence of Markov decision processes with constraints and state-action dependent discount factors

Articles
Published: 15 February 2019

Volume 63, pages 167–182, (2020)
Cite this article

Science China Mathematics Aims and scope Submit manuscript

Xiao Wu¹ &
Xianping Guo²

131 Accesses
2 Citations
Explore all metrics

Abstract

This paper is concerned with the convergence of a sequence of discrete-time Markov decision processes (DTMDPs) with constraints, state-action dependent discount factors, and possibly unbounded costs. Using the convex analytic approach under mild conditions, we prove that the optimal values and optimal policies of the original DTMDPs converge to those of the limit" one. Furthermore, we show that any countable- state DTMDP can be approximated by a sequence of finite-state DTMDPs, which are constructed using the truncation technique. Finally, we illustrate the approximation by solving a controlled queueing system numeri- cally, and give the corresponding error bound of the approximation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Approximate Optimal Cost and Policies of First Passage Markov Decision Processes with Countable-State Space and Discount Factors

Finite approximation of the first passage models for discrete-time Markov decision processes with varying discount factors

Article 25 January 2015

Convergence of Value Functions for Finite Horizon Markov Decision Processes with Constraints

Article 04 August 2020

References

Almudevar A. Approximate Iterative Algorithms. Leiden: CRC Press/Balkema, 2014
Book Google Scholar
Altman E. Denumerable constrained Markov decision processes and finite approximations. Math Methods Oper Res, 1994, 19: 169–191
Article MathSciNet Google Scholar
Altman E. Constrained Markov Decision Processes. Florida: Chapman & Hall/CRC, 1999
MATH Google Scholar
Alvarez-Mena J, Hernández-Lerma O. Convergence of the optimal values of constrained Markov control processes. Math Methods Oper Res, 2002, 55: 461–484
Article MathSciNet Google Scholar
Feinberg E A. Constrained discounted Markov decision processes and Hamiltonian cycles. Math Oper Res, 2000, 25: 130–140
Article MathSciNet Google Scholar
Feinberg E A, Shwartz A. Constrained dynamic programming with two discount factors: Applications and an algorithm. IEEE Trans Automat Control, 1999, 44: 628–631
Article MathSciNet Google Scholar
González-Hernández J, Hernández-Lerma O. Extreme points of sets of randomized strategies in constrained optimiza-tion and control problems. SIAM J Optim, 2005, 15: 1085–1104
Article MathSciNet Google Scholar
González-Hernández J, López-Martńez R R, Minjárez-Sosa J A. Approximation, estimation, and control of stochastic systems under a randomized discounted cost criterion. Kybernetika (Prague), 2009, 45: 737–754
MathSciNet MATH Google Scholar
Guo X P, Hernández-Lerma O. Continuous-Time Markov Decision Processes. Berlin-Heidelberg: Springer-Verlag, 2009
Book Google Scholar
Guo X P, Piunovskiy A. Discounted continuous-time Markov decision processes with constraints: Unbounded transition and loss rates. Math Oper Res, 2011, 36: 105–132
Article MathSciNet Google Scholar
Guo X P, Zhang W Z. Convergence of controlled models and finite-state approximation for discounted continuous-time Markov decision processes with constraints. European J Oper Res, 2014, 238: 486–496
Article MathSciNet Google Scholar
Hernández-Lerma O, González-Hernández J. Constrained Markov decision processes in Borel spaces: The discounted case. Math Methods Oper Res, 2000, 52: 271–285
Article MathSciNet Google Scholar
Hernández-Lerma O, González-Hernández J, López-Martńez R R. Constrained average cost Markov control processes in Borel spaces. SIAM J Control Optim, 2003, 42: 442–468
Article MathSciNet Google Scholar
Hernández-Lerma O, Lasserre J B. Discrete-Time Markov Control Processes. New York: Springer-Verlag, 1996
Book Google Scholar
Hernández-Lerma O, Lasserre J B. Further Topics on Discrete-Time Markov Control Processes. New York: Springer-Verlag, 1999
Book Google Scholar
Hinderer K, Waldmann K-H. Algorithms for countable state Markov decision models with an absorbing set. SIAM J Control Optim, 2005, 43: 2109–2131
Article MathSciNet Google Scholar
Huang X X, Zou X L, Guo X P. A minimization problem of the risk probability in first passage semi-Markov decision processes with loss rates. Sci China Math, 2015, 58: 1923–1938
Article MathSciNet Google Scholar
Huang Y H, Wei Q D, Guo X P. Constrained Markov decision processes with first passage criteria. Ann Oper Res, 2013, 206: 197–219
Article MathSciNet Google Scholar
Mao X, Piunovskiy A. Strategic measures in optimal control problems for stochastic sequences. Stoch Anal Appl, 2000, 18: 755–776
Article MathSciNet Google Scholar
Piunovskiy A. Optimal Control of Random Sequences in Problems with Constraints. Dordrecht: Kluwer Academic, 1997
Book Google Scholar
Piunovskiy A. Controlled random sequences: The convex analytic approach and constrained problems. Russian Math Surveys, 2000, 53: 1233–1293
Article Google Scholar
Prieto-Rumeau T, Hernández-Lerma O. Discounted continuous-time controlled Markov chains: Convergence of control models. J Appl Probab, 2012, 49: 1072–1090
Article MathSciNet Google Scholar
Prokhorov Y. Convergence of random processes and limit theorems in probability theory. Theory Probab Appl, 1956, 1: 157–214
Article MathSciNet Google Scholar
Puterman, M L. Markov Decision Processes. New York: Wiley, 1994
Book Google Scholar
Saldi N, Linder T, Yuksel S. Asymptotic optimality and rates of convergence of quantized stationary policies in stochastic control. IEEE Trans Automat Control, 2015, 60: 553–558
Article MathSciNet Google Scholar
Sennott L I. Constrained discounted Markov decision chains. Probab Engrg Inform Sci, 1991, 5: 463–475
Article MathSciNet Google Scholar
Sennott L I. Stochastic Dynamic Programming and the Control of Queueing Systems. New York: Wiley, 1999
Google Scholar
Wei Q D, Guo X P. Markov decision processes with state-dependent discount factors and unbounded rewards/costs. Oper Res Lett, 2011, 39: 369–374
MathSciNet MATH Google Scholar
Wu X, Guo X P. First passage optimality and variance minimization of Markov decision processes with varying discount factors. J Appl Probab, 2015, 52: 441–456
Article MathSciNet Google Scholar
Wu X, Zhang J Y. Finite approximation of the first passage models for discrete-time Markov decision processes with varying discount factors. Discrete Event Dyn Syst, 2016, 26: 669–683
Article MathSciNet Google Scholar
Wu X, Zou X L, Guo X P. First passage Markov decision processes with constraints and varying discount factors. Front Math China, 2015, 10: 1005–1023
Article MathSciNet Google Scholar
Zhang W Z, Guo X P. Nonzero-sum games for continuous-time Markov chains with unbounded transition and average payoff rates. Sci China Math, 2012, 55: 2405–2416
Article MathSciNet Google Scholar
Zhang Y. Convex analytic approach to constrained discounted Markov decision processes with non-constant discount factors. TOP, 2013, 21: 378–408
Article MathSciNet Google Scholar

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China (Grant Nos. 61374067 and 41271076).

Author information

Authors and Affiliations

School of Mathematics and Statistics, Zhaoqing University, Zhaoqing, 526061, China
Xiao Wu
School of Mathematics, Sun Yat-sen University, Guangzhou, 510275, China
Xianping Guo

Authors

Xiao Wu
View author publications
You can also search for this author in PubMed Google Scholar
Xianping Guo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xianping Guo.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, X., Guo, X. Convergence of Markov decision processes with constraints and state-action dependent discount factors. Sci. China Math. 63, 167–182 (2020). https://doi.org/10.1007/s11425-017-9292-1

Download citation

Received: 02 January 2017
Accepted: 27 March 2018
Published: 15 February 2019
Issue Date: January 2020
DOI: https://doi.org/10.1007/s11425-017-9292-1

Keywords

MSC(2010)

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Convergence of Markov decision processes with constraints and state-action dependent discount factors

Abstract

Access this article

Similar content being viewed by others

Approximate Optimal Cost and Policies of First Passage Markov Decision Processes with Countable-State Space and Discount Factors

Finite approximation of the first passage models for discrete-time Markov decision processes with varying discount factors

Convergence of Value Functions for Finite Horizon Markov Decision Processes with Constraints

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

MSC(2010)

Navigation

Convergence of Markov decision processes with constraints and state-action dependent discount factors

Abstract

Access this article

Similar content being viewed by others

Approximate Optimal Cost and Policies of First Passage Markov Decision Processes with Countable-State Space and Discount Factors

Finite approximation of the first passage models for discrete-time Markov decision processes with varying discount factors

Convergence of Value Functions for Finite Horizon Markov Decision Processes with Constraints

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

MSC(2010)

Search

Navigation