Constrained Markov decision processes in Borel spaces: from discounted to average optimality

Mendoza-Pérez, Armando F.; Jasso-Fuentes, Héctor; De-la-Cruz Courtois, Omar A.

doi:10.1007/s00186-016-0551-3

Constrained Markov decision processes in Borel spaces: from discounted to average optimality

Original Article
Published: 20 June 2016

Volume 84, pages 489–525, (2016)
Cite this article

Mathematical Methods of Operations Research Aims and scope Submit manuscript

Armando F. Mendoza-Pérez¹,
Héctor Jasso-Fuentes² &
Omar A. De-la-Cruz Courtois¹

380 Accesses
4 Citations
Explore all metrics

Abstract

In this paper we study discrete-time Markov decision processes in Borel spaces with a finite number of constraints and with unbounded rewards and costs. Our aim is to provide a simple method to compute constrained optimal control policies when the payoff functions and the constraints are of either: infinite-horizon discounted type and average (a.k.a. ergodic) type. To deduce optimality results for the discounted case, we use the Lagrange multipliers method that rewrites the original problem (with constraints) into a parametric family of discounted unconstrained problems. Based on the dynamic programming technique as long with a simple use of elementary differential calculus, we obtain both suitable Lagrange multipliers and a family of control policies associated to these multipliers, this last family becomes optimal for the original problem with constraints. We next apply the vanishing discount factor method in order to obtain, in a straightforward way, optimal control policies associated to the average problem with constraints. Finally, to illustrate our results, we provide a simple application to linear–quadratic systems (LQ-systems).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Existence and Uniqueness of Quasi-stationary Distributions for Symmetric Markov Processes with Tightness Property

Article 17 January 2019

Masayoshi Takeda

Stochastic Differential Equations with Singular Coefficients: The Martingale Problem View and the Stochastic Dynamics View

Article Open access 06 April 2024

Elena Issoglio & Francesco Russo

Optimal Control and Pontryagin’s Maximum Principle

References

Altman E (1999) Constrained Markov decision processes. Chapman & Hall/CRC, Boca Raton, FL
MATH Google Scholar
Bäuerle N, Rieder U (2011) Markov decision processes with applications to finance. Springer, Berlin
Book MATH Google Scholar
Beutler FJ, Ross KW (1985) Optimal policies for controlled Markov chains with a constraint. J Math Anal Appl 112:236–252
Article MathSciNet MATH Google Scholar
Borkar VS (1994) Ergodic control of Markov chains with constraints—the general case. SIAM J Control Optim 32:176–186
Article MathSciNet MATH Google Scholar
Chang HS (2006) A policy improvement method in constrained stochastic dynamic programming. IEEE Trans Automat Contr 51(9):1523–1526
Article MathSciNet Google Scholar
Chen RC, Blankenship GL (2004) Dynamic programming equations for discounted constrained stochastic control. IEEE Trans Automat Contr 49:699–709
Article MathSciNet Google Scholar
Chen RC, Feinberg EA (2007) Non randomized policies for constrained Markov decision processes. Math Methods Oper Res 66(1):165–179
Article MathSciNet MATH Google Scholar
Costa OLV, Dufour F (2012) Average control of Markov decision processes with Feller transition probabilities and general action spaces. J Math Anal Appl 396:58–69
Article MathSciNet MATH Google Scholar
Ding Y, Jia R, Tang S (2003) Dynamical principal agent model based on CMCP. Math Methods Oper Res 58:149–157
Article MathSciNet MATH Google Scholar
Djonin DV, Krishnamurthy V (2007) MIMO transmission control in fading channels—a constrained Markov decision process formulation with monotone randomized policies. IEEE Trans Signal Process 55:5069–5083
Article MathSciNet Google Scholar
Dutta PK (1991) What do discounted optima converge to? A theory of discount rate asymptotic in economic models. J Econ Theory 55:64–94
Article MATH Google Scholar
Feinberg EA, Kasyanov PO, Zadoianchuk NV (2012) Average cost Markov decision processes with weakly continuous transition probabilities. Math Oper Res 37(4):591–607
Article MathSciNet MATH Google Scholar
Feinberg E, Schwartz A (1996) Constrained discounted dynamic programming. Math Oper Res 21:922–945
Article MathSciNet MATH Google Scholar
González-Hernández J, Hernández-Lerma O (2005) Extreme points of sets of randomized strategies in constrained optimization and control problems. SIAM J Optim 15:1085–1104
Article MathSciNet MATH Google Scholar
Guo XP, Quanxin Z (2006) Average optimality for Markov decision processes in Borel spaces: a new condition and approach. J Appl Probab 43:318–334
MathSciNet MATH Google Scholar
Haviv M (1996) On constrained Markov decision processes. Oper Res Lett 19:25–28
Article MathSciNet MATH Google Scholar
Hernández-Lerma O, González-Hernández J (2000) Constrained Markov control processes in Borel spaces: the discounted case. Math Methods Oper Res 52:271–285
Article MathSciNet MATH Google Scholar
Hernández-Lerma O, González-Hernández J, López-Martínez RR (2003) Constrained average cost Markov control processes in Borel spaces. SIAM J Control Optim 42:442–468
Article MathSciNet MATH Google Scholar
Hernández-Lerma O, Lasserre JB (1996) Discrete-time Markov control processes: basic optimality criteria. Springer, New York
Book MATH Google Scholar
Hernández-Lerma O, Lasserre JB (1999) Further topics on discrete-time Markov control processes. Springer, New York
Book MATH Google Scholar
Hilgert N, Hernández-Lerma O (2003) Bias optimality versus strong 0-discount optimality in Markov control processes with unbounded costs. Acta Appl Math 77:215–235
Article MathSciNet MATH Google Scholar
Jasso-Fuentes H, Escobedo-Trujillo BA, Mendoza-Pérez AF (2016) The Lagrange and the vanishing discount techniques to controlled diffusions with cost constraints. J Math Anal Appl 437:999–1035
Article MathSciNet MATH Google Scholar
Korf LA (2006) Approximating infinite horizon stochastic optimal control in discrete time with constraints. Ann Oper Res 142:165–186
Article MathSciNet MATH Google Scholar
Krishnamurthy V, Vázquez Abad F, Martin K (2003) Implementation of gradient estimation to a constrained Markov decision problem. In: 42nd IEEE conference on decision and control, 2003, pp 4841–4846
Lyer K, Hamachandra N (2010) Sensitivity analysis and optimal ultimately stationary deterministic policies in some constrained discounted cost models. Math Methods Oper Res 71(3):404–425
MathSciNet Google Scholar
Mendoza-Pérez AF, Hernández-Lerma O (2010) Markov control processes with pathwise constraints. Math Methods Oper Res 71:477–502
Article MathSciNet MATH Google Scholar
Mendoza-Pérez AF, Hernández-Lerma O (2012) Deterministic optimal policies for Markov control processes with pathwise constraints. Appl Math 39(2):185–209
MathSciNet MATH Google Scholar
Mendoza-Pérez AF, Jasso-Fuentes H, Hernández-Lerma O (2015) The Lagrange approach to ergodic control of diffusions with cost constraints. Optimization 64:176–196
Article MathSciNet MATH Google Scholar
Nishimura K, Stachurski J (2007) Stochastic optimal policies when the discount rate vanishes. J Econ Dyn Control 31:1416–1430
Article MathSciNet MATH Google Scholar
Prieto-Rumeau T, Hernández-Lerma O (2008) Ergodic control of continuous-time Markov chains with pathwise constraints. SIAM J Control Optim 47:1888–1908
Article MathSciNet MATH Google Scholar
Prieto-Rumeau T, Hernández-Lerma O (2010) The vanishing discount approach to constrained continuous-time controlled Markov chains. Syst Control Lett 59:504–509
Article MathSciNet MATH Google Scholar
Piunovskiy AB (1997) Optimal control of random sequences in problems with constraints. Kluwer, Boston
Book MATH Google Scholar
Ross KW, Varadarajan R (1989) Markov decision processes with sample path constraints. Oper Res 37:780–790
Article MathSciNet MATH Google Scholar
Ross KW, Varadarajan R (1991) Multichain Markov decision processes with a sample path constraint. Math Oper Res 16:195–207
Article MathSciNet MATH Google Scholar
Vega-Amaya O (2015) On the vanishing discount factor approach for Markov decision processes with weakly continuous transition probabilities. J Math Anal Appl 426:978–985
Article MathSciNet MATH Google Scholar
Zadorojniy A, Schwartz A (2006) Robustness of policies in constrained Markov decision processes. IEEE Trans. Automat. Control 51:635–638
Article MathSciNet Google Scholar

Download references

Acknowledgments

The authors wish to thank the editors and the two anonymous referees who have patiently gone through this paper and whose suggestions have improved its presentation and readability.

Author information

Authors and Affiliations

Facultad de Ciencias en Física y Matemáticas, UNACH, Carretera Emiliano Zapata km 8.5, Rancho San Francisco, Ciudad Universitaria, Tuxtla Gutiérrez, Chiapas, C.P. 29050, Mexico
Armando F. Mendoza-Pérez & Omar A. De-la-Cruz Courtois
Departamento de Matemáticas, CINVESTAV–IPN, A. Postal 14-740, Mexico, D.F., 07000, Mexico
Héctor Jasso-Fuentes

Authors

Armando F. Mendoza-Pérez
View author publications
You can also search for this author in PubMed Google Scholar
Héctor Jasso-Fuentes
View author publications
You can also search for this author in PubMed Google Scholar
Omar A. De-la-Cruz Courtois
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Héctor Jasso-Fuentes.

Additional information

This research was supported in part by CONACyT Grant No. 238045.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mendoza-Pérez, A.F., Jasso-Fuentes, H. & De-la-Cruz Courtois, O.A. Constrained Markov decision processes in Borel spaces: from discounted to average optimality. Math Meth Oper Res 84, 489–525 (2016). https://doi.org/10.1007/s00186-016-0551-3

Download citation

Received: 14 December 2015
Accepted: 08 June 2016
Published: 20 June 2016
Issue Date: December 2016
DOI: https://doi.org/10.1007/s00186-016-0551-3

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Constrained Markov decision processes in Borel spaces: from discounted to average optimality

Abstract

Access this article

Similar content being viewed by others

Existence and Uniqueness of Quasi-stationary Distributions for Symmetric Markov Processes with Tightness Property

Stochastic Differential Equations with Singular Coefficients: The Martingale Problem View and the Stochastic Dynamics View

Optimal Control and Pontryagin’s Maximum Principle

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Constrained Markov decision processes in Borel spaces: from discounted to average optimality

Abstract

Access this article

Similar content being viewed by others

Existence and Uniqueness of Quasi-stationary Distributions for Symmetric Markov Processes with Tightness Property

Stochastic Differential Equations with Singular Coefficients: The Martingale Problem View and the Stochastic Dynamics View

Optimal Control and Pontryagin’s Maximum Principle

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation