Skip to main content
Log in

Constrained Markov decision processes in Borel spaces: from discounted to average optimality

  • Original Article
  • Published:
Mathematical Methods of Operations Research Aims and scope Submit manuscript

Abstract

In this paper we study discrete-time Markov decision processes in Borel spaces with a finite number of constraints and with unbounded rewards and costs. Our aim is to provide a simple method to compute constrained optimal control policies when the payoff functions and the constraints are of either: infinite-horizon discounted type and average (a.k.a. ergodic) type. To deduce optimality results for the discounted case, we use the Lagrange multipliers method that rewrites the original problem (with constraints) into a parametric family of discounted unconstrained problems. Based on the dynamic programming technique as long with a simple use of elementary differential calculus, we obtain both suitable Lagrange multipliers and a family of control policies associated to these multipliers, this last family becomes optimal for the original problem with constraints. We next apply the vanishing discount factor method in order to obtain, in a straightforward way, optimal control policies associated to the average problem with constraints. Finally, to illustrate our results, we provide a simple application to linear–quadratic systems (LQ-systems).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Altman E (1999) Constrained Markov decision processes. Chapman & Hall/CRC, Boca Raton, FL

    MATH  Google Scholar 

  • Bäuerle N, Rieder U (2011) Markov decision processes with applications to finance. Springer, Berlin

    Book  MATH  Google Scholar 

  • Beutler FJ, Ross KW (1985) Optimal policies for controlled Markov chains with a constraint. J Math Anal Appl 112:236–252

    Article  MathSciNet  MATH  Google Scholar 

  • Borkar VS (1994) Ergodic control of Markov chains with constraints—the general case. SIAM J Control Optim 32:176–186

    Article  MathSciNet  MATH  Google Scholar 

  • Chang HS (2006) A policy improvement method in constrained stochastic dynamic programming. IEEE Trans Automat Contr 51(9):1523–1526

    Article  MathSciNet  Google Scholar 

  • Chen RC, Blankenship GL (2004) Dynamic programming equations for discounted constrained stochastic control. IEEE Trans Automat Contr 49:699–709

    Article  MathSciNet  Google Scholar 

  • Chen RC, Feinberg EA (2007) Non randomized policies for constrained Markov decision processes. Math Methods Oper Res 66(1):165–179

    Article  MathSciNet  MATH  Google Scholar 

  • Costa OLV, Dufour F (2012) Average control of Markov decision processes with Feller transition probabilities and general action spaces. J Math Anal Appl 396:58–69

    Article  MathSciNet  MATH  Google Scholar 

  • Ding Y, Jia R, Tang S (2003) Dynamical principal agent model based on CMCP. Math Methods Oper Res 58:149–157

    Article  MathSciNet  MATH  Google Scholar 

  • Djonin DV, Krishnamurthy V (2007) MIMO transmission control in fading channels—a constrained Markov decision process formulation with monotone randomized policies. IEEE Trans Signal Process 55:5069–5083

    Article  MathSciNet  Google Scholar 

  • Dutta PK (1991) What do discounted optima converge to? A theory of discount rate asymptotic in economic models. J Econ Theory 55:64–94

    Article  MATH  Google Scholar 

  • Feinberg EA, Kasyanov PO, Zadoianchuk NV (2012) Average cost Markov decision processes with weakly continuous transition probabilities. Math Oper Res 37(4):591–607

    Article  MathSciNet  MATH  Google Scholar 

  • Feinberg E, Schwartz A (1996) Constrained discounted dynamic programming. Math Oper Res 21:922–945

    Article  MathSciNet  MATH  Google Scholar 

  • González-Hernández J, Hernández-Lerma O (2005) Extreme points of sets of randomized strategies in constrained optimization and control problems. SIAM J Optim 15:1085–1104

    Article  MathSciNet  MATH  Google Scholar 

  • Guo XP, Quanxin Z (2006) Average optimality for Markov decision processes in Borel spaces: a new condition and approach. J Appl Probab 43:318–334

    MathSciNet  MATH  Google Scholar 

  • Haviv M (1996) On constrained Markov decision processes. Oper Res Lett 19:25–28

    Article  MathSciNet  MATH  Google Scholar 

  • Hernández-Lerma O, González-Hernández J (2000) Constrained Markov control processes in Borel spaces: the discounted case. Math Methods Oper Res 52:271–285

    Article  MathSciNet  MATH  Google Scholar 

  • Hernández-Lerma O, González-Hernández J, López-Martínez RR (2003) Constrained average cost Markov control processes in Borel spaces. SIAM J Control Optim 42:442–468

    Article  MathSciNet  MATH  Google Scholar 

  • Hernández-Lerma O, Lasserre JB (1996) Discrete-time Markov control processes: basic optimality criteria. Springer, New York

    Book  MATH  Google Scholar 

  • Hernández-Lerma O, Lasserre JB (1999) Further topics on discrete-time Markov control processes. Springer, New York

    Book  MATH  Google Scholar 

  • Hilgert N, Hernández-Lerma O (2003) Bias optimality versus strong 0-discount optimality in Markov control processes with unbounded costs. Acta Appl Math 77:215–235

    Article  MathSciNet  MATH  Google Scholar 

  • Jasso-Fuentes H, Escobedo-Trujillo BA, Mendoza-Pérez AF (2016) The Lagrange and the vanishing discount techniques to controlled diffusions with cost constraints. J Math Anal Appl 437:999–1035

    Article  MathSciNet  MATH  Google Scholar 

  • Korf LA (2006) Approximating infinite horizon stochastic optimal control in discrete time with constraints. Ann Oper Res 142:165–186

    Article  MathSciNet  MATH  Google Scholar 

  • Krishnamurthy V, Vázquez Abad F, Martin K (2003) Implementation of gradient estimation to a constrained Markov decision problem. In: 42nd IEEE conference on decision and control, 2003, pp 4841–4846

  • Lyer K, Hamachandra N (2010) Sensitivity analysis and optimal ultimately stationary deterministic policies in some constrained discounted cost models. Math Methods Oper Res 71(3):404–425

    MathSciNet  Google Scholar 

  • Mendoza-Pérez AF, Hernández-Lerma O (2010) Markov control processes with pathwise constraints. Math Methods Oper Res 71:477–502

    Article  MathSciNet  MATH  Google Scholar 

  • Mendoza-Pérez AF, Hernández-Lerma O (2012) Deterministic optimal policies for Markov control processes with pathwise constraints. Appl Math 39(2):185–209

    MathSciNet  MATH  Google Scholar 

  • Mendoza-Pérez AF, Jasso-Fuentes H, Hernández-Lerma O (2015) The Lagrange approach to ergodic control of diffusions with cost constraints. Optimization 64:176–196

    Article  MathSciNet  MATH  Google Scholar 

  • Nishimura K, Stachurski J (2007) Stochastic optimal policies when the discount rate vanishes. J Econ Dyn Control 31:1416–1430

    Article  MathSciNet  MATH  Google Scholar 

  • Prieto-Rumeau T, Hernández-Lerma O (2008) Ergodic control of continuous-time Markov chains with pathwise constraints. SIAM J Control Optim 47:1888–1908

    Article  MathSciNet  MATH  Google Scholar 

  • Prieto-Rumeau T, Hernández-Lerma O (2010) The vanishing discount approach to constrained continuous-time controlled Markov chains. Syst Control Lett 59:504–509

    Article  MathSciNet  MATH  Google Scholar 

  • Piunovskiy AB (1997) Optimal control of random sequences in problems with constraints. Kluwer, Boston

    Book  MATH  Google Scholar 

  • Ross KW, Varadarajan R (1989) Markov decision processes with sample path constraints. Oper Res 37:780–790

    Article  MathSciNet  MATH  Google Scholar 

  • Ross KW, Varadarajan R (1991) Multichain Markov decision processes with a sample path constraint. Math Oper Res 16:195–207

    Article  MathSciNet  MATH  Google Scholar 

  • Vega-Amaya O (2015) On the vanishing discount factor approach for Markov decision processes with weakly continuous transition probabilities. J Math Anal Appl 426:978–985

    Article  MathSciNet  MATH  Google Scholar 

  • Zadorojniy A, Schwartz A (2006) Robustness of policies in constrained Markov decision processes. IEEE Trans. Automat. Control 51:635–638

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgments

The authors wish to thank the editors and the two anonymous referees who have patiently gone through this paper and whose suggestions have improved its presentation and readability.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Héctor Jasso-Fuentes.

Additional information

This research was supported in part by CONACyT Grant No. 238045.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mendoza-Pérez, A.F., Jasso-Fuentes, H. & De-la-Cruz Courtois, O.A. Constrained Markov decision processes in Borel spaces: from discounted to average optimality. Math Meth Oper Res 84, 489–525 (2016). https://doi.org/10.1007/s00186-016-0551-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00186-016-0551-3

Keywords

Mathematics Subject Classification

Navigation