Skip to main content
Log in

On the value function in constrained control of Markov chains

  • Published:
Mathematical Methods of Operations Research Aims and scope Submit manuscript

Abstract

It is known that the value function in an unconstrained Markov decision process with finitely many states and actions is a piecewise rational function in the discount factor a, and that the value function can be expressed as a Laurent series expansion about α = 1 for α close enough to 1. We show in this paper that this property also holds for the value function of Markov decision processes with additional constraints. More precisely, we show by a constructive proof that there are numbers O = αo1 <... < αm−1 < αm = 1 such that for everyj = 1, 2, ...,m − 1 either the problem is not feasible for all discount factors α in the open interval (αj−1, αj) or the value function is a rational function in a in the closed interval [αj−1, αj]. As a consequence, if the constrained problem is feasible in the neighborhood of α = 1, then the value function has a Laurent series expansion about α = 1. Our proof technique for the constrained case provides also a new proof for the unconstrained case.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Altaian E (1994) Denumerable constrained Markov decision problems and finite approximations. Mathematics of Operations Research 19:169–191

    Google Scholar 

  2. Altman E (1995) Constrained Markov decision processes. INRIA Report RR-2574

  3. Altman E, Gaitsgory VA (1993) Stability and singular perturbations in constrained Markov decision problems. IEEE Transactions on Automatic Control 38:971–975

    Google Scholar 

  4. Altman E, Shwartz A (1989) Optimal priority assignment: A time sharing approach. IEEE Transactions on Automatic Control 34:1089–1102

    Google Scholar 

  5. Altman E, Shwartz A (1991) Sensitivity of constrained Markov decision problems. Annals of Operations Research 32:1–22

    Google Scholar 

  6. Altman E, Shwartz A (1993) Time-sharing policies for controlled Markov chains. Operations Research 41:1116–1124

    Google Scholar 

  7. Bellman R (1957) Dynamic programming. Princeton University Press

  8. Beutler FJ, Ross KW (1985) Optimal policies for controlled Markov chains with a constraint. Journal of Mathematical Analysis and Applications 112:236–252

    Google Scholar 

  9. Beutler FJ, Ross KW (1986) Time-average optimal constrained semi-Markov decision processes. Advances of Applied Probability 18:341–359

    Google Scholar 

  10. Bewley T, Kohlberg E (1976) The asymptotic theory of stochastic games. Mathematics of Operations Research 1:197–208

    Google Scholar 

  11. Blackwell D (1962) Discrete dynamic programming. Annals of Mathematical Statistics 33:719–726

    Google Scholar 

  12. Denardo EV (1967) Contraction mappings in the theory underlying dynamic programming. SIAM Review 9:165–177

    Google Scholar 

  13. D'Epenoux F (1960) Sur un probleme de production et de stockage dans l'aleatoire. Revue Francaise de recherche Operationelle 14:3–16

    Google Scholar 

  14. Derman C (1970) Finite state Markovian decision processes. Academic Press, New York

    Google Scholar 

  15. Feinberg EA (1994) Constrained semi-Markov decision processes with average rewards, ZOR — Mathematical Methods of Operations Research 39:257–288

    Google Scholar 

  16. Hordijk A, Dekker R, Kallenberg LCM (1985) Sensitivity-analysis in discounted Markovian decision problems. OR Spektrum 7:143–151

    Google Scholar 

  17. Hordijk A, Kallenberg LCM (1984) Transient policies in discrete dynamic programming: Linear programming including suboptimality tests and additional constraints. Mathematical Programming 30:46–70

    Google Scholar 

  18. Hordijk A, Kallenberg LCM (1984) Constrained undiscounted stochastic dynamic programming. Mathematics of Operations Research 9:276–289

    Google Scholar 

  19. Hordijk A, Spieksma F (1989) Constrained admission control to a queueing system. Advances of Applied Probability 21:409–431

    Google Scholar 

  20. Howard RA (1960) Dynamic programming and Markov processes. M.I.T. Press, Cambridge, Massachusetts

    Google Scholar 

  21. Kallenberg LCM (1983) Linear programming and finite Markovian control problems. Mathematical Centre Tracts 148, Mathematical Centre, Amsterdam

    Google Scholar 

  22. Miller B, Veinott AF Jr. (1969) Discrete dynamic programming with a small interest rate. Annals of Mathematical Statistics 40:366–370

    Google Scholar 

  23. Nain P, Ross KW (1986) Optimal priority assignment with hard constraint. IEEE Transactions on Automatic Control 31:883–888

    Google Scholar 

  24. Ross KW, Varadarajan R (1989) Markov decision processes with sample path constraints: The communicating case. Operations Research 37:780–790

    Google Scholar 

  25. Ross KW, Varadarajan R (1991) Multichain Markov decision processes with a sample path constraint: A decomposition approach. Mathematics of Operations Research 16:195–207

    Google Scholar 

  26. Royden HL (1968) Real analysis, second edition. MacMillan

  27. Sennott LI (1991) Constrained discounted Markov decision chains. Probability in the Engineering and Informational Sciences 5:463–475

    Google Scholar 

  28. Sennott LI (1993) Constrained average cost Markov decision chains. Probability in the Engineering and Informational Sciences 7:69–84

    Google Scholar 

  29. Shapley LS (1953) Stochastic games. Proceedings of the National Academy of Sciences 39:1095–1100

    Google Scholar 

  30. Smallwood RD (1966) Optimum policy regions for Markov processes with discounting. Operations Research 14:658–669

    Google Scholar 

  31. Tidbal M, Altman E Continuity of optimal values and solutions of convex optimization, and constrained control of Markov chains, submitted to SIAM

  32. Veinott AF Jr. (1966) On finding optimal policies in discrete dynamic programming. Annals of Mathematical Statistics 37:1284–1294

    Google Scholar 

  33. Zoutendijk G (1976) Mathematical programming methods. North-Holland, Amsterdam

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Altman, E., Hordijk, A. & Kallenberg, L.C.M. On the value function in constrained control of Markov chains. Mathematical Methods of Operations Research 44, 387–399 (1996). https://doi.org/10.1007/BF01193938

Download citation

  • Received:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF01193938

Key words

Navigation