Abstract
For sequential decision processes with countable state spaces, we prove compactness of the set of strategic measures corresponding to nonrandomized policies. For the Borel state case, this set may not be compact (Piunovskiy, Optimal control of random sequences in problems with constraints. Kluwer, Boston, p. 170, 1997) in spite of compactness of the set of strategic measures corresponding to all policies (Schäl, On dynamic programming: compactness of the space of policies. Stoch Processes Appl 3(4):345–364, 1975b; Balder, On compactness of the space of policies in stochastic dynamic programming. Stoch Processes Appl 32(1):141–150, 1989). We use the compactness result from this paper to show the existence of optimal policies for countable-state constrained optimization of expected discounted and nonpositive rewards, when the optimality is considered within the class of nonrandomized policies. This paper also studies the convergence of a value-iteration algorithm for such constrained problems.
Similar content being viewed by others
References
Altman E (1999) Constrained Markov decision processes. Chapman & Hall/CRC, Boca Raton, USA
Balder EJ (1989) On compactness of the space of policies in stochastic dynamic programming. Stoch Process Appl 32(1): 141–150
Billingsley P (1986) Convergence of probability measures. Wiley, New York
Blackwell D (1967) Positive dynamic programming. In: Proceedings of the 5th Berkeley symposium on mathematical statistics and probability, University of Californua Press, Berkeley, vol 1, pp 415–418
Borkar VS (1991) Topics in controlled Markov chains. Longman Scientific & Technical, Harlow
Chen RC, Blankenship GL (2004) Dynamic programming equations for discounted constrained stochastic control. IEEE Trans Automat Contr 49(5): 699–709
Chen RC, Feinberg EA (2007) Non-randomized policies for constrained Markov decision processes. Math Meth Oper Res 66(1): 165–179
Dynkin EB, Yushkevich AA (1979) Controlled Markov processes and their applications. Springer, New York
Feinberg EA (2000) Constrained discounted Markov decision processes and Hamiltonian cycles. Math Oper Res 25(1): 130–140
Feinberg EA, Shwartz A (1996) Constrained discounted dynamic programming. Math Oper Res 21(4): 922–945
Hernandez-Lerma O, Lasserre JB (1996) Discrete-time Markov control processes. Springer, New York
Kechris AS (1995) Classical descriptive set theory. Springer, New York
Nikaido H (1968) Convex structures and economic theory. Academic Press, New York
Nowak AS (1988) On the weak topology in a space of probability measures induced by policies. Bull Polish Acad Sci Math 36(3–4): 181–186
Piunovskiy AB (1997) Optimal control of random sequences in problems with constraints. Kluwer, Boston
Royden HL (1968) Real analysis. MacMillan Publishing Co, New York
Schäl M (1975a) Conditions for optimality in dynamic programming and for the limit of n-stage optimal policies to be optimal. Z Wahrsch verw Gebiete 32(3): 179–196
Schäl M (1975b) On dynamic programming: compactness of the space of policies. Stoch Process Appl 3(4): 345–364
Schäl M (1979) On dynamic programming and statistical decision theory. Ann Stat 7(2): 432–445
Strauch R (1966) Negative dynamic programming. Ann Math Stat 37(4): 871–890
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Chen, R.C., Feinberg, E.A. Compactness of the space of non-randomized policies in countable-state sequential decision processes. Math Meth Oper Res 71, 307–323 (2010). https://doi.org/10.1007/s00186-009-0298-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00186-009-0298-1