Blackwell optimal policies in a Markov decision process with a Borel state space
- 110 Downloads
- 5 Citations
Abstract
After an introduction into sensitive criteria in Markov decision processes and a discussion of definitions, we prove the existence of stationary Blackwell optimal policies under following main assumptions: (i) the state space is a Borel one; (ii) the action space is countable, the action sets are finite; (iii) the transition function is given by a transition density; (iv) a simultaneous Doeblin-type recurrence condition holds. The proof is based on an aggregation of randomized stationary policies into measures. Topology in the space of those measures is at the same time a weak and a strong one, and this fact yields compactness of the space and continuity of Laurent coefficients of the expected discounted reward. Another important tool is a lexicographical policy improvement. The exposition is mostly self-contained.
Key words
Discrete-time Markov decision process Borel state space transition densities simultaneous Doeblin condition Blackwell optimalityPreview
Unable to display preview. Download preview PDF.
References
- Aropostatis A, Borkar VS, Fernández-Gaucherand E, Ghosh MK, Markus SL (1993) Discrete-time controlled Markov processes with average cost criterion: a survey. SIAM J Control Optim 31: 282–344Google Scholar
- Blackwell D (1962) Discrete dynamic programming. Ann Math Stat 33: 719–726Google Scholar
- Cavazos-Cadena R, Lasserre JB (1988) Strong 1-optimal stationary policies in denumerable Markov decision processes. System Control Letters 11: 65–71Google Scholar
- Cavazos-Cadena R, Lasserre JB (1993) A direct approach to Blackwell optimality. PreprintGoogle Scholar
- Chitashvili RJ (1975) A controlled finite Markov chain with an arbitrary set of decisions. Theory Prob 20: 839–847Google Scholar
- Chitashivili RJ (1976) A finite controlled Markov chain with small termination probabil ty. Theory Prob 21: 158–163Google Scholar
- Dekker R, Hordijk A (1988) Average, sensitive and Blackwell optimal policies in denumerable Markov decision chains with unbounded rewards. Math Oper Res 13: 395–420Google Scholar
- Denardo EV (1971) Markov renewal programs with small interest rates. Ann Math Stat 42:477–496Google Scholar
- Denardo EV, Miller BL (1968) An optimality condition for discrete dynamic programming with no discounting. Ann Math Stat 39: 1220–1227Google Scholar
- Doob JL (1953) Stochastic Processes. Wiley, New YorkGoogle Scholar
- Dynkin EB, Yushkevich AA (1979) Controlled Markov processes. Grundlehren der mathematischen Wissenschaften 235. Springer, Berlin Heidelberg New YorkGoogle Scholar
- Hille E (1948) Functional Analysis and Semigroups. American Mathematical Society Colloqium Publications 31, New YorkGoogle Scholar
- Hordijk A, Sladký K (1977) Sensitive optimality criteria in countable state dynamic programming. Math Oper Res 2: 1–14Google Scholar
- Lasserre JB (1988) Conditions for existence of average and Blackwell optimal stationary policies in denumerable Markov decision processes. J Math Anal Appl 136: 479–489Google Scholar
- Lippman SA (1969) Criterion equivalence in discrete dynamic programming. Oper Res 17:920–923Google Scholar
- Miller BL, Veinott AF Jr (1969) Discrete dynamic programming with a small interest rate. Ann Math Stat 40: 366–370Google Scholar
- Puterman M (1974) Sensitive discount optimality in controlled one-dimensional diffusions. Ann Prob 2: 408–419Google Scholar
- Sladký K (1974) On the set of optimal controls for Markov chains with rewards. Kybernetika 10: 350–367Google Scholar
- Sladký K (1978) Sensitive optimality criteria for continuous time Markov decision processes. Trans 8th Prague Conference Inform Theory B: 211–225Google Scholar
- Strauch RE (1966) Negative dynamic programming. Ann Math Stat 37: 871–890Google Scholar
- Veinott AF Jr (1966) On finding optimal policies in discrete dynamic programming with no discounting. Ann Math Stat 37: 1284–1294Google Scholar
- Veinott AF Jr (1969) Discrete dynamic programming with sensitive optimality criteria. Ann Math Stat 40: 1635–1660Google Scholar
- Yushkevich AA, Chitashvili RJ (1982) Controlled random sequences and Markov chains. Russian Math Surveys 37, 6(228): 239–274Google Scholar