Zeitschrift für Operations Research

, Volume 40, Issue 3, pp 253–288 | Cite as

Blackwell optimal policies in a Markov decision process with a Borel state space

  • A. A. Yushkevich
Articles

Abstract

After an introduction into sensitive criteria in Markov decision processes and a discussion of definitions, we prove the existence of stationary Blackwell optimal policies under following main assumptions: (i) the state space is a Borel one; (ii) the action space is countable, the action sets are finite; (iii) the transition function is given by a transition density; (iv) a simultaneous Doeblin-type recurrence condition holds. The proof is based on an aggregation of randomized stationary policies into measures. Topology in the space of those measures is at the same time a weak and a strong one, and this fact yields compactness of the space and continuity of Laurent coefficients of the expected discounted reward. Another important tool is a lexicographical policy improvement. The exposition is mostly self-contained.

Key words

Discrete-time Markov decision process Borel state space transition densities simultaneous Doeblin condition Blackwell optimality 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aropostatis A, Borkar VS, Fernández-Gaucherand E, Ghosh MK, Markus SL (1993) Discrete-time controlled Markov processes with average cost criterion: a survey. SIAM J Control Optim 31: 282–344Google Scholar
  2. Blackwell D (1962) Discrete dynamic programming. Ann Math Stat 33: 719–726Google Scholar
  3. Cavazos-Cadena R, Lasserre JB (1988) Strong 1-optimal stationary policies in denumerable Markov decision processes. System Control Letters 11: 65–71Google Scholar
  4. Cavazos-Cadena R, Lasserre JB (1993) A direct approach to Blackwell optimality. PreprintGoogle Scholar
  5. Chitashvili RJ (1975) A controlled finite Markov chain with an arbitrary set of decisions. Theory Prob 20: 839–847Google Scholar
  6. Chitashivili RJ (1976) A finite controlled Markov chain with small termination probabil ty. Theory Prob 21: 158–163Google Scholar
  7. Dekker R, Hordijk A (1988) Average, sensitive and Blackwell optimal policies in denumerable Markov decision chains with unbounded rewards. Math Oper Res 13: 395–420Google Scholar
  8. Denardo EV (1971) Markov renewal programs with small interest rates. Ann Math Stat 42:477–496Google Scholar
  9. Denardo EV, Miller BL (1968) An optimality condition for discrete dynamic programming with no discounting. Ann Math Stat 39: 1220–1227Google Scholar
  10. Doob JL (1953) Stochastic Processes. Wiley, New YorkGoogle Scholar
  11. Dynkin EB, Yushkevich AA (1979) Controlled Markov processes. Grundlehren der mathematischen Wissenschaften 235. Springer, Berlin Heidelberg New YorkGoogle Scholar
  12. Hille E (1948) Functional Analysis and Semigroups. American Mathematical Society Colloqium Publications 31, New YorkGoogle Scholar
  13. Hordijk A, Sladký K (1977) Sensitive optimality criteria in countable state dynamic programming. Math Oper Res 2: 1–14Google Scholar
  14. Lasserre JB (1988) Conditions for existence of average and Blackwell optimal stationary policies in denumerable Markov decision processes. J Math Anal Appl 136: 479–489Google Scholar
  15. Lippman SA (1969) Criterion equivalence in discrete dynamic programming. Oper Res 17:920–923Google Scholar
  16. Miller BL, Veinott AF Jr (1969) Discrete dynamic programming with a small interest rate. Ann Math Stat 40: 366–370Google Scholar
  17. Puterman M (1974) Sensitive discount optimality in controlled one-dimensional diffusions. Ann Prob 2: 408–419Google Scholar
  18. Sladký K (1974) On the set of optimal controls for Markov chains with rewards. Kybernetika 10: 350–367Google Scholar
  19. Sladký K (1978) Sensitive optimality criteria for continuous time Markov decision processes. Trans 8th Prague Conference Inform Theory B: 211–225Google Scholar
  20. Strauch RE (1966) Negative dynamic programming. Ann Math Stat 37: 871–890Google Scholar
  21. Veinott AF Jr (1966) On finding optimal policies in discrete dynamic programming with no discounting. Ann Math Stat 37: 1284–1294Google Scholar
  22. Veinott AF Jr (1969) Discrete dynamic programming with sensitive optimality criteria. Ann Math Stat 40: 1635–1660Google Scholar
  23. Yushkevich AA, Chitashvili RJ (1982) Controlled random sequences and Markov chains. Russian Math Surveys 37, 6(228): 239–274Google Scholar

Copyright information

© Physica-Verlag 1994

Authors and Affiliations

  • A. A. Yushkevich
    • 1
  1. 1.Department of MathematicsUniversity of North Carolina at CharlotteCharlotteUSA

Personalised recommendations