On Optimal Policies of Multichain Finite State Compact Action Markov Decision Processes
This paper is concerned with finite state multichain MDPs with compact action set. The optimality criterion is the long-run average cost. Simple examples illustrate that optimal stationaryu Markov policies do not always exist. We establish the existence of e-optimal policies which are stationary Markovian, and develop an algorithm which computes these approximate optimal policies. We establish a necessary and sufficient condition for the existence of an optimal policy which is stationary Markovian, and in case that such an optimal policy exists the algorithm computes it.
KeywordsOptimal Policy Markov Decision Process Average Cost Optimality Equation Deterministic Policy
Unable to display preview. Download preview PDF.
- A. Berman and R. J. Plemmons (1979). Nonnegative Matrices in The Mathematical Sciences, Academic Press, New York.Google Scholar
- E. V. Denardo and B. Fox (1968). Multichain Markov renewal programs, SIAM J. Appl. Math.468–487.Google Scholar
- C. Derman (1970). Finite States Markovian Decision Processes, Academic Press, New York.Google Scholar
- J. Filar and K. Vrieze (1997). Competitive Markov Decision Processes, Springer-Verlag, New York.Google Scholar
- D. P. Heyman and M. J. Sobel (1984). Stochastic Models in Operations Research, vol. II: Stochastic Optimization, McGraw-Hill, New York.Google Scholar
- A. Hordijk (1974). Dynamic programming and Markov potential theory, Math. Centre Tracts 51, Amsterdam.Google Scholar
- A. Hordijk and L. C. M. Kallenberg, Constrained undiscounted stochastic dynamic programming, Math. Op. Res. 9 276–289.Google Scholar
- A. Leizarowitz (2000). An algorithm to identify average optimal policies in multichain finite state compact action Markov decision processes, preprint.Google Scholar
- E. Seneta (1981). Non-negative Matrices and Markov Chains, Springer-Verlag, New York.Google Scholar
- R. Varadarajan (1987). Reliability and performance models for reconfigurable computer systems. PhD thesis, University of Pennsylvania, Philadelphia, PA.Google Scholar