Sample-Path Optimality in Average Markov Decision Chains Under a Double Lyapunov Function Condition
This work concerns discrete-time average Markov decision chains on a denumerable state space. Besides standard continuity compactness requirements, the main structural condition on the model is that the cost function has a Lyapunov function ℓ and that a power larger than two of ℓ also admits a Lyapunov function. In this context, the existence of optimal stationary policies in the (strong) sample-path sense is established, and it is shown that the Markov policies obtained from methods commonly used to approximate a solution of the optimality equation are also sample-path average optimal.
With sincere gratitude and appreciation, the authors dedicate this work to Professor Onésimo Hernández-Lerma on the occasion of his 65th anniversary, for his friendly and generous support and clever guidance.
- 3.Borkar V.K: On minimum cost per unit of time control of Markov chains, SIAM J. Control Optim. 21, 652–666 (1984)Google Scholar
- 14.Hordijk A.: Dynamic Programming and Potential Theory (Mathematical Centre Tract 51.) Mathematisch Centrum, Amsterdam (1974)Google Scholar
- 16.Lasserre J.B:: Sample-Path average optimality for Markov control processes, IEEE T. Automat. Contr. 44, 1966–1971 (1999)Google Scholar
- 18.Royden H.L.: Real Analysis, 2nd edn. MacMillan, New York (1968)Google Scholar