Advertisement

Sample-Path Optimality in Average Markov Decision Chains Under a Double Lyapunov Function Condition

  • Rolando Cavazos-CadenaEmail author
  • Raúl Montes-de-Oca
Chapter
Part of the Systems & Control: Foundations & Applications book series (SCFA)

Abstract

This work concerns discrete-time average Markov decision chains on a denumerable state space. Besides standard continuity compactness requirements, the main structural condition on the model is that the cost function has a Lyapunov function and that a power larger than two of also admits a Lyapunov function. In this context, the existence of optimal stationary policies in the (strong) sample-path sense is established, and it is shown that the Markov policies obtained from methods commonly used to approximate a solution of the optimality equation are also sample-path average optimal.

Notes

Acknowledgment

With sincere gratitude and appreciation, the authors dedicate this work to Professor Onésimo Hernández-Lerma on the occasion of his 65th anniversary, for his friendly and generous support and clever guidance.

References

  1. 1.
    Arapostathis A., Borkar V.K., Fernández-Gaucherand E., Ghosh M.K., Marcus S.I.: Discrete time controlled Markov processes with average cost criterion: a survey. SIAM J. Control Optim. 31, 282–344 (1993)MathSciNetzbMATHCrossRefGoogle Scholar
  2. 2.
    Billingsley P. : Probability and Measure, 3rd edn. Wiley, New York (1995)zbMATHGoogle Scholar
  3. 3.
    Borkar V.K: On minimum cost per unit of time control of Markov chains, SIAM J. Control Optim. 21, 652–666 (1984)Google Scholar
  4. 4.
    Borkar V.K.: Topics in Controlled Markov Chains, Longman, Harlow (1991)zbMATHGoogle Scholar
  5. 5.
    Cavazos-Cadena R., Hernández-Lerma O.: Equivalence of Lyapunov stability criteria in a class of Markov decision processes, Appl. Math. Optim. 26, 113–137 (1992)MathSciNetzbMATHCrossRefGoogle Scholar
  6. 6.
    Cavazos-Cadena R., Fernández-Gaucherand E.: Denumerable controlled Markov chains with average reward criterion: sample path optimality, Math. Method. Oper. Res. 41, 89–108 (1995)zbMATHCrossRefGoogle Scholar
  7. 7.
    Cavazos-Cadena R., Fernández-Gaucherand E.: Value iteration in a class of average controlled Markov chains with unbounded costs: Necessary and sufficient conditions for pointwise convergence, J. App. Prob. 33, 986–1002 (1996)zbMATHCrossRefGoogle Scholar
  8. 8.
    Cavazos-Cadena R.: Adaptive control of average Markov decision chains under the Lyapunov stability condition, Math. Method. Oper. Res. 54, 63–99 (2001)MathSciNetzbMATHCrossRefGoogle Scholar
  9. 9.
    Hernández-Lerma O.: Adaptive Markov Control Processes, Springer, New York (1989)zbMATHCrossRefGoogle Scholar
  10. 10.
    Hernández-Lerma O.: Existence of average optimal policies in Markov control processes with strictly unbounded costs, Kybernetika, 29, 1–17 (1993)MathSciNetzbMATHGoogle Scholar
  11. 11.
    Hernández-Lerma O., Lasserre J.B.: Value iteration and rolling horizon plans for Markov control processes with unbounded rewards, J. Math. Anal. Appl. 177, 38–55 (1993)MathSciNetzbMATHCrossRefGoogle Scholar
  12. 12.
    Hernández-Lerma O., Lasserre J.B.: Discrete-time Markov control processes: Basic optimality criteria, Springer, New York (1996)CrossRefGoogle Scholar
  13. 13.
    Hernández-Lerma O., Lasserre J.B.: Further Topics on Discrete-time Markov Control Processes, Springer, New York (1999)zbMATHCrossRefGoogle Scholar
  14. 14.
    Hordijk A.: Dynamic Programming and Potential Theory (Mathematical Centre Tract 51.) Mathematisch Centrum, Amsterdam (1974)Google Scholar
  15. 15.
    Hunt F.Y.: Sample path optimality for a Markov optimization problem, Stoch. Proc. Appl. 115, 769–779 (2005)zbMATHCrossRefGoogle Scholar
  16. 16.
    Lasserre J.B:: Sample-Path average optimality for Markov control processes, IEEE T. Automat. Contr. 44, 1966–1971 (1999)Google Scholar
  17. 17.
    Montes-de-Oca R., Hernández-Lerma O.: Value iteration in average cost Markov control processes on Borel spaces, Acta App. Math. 42, 203–221 (1994)CrossRefGoogle Scholar
  18. 18.
    Royden H.L.: Real Analysis, 2nd edn. MacMillan, New York (1968)Google Scholar
  19. 19.
    Shao J.: Mathematical Statistics, Springer, New York (1999)zbMATHGoogle Scholar
  20. 20.
    Vega-Amaya O.: Sample path average optimality of Markov control processes with strictly unbounded costs, Applicationes Mathematicae, 26, 363–381 (1999)MathSciNetzbMATHGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  1. 1.Departamento de Estadística y CálculoUniversidad Autónoma Agraria Antonio NarroSaltilloMéxico
  2. 2.Departamento de MatemáticasUniversidad Autónoma Metropolitana–IztapalapaMéxico D.F.México

Personalised recommendations