Skip to main content

Sample-Path Optimality in Average Markov Decision Chains Under a Double Lyapunov Function Condition

  • Chapter
  • First Online:
Optimization, Control, and Applications of Stochastic Systems

Abstract

This work concerns discrete-time average Markov decision chains on a denumerable state space. Besides standard continuity compactness requirements, the main structural condition on the model is that the cost function has a Lyapunov function and that a power larger than two of also admits a Lyapunov function. In this context, the existence of optimal stationary policies in the (strong) sample-path sense is established, and it is shown that the Markov policies obtained from methods commonly used to approximate a solution of the optimality equation are also sample-path average optimal.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Arapostathis A., Borkar V.K., Fernández-Gaucherand E., Ghosh M.K., Marcus S.I.: Discrete time controlled Markov processes with average cost criterion: a survey. SIAM J. Control Optim. 31, 282–344 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  2. Billingsley P. : Probability and Measure, 3rd edn. Wiley, New York (1995)

    MATH  Google Scholar 

  3. Borkar V.K: On minimum cost per unit of time control of Markov chains, SIAM J. Control Optim. 21, 652–666 (1984)

    Google Scholar 

  4. Borkar V.K.: Topics in Controlled Markov Chains, Longman, Harlow (1991)

    MATH  Google Scholar 

  5. Cavazos-Cadena R., Hernández-Lerma O.: Equivalence of Lyapunov stability criteria in a class of Markov decision processes, Appl. Math. Optim. 26, 113–137 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  6. Cavazos-Cadena R., Fernández-Gaucherand E.: Denumerable controlled Markov chains with average reward criterion: sample path optimality, Math. Method. Oper. Res. 41, 89–108 (1995)

    Article  MATH  Google Scholar 

  7. Cavazos-Cadena R., Fernández-Gaucherand E.: Value iteration in a class of average controlled Markov chains with unbounded costs: Necessary and sufficient conditions for pointwise convergence, J. App. Prob. 33, 986–1002 (1996)

    Article  MATH  Google Scholar 

  8. Cavazos-Cadena R.: Adaptive control of average Markov decision chains under the Lyapunov stability condition, Math. Method. Oper. Res. 54, 63–99 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  9. Hernández-Lerma O.: Adaptive Markov Control Processes, Springer, New York (1989)

    Book  MATH  Google Scholar 

  10. Hernández-Lerma O.: Existence of average optimal policies in Markov control processes with strictly unbounded costs, Kybernetika, 29, 1–17 (1993)

    MathSciNet  MATH  Google Scholar 

  11. Hernández-Lerma O., Lasserre J.B.: Value iteration and rolling horizon plans for Markov control processes with unbounded rewards, J. Math. Anal. Appl. 177, 38–55 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  12. Hernández-Lerma O., Lasserre J.B.: Discrete-time Markov control processes: Basic optimality criteria, Springer, New York (1996)

    Book  Google Scholar 

  13. Hernández-Lerma O., Lasserre J.B.: Further Topics on Discrete-time Markov Control Processes, Springer, New York (1999)

    Book  MATH  Google Scholar 

  14. Hordijk A.: Dynamic Programming and Potential Theory (Mathematical Centre Tract 51.) Mathematisch Centrum, Amsterdam (1974)

    Google Scholar 

  15. Hunt F.Y.: Sample path optimality for a Markov optimization problem, Stoch. Proc. Appl. 115, 769–779 (2005)

    Article  MATH  Google Scholar 

  16. Lasserre J.B:: Sample-Path average optimality for Markov control processes, IEEE T. Automat. Contr. 44, 1966–1971 (1999)

    Google Scholar 

  17. Montes-de-Oca R., Hernández-Lerma O.: Value iteration in average cost Markov control processes on Borel spaces, Acta App. Math. 42, 203–221 (1994)

    Article  Google Scholar 

  18. Royden H.L.: Real Analysis, 2nd edn. MacMillan, New York (1968)

    Google Scholar 

  19. Shao J.: Mathematical Statistics, Springer, New York (1999)

    MATH  Google Scholar 

  20. Vega-Amaya O.: Sample path average optimality of Markov control processes with strictly unbounded costs, Applicationes Mathematicae, 26, 363–381 (1999)

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgment

With sincere gratitude and appreciation, the authors dedicate this work to Professor Onésimo Hernández-Lerma on the occasion of his 65th anniversary, for his friendly and generous support and clever guidance.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rolando Cavazos-Cadena .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Cavazos-Cadena, R., Montes-de-Oca, R. (2012). Sample-Path Optimality in Average Markov Decision Chains Under a Double Lyapunov Function Condition. In: Hernández-Hernández, D., Minjárez-Sosa, J. (eds) Optimization, Control, and Applications of Stochastic Systems. Systems & Control: Foundations & Applications. Birkhäuser, Boston. https://doi.org/10.1007/978-0-8176-8337-5_3

Download citation

Publish with us

Policies and ethics