Sample-Path Optimality in Average Markov Decision Chains Under a Double Lyapunov Function Condition

Cavazos-Cadena, Rolando; Montes-de-Oca, Raúl

doi:10.1007/978-0-8176-8337-5_3

Rolando Cavazos-Cadena³ &
Raúl Montes-de-Oca⁴

Part of the book series: Systems & Control: Foundations & Applications ((SCFA))

1372 Accesses
3 Citations

Abstract

This work concerns discrete-time average Markov decision chains on a denumerable state space. Besides standard continuity compactness requirements, the main structural condition on the model is that the cost function has a Lyapunov function ℓ and that a power larger than two of ℓ also admits a Lyapunov function. In this context, the existence of optimal stationary policies in the (strong) sample-path sense is established, and it is shown that the Markov policies obtained from methods commonly used to approximate a solution of the optimality equation are also sample-path average optimal.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Hardcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Arapostathis A., Borkar V.K., Fernández-Gaucherand E., Ghosh M.K., Marcus S.I.: Discrete time controlled Markov processes with average cost criterion: a survey. SIAM J. Control Optim. 31, 282–344 (1993)
Article MathSciNet MATH Google Scholar
Billingsley P. : Probability and Measure, 3rd edn. Wiley, New York (1995)
MATH Google Scholar
Borkar V.K: On minimum cost per unit of time control of Markov chains, SIAM J. Control Optim. 21, 652–666 (1984)
Google Scholar
Borkar V.K.: Topics in Controlled Markov Chains, Longman, Harlow (1991)
MATH Google Scholar
Cavazos-Cadena R., Hernández-Lerma O.: Equivalence of Lyapunov stability criteria in a class of Markov decision processes, Appl. Math. Optim. 26, 113–137 (1992)
Article MathSciNet MATH Google Scholar
Cavazos-Cadena R., Fernández-Gaucherand E.: Denumerable controlled Markov chains with average reward criterion: sample path optimality, Math. Method. Oper. Res. 41, 89–108 (1995)
Article MATH Google Scholar
Cavazos-Cadena R., Fernández-Gaucherand E.: Value iteration in a class of average controlled Markov chains with unbounded costs: Necessary and sufficient conditions for pointwise convergence, J. App. Prob. 33, 986–1002 (1996)
Article MATH Google Scholar
Cavazos-Cadena R.: Adaptive control of average Markov decision chains under the Lyapunov stability condition, Math. Method. Oper. Res. 54, 63–99 (2001)
Article MathSciNet MATH Google Scholar
Hernández-Lerma O.: Adaptive Markov Control Processes, Springer, New York (1989)
Book MATH Google Scholar
Hernández-Lerma O.: Existence of average optimal policies in Markov control processes with strictly unbounded costs, Kybernetika, 29, 1–17 (1993)
MathSciNet MATH Google Scholar
Hernández-Lerma O., Lasserre J.B.: Value iteration and rolling horizon plans for Markov control processes with unbounded rewards, J. Math. Anal. Appl. 177, 38–55 (1993)
Article MathSciNet MATH Google Scholar
Hernández-Lerma O., Lasserre J.B.: Discrete-time Markov control processes: Basic optimality criteria, Springer, New York (1996)
Book Google Scholar
Hernández-Lerma O., Lasserre J.B.: Further Topics on Discrete-time Markov Control Processes, Springer, New York (1999)
Book MATH Google Scholar
Hordijk A.: Dynamic Programming and Potential Theory (Mathematical Centre Tract 51.) Mathematisch Centrum, Amsterdam (1974)
Google Scholar
Hunt F.Y.: Sample path optimality for a Markov optimization problem, Stoch. Proc. Appl. 115, 769–779 (2005)
Article MATH Google Scholar
Lasserre J.B:: Sample-Path average optimality for Markov control processes, IEEE T. Automat. Contr. 44, 1966–1971 (1999)
Google Scholar
Montes-de-Oca R., Hernández-Lerma O.: Value iteration in average cost Markov control processes on Borel spaces, Acta App. Math. 42, 203–221 (1994)
Article Google Scholar
Royden H.L.: Real Analysis, 2nd edn. MacMillan, New York (1968)
Google Scholar
Shao J.: Mathematical Statistics, Springer, New York (1999)
MATH Google Scholar
Vega-Amaya O.: Sample path average optimality of Markov control processes with strictly unbounded costs, Applicationes Mathematicae, 26, 363–381 (1999)
MathSciNet MATH Google Scholar

Download references

Acknowledgment

With sincere gratitude and appreciation, the authors dedicate this work to Professor Onésimo Hernández-Lerma on the occasion of his 65th anniversary, for his friendly and generous support and clever guidance.

Author information

Authors and Affiliations

Departamento de Estadística y Cálculo, Universidad Autónoma Agraria Antonio Narro, Buenavista, Saltillo, Coahuila, 25315, México
Rolando Cavazos-Cadena
Departamento de Matemáticas, Universidad Autónoma Metropolitana–Iztapalapa, México D.F., 09340, México
Raúl Montes-de-Oca

Authors

Rolando Cavazos-Cadena
View author publications
You can also search for this author in PubMed Google Scholar
Raúl Montes-de-Oca
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rolando Cavazos-Cadena .

Editor information

Editors and Affiliations

, Department of Probability and Statistics, Center for Research in Mathematics, Jalisco s/n, Guanajuato, 36000, Mexico
Daniel Hernández-Hernández
, Department of Mathematics, University of Sonora, Rosales s/n, Hermosillo, 83000, Sonora, Mexico
J. Adolfo Minjárez-Sosa

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Cavazos-Cadena, R., Montes-de-Oca, R. (2012). Sample-Path Optimality in Average Markov Decision Chains Under a Double Lyapunov Function Condition. In: Hernández-Hernández, D., Minjárez-Sosa, J. (eds) Optimization, Control, and Applications of Stochastic Systems. Systems & Control: Foundations & Applications. Birkhäuser, Boston. https://doi.org/10.1007/978-0-8176-8337-5_3

Download citation

DOI: https://doi.org/10.1007/978-0-8176-8337-5_3
Published: 12 July 2012
Publisher Name: Birkhäuser, Boston
Print ISBN: 978-0-8176-8336-8
Online ISBN: 978-0-8176-8337-5
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics