Recursive adaptive control of Markov decision processes with the average reward criterion

Cavazos-Cadena, Rolando; Hernández-Lerma, Onésimo

doi:10.1007/BF01442397

Recursive adaptive control of Markov decision processes with the average reward criterion

Published: January 1991

Volume 23, pages 193–207, (1991)
Cite this article

Applied Mathematics and Optimization Aims and scope Submit manuscript

Rolando Cavazos-Cadena¹ &
Onésimo Hernández-Lerma²

55 Accesses
2 Citations
Explore all metrics

Abstract

We are concerned with Markov decision processes with Borel state and action spaces; the transition law and the reward function depend on anunknown parameter. In this framework, we study therecursive adaptive nonstationary value iteration policy, which is proved to be optimal under thesame conditions usually imposed to obtain the optimality of other well-knownnonrecursive adaptive policies. The results are illustrated by showing the existence of optimal adaptive policies for a class of additive-noise systems with unknown noise distribution.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Finite horizon continuous-time Markov decision processes with mean and variance criteria

Article 29 September 2018

Approximation of Infinite Horizon Discounted Cost Markov Decision Processes

Approximate Policy Iteration for Markov Decision Processes via Quantitative Adaptive Aggregations

References

Acosta-Abreu, R. S., and O. Hernández-Lerma (1985). Iterative adaptive control of denumerable stage average-cost Markov systems. Control and Cybernetics, 14, 313–322.
Google Scholar
Ash, R. B. (1972). Real Analysis and Probability. Academic Press, New York
Google Scholar
Cavazos-Cadena, R. (1987). Finite state approximations and adaptive control of discounted Markov decision processes with unbounded rewards. Control and Cybernetics, 16, 31–58.
Google Scholar
Cavazos-Cadena, R. (1990). Nonparametric adaptive control of discounted stochastic systems with compact state space. Journal of Optimization Theory and Applications, 65(2), 191–207.
Google Scholar
Cavazos-Cadena, R., and O. Hernández-Lerma (1989). Recursive Adaptive Control of Markov Decision Processes. Report No. 28, Departamento de Matemáticas, CINVESTAV-IPN, México, D.F.
Google Scholar
Devroye, L., and L. Györfi (1985) Nonparametric Density Estimation: TheL ₁ View. Wiley, New York.
Google Scholar
Federgruen, A., and P. J. Schweitzer (1981). Nonstationary Markov decision problems with converging parameters. Journal of Optimization Theory and Applications, 34, 207–241.
Google Scholar
Georgin, J. P. (1978). Estimation et contrôle des chaíns de Markov sur des espaces arbitraires, Annales de l'Institut Henri Poincaré, Section B, 14, 255–277.
Google Scholar
Hernández-Lerma, O. (1989). Adaptive Markov Control Processes. Springer-Verlag, New York.
Google Scholar
Hernández-Lerma, O., and S. I. Marcus (1985). Adaptive control of discounted Markov decision chains. Journal of Optimization Theory and Applications, 46, 227–235.
Google Scholar
Himmelberg, C. J., T. Parthasaraty, and F. S. Van Vleck (1976). Optimal plans for dynamic programming problems. Mathematics of Operations Research, 1, 390–394.
Google Scholar
Hinderer, K. (1970). Foundations of Non-Stationary Dynamic Programming with Discrete Time Parameter. Lecture Notes on Operations Research, Vol. 33. Springer-Verlag, New York.
Google Scholar
Kolonko, M. (1982). The average optimal adaptive control of a Markov decision model in the presence of an unknown parameter. Mathematische Operationsforschung und Statistiks Series Optimization, 13, 567–591.
Google Scholar
Kurano, M. (1985). Average-optimal adaptive policies in semi-Markov decision processes including an unknown parameter. Journal of the Operations Research Society of Japan, 28, 252–266.
Google Scholar
Loève, M. (1970). Probability Theory I. Springer-Verlag, New York.
Google Scholar
Mandl, P. (1974). Estimation and control in Markov chains. Advances in Applied Probability, 6, 40–60.
Google Scholar
Royden, H. L. (1968). Real Analysis. Macmillan, New York.
Google Scholar
Schäl, M. (1987). Estimation and control in discounted stochastic dynamic programming. Stochastics, 20, 51–71.
Google Scholar

Download references

Author information

Authors and Affiliations

Departamento de Estadística y Cálculo, Universidad Autónoma Agraria Antonio Narro, 25315, Buenavista, Saltillo, COAH, México
Rolando Cavazos-Cadena
Departamento de Matemáticas, CINVESTAV-IPN, Apartado Postal 14-740, 07000, Mexico, D.F., México
Onésimo Hernández-Lerma

Authors

Rolando Cavazos-Cadena
View author publications
You can also search for this author in PubMed Google Scholar
Onésimo Hernández-Lerma
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

This research was supported in part by the Consejo Nacional de Ciencia y Tecnología under Grants PCEXCNA-050156 and A128CCOEO550, and in part by the Third World Academy of Sciences under Grant TWAS RG MP 898-152.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cavazos-Cadena, R., Hernández-Lerma, O. Recursive adaptive control of Markov decision processes with the average reward criterion. Appl Math Optim 23, 193–207 (1991). https://doi.org/10.1007/BF01442397

Download citation

Accepted: 07 February 1990
Issue Date: January 1991
DOI: https://doi.org/10.1007/BF01442397

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Recursive adaptive control of Markov decision processes with the average reward criterion

Abstract

Access this article

Similar content being viewed by others

Finite horizon continuous-time Markov decision processes with mean and variance criteria

Approximation of Infinite Horizon Discounted Cost Markov Decision Processes

Approximate Policy Iteration for Markov Decision Processes via Quantitative Adaptive Aggregations

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Recursive adaptive control of Markov decision processes with the average reward criterion

Abstract

Access this article

Similar content being viewed by others

Finite horizon continuous-time Markov decision processes with mean and variance criteria

Approximation of Infinite Horizon Discounted Cost Markov Decision Processes

Approximate Policy Iteration for Markov Decision Processes via Quantitative Adaptive Aggregations

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation