Abstract
We are concerned with Markov decision processes with Borel state and action spaces; the transition law and the reward function depend on anunknown parameter. In this framework, we study therecursive adaptive nonstationary value iteration policy, which is proved to be optimal under thesame conditions usually imposed to obtain the optimality of other well-knownnonrecursive adaptive policies. The results are illustrated by showing the existence of optimal adaptive policies for a class of additive-noise systems with unknown noise distribution.
Similar content being viewed by others
References
Acosta-Abreu, R. S., and O. Hernández-Lerma (1985). Iterative adaptive control of denumerable stage average-cost Markov systems. Control and Cybernetics, 14, 313–322.
Ash, R. B. (1972). Real Analysis and Probability. Academic Press, New York
Cavazos-Cadena, R. (1987). Finite state approximations and adaptive control of discounted Markov decision processes with unbounded rewards. Control and Cybernetics, 16, 31–58.
Cavazos-Cadena, R. (1990). Nonparametric adaptive control of discounted stochastic systems with compact state space. Journal of Optimization Theory and Applications, 65(2), 191–207.
Cavazos-Cadena, R., and O. Hernández-Lerma (1989). Recursive Adaptive Control of Markov Decision Processes. Report No. 28, Departamento de Matemáticas, CINVESTAV-IPN, México, D.F.
Devroye, L., and L. Györfi (1985) Nonparametric Density Estimation: TheL 1 View. Wiley, New York.
Federgruen, A., and P. J. Schweitzer (1981). Nonstationary Markov decision problems with converging parameters. Journal of Optimization Theory and Applications, 34, 207–241.
Georgin, J. P. (1978). Estimation et contrôle des chaíns de Markov sur des espaces arbitraires, Annales de l'Institut Henri Poincaré, Section B, 14, 255–277.
Hernández-Lerma, O. (1989). Adaptive Markov Control Processes. Springer-Verlag, New York.
Hernández-Lerma, O., and S. I. Marcus (1985). Adaptive control of discounted Markov decision chains. Journal of Optimization Theory and Applications, 46, 227–235.
Himmelberg, C. J., T. Parthasaraty, and F. S. Van Vleck (1976). Optimal plans for dynamic programming problems. Mathematics of Operations Research, 1, 390–394.
Hinderer, K. (1970). Foundations of Non-Stationary Dynamic Programming with Discrete Time Parameter. Lecture Notes on Operations Research, Vol. 33. Springer-Verlag, New York.
Kolonko, M. (1982). The average optimal adaptive control of a Markov decision model in the presence of an unknown parameter. Mathematische Operationsforschung und Statistiks Series Optimization, 13, 567–591.
Kurano, M. (1985). Average-optimal adaptive policies in semi-Markov decision processes including an unknown parameter. Journal of the Operations Research Society of Japan, 28, 252–266.
Loève, M. (1970). Probability Theory I. Springer-Verlag, New York.
Mandl, P. (1974). Estimation and control in Markov chains. Advances in Applied Probability, 6, 40–60.
Royden, H. L. (1968). Real Analysis. Macmillan, New York.
Schäl, M. (1987). Estimation and control in discounted stochastic dynamic programming. Stochastics, 20, 51–71.
Author information
Authors and Affiliations
Additional information
This research was supported in part by the Consejo Nacional de Ciencia y Tecnología under Grants PCEXCNA-050156 and A128CCOEO550, and in part by the Third World Academy of Sciences under Grant TWAS RG MP 898-152.
Rights and permissions
About this article
Cite this article
Cavazos-Cadena, R., Hernández-Lerma, O. Recursive adaptive control of Markov decision processes with the average reward criterion. Appl Math Optim 23, 193–207 (1991). https://doi.org/10.1007/BF01442397
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/BF01442397