Skip to main content
Log in

Recursive adaptive control of Markov decision processes with the average reward criterion

  • Published:
Applied Mathematics and Optimization Aims and scope Submit manuscript

Abstract

We are concerned with Markov decision processes with Borel state and action spaces; the transition law and the reward function depend on anunknown parameter. In this framework, we study therecursive adaptive nonstationary value iteration policy, which is proved to be optimal under thesame conditions usually imposed to obtain the optimality of other well-knownnonrecursive adaptive policies. The results are illustrated by showing the existence of optimal adaptive policies for a class of additive-noise systems with unknown noise distribution.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Acosta-Abreu, R. S., and O. Hernández-Lerma (1985). Iterative adaptive control of denumerable stage average-cost Markov systems. Control and Cybernetics, 14, 313–322.

    Google Scholar 

  2. Ash, R. B. (1972). Real Analysis and Probability. Academic Press, New York

    Google Scholar 

  3. Cavazos-Cadena, R. (1987). Finite state approximations and adaptive control of discounted Markov decision processes with unbounded rewards. Control and Cybernetics, 16, 31–58.

    Google Scholar 

  4. Cavazos-Cadena, R. (1990). Nonparametric adaptive control of discounted stochastic systems with compact state space. Journal of Optimization Theory and Applications, 65(2), 191–207.

    Google Scholar 

  5. Cavazos-Cadena, R., and O. Hernández-Lerma (1989). Recursive Adaptive Control of Markov Decision Processes. Report No. 28, Departamento de Matemáticas, CINVESTAV-IPN, México, D.F.

    Google Scholar 

  6. Devroye, L., and L. Györfi (1985) Nonparametric Density Estimation: TheL 1 View. Wiley, New York.

    Google Scholar 

  7. Federgruen, A., and P. J. Schweitzer (1981). Nonstationary Markov decision problems with converging parameters. Journal of Optimization Theory and Applications, 34, 207–241.

    Google Scholar 

  8. Georgin, J. P. (1978). Estimation et contrôle des chaíns de Markov sur des espaces arbitraires, Annales de l'Institut Henri Poincaré, Section B, 14, 255–277.

    Google Scholar 

  9. Hernández-Lerma, O. (1989). Adaptive Markov Control Processes. Springer-Verlag, New York.

    Google Scholar 

  10. Hernández-Lerma, O., and S. I. Marcus (1985). Adaptive control of discounted Markov decision chains. Journal of Optimization Theory and Applications, 46, 227–235.

    Google Scholar 

  11. Himmelberg, C. J., T. Parthasaraty, and F. S. Van Vleck (1976). Optimal plans for dynamic programming problems. Mathematics of Operations Research, 1, 390–394.

    Google Scholar 

  12. Hinderer, K. (1970). Foundations of Non-Stationary Dynamic Programming with Discrete Time Parameter. Lecture Notes on Operations Research, Vol. 33. Springer-Verlag, New York.

    Google Scholar 

  13. Kolonko, M. (1982). The average optimal adaptive control of a Markov decision model in the presence of an unknown parameter. Mathematische Operationsforschung und Statistiks Series Optimization, 13, 567–591.

    Google Scholar 

  14. Kurano, M. (1985). Average-optimal adaptive policies in semi-Markov decision processes including an unknown parameter. Journal of the Operations Research Society of Japan, 28, 252–266.

    Google Scholar 

  15. Loève, M. (1970). Probability Theory I. Springer-Verlag, New York.

    Google Scholar 

  16. Mandl, P. (1974). Estimation and control in Markov chains. Advances in Applied Probability, 6, 40–60.

    Google Scholar 

  17. Royden, H. L. (1968). Real Analysis. Macmillan, New York.

    Google Scholar 

  18. Schäl, M. (1987). Estimation and control in discounted stochastic dynamic programming. Stochastics, 20, 51–71.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Additional information

This research was supported in part by the Consejo Nacional de Ciencia y Tecnología under Grants PCEXCNA-050156 and A128CCOEO550, and in part by the Third World Academy of Sciences under Grant TWAS RG MP 898-152.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cavazos-Cadena, R., Hernández-Lerma, O. Recursive adaptive control of Markov decision processes with the average reward criterion. Appl Math Optim 23, 193–207 (1991). https://doi.org/10.1007/BF01442397

Download citation

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF01442397

Keywords

Navigation