Annals of Operations Research

, Volume 28, Issue 1, pp 169–184 | Cite as

Nonparametric estimation and adaptive control in a class of finite Markov decision chains

  • Rolando Cavazos-Cadena
Research Contributions


We consider a class of Markov decision processes withfinite state and action spaces which, essentially, is determined by the following condition: The state space isirreducible under the action of any stationary policy. However, except by this restriction, the transition law iscompletely unknown to the controller. In this context, we find a set of policies under which thefrequency estimators of the transition law are strongly consistent and then, this result is applied to constructadaptive asymptotically discount-optimal policies.


Finite Markov decision chains unknown transition law frequency estimators asymptotic discount optimality principle of estimation and control nonstationary value iteration 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    P.J. Bickel and K.A. Doksum,Mathematical Statistics (Holden-Day, San Francisco, 1977).Google Scholar
  2. [2]
    V.S. Borkar, On minimum cost per unit of time control of Markov chains, SIAM J. Control Optim. 22 (1984) 965–978.Google Scholar
  3. [3]
    R. Cavazos-Cadena, Finite-state approximations and adaptive control of discounted Markov decision processes with unbounded rewards, Control Cyber. 16 (1987) 31–58.Google Scholar
  4. [4]
    R. Cavazos-Cadena, Nonparametric adaptive control of discounted stochastic systems with compact state space, J. Optim. Theory Appl. (1989) to appear.Google Scholar
  5. [5]
    A. Federgruen and P.J. Schweitzer, Nonstationary Markov decision problems with converging parameters, J. Optim. Theory Appl. 34 (1981) 207–241.Google Scholar
  6. [6]
    E.I. Gordienko, Adaptive strategies for certain classes of controlled Markov processes, Theory Probab. Appl. 29 (1985) 504–518.Google Scholar
  7. [7]
    O. Hernández-Lerma,Adaptive Markov Control Processes (Springer, New York, 1989).Google Scholar
  8. [8]
    O. Hernández-Lerma and S.I. Marcus, Adaptive control of discounted Markov chains, J. Optim. Theory Appl. 46 (1985) 227–235.Google Scholar
  9. [9]
    O. Hernández-Lerma and S.I. Marcus, Adaptive policies for discrete-time stochastic control systems with unknown disturbance distribution, Syst. Control Lett. 9 (1987) 307–315.Google Scholar
  10. [10]
    O. Hernández-Lerma and R. Cavazos-Cadena, Nonparametric estimation and adaptive control of Markov processes: average and discounted criteria (1989) submitted.Google Scholar
  11. [11]
    K. Hinderer,Foundations of Non-Stationary Dynamic Programming with Discrete Time Parameter, Lecture Notes in Operations Research 33 (Springer, New York, 1970).Google Scholar
  12. [12]
    M. Kolonko, The average optimal adaptive control of a Markov model in the presence of an unknown parameter, Math. Operationsforsch. Statist. 13, Serie Optim. (1982) 567–591.Google Scholar
  13. [13]
    M. Kurano, Discrete-time Markovian decision processes with an unknown parameter — average return criterion, J. Oper. Res. Soc. Japan 15 (1972) 67–76.Google Scholar
  14. [14]
    M. Loève,Probability Theory I, 4th ed. (Springer, New York, 1977).Google Scholar
  15. [15]
    M. Loève,Probability Theory II, 4th ed. (Springer, New York, 1977).Google Scholar
  16. [16]
    P. Mandl, Estimation and control in Markov chains, Adv. Appl. Probab. 6 (1974) 143–169.Google Scholar
  17. [17]
    S.M. Ross,Introduction to Stochastic Dynamic Programming (Academic Press, New York, 1983).Google Scholar
  18. [18]
    M. SchÄl, Estimation and control in discounted dynamic programming, Stochastics 20 (1987) 51–71.Google Scholar

Copyright information

© J.C. Baltzer A.G. Scientific Publishing Company 1991

Authors and Affiliations

  • Rolando Cavazos-Cadena
    • 1
  1. 1.Departamento de Estadística y CálculoUniversidad Autónoma Agraria Antonio Narro, BuenavistaSaltillo, COAHMéxico

Personalised recommendations