Abstract
We consider a class of discrete-time stochastic control systems, with Borel state and action spaces, and possibly unbounded costs. The processes evolve according to the equation x t +1=F(x t , a t , ξ t ), t=0, 1, ..., where the ξ t are i.i.d. random vectors whose common distribution is unknown. Assuming observability of {ξ t }, we use the empirical estimator of its distribution to construct adaptive policies which are asymptotically discounted cost optimal .
Similar content being viewed by others
References
Ash RB (1972) Real analysis and probability. Academic, New York
Bertsekas DP (1987) Dynamic programming: deterministic and stochastic models. Prentice-Hall, Englewood Cliffs, N.J, New York
Billingsley P, Topsoe F (1967) Uniformity in weak convergence. Z Wahrsch Verw Geb 7:1–16
Cavazos-Cadena R (1990) Nonparametric adaptive control of discounted stochastic systems with compact state space. J Optim Theory Appl 65:191–207
Dynkin EB, Yushkevich AA (1979) Controlled Markov processes. Springer Berlin Heidelberg, New York
Fernández-Gaucherand E (1994) A note on the Ross-Taylor theorem. Appl Math Comput 64:207–212
Gaenssler P, Stute W (1979) Empirical processes: a survey for i.i.d. random variables. Ann Probab 7:193–243
Gordienko EI, Minjárez-Sosa JA (1998) Adaptive control for discrete-time Markov processes with unbounded costs: discounted criterion. Kybernetika 34:217–234
Hernández-Lerma O (1989) Adaptive Markov control processes. Springer Berlin Heidelberg, New York
Hernández-Lerma O, Cavazos-Cadena R (1990) Density estimation and adaptive control of Markov processes: average and discounted criteria. Acata Appl Math 20:285–307
Hernández-Lerma O, Lasserre JB (1996) Discrete-time markov control processes: basic optimality criteria. Springer Berlin Heidelberg, New York
Hernández-Lerma O, Lasserre JB (1999) Further topics on discrete-time markov control processes. Springer Berlin Heidelberg, New York
Hernández-Lerma O, Marcus SI (1985) Adaptive control of discounted Markov decision chains. J Optim Theory Appl 46:227–235
Hilgert N, Minjárez-Sosa JA (2001) Adaptive policies for time-varying stochastic systems under discounted criterion. Math Methods Oper Res 54(3):491–505
Hilgert N, Hernández-Lerma O (2000) Limiting optimal discounted-cost control of a class of time-varying stochastic systems. Syst Control Lett 40:37–42
Kurano M. (1972). Discrete-time markovian decision processes with an unknown parameter-average return criterion. J Oper Res Soc Japan 15:67–76
Mandl P. (1974). Estimation and control in Markov chains. Adv Appl Probab 6:40–60
Ranga Rao R, (1962) Relations between weak and uniform convergence of measures with applications. Ann Math Stat 33:659–680
Reider U. (1978). Measurable selection theorems for optimization problems. Manuscripta Math 24:115–131
Schäl M (1975) Conditions for optimality and for the limit of n-stage optimal policies to be optimal. Z Wahrs Verw Gerb 32:179–196
Schäl M (1987) Estimation and control in discounted stochastic dynamic programming. Stochastics 20:51–71
Van Nunen JAEE, Wessels J (1978) A note on dynamic programming with unbounded rewards. Manag Sci 24:576–580
Author information
Authors and Affiliations
Corresponding author
Additional information
AMS Subject Classification (2000) 93E10, 90C40
Rights and permissions
About this article
Cite this article
Hilgert, N., Minjárez-Sosa, J.A. Adaptive control of stochastic systems with unknown disturbance distribution: discounted criteria. Math Meth Oper Res 63, 443–460 (2006). https://doi.org/10.1007/s00186-005-0024-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00186-005-0024-6