Skip to main content
Log in

Adaptive control of stochastic systems with unknown disturbance distribution: discounted criteria

  • Published:
Mathematical Methods of Operations Research Aims and scope Submit manuscript

Abstract

We consider a class of discrete-time stochastic control systems, with Borel state and action spaces, and possibly unbounded costs. The processes evolve according to the equation x t +1=F(x t , a t , ξ t ), t=0, 1, ..., where the ξ t are i.i.d. random vectors whose common distribution is unknown. Assuming observability of {ξ t }, we use the empirical estimator of its distribution to construct adaptive policies which are asymptotically discounted cost optimal .

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Ash RB (1972) Real analysis and probability. Academic, New York

    Google Scholar 

  • Bertsekas DP (1987) Dynamic programming: deterministic and stochastic models. Prentice-Hall, Englewood Cliffs, N.J, New York

    MATH  Google Scholar 

  • Billingsley P, Topsoe F (1967) Uniformity in weak convergence. Z Wahrsch Verw Geb 7:1–16

    Article  MathSciNet  MATH  Google Scholar 

  • Cavazos-Cadena R (1990) Nonparametric adaptive control of discounted stochastic systems with compact state space. J Optim Theory Appl 65:191–207

    Article  MathSciNet  MATH  Google Scholar 

  • Dynkin EB, Yushkevich AA (1979) Controlled Markov processes. Springer Berlin Heidelberg, New York

    Google Scholar 

  • Fernández-Gaucherand E (1994) A note on the Ross-Taylor theorem. Appl Math Comput 64:207–212

    Article  MathSciNet  MATH  Google Scholar 

  • Gaenssler P, Stute W (1979) Empirical processes: a survey for i.i.d. random variables. Ann Probab 7:193–243

    Article  MathSciNet  MATH  Google Scholar 

  • Gordienko EI, Minjárez-Sosa JA (1998) Adaptive control for discrete-time Markov processes with unbounded costs: discounted criterion. Kybernetika 34:217–234

    MathSciNet  Google Scholar 

  • Hernández-Lerma O (1989) Adaptive Markov control processes. Springer Berlin Heidelberg, New York

    MATH  Google Scholar 

  • Hernández-Lerma O, Cavazos-Cadena R (1990) Density estimation and adaptive control of Markov processes: average and discounted criteria. Acata Appl Math 20:285–307

    Article  MATH  Google Scholar 

  • Hernández-Lerma O, Lasserre JB (1996) Discrete-time markov control processes: basic optimality criteria. Springer Berlin Heidelberg, New York

    Google Scholar 

  • Hernández-Lerma O, Lasserre JB (1999) Further topics on discrete-time markov control processes. Springer Berlin Heidelberg, New York

    MATH  Google Scholar 

  • Hernández-Lerma O, Marcus SI (1985) Adaptive control of discounted Markov decision chains. J Optim Theory Appl 46:227–235

    Article  MathSciNet  MATH  Google Scholar 

  • Hilgert N, Minjárez-Sosa JA (2001) Adaptive policies for time-varying stochastic systems under discounted criterion. Math Methods Oper Res 54(3):491–505

    Article  MathSciNet  MATH  Google Scholar 

  • Hilgert N, Hernández-Lerma O (2000) Limiting optimal discounted-cost control of a class of time-varying stochastic systems. Syst Control Lett 40:37–42

    Article  MATH  Google Scholar 

  • Kurano M. (1972). Discrete-time markovian decision processes with an unknown parameter-average return criterion. J Oper Res Soc Japan 15:67–76

    MathSciNet  MATH  Google Scholar 

  • Mandl P. (1974). Estimation and control in Markov chains. Adv Appl Probab 6:40–60

    Article  MathSciNet  MATH  Google Scholar 

  • Ranga Rao R, (1962) Relations between weak and uniform convergence of measures with applications. Ann Math Stat 33:659–680

    Article  MathSciNet  Google Scholar 

  • Reider U. (1978). Measurable selection theorems for optimization problems. Manuscripta Math 24:115–131

    Article  MathSciNet  Google Scholar 

  • Schäl M (1975) Conditions for optimality and for the limit of n-stage optimal policies to be optimal. Z Wahrs Verw Gerb 32:179–196

    Article  MATH  Google Scholar 

  • Schäl M (1987) Estimation and control in discounted stochastic dynamic programming. Stochastics 20:51–71

    MathSciNet  MATH  Google Scholar 

  • Van Nunen JAEE, Wessels J (1978) A note on dynamic programming with unbounded rewards. Manag Sci 24:576–580

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to J. Adolfo Minjárez-Sosa.

Additional information

AMS Subject Classification (2000) 93E10, 90C40

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hilgert, N., Minjárez-Sosa, J.A. Adaptive control of stochastic systems with unknown disturbance distribution: discounted criteria. Math Meth Oper Res 63, 443–460 (2006). https://doi.org/10.1007/s00186-005-0024-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00186-005-0024-6

Keywords

Navigation