Abstract
The Kumar-Becker-Lin scheme introduces a slowly vanishing cost bias in the parameter estimation part of self-tuning control in order to improve its performance. This paper establishes the a.s. optimality of a variant of this scheme for Markov chains on a countable state space when the action space is compact metric and the parameter space is a compact subset ofR m.
Similar content being viewed by others
References
Mandl, P.,Estimation and Control in Markov Chains, Advances in Applied Probability, Vol. 6, pp. 40–60, 1974.
Schäl, M.,Estimation and Control in Discounted Dynamic Programming, Stochastics, Vol. 20, pp. 51–71, 1987.
Borkar, V. S., andGhosh, M. K.,Ergodic and Adaptive Control of Nearest Neighbor Motions, Mathematics of Control, Signals, and Systems (to appear).
Borkar, V. S., andVaraiya, P. P.,Adaptive Control of Markov Chains, I: Finite Parameter Set, IEEE Transactions on Automatic Control, Vol. AC-24, pp. 953–957, 1979.
Kumar, P. R., andBecker, A.,A New Family of Optimal Adaptive Controllers for Markov Chains, IEEE Transactions on Automatic Control, Vol. AC-27, pp. 137–146, 1982.
Kumar, P. R., andLin, W.,Optimal Adaptive Controllers for Markov Chains, IEEE Transactions on Automatic Control, Vol. AC-27, pp. 765–774, 1982.
Kumar, P. R.,A Survey of Some Results in Stochastic Adaptive Control, SIAM Journal on Control and Optimization, Vol. 23, pp. 329–380, 1985.
Hajek, B.,Hitting-Time and Occupation-Time Bounds Implied by Drift Analysis with Applications, Advances in Applied Probability, Vol. 14, pp. 502–525, 1982.
Borkar, V. S.,Control of Markov Chains with Long-Run Average Cost Criterion: The Dynamic Programming Equations, SIAM Journal on Control and Optimization, Vol. 27, pp. 642–657, 1989.
Beneš, V. E.,Existence of Optimal Strategies Based on Specified Information, for a Class of Stochastic Decision Problems, SIAM Journal on Control and Optimization, Vol. 8, pp. 179–188, 1970.
Borkar, V. S., andVaraiya, P. P.,Identification and Adaptive Control of Markov Chains, SIAM Journal on Control and Optimization, Vol. 20, pp. 470–488, 1982.
Borkar, V. S.,A Convex Analytic Approach to Markov Decision Processes, Probability Theory and Related Fields, Vol. 78, pp. 583–602, 1988.
Borkar, V. S.,Control of Markov Chains with Long-Run Average Cost Criterion, Stochastic Differential Systems, Stochastic Control Theory, and Applications, Edited by W. H. Fleming and P. L. Lions, Springer-Verlag, New York, New York, pp. 57–77, 1988.
Loève, M.,Probability Theory, Vol. 2, 4th Edition, Springer-Verlag, New York, New York, 1978.
Neveu, J.,Discrete Parameter Martingales, North-Holland, Amsterdam, Holland, 1975.
Chow, Y. S., andTeicher, H.,Probability Theory: Independence, Interchange-ability, Martingales, Springer-Verlag, New York, New York, 1978.
Borkar, V. S., andBagchi, A.,Parameter Estimation in Continuous-Time Stochastic Processes, Stochastics, Vol. 8, pp. 193–212, 1982.
Author information
Authors and Affiliations
Additional information
Communicated by P. Varaiya
Rights and permissions
About this article
Cite this article
Borkar, V.S. The Kumar-Becker-Lin scheme revisited. J Optim Theory Appl 66, 289–309 (1990). https://doi.org/10.1007/BF00939540
Issue Date:
DOI: https://doi.org/10.1007/BF00939540