Abstract
Herein we review certain problems in sequential optimization when the underlying dynamical system is not fully specified but has to be learned during the operation of the system. A prototypical example is the multi-armed bandit problem, which was one of Yakowitz’s many research areas. Other problems under review include stochastic approximation and adaptive control of Markov chains.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Agrawal, R., D. Teneketzis and V. Anantharam. (1989a). Asymptotically efficient adaptive allocation schemes for controlled I.I.D. processes: Finite parameter space. IEEE Trans. Automat. Contr. 34, 258–267.
Agrawal, R., D. Teneketzis and V. Anantharam. (1989b). Asymptotically efficient adaptive allocation schemes for controlled Markov chains: Finite parameter space. IEEE Trans. Automat. Contr. 34, 1249–1259.
Anantharam, V., P. Varaiya and J. Walrand. (1987). Asymptotically efficient allocation rules for multiarmed bandit problems with multiple plays. Part II: Markovian rewards. IEEE Trans. Automat. Contr.32, 975–982.
Banks, J. S. and R.K. Sundaram. (1992). Denumerable-armed bandits. Econometrica60, 1071–1096.
Banks, J. S. and R.K. Sundaram. (1994). Switching costs and the Gittins index. Econometrica62, 687–694.
Benveniste, A., M. Metivier, and P. Priouret. (1987). Adaptive Algorithms and Stochastic Approximations. Springer-Verlag, New York.
Berry, D. A. (1972). A Bernoulli two-armed bandit. Ann. Math. Statist.43, 871–897.
Blum, J. (1954). Approximation methods which converge with probability one. Ann. Math. Statist.25, 382–386.
Borkar, V. and P. Varaiya. (1979). Adaptive control of Markov chains. I: Finite parameter set. IEEE Trans. Automat. Contr.24, 953–958.
Brezzi, M. and T.L. Lai. (2000a). Incomplete learning from endogenous data in dynamic allocation. Econometrica68, 1511–1516.
Brezzi, M. and T.L. Lai. (2000b). Optimal learning and experimentation in bandit problems. To appear in J. Economic Dynamics & Control.
Chang, F. and T.L. Lai. (1987). Optimal stopping and dynamic allocation. Adv. Appl. Probab.19, 829–853.
Chernoff, H. (1967). Sequential models for clinical trials. Proc. Fifth Berkeley Symp. Math. Statist. & Probab.4, 805–812.
Univ. California Press. Fabian, V. (1967). Stochastic approximation of minima with improved asymptotic speed. Ann. Math. Statist.38, 191–200.
Fabian, V. (1971). Stochastic approximation. In Optimizing Methods in Statistics (J. Rustagi, ed.), 439–470. Academic Press, New York.
Fabius, J. and W.R. van Zwet. (1970). Some remarks on the two-armed bandit. Ann. Math. Statist.41, 1906–1916.
Gittins, J. C. (1979). Bandit processes and dynamic allocation indices (with discussion). J. Roy. Statist. Soc. Ser. B41, 148–177.
Gittins, J.C. (1989). Multi-Armed Bandit Allocation Indices. Wiley, New York.
Gittins, J.C. and D.M. Jones. (1974). A dynamic allocation index for the sequential design of experiments. In Progress in Statistics (J. Gani et al., ed.), 241–266. North Holland, Amsterdam.
Graves, T. L. and T.L. Lai. (1997). Asymptotically efficient adaptive choice of control laws in controlled Markov chains. SIAM J. Contr. Optimiz.35, 715–743.
Kaebling, L.P., M.C. Littman and A.W. Moore. (1996). Reinformement learning: A survey. J. Artificial Intelligence Res.4, 237–285.
Kiefer, J. and J. Wolfowitz. (1952). Stochastic estimation of the maximum of a regression function. Ann. Math. Statist.23, 462–466.
Kirkpatrick, S., C.D. Gelatt and M.P. Vecchi. (1983). Optimization by simulated annealing. Science220, 671–680.
Klimov, G.P. (1974/1978). Time-sharing service systems I, II. Theory Probab. Appl.19/23, 532–551/314–321.
Kumar, P.R. (1985). A survey of some results in stochastic adaptive control. SIAM J. Contr. Optimiz.23, 329–380.
Kushner, H.J. and D.S. Clark. (1978). Stochastic Approximation for Constrained and Unconstrained Systems. Springer-Verlag, New York.
Kushner, H.J. and G. Yin. (1997). Stochastic Approximation Algorithms and Applications. Springer-Verlag, New York.
Lai, T.L. (1987). Adaptive treatment allocation and the multi-armed bandit problem. Ann. Statist.15, 1091–1114.
Lai, T.L. and H. Robbins. (1979). Adaptive design and stochastic approximation. Ann. Statist.7, 1196–1221.
Lai, T.L. and H. Robbins. (1985). Asymptotically efficient adaptive allocation rules. Adv. Appl. Math.6, 4–22.
Lai, T.L. and S. Yakowitz. (1995). Machine learning and nonparametric bandit theory. IEEE Trans. Automat. Contr.40, 1199–1209.
Lai, T.L. and Z. Ying. (1988). Open bandit processes and optimal scheduling of queuing networks. Adv. Appl. Probab.20, 447–472.
Ljung, L. (1977). Analysis of recursive stochastic algorithms. IEEE Trans. Automat. Contr.22, 551–575.
Mandl, P. (1974). Estimation and control of Markov chains. Adv. Appl. Probab.6, 40–60.
Mortensen, D. (1985). Job search and labor market analysis. Handbook of Labor Economics2, 849–919.
Robbins, H. (1952). Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc.58, 527–535.
Robbins, H. and S. Monro. (1951). A stochastic approximation method. Ann. Math. Statist.22, 400–407.
Rothschild, M. (1974). A two-armed bandit theory of market pricing. J. Economic Theory9, 185–202.
Sacks, J. (1958). Asymptotic distribution of stochastic approximation procedures. Ann. Math. Statist.29, 375–405.
Spall, J.C. (1992). Multivariate stochastic approximation using a simultaneous perturbation gradient approximation. IEEE Trans. Automat. Contr.37, 332–341.
Spall, J.C. and J.A. Cristion. (1994). Nonlinear adaptive control using neural networks: Estimation with a smoothed form of simultaneous perturbation gradient approximation. Statistica Sinica4, 1–27.
Varaiya, P.P., J.C. Walrand and C. Buyukkoc. (1985). Extensions of the multiarmed bandit problem: the discounted case. IEEE Trans. Automat. Contr.30, 426–439.
Whittle, P. (1981). Arm-acquiring bandits. Ann. Probab.9, 284–292.
Yakowitz, S. (1989). A statistical foundation for machine learning, with application to Go-moku. Computers & Math.17, 1085–1102.
Yakowitz, S. (1993). A global stochastic approximation. SIAM J. Contr. Optimiz.31, 30–40.
Yakowitz, S., J. Jayawardena and S. Li. (1992). Theory for automatic learning under partially observed Markov-dependent noise. IEEE Trans. Automat. Contr.37, 1316–1324.
Yakowitz, S. and M. Kollier. (1992). Machine learning for blackjack counting strategies. J. Statist. Planning & Inference13, 295–309.
Yakowitz, S. and W. Lowe. (1991). Nonparametric bandit methods. Ann. Operat. Res.28, 297–312.
Yakowitz, S. and E. Lugosi. (1990). Random search in the presence of noise, with application to machine learning. SIAM J. Scient. & Statist. Comput.11, 702–712.
Yakowitz, S. and J. Mai. (1995). Methods and theory for off-line machine learning. IEEE Trans. Automat. Contr.40, 161–165.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer Science + Business Media, Inc.
About this chapter
Cite this chapter
Lai, T.L. (2002). Sequential Optimization Under Uncertainty. In: Dror, M., L’Ecuyer, P., Szidarovszky, F. (eds) Modeling Uncertainty. International Series in Operations Research & Management Science, vol 46. Springer, New York, NY. https://doi.org/10.1007/0-306-48102-2_3
Download citation
DOI: https://doi.org/10.1007/0-306-48102-2_3
Publisher Name: Springer, New York, NY
Print ISBN: 978-0-7923-7463-3
Online ISBN: 978-0-306-48102-4
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)