Sequential Optimization Under Uncertainty

Lai, Tze Leung

doi:10.1007/0-306-48102-2_3

Sequential Optimization Under Uncertainty

Tze Leung Lai⁶

Chapter

2200 Accesses

Part of the book series: International Series in Operations Research & Management Science ((ISOR,volume 46))

Abstract

Herein we review certain problems in sequential optimization when the underlying dynamical system is not fully specified but has to be learned during the operation of the system. A prototypical example is the multi-armed bandit problem, which was one of Yakowitz’s many research areas. Other problems under review include stochastic approximation and adaptive control of Markov chains.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agrawal, R., D. Teneketzis and V. Anantharam. (1989a). Asymptotically efficient adaptive allocation schemes for controlled I.I.D. processes: Finite parameter space. IEEE Trans. Automat. Contr. 34, 258–267.
MathSciNet MATH Google Scholar
Agrawal, R., D. Teneketzis and V. Anantharam. (1989b). Asymptotically efficient adaptive allocation schemes for controlled Markov chains: Finite parameter space. IEEE Trans. Automat. Contr. 34, 1249–1259.
MathSciNet MATH Google Scholar
Anantharam, V., P. Varaiya and J. Walrand. (1987). Asymptotically efficient allocation rules for multiarmed bandit problems with multiple plays. Part II: Markovian rewards. IEEE Trans. Automat. Contr.32, 975–982.
MATH Google Scholar
Banks, J. S. and R.K. Sundaram. (1992). Denumerable-armed bandits. Econometrica60, 1071–1096.
MathSciNet MATH Google Scholar
Banks, J. S. and R.K. Sundaram. (1994). Switching costs and the Gittins index. Econometrica62, 687–694.
MATH Google Scholar
Benveniste, A., M. Metivier, and P. Priouret. (1987). Adaptive Algorithms and Stochastic Approximations. Springer-Verlag, New York.
MATH Google Scholar
Berry, D. A. (1972). A Bernoulli two-armed bandit. Ann. Math. Statist.43, 871–897.
MathSciNet MATH Google Scholar
Blum, J. (1954). Approximation methods which converge with probability one. Ann. Math. Statist.25, 382–386.
MathSciNet MATH Google Scholar
Borkar, V. and P. Varaiya. (1979). Adaptive control of Markov chains. I: Finite parameter set. IEEE Trans. Automat. Contr.24, 953–958.
MathSciNet MATH Google Scholar
Brezzi, M. and T.L. Lai. (2000a). Incomplete learning from endogenous data in dynamic allocation. Econometrica68, 1511–1516.
MathSciNet MATH Google Scholar
Brezzi, M. and T.L. Lai. (2000b). Optimal learning and experimentation in bandit problems. To appear in J. Economic Dynamics & Control.
Google Scholar
Chang, F. and T.L. Lai. (1987). Optimal stopping and dynamic allocation. Adv. Appl. Probab.19, 829–853.
MathSciNet MATH Google Scholar
Chernoff, H. (1967). Sequential models for clinical trials. Proc. Fifth Berkeley Symp. Math. Statist. & Probab.4, 805–812.
Google Scholar
Univ. California Press. Fabian, V. (1967). Stochastic approximation of minima with improved asymptotic speed. Ann. Math. Statist.38, 191–200.
MathSciNet Google Scholar
Fabian, V. (1971). Stochastic approximation. In Optimizing Methods in Statistics (J. Rustagi, ed.), 439–470. Academic Press, New York.
Google Scholar
Fabius, J. and W.R. van Zwet. (1970). Some remarks on the two-armed bandit. Ann. Math. Statist.41, 1906–1916.
MathSciNet MATH Google Scholar
Gittins, J. C. (1979). Bandit processes and dynamic allocation indices (with discussion). J. Roy. Statist. Soc. Ser. B41, 148–177.
MathSciNet MATH Google Scholar
Gittins, J.C. (1989). Multi-Armed Bandit Allocation Indices. Wiley, New York.
MATH Google Scholar
Gittins, J.C. and D.M. Jones. (1974). A dynamic allocation index for the sequential design of experiments. In Progress in Statistics (J. Gani et al., ed.), 241–266. North Holland, Amsterdam.
Google Scholar
Graves, T. L. and T.L. Lai. (1997). Asymptotically efficient adaptive choice of control laws in controlled Markov chains. SIAM J. Contr. Optimiz.35, 715–743.
MathSciNet MATH Google Scholar
Kaebling, L.P., M.C. Littman and A.W. Moore. (1996). Reinformement learning: A survey. J. Artificial Intelligence Res.4, 237–285.
Google Scholar
Kiefer, J. and J. Wolfowitz. (1952). Stochastic estimation of the maximum of a regression function. Ann. Math. Statist.23, 462–466.
MathSciNet MATH Google Scholar
Kirkpatrick, S., C.D. Gelatt and M.P. Vecchi. (1983). Optimization by simulated annealing. Science220, 671–680.
MathSciNet MATH Google Scholar
Klimov, G.P. (1974/1978). Time-sharing service systems I, II. Theory Probab. Appl.19/23, 532–551/314–321.
MATH Google Scholar
Kumar, P.R. (1985). A survey of some results in stochastic adaptive control. SIAM J. Contr. Optimiz.23, 329–380.
MathSciNet MATH Google Scholar
Kushner, H.J. and D.S. Clark. (1978). Stochastic Approximation for Constrained and Unconstrained Systems. Springer-Verlag, New York.
MATH Google Scholar
Kushner, H.J. and G. Yin. (1997). Stochastic Approximation Algorithms and Applications. Springer-Verlag, New York.
MATH Google Scholar
Lai, T.L. (1987). Adaptive treatment allocation and the multi-armed bandit problem. Ann. Statist.15, 1091–1114.
MathSciNet MATH Google Scholar
Lai, T.L. and H. Robbins. (1979). Adaptive design and stochastic approximation. Ann. Statist.7, 1196–1221.
MathSciNet MATH Google Scholar
Lai, T.L. and H. Robbins. (1985). Asymptotically efficient adaptive allocation rules. Adv. Appl. Math.6, 4–22.
MathSciNet MATH Google Scholar
Lai, T.L. and S. Yakowitz. (1995). Machine learning and nonparametric bandit theory. IEEE Trans. Automat. Contr.40, 1199–1209.
MathSciNet MATH Google Scholar
Lai, T.L. and Z. Ying. (1988). Open bandit processes and optimal scheduling of queuing networks. Adv. Appl. Probab.20, 447–472.
MATH Google Scholar
Ljung, L. (1977). Analysis of recursive stochastic algorithms. IEEE Trans. Automat. Contr.22, 551–575.
MathSciNet MATH Google Scholar
Mandl, P. (1974). Estimation and control of Markov chains. Adv. Appl. Probab.6, 40–60.
MathSciNet MATH Google Scholar
Mortensen, D. (1985). Job search and labor market analysis. Handbook of Labor Economics2, 849–919.
Google Scholar
Robbins, H. (1952). Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc.58, 527–535.
MathSciNet MATH Google Scholar
Robbins, H. and S. Monro. (1951). A stochastic approximation method. Ann. Math. Statist.22, 400–407.
MathSciNet MATH Google Scholar
Rothschild, M. (1974). A two-armed bandit theory of market pricing. J. Economic Theory9, 185–202.
MathSciNet Google Scholar
Sacks, J. (1958). Asymptotic distribution of stochastic approximation procedures. Ann. Math. Statist.29, 375–405.
MathSciNet MATH Google Scholar
Spall, J.C. (1992). Multivariate stochastic approximation using a simultaneous perturbation gradient approximation. IEEE Trans. Automat. Contr.37, 332–341.
MathSciNet MATH Google Scholar
Spall, J.C. and J.A. Cristion. (1994). Nonlinear adaptive control using neural networks: Estimation with a smoothed form of simultaneous perturbation gradient approximation. Statistica Sinica4, 1–27.
MathSciNet MATH Google Scholar
Varaiya, P.P., J.C. Walrand and C. Buyukkoc. (1985). Extensions of the multiarmed bandit problem: the discounted case. IEEE Trans. Automat. Contr.30, 426–439.
MathSciNet MATH Google Scholar
Whittle, P. (1981). Arm-acquiring bandits. Ann. Probab.9, 284–292.
MathSciNet MATH Google Scholar
Yakowitz, S. (1989). A statistical foundation for machine learning, with application to Go-moku. Computers & Math.17, 1085–1102.
MathSciNet MATH Google Scholar
Yakowitz, S. (1993). A global stochastic approximation. SIAM J. Contr. Optimiz.31, 30–40.
MathSciNet MATH Google Scholar
Yakowitz, S., J. Jayawardena and S. Li. (1992). Theory for automatic learning under partially observed Markov-dependent noise. IEEE Trans. Automat. Contr.37, 1316–1324.
MathSciNet MATH Google Scholar
Yakowitz, S. and M. Kollier. (1992). Machine learning for blackjack counting strategies. J. Statist. Planning & Inference13, 295–309.
MathSciNet MATH Google Scholar
Yakowitz, S. and W. Lowe. (1991). Nonparametric bandit methods. Ann. Operat. Res.28, 297–312.
MathSciNet MATH Google Scholar
Yakowitz, S. and E. Lugosi. (1990). Random search in the presence of noise, with application to machine learning. SIAM J. Scient. & Statist. Comput.11, 702–712.
MathSciNet MATH Google Scholar
Yakowitz, S. and J. Mai. (1995). Methods and theory for off-line machine learning. IEEE Trans. Automat. Contr.40, 161–165.
MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Stanford University, USA
Tze Leung Lai

Authors

Tze Leung Lai
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Arizona, USA
Moshe Dror & Ferenc Szidarovszky &
Université de Montréal, France
Pierre L’Ecuyer

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Lai, T.L. (2002). Sequential Optimization Under Uncertainty. In: Dror, M., L’Ecuyer, P., Szidarovszky, F. (eds) Modeling Uncertainty. International Series in Operations Research & Management Science, vol 46. Springer, New York, NY. https://doi.org/10.1007/0-306-48102-2_3

Download citation

DOI: https://doi.org/10.1007/0-306-48102-2_3
Publisher Name: Springer, New York, NY
Print ISBN: 978-0-7923-7463-3
Online ISBN: 978-0-306-48102-4
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics