Skip to main content

Sequential Optimization Under Uncertainty

  • Chapter
  • 2200 Accesses

Part of the book series: International Series in Operations Research & Management Science ((ISOR,volume 46))

Abstract

Herein we review certain problems in sequential optimization when the underlying dynamical system is not fully specified but has to be learned during the operation of the system. A prototypical example is the multi-armed bandit problem, which was one of Yakowitz’s many research areas. Other problems under review include stochastic approximation and adaptive control of Markov chains.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Agrawal, R., D. Teneketzis and V. Anantharam. (1989a). Asymptotically efficient adaptive allocation schemes for controlled I.I.D. processes: Finite parameter space. IEEE Trans. Automat. Contr. 34, 258–267.

    MathSciNet  MATH  Google Scholar 

  • Agrawal, R., D. Teneketzis and V. Anantharam. (1989b). Asymptotically efficient adaptive allocation schemes for controlled Markov chains: Finite parameter space. IEEE Trans. Automat. Contr. 34, 1249–1259.

    MathSciNet  MATH  Google Scholar 

  • Anantharam, V., P. Varaiya and J. Walrand. (1987). Asymptotically efficient allocation rules for multiarmed bandit problems with multiple plays. Part II: Markovian rewards. IEEE Trans. Automat. Contr.32, 975–982.

    MATH  Google Scholar 

  • Banks, J. S. and R.K. Sundaram. (1992). Denumerable-armed bandits. Econometrica60, 1071–1096.

    MathSciNet  MATH  Google Scholar 

  • Banks, J. S. and R.K. Sundaram. (1994). Switching costs and the Gittins index. Econometrica62, 687–694.

    MATH  Google Scholar 

  • Benveniste, A., M. Metivier, and P. Priouret. (1987). Adaptive Algorithms and Stochastic Approximations. Springer-Verlag, New York.

    MATH  Google Scholar 

  • Berry, D. A. (1972). A Bernoulli two-armed bandit. Ann. Math. Statist.43, 871–897.

    MathSciNet  MATH  Google Scholar 

  • Blum, J. (1954). Approximation methods which converge with probability one. Ann. Math. Statist.25, 382–386.

    MathSciNet  MATH  Google Scholar 

  • Borkar, V. and P. Varaiya. (1979). Adaptive control of Markov chains. I: Finite parameter set. IEEE Trans. Automat. Contr.24, 953–958.

    MathSciNet  MATH  Google Scholar 

  • Brezzi, M. and T.L. Lai. (2000a). Incomplete learning from endogenous data in dynamic allocation. Econometrica68, 1511–1516.

    MathSciNet  MATH  Google Scholar 

  • Brezzi, M. and T.L. Lai. (2000b). Optimal learning and experimentation in bandit problems. To appear in J. Economic Dynamics & Control.

    Google Scholar 

  • Chang, F. and T.L. Lai. (1987). Optimal stopping and dynamic allocation. Adv. Appl. Probab.19, 829–853.

    MathSciNet  MATH  Google Scholar 

  • Chernoff, H. (1967). Sequential models for clinical trials. Proc. Fifth Berkeley Symp. Math. Statist. & Probab.4, 805–812.

    Google Scholar 

  • Univ. California Press. Fabian, V. (1967). Stochastic approximation of minima with improved asymptotic speed. Ann. Math. Statist.38, 191–200.

    MathSciNet  Google Scholar 

  • Fabian, V. (1971). Stochastic approximation. In Optimizing Methods in Statistics (J. Rustagi, ed.), 439–470. Academic Press, New York.

    Google Scholar 

  • Fabius, J. and W.R. van Zwet. (1970). Some remarks on the two-armed bandit. Ann. Math. Statist.41, 1906–1916.

    MathSciNet  MATH  Google Scholar 

  • Gittins, J. C. (1979). Bandit processes and dynamic allocation indices (with discussion). J. Roy. Statist. Soc. Ser. B41, 148–177.

    MathSciNet  MATH  Google Scholar 

  • Gittins, J.C. (1989). Multi-Armed Bandit Allocation Indices. Wiley, New York.

    MATH  Google Scholar 

  • Gittins, J.C. and D.M. Jones. (1974). A dynamic allocation index for the sequential design of experiments. In Progress in Statistics (J. Gani et al., ed.), 241–266. North Holland, Amsterdam.

    Google Scholar 

  • Graves, T. L. and T.L. Lai. (1997). Asymptotically efficient adaptive choice of control laws in controlled Markov chains. SIAM J. Contr. Optimiz.35, 715–743.

    MathSciNet  MATH  Google Scholar 

  • Kaebling, L.P., M.C. Littman and A.W. Moore. (1996). Reinformement learning: A survey. J. Artificial Intelligence Res.4, 237–285.

    Google Scholar 

  • Kiefer, J. and J. Wolfowitz. (1952). Stochastic estimation of the maximum of a regression function. Ann. Math. Statist.23, 462–466.

    MathSciNet  MATH  Google Scholar 

  • Kirkpatrick, S., C.D. Gelatt and M.P. Vecchi. (1983). Optimization by simulated annealing. Science220, 671–680.

    MathSciNet  MATH  Google Scholar 

  • Klimov, G.P. (1974/1978). Time-sharing service systems I, II. Theory Probab. Appl.19/23, 532–551/314–321.

    MATH  Google Scholar 

  • Kumar, P.R. (1985). A survey of some results in stochastic adaptive control. SIAM J. Contr. Optimiz.23, 329–380.

    MathSciNet  MATH  Google Scholar 

  • Kushner, H.J. and D.S. Clark. (1978). Stochastic Approximation for Constrained and Unconstrained Systems. Springer-Verlag, New York.

    MATH  Google Scholar 

  • Kushner, H.J. and G. Yin. (1997). Stochastic Approximation Algorithms and Applications. Springer-Verlag, New York.

    MATH  Google Scholar 

  • Lai, T.L. (1987). Adaptive treatment allocation and the multi-armed bandit problem. Ann. Statist.15, 1091–1114.

    MathSciNet  MATH  Google Scholar 

  • Lai, T.L. and H. Robbins. (1979). Adaptive design and stochastic approximation. Ann. Statist.7, 1196–1221.

    MathSciNet  MATH  Google Scholar 

  • Lai, T.L. and H. Robbins. (1985). Asymptotically efficient adaptive allocation rules. Adv. Appl. Math.6, 4–22.

    MathSciNet  MATH  Google Scholar 

  • Lai, T.L. and S. Yakowitz. (1995). Machine learning and nonparametric bandit theory. IEEE Trans. Automat. Contr.40, 1199–1209.

    MathSciNet  MATH  Google Scholar 

  • Lai, T.L. and Z. Ying. (1988). Open bandit processes and optimal scheduling of queuing networks. Adv. Appl. Probab.20, 447–472.

    MATH  Google Scholar 

  • Ljung, L. (1977). Analysis of recursive stochastic algorithms. IEEE Trans. Automat. Contr.22, 551–575.

    MathSciNet  MATH  Google Scholar 

  • Mandl, P. (1974). Estimation and control of Markov chains. Adv. Appl. Probab.6, 40–60.

    MathSciNet  MATH  Google Scholar 

  • Mortensen, D. (1985). Job search and labor market analysis. Handbook of Labor Economics2, 849–919.

    Google Scholar 

  • Robbins, H. (1952). Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc.58, 527–535.

    MathSciNet  MATH  Google Scholar 

  • Robbins, H. and S. Monro. (1951). A stochastic approximation method. Ann. Math. Statist.22, 400–407.

    MathSciNet  MATH  Google Scholar 

  • Rothschild, M. (1974). A two-armed bandit theory of market pricing. J. Economic Theory9, 185–202.

    MathSciNet  Google Scholar 

  • Sacks, J. (1958). Asymptotic distribution of stochastic approximation procedures. Ann. Math. Statist.29, 375–405.

    MathSciNet  MATH  Google Scholar 

  • Spall, J.C. (1992). Multivariate stochastic approximation using a simultaneous perturbation gradient approximation. IEEE Trans. Automat. Contr.37, 332–341.

    MathSciNet  MATH  Google Scholar 

  • Spall, J.C. and J.A. Cristion. (1994). Nonlinear adaptive control using neural networks: Estimation with a smoothed form of simultaneous perturbation gradient approximation. Statistica Sinica4, 1–27.

    MathSciNet  MATH  Google Scholar 

  • Varaiya, P.P., J.C. Walrand and C. Buyukkoc. (1985). Extensions of the multiarmed bandit problem: the discounted case. IEEE Trans. Automat. Contr.30, 426–439.

    MathSciNet  MATH  Google Scholar 

  • Whittle, P. (1981). Arm-acquiring bandits. Ann. Probab.9, 284–292.

    MathSciNet  MATH  Google Scholar 

  • Yakowitz, S. (1989). A statistical foundation for machine learning, with application to Go-moku. Computers & Math.17, 1085–1102.

    MathSciNet  MATH  Google Scholar 

  • Yakowitz, S. (1993). A global stochastic approximation. SIAM J. Contr. Optimiz.31, 30–40.

    MathSciNet  MATH  Google Scholar 

  • Yakowitz, S., J. Jayawardena and S. Li. (1992). Theory for automatic learning under partially observed Markov-dependent noise. IEEE Trans. Automat. Contr.37, 1316–1324.

    MathSciNet  MATH  Google Scholar 

  • Yakowitz, S. and M. Kollier. (1992). Machine learning for blackjack counting strategies. J. Statist. Planning & Inference13, 295–309.

    MathSciNet  MATH  Google Scholar 

  • Yakowitz, S. and W. Lowe. (1991). Nonparametric bandit methods. Ann. Operat. Res.28, 297–312.

    MathSciNet  MATH  Google Scholar 

  • Yakowitz, S. and E. Lugosi. (1990). Random search in the presence of noise, with application to machine learning. SIAM J. Scient. & Statist. Comput.11, 702–712.

    MathSciNet  MATH  Google Scholar 

  • Yakowitz, S. and J. Mai. (1995). Methods and theory for off-line machine learning. IEEE Trans. Automat. Contr.40, 161–165.

    MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer Science + Business Media, Inc.

About this chapter

Cite this chapter

Lai, T.L. (2002). Sequential Optimization Under Uncertainty. In: Dror, M., L’Ecuyer, P., Szidarovszky, F. (eds) Modeling Uncertainty. International Series in Operations Research & Management Science, vol 46. Springer, New York, NY. https://doi.org/10.1007/0-306-48102-2_3

Download citation

Publish with us

Policies and ethics