Sequential Optimization Under Uncertainty

  • Tze Leung Lai
Part of the International Series in Operations Research & Management Science book series (ISOR, volume 46)


Herein we review certain problems in sequential optimization when the underlying dynamical system is not fully specified but has to be learned during the operation of the system. A prototypical example is the multi-armed bandit problem, which was one of Yakowitz’s many research areas. Other problems under review include stochastic approximation and adaptive control of Markov chains.


Adaptive Control Switching Cost Stochastic Approximation Control Rule Bandit Problem 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Agrawal, R., D. Teneketzis and V. Anantharam. (1989a). Asymptotically efficient adaptive allocation schemes for controlled I.I.D. processes: Finite parameter space. IEEE Trans. Automat. Contr. 34, 258–267.MathSciNetzbMATHGoogle Scholar
  2. Agrawal, R., D. Teneketzis and V. Anantharam. (1989b). Asymptotically efficient adaptive allocation schemes for controlled Markov chains: Finite parameter space. IEEE Trans. Automat. Contr. 34, 1249–1259.MathSciNetzbMATHGoogle Scholar
  3. Anantharam, V., P. Varaiya and J. Walrand. (1987). Asymptotically efficient allocation rules for multiarmed bandit problems with multiple plays. Part II: Markovian rewards. IEEE Trans. Automat. Contr.32, 975–982.zbMATHGoogle Scholar
  4. Banks, J. S. and R.K. Sundaram. (1992). Denumerable-armed bandits. Econometrica60, 1071–1096.MathSciNetzbMATHGoogle Scholar
  5. Banks, J. S. and R.K. Sundaram. (1994). Switching costs and the Gittins index. Econometrica62, 687–694.zbMATHGoogle Scholar
  6. Benveniste, A., M. Metivier, and P. Priouret. (1987). Adaptive Algorithms and Stochastic Approximations. Springer-Verlag, New York.zbMATHGoogle Scholar
  7. Berry, D. A. (1972). A Bernoulli two-armed bandit. Ann. Math. Statist.43, 871–897.MathSciNetzbMATHGoogle Scholar
  8. Blum, J. (1954). Approximation methods which converge with probability one. Ann. Math. Statist.25, 382–386.MathSciNetzbMATHGoogle Scholar
  9. Borkar, V. and P. Varaiya. (1979). Adaptive control of Markov chains. I: Finite parameter set. IEEE Trans. Automat. Contr.24, 953–958.MathSciNetzbMATHGoogle Scholar
  10. Brezzi, M. and T.L. Lai. (2000a). Incomplete learning from endogenous data in dynamic allocation. Econometrica68, 1511–1516.MathSciNetzbMATHGoogle Scholar
  11. Brezzi, M. and T.L. Lai. (2000b). Optimal learning and experimentation in bandit problems. To appear in J. Economic Dynamics & Control.Google Scholar
  12. Chang, F. and T.L. Lai. (1987). Optimal stopping and dynamic allocation. Adv. Appl. Probab.19, 829–853.MathSciNetzbMATHGoogle Scholar
  13. Chernoff, H. (1967). Sequential models for clinical trials. Proc. Fifth Berkeley Symp. Math. Statist. & Probab.4, 805–812.Google Scholar
  14. Univ. California Press. Fabian, V. (1967). Stochastic approximation of minima with improved asymptotic speed. Ann. Math. Statist.38, 191–200.MathSciNetGoogle Scholar
  15. Fabian, V. (1971). Stochastic approximation. In Optimizing Methods in Statistics (J. Rustagi, ed.), 439–470. Academic Press, New York.Google Scholar
  16. Fabius, J. and W.R. van Zwet. (1970). Some remarks on the two-armed bandit. Ann. Math. Statist.41, 1906–1916.MathSciNetzbMATHGoogle Scholar
  17. Gittins, J. C. (1979). Bandit processes and dynamic allocation indices (with discussion). J. Roy. Statist. Soc. Ser. B41, 148–177.MathSciNetzbMATHGoogle Scholar
  18. Gittins, J.C. (1989). Multi-Armed Bandit Allocation Indices. Wiley, New York.zbMATHGoogle Scholar
  19. Gittins, J.C. and D.M. Jones. (1974). A dynamic allocation index for the sequential design of experiments. In Progress in Statistics (J. Gani et al., ed.), 241–266. North Holland, Amsterdam.Google Scholar
  20. Graves, T. L. and T.L. Lai. (1997). Asymptotically efficient adaptive choice of control laws in controlled Markov chains. SIAM J. Contr. Optimiz.35, 715–743.MathSciNetzbMATHGoogle Scholar
  21. Kaebling, L.P., M.C. Littman and A.W. Moore. (1996). Reinformement learning: A survey. J. Artificial Intelligence Res.4, 237–285.Google Scholar
  22. Kiefer, J. and J. Wolfowitz. (1952). Stochastic estimation of the maximum of a regression function. Ann. Math. Statist.23, 462–466.MathSciNetzbMATHGoogle Scholar
  23. Kirkpatrick, S., C.D. Gelatt and M.P. Vecchi. (1983). Optimization by simulated annealing. Science220, 671–680.MathSciNetzbMATHGoogle Scholar
  24. Klimov, G.P. (1974/1978). Time-sharing service systems I, II. Theory Probab. Appl.19/23, 532–551/314–321.zbMATHGoogle Scholar
  25. Kumar, P.R. (1985). A survey of some results in stochastic adaptive control. SIAM J. Contr. Optimiz.23, 329–380.MathSciNetzbMATHGoogle Scholar
  26. Kushner, H.J. and D.S. Clark. (1978). Stochastic Approximation for Constrained and Unconstrained Systems. Springer-Verlag, New York.zbMATHGoogle Scholar
  27. Kushner, H.J. and G. Yin. (1997). Stochastic Approximation Algorithms and Applications. Springer-Verlag, New York.zbMATHGoogle Scholar
  28. Lai, T.L. (1987). Adaptive treatment allocation and the multi-armed bandit problem. Ann. Statist.15, 1091–1114.MathSciNetzbMATHGoogle Scholar
  29. Lai, T.L. and H. Robbins. (1979). Adaptive design and stochastic approximation. Ann. Statist.7, 1196–1221.MathSciNetzbMATHGoogle Scholar
  30. Lai, T.L. and H. Robbins. (1985). Asymptotically efficient adaptive allocation rules. Adv. Appl. Math.6, 4–22.MathSciNetzbMATHGoogle Scholar
  31. Lai, T.L. and S. Yakowitz. (1995). Machine learning and nonparametric bandit theory. IEEE Trans. Automat. Contr.40, 1199–1209.MathSciNetzbMATHGoogle Scholar
  32. Lai, T.L. and Z. Ying. (1988). Open bandit processes and optimal scheduling of queuing networks. Adv. Appl. Probab.20, 447–472.zbMATHGoogle Scholar
  33. Ljung, L. (1977). Analysis of recursive stochastic algorithms. IEEE Trans. Automat. Contr.22, 551–575.MathSciNetzbMATHGoogle Scholar
  34. Mandl, P. (1974). Estimation and control of Markov chains. Adv. Appl. Probab.6, 40–60.MathSciNetzbMATHGoogle Scholar
  35. Mortensen, D. (1985). Job search and labor market analysis. Handbook of Labor Economics2, 849–919.Google Scholar
  36. Robbins, H. (1952). Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc.58, 527–535.MathSciNetzbMATHGoogle Scholar
  37. Robbins, H. and S. Monro. (1951). A stochastic approximation method. Ann. Math. Statist.22, 400–407.MathSciNetzbMATHGoogle Scholar
  38. Rothschild, M. (1974). A two-armed bandit theory of market pricing. J. Economic Theory9, 185–202.MathSciNetGoogle Scholar
  39. Sacks, J. (1958). Asymptotic distribution of stochastic approximation procedures. Ann. Math. Statist.29, 375–405.MathSciNetzbMATHGoogle Scholar
  40. Spall, J.C. (1992). Multivariate stochastic approximation using a simultaneous perturbation gradient approximation. IEEE Trans. Automat. Contr.37, 332–341.MathSciNetzbMATHGoogle Scholar
  41. Spall, J.C. and J.A. Cristion. (1994). Nonlinear adaptive control using neural networks: Estimation with a smoothed form of simultaneous perturbation gradient approximation. Statistica Sinica4, 1–27.MathSciNetzbMATHGoogle Scholar
  42. Varaiya, P.P., J.C. Walrand and C. Buyukkoc. (1985). Extensions of the multiarmed bandit problem: the discounted case. IEEE Trans. Automat. Contr.30, 426–439.MathSciNetzbMATHGoogle Scholar
  43. Whittle, P. (1981). Arm-acquiring bandits. Ann. Probab.9, 284–292.MathSciNetzbMATHGoogle Scholar
  44. Yakowitz, S. (1989). A statistical foundation for machine learning, with application to Go-moku. Computers & Math.17, 1085–1102.MathSciNetzbMATHGoogle Scholar
  45. Yakowitz, S. (1993). A global stochastic approximation. SIAM J. Contr. Optimiz.31, 30–40.MathSciNetzbMATHGoogle Scholar
  46. Yakowitz, S., J. Jayawardena and S. Li. (1992). Theory for automatic learning under partially observed Markov-dependent noise. IEEE Trans. Automat. Contr.37, 1316–1324.MathSciNetzbMATHGoogle Scholar
  47. Yakowitz, S. and M. Kollier. (1992). Machine learning for blackjack counting strategies. J. Statist. Planning & Inference13, 295–309.MathSciNetzbMATHGoogle Scholar
  48. Yakowitz, S. and W. Lowe. (1991). Nonparametric bandit methods. Ann. Operat. Res.28, 297–312.MathSciNetzbMATHGoogle Scholar
  49. Yakowitz, S. and E. Lugosi. (1990). Random search in the presence of noise, with application to machine learning. SIAM J. Scient. & Statist. Comput.11, 702–712.MathSciNetzbMATHGoogle Scholar
  50. Yakowitz, S. and J. Mai. (1995). Methods and theory for off-line machine learning. IEEE Trans. Automat. Contr.40, 161–165.MathSciNetzbMATHGoogle Scholar

Copyright information

© Springer Science + Business Media, Inc. 2002

Authors and Affiliations

  • Tze Leung Lai
    • 1
  1. 1.Stanford UniversityUSA

Personalised recommendations