Sequential Optimization Under Uncertainty
Herein we review certain problems in sequential optimization when the underlying dynamical system is not fully specified but has to be learned during the operation of the system. A prototypical example is the multi-armed bandit problem, which was one of Yakowitz’s many research areas. Other problems under review include stochastic approximation and adaptive control of Markov chains.
KeywordsAdaptive Control Switching Cost Stochastic Approximation Control Rule Bandit Problem
Unable to display preview. Download preview PDF.
- Brezzi, M. and T.L. Lai. (2000b). Optimal learning and experimentation in bandit problems. To appear in J. Economic Dynamics & Control.Google Scholar
- Chernoff, H. (1967). Sequential models for clinical trials. Proc. Fifth Berkeley Symp. Math. Statist. & Probab.4, 805–812.Google Scholar
- Fabian, V. (1971). Stochastic approximation. In Optimizing Methods in Statistics (J. Rustagi, ed.), 439–470. Academic Press, New York.Google Scholar
- Gittins, J.C. and D.M. Jones. (1974). A dynamic allocation index for the sequential design of experiments. In Progress in Statistics (J. Gani et al., ed.), 241–266. North Holland, Amsterdam.Google Scholar
- Kaebling, L.P., M.C. Littman and A.W. Moore. (1996). Reinformement learning: A survey. J. Artificial Intelligence Res.4, 237–285.Google Scholar
- Mortensen, D. (1985). Job search and labor market analysis. Handbook of Labor Economics2, 849–919.Google Scholar