Skip to main content
Log in

Nonparametric bandit methods

  • Research Contributions
  • Published:
Annals of Operations Research Aims and scope Submit manuscript

Abstract

Bandits are a finite collection of random variables. Bandit problems are Markov decision problems in which, at each decision time, the decision maker selects a random variable (referred to as a bandit “arm”) and observes an outcome. The selection is based on the observation history. The objective is to sequentially choose arms so as to minimize growth (with decision time) rate of the number of suboptimal selections.

The appellation “bandit” refers to mechanical gambling machines, and the tradition stems from the question of allocating competing treatments to a sequence of patients having the same disease. Our motivation is “machine learning” in which a game-playing or assembly-line adjusting computer is faced with a sequence of statistically-similar decision problems and, as resource, has access to an expanding data base relevant to these problems.

The setting for the present study is nonparametric and infinite horizon. The central aim is to relate a methodology which postulates finite moments or, alternatively, bounded bandit arms. Under these circumstances, strategies proposed are shown to be asymptotically optimal and converge at guaranteed rates. In the bounded-arm case, the rate is optimal.

We extend the theory to the case in which the bandit population is infinite, and share some computational experience.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. J. Bather, Randomised allocation of treatments in sequential trials, Adv. Appl. Prob. 12 (1980) 174–182.

    Google Scholar 

  2. R. Bellman,Adaptive Control Processes (Princeton University Press, Princeton, NJ, 1961).

    Google Scholar 

  3. D.A. Berry and B. Fristedt,Bandit Problems: Sequential Allocation of Experiments (Chapman-Hall, New York, 1985).

    Google Scholar 

  4. H. Chernoff,Sequential Analysis and Optimal Design (SIAM, Philadelphia, 1972).

    Google Scholar 

  5. Y.S. Chow and T.L. Lai, Some one-sided theorems on the tail distribution of sample sums with applications to the last time and largest excess of boundary crossings, Trans. AMS 208 (1975) 51–72.

    Google Scholar 

  6. M.K. Clayton and D.A. Berry, Bayesian nonparametric bandits, Ann. Statist. 13 (1985) 1523–1534.

    Google Scholar 

  7. L.P. Devroye, The uniform convergence of nearest neighbor regression function estimators and their application in optimization, IEEE Trans. Info. Theory IT-24 (2) (1978) 142–151.

    Google Scholar 

  8. D.K. Fuk and S.V. Nagaev, Probability inequalities for sums of independent random variables, Theory Probab. Appl. 16 (1971) 643–660.

    Google Scholar 

  9. O. Hernández-Lerma,Adaptive Markov Control Processes (Springer, New York, 1989).

    Google Scholar 

  10. J.D. Holland, Genetic algorithms and the optimal allocation of trials, SIAM J. Comput. 2 (2) (1973) 88–105.

    Google Scholar 

  11. T.L. Lai, Adaptive treatment allocation and the multi-armed bandit problem, Ann. Statist. 15 (1987) 1091–1114.

    Google Scholar 

  12. T.L. Lai and H. Robbins, Asymptotically efficient adaptive allocation rules, Adv. Appl. Math. 6 (1985) 4–22.

    Google Scholar 

  13. T.L. Lai, H. Robbins and D. Siegmund, Sequential design of comparamtive clinical trials, in:Recent Advances in Statistics, eds. M. Rizvi, J. Rustagi, and D. Siegmund (Academic Press, New York, 1983).

    Google Scholar 

  14. A. Renyi,Probability Theory (Elsevier, New York, 1970).

    Google Scholar 

  15. H. Robbins, Some aspects of the sequential design of experiments, Bull. AMS 58 (1952) 527–535.

    Google Scholar 

  16. S. Yakowitz, A statistical foundation for machine learning, with application to go-moku, Comput. Math. Appl. 17 (1989) 1085–1102.

    Google Scholar 

  17. S. Yakowitz,Mathematics of Adaptive Control Processes (Elsevier, New York, 1969).

    Google Scholar 

  18. S. Yakowitz and E. Lugosi, Random search in the presence of noise, with application to machine learning, SIAM J. Sci. Statist. Comput. (1990).

  19. S. Yakowitz, T. Jawayardena and S. Li, Theory for automatic learning under Markov-dependent noise, with applications, submitted for publication.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yakowitz, S., Lowe, W. Nonparametric bandit methods. Ann Oper Res 28, 297–312 (1991). https://doi.org/10.1007/BF02055587

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02055587

Keywords

Navigation