Nonparametric bandit methods

Yakowitz, Sid; Lowe, Wing

doi:10.1007/BF02055587

Nonparametric bandit methods

Research Contributions
Published: December 1991

Volume 28, pages 297–312, (1991)
Cite this article

Annals of Operations Research Aims and scope Submit manuscript

Sid Yakowitz¹ &
Wing Lowe¹

185 Accesses
29 Citations
Explore all metrics

Abstract

Bandits are a finite collection of random variables. Bandit problems are Markov decision problems in which, at each decision time, the decision maker selects a random variable (referred to as a bandit “arm”) and observes an outcome. The selection is based on the observation history. The objective is to sequentially choose arms so as to minimize growth (with decision time) rate of the number of suboptimal selections.

The appellation “bandit” refers to mechanical gambling machines, and the tradition stems from the question of allocating competing treatments to a sequence of patients having the same disease. Our motivation is “machine learning” in which a game-playing or assembly-line adjusting computer is faced with a sequence of statistically-similar decision problems and, as resource, has access to an expanding data base relevant to these problems.

The setting for the present study is nonparametric and infinite horizon. The central aim is to relate a methodology which postulates finite moments or, alternatively, bounded bandit arms. Under these circumstances, strategies proposed are shown to be asymptotically optimal and converge at guaranteed rates. In the bounded-arm case, the rate is optimal.

We extend the theory to the case in which the bandit population is infinite, and share some computational experience.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bandit Procedures for Designing Patient-Centric Clinical Trials

Computing the Performance of a New Adaptive Sampling Algorithm Based on The Gittins Index in Experiments with Exponential Rewards

Refined Algorithms for Infinitely Many-Armed Bandits with Deterministic Rewards

References

J. Bather, Randomised allocation of treatments in sequential trials, Adv. Appl. Prob. 12 (1980) 174–182.
Google Scholar
R. Bellman,Adaptive Control Processes (Princeton University Press, Princeton, NJ, 1961).
Google Scholar
D.A. Berry and B. Fristedt,Bandit Problems: Sequential Allocation of Experiments (Chapman-Hall, New York, 1985).
Google Scholar
H. Chernoff,Sequential Analysis and Optimal Design (SIAM, Philadelphia, 1972).
Google Scholar
Y.S. Chow and T.L. Lai, Some one-sided theorems on the tail distribution of sample sums with applications to the last time and largest excess of boundary crossings, Trans. AMS 208 (1975) 51–72.
Google Scholar
M.K. Clayton and D.A. Berry, Bayesian nonparametric bandits, Ann. Statist. 13 (1985) 1523–1534.
Google Scholar
L.P. Devroye, The uniform convergence of nearest neighbor regression function estimators and their application in optimization, IEEE Trans. Info. Theory IT-24 (2) (1978) 142–151.
Google Scholar
D.K. Fuk and S.V. Nagaev, Probability inequalities for sums of independent random variables, Theory Probab. Appl. 16 (1971) 643–660.
Google Scholar
O. Hernández-Lerma,Adaptive Markov Control Processes (Springer, New York, 1989).
Google Scholar
J.D. Holland, Genetic algorithms and the optimal allocation of trials, SIAM J. Comput. 2 (2) (1973) 88–105.
Google Scholar
T.L. Lai, Adaptive treatment allocation and the multi-armed bandit problem, Ann. Statist. 15 (1987) 1091–1114.
Google Scholar
T.L. Lai and H. Robbins, Asymptotically efficient adaptive allocation rules, Adv. Appl. Math. 6 (1985) 4–22.
Google Scholar
T.L. Lai, H. Robbins and D. Siegmund, Sequential design of comparamtive clinical trials, in:Recent Advances in Statistics, eds. M. Rizvi, J. Rustagi, and D. Siegmund (Academic Press, New York, 1983).
Google Scholar
A. Renyi,Probability Theory (Elsevier, New York, 1970).
Google Scholar
H. Robbins, Some aspects of the sequential design of experiments, Bull. AMS 58 (1952) 527–535.
Google Scholar
S. Yakowitz, A statistical foundation for machine learning, with application to go-moku, Comput. Math. Appl. 17 (1989) 1085–1102.
Google Scholar
S. Yakowitz,Mathematics of Adaptive Control Processes (Elsevier, New York, 1969).
Google Scholar
S. Yakowitz and E. Lugosi, Random search in the presence of noise, with application to machine learning, SIAM J. Sci. Statist. Comput. (1990).
S. Yakowitz, T. Jawayardena and S. Li, Theory for automatic learning under Markov-dependent noise, with applications, submitted for publication.

Download references

Author information

Authors and Affiliations

Systems and Industrial Engineering Department, University of Arizona, 85721, Tucson, AZ, USA
Sid Yakowitz & Wing Lowe

Authors

Sid Yakowitz
View author publications
You can also search for this author in PubMed Google Scholar
Wing Lowe
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yakowitz, S., Lowe, W. Nonparametric bandit methods. Ann Oper Res 28, 297–312 (1991). https://doi.org/10.1007/BF02055587

Download citation

Issue Date: December 1991
DOI: https://doi.org/10.1007/BF02055587

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Nonparametric bandit methods

Abstract

Access this article

Similar content being viewed by others

Bandit Procedures for Designing Patient-Centric Clinical Trials

Computing the Performance of a New Adaptive Sampling Algorithm Based on The Gittins Index in Experiments with Exponential Rewards

Refined Algorithms for Infinitely Many-Armed Bandits with Deterministic Rewards

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Nonparametric bandit methods

Abstract

Access this article

Similar content being viewed by others

Bandit Procedures for Designing Patient-Centric Clinical Trials

Computing the Performance of a New Adaptive Sampling Algorithm Based on The Gittins Index in Experiments with Exponential Rewards

Refined Algorithms for Infinitely Many-Armed Bandits with Deterministic Rewards

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation