Skip to main content
Log in

Some Optimal Strategies for Bandit Problems with Beta Prior Distributions

  • Published:
Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Abstract

A bandit problem with infinitely many Bernoulli arms is considered. The parameters of Bernoulli arms are independent and identically distributed random variables from a common distribution with beta(a, b). We investigate the k-failure strategy which is a modification of Robbins's stay-with-a-winner/switch-on-a-loser strategy and three other strategies proposed recently by Berry et al. (1997, Ann. Statist., 25, 2103–2116). We show that the k-failure strategy performs poorly when b is greater than 1, and the best strategy among the k-failure strategies is the 1-failure strategy when b is less than or equal to 1. Utilizing the formulas derived by Berry et al. (1997), we obtain the asymptotic expected failure rates of these three strategies for beta prior distributions. Numerical estimations and simulations for a variety of beta prior distributions are presented to illustrate the performances of these strategies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Banks, J. S. and Sundaram, R. K. (1992). Denumerable-armed bandits, Econometrica, 60, 1071–1096.

    Google Scholar 

  • Berry, D. A. and Fristedt, B. (1985). Bandit Problems: Sequential Allocations of Experiments, Chapman and Hall, London.

    Google Scholar 

  • Berry, D. A., Chen, R. W., Zame, A., Heath, D. C. and Shepp, L. A. (1997). Bandit problems with infinitely many arms, Ann. Statist., 25, 2103–2116.

    Google Scholar 

  • Gittins, J. C. (1989). Multi-armed Bandit Allocation Indices, Wiley, New York.

    Google Scholar 

  • Herschkorn, S. J., Pekoz, E. and Ross, S. M. (1995). Policies without memory for the infinite-armed Bernoulli bandit under the average-reward criterion, Probab. Engrg. Inform. Sci., 10, 21–28.

    Google Scholar 

  • Robbins, H. (1952). Some aspects of the sequential design of experiments, Bull. Amer. Math. Soc., 58, 527–536.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

About this article

Cite this article

Lin, CT., Shiau, C.J. Some Optimal Strategies for Bandit Problems with Beta Prior Distributions. Annals of the Institute of Statistical Mathematics 52, 397–405 (2000). https://doi.org/10.1023/A:1004130209258

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1004130209258

Navigation