Skip to main content
Log in

On the two-armed bandit problem with non-observed Poissonian switching of arms

  • Published:
Mathematical Methods of Operations Research Aims and scope Submit manuscript

Abstract

We consider the symmetrical case of the Poissonian variant of the two-armed bandit problem. We suppose that at jump-times of a non-observed Poisson process sequential switching of the bandit arms occurs. The optimality of the myopic policy is proved for all sufficiently large switching rates. In this case, explicit formulas which are good approximation for the value function are found as well.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Bertsekas D, Shreve S (1978) Stochastic optimal control: The discrete time case. Academic Press, New York

    Google Scholar 

  2. DeGroot M (1970) Optimal statistical decisions. McGraw-Hill

  3. Donchev D (1998) A system of functional-differential equations associated with the problem of optimal detection of jump-times of a Poisson process. JAMSA 11, to appear

  4. Donchev D, Yushkevich A (1996) Average optimality in a Poisson bandit with switching arms. Mathematical Methods of Operations Research 45:265–280

    Google Scholar 

  5. Dynkin E, Yushkevich A (1979) Controlled Markov Processes. Springer-Verlag, New York, Berlin

    Google Scholar 

  6. Feldman D (1962) Contribution to the “two-armed bandit” problem. Ann Math Stat 33:847–856

    Google Scholar 

  7. Presman E (1990) A Poisson version of the two-armed bandit problem with discounting. Theory Prob Appl 35:307–317

    Google Scholar 

  8. Presman E, Sonin I (1990) Sequential control with incomplete data: Bayesian approach. Academic Press, New York (Russian edition 1982)

    Google Scholar 

  9. Sonin I (1976) A model of resource distribution with incomplete information. In: Modelling Scientific-Technological Progress and the Cotrol of Economic Processes Under Incomplete Information. CEMI, USSR Academy of Sciences Press Moscow, pp. 161–201 (in Russian)

  10. Yushkevich A (1988) On the two-armed bandit problem with continuous time parameter and discounted reward. Stochastics 23:399–410

    Google Scholar 

  11. Yushkevich A (1989) Personal discussion. Summer school on probability, Chernovci, Ukraine

    Google Scholar 

  12. Yushkevich A (1989) Verification theorems for Markov decision processes with controlled deterministic drift and gradual and impulsive controls. Theory Prob Appl 34:474–496

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Donchev, D.S. On the two-armed bandit problem with non-observed Poissonian switching of arms. Mathematical Methods of Operations Research 47, 401–422 (1998). https://doi.org/10.1007/BF01198403

Download citation

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF01198403

Key words

Navigation