Abstract
We consider the symmetrical case of the Poissonian variant of the two-armed bandit problem. We suppose that at jump-times of a non-observed Poisson process sequential switching of the bandit arms occurs. The optimality of the myopic policy is proved for all sufficiently large switching rates. In this case, explicit formulas which are good approximation for the value function are found as well.
Similar content being viewed by others
References
Bertsekas D, Shreve S (1978) Stochastic optimal control: The discrete time case. Academic Press, New York
DeGroot M (1970) Optimal statistical decisions. McGraw-Hill
Donchev D (1998) A system of functional-differential equations associated with the problem of optimal detection of jump-times of a Poisson process. JAMSA 11, to appear
Donchev D, Yushkevich A (1996) Average optimality in a Poisson bandit with switching arms. Mathematical Methods of Operations Research 45:265–280
Dynkin E, Yushkevich A (1979) Controlled Markov Processes. Springer-Verlag, New York, Berlin
Feldman D (1962) Contribution to the “two-armed bandit” problem. Ann Math Stat 33:847–856
Presman E (1990) A Poisson version of the two-armed bandit problem with discounting. Theory Prob Appl 35:307–317
Presman E, Sonin I (1990) Sequential control with incomplete data: Bayesian approach. Academic Press, New York (Russian edition 1982)
Sonin I (1976) A model of resource distribution with incomplete information. In: Modelling Scientific-Technological Progress and the Cotrol of Economic Processes Under Incomplete Information. CEMI, USSR Academy of Sciences Press Moscow, pp. 161–201 (in Russian)
Yushkevich A (1988) On the two-armed bandit problem with continuous time parameter and discounted reward. Stochastics 23:399–410
Yushkevich A (1989) Personal discussion. Summer school on probability, Chernovci, Ukraine
Yushkevich A (1989) Verification theorems for Markov decision processes with controlled deterministic drift and gradual and impulsive controls. Theory Prob Appl 34:474–496
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Donchev, D.S. On the two-armed bandit problem with non-observed Poissonian switching of arms. Mathematical Methods of Operations Research 47, 401–422 (1998). https://doi.org/10.1007/BF01198403
Revised:
Issue Date:
DOI: https://doi.org/10.1007/BF01198403