On the bandit problem

  • Ulrich Herkenrath
  • Radu Theodurescu
Part II: Research Reports
Part of the Lecture Notes in Control and Information Sciences book series (LNCIS, volume 16)


In this paper we propose first an approach of studying the so-called two-armed bandit problem which is essentially based on the theory of random systems with complete connections. Next we apply stochastic approximation techniques for finding an optimal strategy. For detailed proofs, see [2–5].

In Section 1 we present some basic definitions and several results from the theory of random systems with complete connections. Next we adapt several concepts concerning general control systems, which we developed in a previous paper [2], to the actual circumstances. Further we deal with the two-armed bandit problem under two possible decision procedures. The first procedure is based on learning techniques, whereas the latter is based on sequential techniques. In both cases we examine the expediency and the optimality of these procedures. In Section 2 we propose an optimal strategy for the two-armed bandit problem by making use of the Kiefer-Wolfowitz stochastic approximation procedure. We further apply the same technique to a market pricing problem.

AMS 1970 subject classification

Primary 93A10 62L20 Secondary 93C55 90A15 

Key words and phrases

control systems learning automata learning algorithms optimality expediency two-armed bandit problem stochastic chastic approximation 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    Blum, J.R., Approximation methods which converge with probability one. Ann. Math. Statist. 25 (1954), 382–386.Google Scholar
  2. [2]
    Herkenrath, U., Theodorescu, R., General control systems. Information Sci. 14 (1978), 57–73.Google Scholar
  3. [3]
    —, On certain aspects of the two-armed bandit problem. Elektron. Informationsverarbeit. Kybernetik (1978).Google Scholar
  4. [4]
    —, Expediency and optimality for general control systems. Coll. Internat. C.N.R.S., Cachan, July 4–8, 1977.Google Scholar
  5. [5]
    —, On a stochastic approximation procedure applied to the bandit problem. (submitted to publication).Google Scholar
  6. [6]
    Iosifescu, M., Theodorescu, R., Random processes and learning. Springer, New York 1969.Google Scholar
  7. [7]
    Kiefer, J., Wolfowitz, J., Stochastic estimation of the maximum of a regression function. Ann. Math. Statist. 23 (1952), 462–466.Google Scholar
  8. [8]
    Norman, M.F., On the linear model with two absorbing barriers. J. Math. Psychology 5 (1968), 225–241.Google Scholar
  9. [9]
    —, Markov processes and learning models. Academic Press, New York 1972.Google Scholar
  10. [10]
    Rotschild, M., A two-armed bandit theory of market pricing. J. Econom. Theory 9 (1974), 430–443.Google Scholar
  11. [11]
    Vogel, W., A sequential design for the two armed bandit. Ann. Math. Statist. 31 (1960), 430–443.Google Scholar
  12. [12]
    —, An asymptotic minimax theorem for the two armed bandit problem. Ann. Math. Statist. 31 (1960), 444–451.Google Scholar
  13. [13]
    Wasan, M.T., Stochastic approximation. Cambridge University Press, Cambridge 1969.Google Scholar
  14. [14]
    Witten, I.H., Finite-time performance of some two-armed bandit controllers. IEEE Trans. Syst., Man., Cybern. SMC-3 (1973), 194–197.Google Scholar

Copyright information

© Springer-Verlag 1979

Authors and Affiliations

  • Ulrich Herkenrath
    • 1
  • Radu Theodurescu
    • 2
  1. 1.Institute of Applied MathematicsUniversity of BonnBonnFederal Republic of Germany
  2. 2.Department of MathematicsLaval UniversityQuebecCanada

Personalised recommendations