, Volume 32, Issue 1, pp 395–407 | Cite as

On monotone optimal decision rules and the stay-on-a-winner rule for the two-armed bandit

  • M. Kolonko
  • H. Benzing


Consider the following optimization problem: Find a decision rule δ such thatw(x, δ (x))=max a w(x, a) for allx under the constraint δ (x)∈D (x). We give conditions for the existence of monotone optimal decision rules δ. The term ‘monotone’ is used in a general sense. The well-known stay-on-a-winner rules for the two-armed bandit can be characterized as monotone decision rules by including the stage number intox and using a special ordering onx. This enables us to give simple conditions for the existence of optimal rules that are stay-on-a-winner rules. We extend results ofBerry andKalin/Theodorescu to the case of dependent arms.


Stochastic Process Probability Theory Economic Theory Decision Rule General Sense 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Benzing, H., K. Hinderer andM. Kolonko: On thek-Armed Bernoulli Bandit: Monotonicity of the Total Reward under an Arbitrary Prior Distribution. Math. Operationsforsch. Statist., Ser. Optimization 15, 1984, 583–595.Google Scholar
  2. Berry, D.A.: A Bernoulli Two-Armed Bandit. Ann. Math. Stat.,43, 1972, 871–897.Google Scholar
  3. Bradt, R.N., S.M. Johnson andS. Karlin. On Sequential Designs for Maximizing the Sum ofn Observations. Ann Math. Stat.27, 1956, 1060–1074.Google Scholar
  4. Bertsekas, D.P., andS.E. Shreve: Stochastic Optimal Control. Academic Press, New York 1978.Google Scholar
  5. Hengartner, W., D. Kalin andR. Theodorescu: On the Bernoulli Two-Armed Bandit Problem. Math. Operationsforsch. Statist., Ser. Optimization12, 1981, 307–316.Google Scholar
  6. Hinderer, K.: On the Structure of Solutions of Stochastic Dynamic Programs. In: Proc. 7th Conf. on Prob. Theory, Aug. 29–Sept. 4, 1982, Brasov, Romania.Google Scholar
  7. Kalin, D.: A Note on Monotone Optimal Polcies for Markov Decision Processes. Math. Progr.15, 1978, 220–222.Google Scholar
  8. —: Über Markoff'sche Entscheidungsmodelle mit halbgeordnetem Zustandsraum. Methods of Operations Research33, 1979, 233–245.Google Scholar
  9. —: Beiträge zu strukturierten Markoffschen Entscheidungsmodellen. Habilitationsschrift, Bonn 1981.Google Scholar
  10. Kalin, D., andR. Theodorescu: A Note on Structural Properties of the Bernoulli Two-Armed Bandit Problem. Math. Operationsforsch. Statist., Ser. Optimization13, 1982, 469–472.Google Scholar
  11. Lehmann, E.L.: Some Concepts of Dependence. Ann Math. Stat.37, 1966, 1137–1153.Google Scholar
  12. Serfozo, R.: Monotone Optimal Policies for Markov Decicion Processes. In: Stochastic Systems: Modelling, Identification and Optimization, II, Mathematical Programming Study6, 1976, 202–216.Google Scholar
  13. Topkis, D.M.: Minimizing a Submodular Function on a Lattice. Op. Res.26, 1978, 305–321.Google Scholar

Copyright information

© Physica-Verlag Ges.m.b.H. 1985

Authors and Affiliations

  • M. Kolonko
    • 1
  • H. Benzing
    • 1
  1. 1.Institut für Mathematische Statistik der Universität KarlsruheKarlsruhe 1Germany

Personalised recommendations