Improved Rates for the Stochastic Continuum-Armed Bandit Problem

  • Peter Auer
  • Ronald Ortner
  • Csaba Szepesvári
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4539)


Considering one-dimensional continuum-armed bandit problems, we propose an improvement of an algorithm of Kleinberg and a new set of conditions which give rise to improved rates. In particular, we introduce a novel assumption that is complementary to the previous smoothness conditions, while at the same time smoothness of the mean payoff function is required only at the maxima. Under these new assumptions new bounds on the expected regret are derived. In particular, we show that apart from logarithmic factors, the expected regret scales with the square-root of the number of trials, provided that the mean payoff function has finitely many maxima and its second derivatives are continuous and non-vanishing at the maxima. This improves a previous result of Cope by weakening the assumptions on the function. We also derive matching lower bounds. To complement the bounds on the expected regret, we provide high probability bounds which exhibit similar scaling.


Improve Rate Smoothness Condition Logarithmic Factor Bandit Problem Continuous Action Space 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Kleinberg, R.: Nearly tight bounds for the continuum-armed bandit problem. In: Advances in Neural Information Processing Systems 17 NIPS, 697–704 (2004)Google Scholar
  2. 2.
    Agrawal, R.: The continuum-armed bandit problem. SIAM J. Control Optim. 33, 1926–1951 (1995)zbMATHCrossRefMathSciNetGoogle Scholar
  3. 3.
    Cope, E.: Regret and convergence bounds for a class of continuum-armed bandit problems. submitted (2006)Google Scholar
  4. 4.
    Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multi-armed bandit problem. Mach. Learn. 47, 235–256 (2002)zbMATHCrossRefGoogle Scholar
  5. 5.
    Cesa-Bianchi, N., Lugosi, G., Stoltz, G.: Minimizing regret with label efficient prediction. IEEE Trans. Inform. Theory 51, 2152–2162 (2004)CrossRefMathSciNetGoogle Scholar

Copyright information

© Springer Berlin Heidelberg 2007

Authors and Affiliations

  • Peter Auer
    • 1
  • Ronald Ortner
    • 1
  • Csaba Szepesvári
    • 2
  1. 1.University of Leoben, A-8700 LeobenAustria
  2. 2.University of Alberta, Edmonton T6G 2E8Canada

Personalised recommendations