Advertisement

Improved Second-Order Bounds for Prediction with Expert Advice

  • Nicolò Cesa-Bianchi
  • Yishay Mansour
  • Gilles Stoltz
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3559)

Abstract

This work studies external regret in sequential prediction games with arbitrary payoffs (nonnegative or non-positive). External regret measures the difference between the payoff obtained by the forecasting strategy and the payoff of the best action. We focus on two important parameters: M, the largest absolute value of any payoff, and Q *, the sum of squared payoffs of the best action. Given these parameters we derive first a simple and new forecasting strategy with regret at most order of \(\sqrt{Q^{*}({\rm ln}N)}+M {\rm ln} N\), where N is the number of actions. We extend the results to the case where the parameters are unknown and derive similar bounds. We then devise a refined analysis of the weighted majority forecaster, which yields bounds of the same flavour. The proof techniques we develop are finally applied to the adversarial multi-armed bandit setting, and we prove bounds on the performance of an online algorithm in the case where there is no lower bound on the probability of each action.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Allenberg-Neeman, C., Auer, P.: Personal communicationGoogle Scholar
  2. 2.
    Allenberg-Neeman, C., Neeman, B.: Full information game with gains and losses. In: Ben-David, S., Case, J., Maruoka, A. (eds.) ALT 2004. LNCS (LNAI), vol. 3244, pp. 264–278. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  3. 3.
    Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.E.: The nonstochastic multiarmed bandit problem. SIAM Journal on Computing 32, 48–77 (2002)zbMATHCrossRefMathSciNetGoogle Scholar
  4. 4.
    Auer, P., Cesa-Bianchi, N., Gentile, C.: Adaptive and self-confident on-line learning algorithms. Journal of Computer and System Sciences 64, 48–75 (2002)zbMATHCrossRefMathSciNetGoogle Scholar
  5. 5.
    Cesa-Bianchi, N., Freund, Y., Helmbold, D.P., Haussler, D., Schapire, R., Warmuth, M.K.: How to use expert advice. Journal of the ACM 3, 427–485 (1997)CrossRefMathSciNetGoogle Scholar
  6. 6.
    Cesa-Bianchi, N., Lugosi, G., Stoltz, G.: Minimizing regret with label efficient prediction. IEEE Transactions on Information Theory (to appear)Google Scholar
  7. 7.
    Cesa-Bianchi, N., Lugosi, G., Stoltz, G.: Regret minimization under partial monitoring. Submitted for journal publication (2004)Google Scholar
  8. 8.
    Cesa-Bianchi, N., Lugosi, G.: Prediction, Learning, and Games. Cambridge University Press, Cambridge (to appear)Google Scholar
  9. 9.
    Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences 55(1), 119–139 (1997)zbMATHCrossRefMathSciNetGoogle Scholar
  10. 10.
    Hart, S., Mas-Colell, A.: A Reinforcement Procedure Leading to Correlated Equilibrium. In: Neuefeind, W., Trockel, W. (eds.) Economic Essays, Gerard Debreu, pp. 181–200. Springer, Heidelberg (2001)Google Scholar
  11. 11.
    Littlestone, N., Warmuth, M.K.: The weighted majority algorithm. Information and Computation 108, 212–261 (1994)zbMATHCrossRefMathSciNetGoogle Scholar
  12. 12.
    Piccolboni, A., Schindelhauer, C.: Discrete prediction games with arbitrary feedback and loss. In: Proceedings of the 14th Annual Conference on Computational Learning Theory, pp. 208–223 (2001)Google Scholar
  13. 13.
    Vovk, V.G.: A Game of Prediction with Expert Advice. Journal of Computer and System Sciences 56(2), 153–173 (1998)zbMATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Nicolò Cesa-Bianchi
    • 1
  • Yishay Mansour
    • 2
  • Gilles Stoltz
    • 3
  1. 1.DSIUniversità di MilanoMilanoItaly
  2. 2.School of computer ScienceTel-Aviv UniversityTel AvivIsrael
  3. 3.DMAEcole Normale SupérieureParisFrance

Personalised recommendations