Machine Learning

, Volume 80, Issue 2–3, pp 165–188 | Cite as

Extracting certainty from uncertainty: regret bounded by variation in costs

Article

Abstract

Prediction from expert advice is a fundamental problem in machine learning. A major pillar of the field is the existence of learning algorithms whose average loss approaches that of the best expert in hindsight (in other words, whose average regret approaches zero). Traditionally the regret of online algorithms was bounded in terms of the number of prediction rounds.

Cesa-Bianchi, Mansour and Stoltz (Mach. Learn. 66(2–3):21–352, 2007) posed the question whether it is be possible to bound the regret of an online algorithm by the variation of the observed costs. In this paper we resolve this question, and prove such bounds in the fully adversarial setting, in two important online learning scenarios: prediction from expert advice, and online linear optimization.

Keywords

Individual sequences Prediction with expert advice Online learning Regret minimization 

References

  1. Allenberg-Neeman, C., & Neeman, B. (2004). Full information game with gains and losses. In 15th international conference on algorithmic learning theory. Google Scholar
  2. Auer, P., Cesa-Bianchi, N., Freund, Y., & Schapire, R. E. (2003). The nonstochastic multiarmed bandit problem. SIAM Journal on Computing, 32(1), 48–77. CrossRefMathSciNetGoogle Scholar
  3. Cesa-Bianchi, N., & Lugosi, G. (2006). Prediction, learning, and games. Cambridge: Cambridge University Press. MATHCrossRefGoogle Scholar
  4. Cesa-Bianchi, N., Mansour, Y., & Stoltz, G. (2007). Improved second-order bounds for prediction with expert advice. Machine Learning, 66(2–3), 21–352. Google Scholar
  5. Cover, T. (1991). Universal portfolios. Mathematical Finance, 1, 1–19. MATHCrossRefMathSciNetGoogle Scholar
  6. Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1), 119–139. MATHCrossRefMathSciNetGoogle Scholar
  7. Hannan, J. (1957). Approximation to Bayes risk in repeated play. In M. Dresher, A. W. Tucker, & P. Wolfe (Eds.), Contributions to the theory of games (Vol. III, pp. 97–139). Google Scholar
  8. Hazan, E., & Kale, S. (2009a). On stochastic and worst-case models for investing. In Advances in neural information processing systems (NIPS) (Vol. 22). Google Scholar
  9. Hazan, E., & Kale, S. (2009b). Better algorithms for benign bandits. In ACM-SIAM symposium on discrete algorithms (SODA09). Google Scholar
  10. Helmbold, D. P., Kivinen, J., & Warmuth, M. K. (1999). Relative loss bounds for single neurons. IEEE Transactions on Neural Networks, 10(6), 1291–1304. CrossRefGoogle Scholar
  11. Herbster, M., & Warmuth, M. K. (2001). Tracking the best linear predictor. Journal of Machine Learning Research, 1, 281–309. MATHCrossRefMathSciNetGoogle Scholar
  12. Kalai, A., & Vempala, S. (2005). Efficient algorithms for online decision problems. Journal of Computer and System Sciences, 71(3), 291–307. MATHCrossRefMathSciNetGoogle Scholar
  13. Kivinen, J., & Warmuth, M. K. (1997). Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132(1), 1–63. MATHCrossRefMathSciNetGoogle Scholar
  14. Littlestone, N., & Warmuth, M. K. (1994). The weighted majority algorithm. Information and Computation, 108(2), 212–261. MATHCrossRefMathSciNetGoogle Scholar
  15. Vovk, V. (1998). A game of prediction with expert advice. Journal of Computer and System Sciences, 56(2), 153–173. MATHCrossRefMathSciNetGoogle Scholar
  16. Zinkevich, M. (2003). Online convex programming and generalized infinitesimal gradient ascent. In ICML (pp. 928–936). Google Scholar

Copyright information

© The Author(s) 2010

Authors and Affiliations

  1. 1.IBM Almaden Research CenterSan JoseUSA
  2. 2.Yahoo! ResearchSanta ClaraUSA

Personalised recommendations