Abstract
This work studies external regret in sequential prediction games with both positive and negative payoffs. External regret measures the difference between the payoff obtained by the forecasting strategy and the payoff of the best action. In this setting, we derive new and sharper regret bounds for the well-known exponentially weighted average forecaster and for a second forecaster with a different multiplicative update rule. Our analysis has two main advantages: first, no preliminary knowledge about the payoff sequence is needed, not even its range; second, our bounds are expressed in terms of sums of squared payoffs, replacing larger first-order quantities appearing in previous bounds. In addition, our most refined bounds have the natural and desirable property of being stable under rescalings and general translations of the payoff sequence.
Article PDF
Similar content being viewed by others
References
Allenberg-Neeman, C., & Neeman B. (2004). Full information game with gains and losses. Algorithmic Learning Theory, 15th International Conference, ALT 2004, Padova, Italy, October 2004, In Proceedings, volume 3244 of Lecture Notes in Artificial Intelligence, pp. 264-278, Springer.
Auer, P., Cesa-Bianchi, N., Freund, Y., & Schapire, R.E. (2002). The nonstochastic multiarmed bandit problem. SIAM Journal on Computing, 32, 48–77.
Auer, P., Cesa-Bianchi, N., & Gentile, C. (2002). Adaptive and self-confident on-line learning algorithms. Journal of Computer and System Sciences, 64, 48–75.
Cesa-Bianchi, N., Freund, Y., Helmbold, D.P., Haussler, D., Schapire, R., & Warmuth, M.K. (1997). How to use expert advice. Journal of the ACM, 3, 427–485.
Cesa-Bianchi, N., & Lugosi, G. (2003). Potential-based algorithms in on-line prediction and game theory. Machine Learning, 51, 239–261.
Cesa-Bianchi, N., & Lugosi, G. (2006). Prediction, Learning, and Games. Cambridge University Press.
Cesa-Bianchi, N., Lugosi, G., & Stoltz, G. (2005). Minimizing regret with label efficient prediction. IEEE Transactions on Information Theory, 51, 2152–2162.
Cesa-Bianchi, N., Lugosi, G., & Stoltz, G. (2006). Regret minimization under partial monitoring. Mathematics of Operations Research, 31(3), 562–580.
Freedman, D.A. (1975). On tail probabilities for martingales. The Annals of Probability, 3, 100–118.
Freund, Y., & Schapire, R.E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1), 119–139.
Helmbold, D.P., Schapire, R.E., Singer, Y., & Warmuth, M. K. (1998). On-line portfolio selection using multiplicative updates. Mathematical Finance, 8,325–344, 1998.
Littlestone, N., & Warmuth, M.K. (1994). The weighted majority algorithm. Information and Computation, 108, 212–261.
Piccolboni, A., & Schindelhauer, C. (2001). Discrete prediction games with arbitrary feedback and loss. In Proceedings of the 14th Annual Conference on Computational Learning Theory (pp. 208–223).
Vovk, V.G. (1998). A game of prediction with expert advice. Journal of Computer and System Sciences, 56(2), 153–173.
Author information
Authors and Affiliations
Corresponding author
Additional information
Editor: Avrim Blum
An extended abstract appeared in the Proceedings of the 18th Annual Conference on Learning Theory, Springer, 2005. The work of all authors was supported in part by the IST Programme of the European Community, under the PASCAL Network of Excellence, IST-2002-506778.
The work was done while Yishay Mansour was a fellow in the Institute of Advance studies, Hebrew University. His work was also supported by a grant no. 1079/04 from the Israel Science Foundation and an IBM faculty award.
Rights and permissions
About this article
Cite this article
Cesa-Bianchi, N., Mansour, Y. & Stoltz, G. Improved second-order bounds for prediction with expert advice. Mach Learn 66, 321–352 (2007). https://doi.org/10.1007/s10994-006-5001-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10994-006-5001-7