Worst-Case Bounds for the Logarithmic Loss of Predictors

Cesa-Bianchi, Nicolò; Lugosi, Gábor

doi:10.1023/A:1010848128995

Worst-Case Bounds for the Logarithmic Loss of Predictors

Published: June 2001

Volume 43, pages 247–264, (2001)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Worst-Case Bounds for the Logarithmic Loss of Predictors

Download PDF

Nicolò Cesa-Bianchi¹ &
Gábor Lugosi²

671 Accesses
15 Citations
Explore all metrics

Abstract

We investigate on-line prediction of individual sequences. Given a class of predictors, the goal is to predict as well as the best predictor in the class, where the loss is measured by the self information (logarithmic) loss function. The excess loss (regret) is closely related to the redundancy of the associated lossless universal code. Using Shtarkov's theorem and tools from empirical process theory, we prove a general upper bound on the best possible (minimax) regret. The bound depends on certain metric properties of the class of predictors. We apply the bound to both parametric and nonparametric classes of predictors. Finally, we point out a suboptimal behavior of the popular Bayesian weighted average algorithm.

References

Azuma, K. (1967). Weighted sums of certain dependent random variables. Tohoku Mathematical Journal, 68, 357–367.
Google Scholar
Barron, A. & Xie, Q. (1996). Asymptotic minimax regret for data compression, gambling, and prediction. Unpublished manuscript presented at an informal meeting on prediction of individual sequences held at the University of California (Santa Cruz).
Cover, T. (1991). Universal portfolios. Mathematical Finance, 1, 1–29.
Google Scholar
Cover, T. & Ordentlich, E. (1996). Universal portfolios with side information. IEEE Transactions on Information Theory, 42:2, 348–363.
Google Scholar
Cover, T. & Thomas, J. (1991). Elements of Information Theory. New York: John Wiley and Sons.
Google Scholar
Feder, M. (1991). Gambling using a finite state machine. IEEE Transactions on Information Theory, 37, 1459–1465.
Google Scholar
Freund, Y. (1996). Predicting a binary sequence almost as well as the optimal biased coin. In Proceedings of the 9th Annual Conference on Computational Learning Theory (pp. 89–98).
Haussler, D. & Barron, A. (1993). How well does the Bayes method work in on-line predictions of {+1,-1} Values? In Proceedings of 3rd NEC Symposium (pp. 74–100).
Haussler, D., Kivinen, J., & Warmuth, M. (1998). Sequential prediction of individual sequences under general loss functions. IEEE Transactions on Information Theory, 44, 1906–1925.
Google Scholar
Merhav, N. & Feder,M. (1998). Universal prediction. IEEE Transactions on Information Theory, 44:6, 2124–2147.
Google Scholar
Opper, M. & Haussler, D. (1997). Worst case prediction over sequences under log loss. The Mathematics of Information Coding, Extraction, and Distribution. Springer Verlag.
Rissanen, J. (1976). Generalized Kraft's inequality and arithmetic coding. IBM Journal of Research and Development, 20, 198–203.
Google Scholar
Rissanen, J. (1996). Fischer information and stochastic complexity. IEEE Transactions on Information Theory, 42, 40–47.
Google Scholar
Santis, A. D., Markowski, G., & Wegman, M. (1988). Learning probabilistic prediction functions. In Proceedings of the 1st Annual Workshop on Computational Learning Theory (pp. 312–328).
Shtarkov, Y. (1987). Universal sequential coding of single messages. Translated from: Problems in Information Transmission, 23:3, 3–17.
Google Scholar
Talagrand, M. (1996). Majorizing measures: The generic chaining. Annals of Probability, 24, 1049–1103. (Special Invited Paper).
Google Scholar
Vovk, V. (1990). Aggregating strategies. In Proceedings of the 3rd Annual Workshop on Computational Learning Theory, 372–383.
Vovk, V. (1998). A game of prediction with expert advice. Journal of Computer and System Sciences, 56:2, 153–173.
Google Scholar
Weinberger, M., Merhav, N., & Feder, M. (1994). Optimal sequential probability assignment for individual sequences. IEEE Transactions on Information Theory, 40, 384–396.
Google Scholar
Yamanishi, K. (1995).Aloss bound model for on-line stochastic algorithms. Information and Computation, 119:1, 39–54.
Google Scholar
Yamanishi, K. (1998). A decision-theoretic extension of stochastic complexity and its application to learning. IEEE Transactions on Information Theory, 44, 1424–1440.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information Technologies, University of Milan, Via Bramante 65, 26013, Crema, Italy
Nicolò Cesa-Bianchi
Department of Economics, Pompeu Fabra University, Ramon Trias Fargas 25-27, 08005, Barcelona, Spain
Gábor Lugosi

Authors

Nicolò Cesa-Bianchi
View author publications
You can also search for this author in PubMed Google Scholar
Gábor Lugosi
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cesa-Bianchi, N., Lugosi, G. Worst-Case Bounds for the Logarithmic Loss of Predictors. Machine Learning 43, 247–264 (2001). https://doi.org/10.1023/A:1010848128995

Download citation

Issue Date: June 2001
DOI: https://doi.org/10.1023/A:1010848128995

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Worst-Case Bounds for the Logarithmic Loss of Predictors

Abstract

Article PDF

Similar content being viewed by others

Concentration and Confidence for Discrete Bayesian Sequence Predictors

Predictions and Algorithmic Statistics for Infinite Sequences

Offline to Online Conversion

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Worst-Case Bounds for the Logarithmic Loss of Predictors

Abstract

Article PDF

Similar content being viewed by others

Concentration and Confidence for Discrete Bayesian Sequence Predictors

Predictions and Algorithmic Statistics for Infinite Sequences

Offline to Online Conversion

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation