Tight worst-case loss bounds for predicting with expert advice

Haussler, David; Kivinen, Jyrki; Warmuth, Manfred K.

doi:10.1007/3-540-59119-2_169

David Haussler¹,
Jyrki Kivinen¹ &
Manfred K. Warmuth¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 904))

Included in the following conference series:

European Conference on Computational Learning Theory

208 Accesses
22 Citations

Abstract

We consider on-line algorithms for predicting binary outcomes, when the algorithm has available the predictions made by N experts. For a sequence of trials, we compute total losses for both the algorithm and the experts under a loss function. At the end of the trial sequence, we compare the total loss of the algorithm to the total loss of the best expert, i.e., the expert with the least loss on the particular trial sequence. Vovk has introduced a simple algorithm for this prediction problem and proved that for a large class of loss functions, with binary outcomes the total loss of the algorithm exceeds the total loss of the best expert at most by the amount ein N, where c is a constant determined by the loss function. This upper bound does not depend on any assumptions on how the experts' predictions or the outcomes are generated, and the trial sequence can be arbitrarily long. We give a straightforward alternative method for finding the correct value c and show by a lower bound that for this value of c, the upper bound is asymptotically tight. The lower bound is based on a probabilistic adversary argument. The class of loss functions for which the c ln N upper bound holds includes the square loss, the logarithmic loss, and the Hellinger loss. We also consider another class of loss functions, including the absolute loss, for which we have an \(\Omega (\sqrt {\ell logN} )\)lower bound, where ℓ is the number of trials.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Cesa-Bianchi, N., Freund, Y., Helmbold, D. P., Haussler, D., Schapire, R. E., Warmuth, M. K.: How to use expert advice. Technical Report UCSC-CRL-94-33, Univ. of Calif. Computer Research Lab, Santa Cruz, CA, 1994. An extended abstract appeared in STOC '93.
Google Scholar
Cesa-Bianchi, N., Freund, Y., Helmbold, D. P., Warmuth, M. K.: On-line prediction and conversion strategies. In Computational Learning Theory: EuroCOLT '93, pages 205–216. Oxford University Press, Oxford, UK, 1994.
Google Scholar
Cesa-Bianchi, N., Long, P., Warmuth, M. K.: Worst-case quadratic loss bounds for on-line prediction of linear functions by gradient descent. Technical Report UCSC-CRL-93-36, Univ. of Calif. Computer Research Lab, Santa Cruz, CA, 1993. An extended abstract appeared in COLT '93.
Google Scholar
Chung, T. H.: Approximate methods for sequential decision making using expert advice. In Proc. 7th ACM Workshop on Computational Learning Theory, pages 183–189. ACM Press, New York, NY, 1994.
Google Scholar
Cover, T.: Behavior of sequential predictors of binary sequences. In Proc. 4th Prague Conference on Information Theory, Statistical Decision Functions and Random Processes, pages 263–272. Publishing House of the Czechoslovak Academy of Sciences, 1965.
Google Scholar
Dawid, A. P.: Prequential analysis, stochastic complexity and Bayesian inference. Bayesian Statistics (to appear).
Google Scholar
DeSantis, A., Markowsky, G., Wegman, M. N.: Learning probabilistic prediction functions. In Proc. 29th IEEE Symposium on Foundations of Computer Science, pages 110–119. IEEE Computer Society Press, Los Alamitos, CA, 1988.
Google Scholar
Feder, M., Merhav, N., Gutman, M.: Universal prediction of individual sequences. IEEE Transactions on Information Theory 38 (1992) 1258–1270.
Google Scholar
Galambos, J.: The Asymptotic Theory of Extreme Order Statistics. R. E. Krieger, Malabar, FL, 1987. Second Edition.
Google Scholar
Kivinen, J., Warmuth, M. K.: Using experts for predicting continuous outcomes. In Computational Learning Theory: EuroCOLT '93, pages 109–120. Oxford University Press, Oxford, UK, 1994.
Google Scholar
Kivinen, J., Warmuth, M. K.: Exponentiated gradient versus gradient descent for linear predictors. Technical Report UCSC-CRL-94-16, Univ. of Calif. Computer Research Lab, Santa Cruz, CA, June 1994.
Google Scholar
Littlestone, N., Long, P. M., Warmuth, M. K.: On-line learning of linear functions. In Proc. 23rd ACM Symposium on Theory of Computing, pages 465–475. ACM Press, New York, NY, 1991.
Google Scholar
Littlestone, N., Warmuth, M. K.: The weighted majority algorithm. Information and Computation 108 (1994) 212–261.
Google Scholar
Merhav, N., Feder, M.: Universal sequential learning and decisions from individual data sequences. In Proc. 5th ACM Workshop on Computational Learning Theory, pages 413–427. ACM Press, New York, NY, 1992.
Google Scholar
Mycielski, J.: A learning algorithm for linear operators. Proceedings of the American Mathematical Society 103 (1988) 547–550.
Google Scholar
Vovk, V.: Aggregating strategies. In Proc. 3rd Workshop on Computational Learning Theory, pages 371–383. Morgan Kaufmann, San Mateo, CA, 1990.
Google Scholar
Vovk, V.: Universal forecasting algorithms. Information and Computation 96 (1992) 245–277.
Google Scholar
Weinberger, M. J., Merhav, N., Feder, M.: Optimal sequential probability assignment for individual sequences. IEEE Transactions on Information Theory 40 (1994) 384–396.
Google Scholar

Download references

Author information

Authors and Affiliations

Computer and Information Sciences, University of California, Santa Cruz, 95064, Santa Cruz, CA, USA
David Haussler, Jyrki Kivinen & Manfred K. Warmuth

Authors

David Haussler
View author publications
You can also search for this author in PubMed Google Scholar
Jyrki Kivinen
View author publications
You can also search for this author in PubMed Google Scholar
Manfred K. Warmuth
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Paul Vitányi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Haussler, D., Kivinen, J., Warmuth, M.K. (1995). Tight worst-case loss bounds for predicting with expert advice. In: Vitányi, P. (eds) Computational Learning Theory. EuroCOLT 1995. Lecture Notes in Computer Science, vol 904. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-59119-2_169

Download citation

DOI: https://doi.org/10.1007/3-540-59119-2_169
Published: 01 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-59119-1
Online ISBN: 978-3-540-49195-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics