Machine Learning

, Volume 35, Issue 3, pp 247–282 | Cite as

Derandomizing Stochastic Prediction Strategies

Article

Abstract

In this paper we continue study of the games of prediction with expert advice with uncountably many experts. A convenient interpretation of such games is to construe the pool of experts as one “stochastic predictor”, who chooses one of the experts in the pool at random according to the prior distribution on the experts and then replicates the (deterministic ) predictions of the chosen expert. We notice that if the stochastic predictor's total loss is at most L with probability at least p then the learner's loss can be bounded by cL + aln \(\frac{1}{{\text{P}}}\) for the usual constants c and a. This interpretation is used to revamp known results and obtain new results on tracking the best expert. It is also applied to merging overconfident experts and to fitting polynomials to data.

on-line learning prediction with expert advice tracking the best expert regression 

References

  1. Auer, P., & Long, P. (1994). Simulating access to hidden information while learning. Proceedings of the 26th Annual ACM Symposium on Theory of Computing (pp. 263–272). New York: Assoc. Comput. Mach.Google Scholar
  2. Carlson, B.C. (1977). Special functions of applied mathematics. New York: Academic Press.Google Scholar
  3. Cesa-Bianchi, N., Freund, Y., Helmbold, D.P., Haussler, D., Schapire, R.E., & Warmuth, M.K. (1993). How to use expert advice. Proceedings of the 25th Annual ACM Symposium on Theory of Computing (pp. 382–391). New York: Assoc. Comput. Mach.Google Scholar
  4. Cesa-Bianchi, N., Freund, Y., Helmbold, D.P., & Warmuth, M.K. (1996). On-line prediction and conversion strategies. Machine Learning, 25, 71–110.Google Scholar
  5. Cesa-Bianchi, N., Helmbold, D.P., & Panizza, S. (1996). On Bayes methods for on-line Boolean prediction. Proceedings of the 9th Annual ACM Conference on Computational Learning Theory (pp. 314–324). New York: Assoc. Comput. Mach.Google Scholar
  6. Cover, T., & Ordentlich, E. (1996). Universal portfolios with side information. IEEE Trans. Inform. Theory, 42, 348–363.Google Scholar
  7. Dawid, A.P. (1986). Probability forecasting. In S. Kotz & N.L. Johnson (Eds.), Encyclopedia of Statistical Sciences (Vol. 7). New York: Wiley.Google Scholar
  8. DeSantis, A., Markowsky, G., & Wegman, M.N. (1988). Learning probabilistic prediction functions. Proceedings of the 29th Annual IEEE Symposium on Foundations of Computer Science (pp. 110–119). Los Alamitos, CA: IEEE Comput. Soc.Google Scholar
  9. Feder, M., Merhav, N., & Gutman, M. (1992). Universal prediction of individual sequences. IEEE Trans. Inform. Theory, 38, 1258–1270.Google Scholar
  10. Freund, Y. (1996). Predicting a binary sequence almost as well as the optimal biased coin. Proceedings of the 9th Annual ACM Conference on Computational Learning Theory (pp. 89–98). New York: Assoc. Comput. Mach.Google Scholar
  11. Freund, Y., Schapire, R., Singer, Y., & Warmuth, M. (1997). Using and combining predictors that specialize. Proceedings of the 29th Annual ACM Symposium on Theory of Computing. New York: Assoc. Comput. Mach.Google Scholar
  12. Haussler, D., Kivinen, J., & Warmuth, M.K. (1994). Tight worst-case loss bounds for predicting with expert advice. (Technical Report UCSC-CRL-94-36). University of California, Santa Cruz, CA, revised December 1994. Short version in P. Vitányi (Ed.), Computational Learning Theory. Lecture Notes in Computer Science (Vol. 904). Berlin: Springer (1995).Google Scholar
  13. Helmbold, D., & Schapire, R. (1997). Predicting nearly as well as the best pruning of a decision tree. Machine Learning, 27, 51–68.Google Scholar
  14. Herbster, M., & Warmuth, M. (1995). Tracking the best expert. Proceedings of the 12th International Conference on Machine Learning (pp. 286–294). Morgan Kaufmann. To appear in Machine Learning.Google Scholar
  15. Herbster, M., & Warmuth, M. (1997). Tracking the best expert, II. Manuscript.Google Scholar
  16. Lauritzen, S.L., & Spiegelhalter, D.J. (1988). Local computations with probabilities on graphical structures and their application to expert systems (with discussion). J. R. Statist. Soc. B, 50, 157–224. Also in (Shafer and Pearl, 1990).Google Scholar
  17. Littlestone, N., & Warmuth, M.K. (1994). The weighted majority algorithm. Inform. Computation, 108, 212–261.Google Scholar
  18. Pearl, J. (1986). Fusion, propagation, and structuring in belief networks. Artificial Intelligence, 29, 241–288. Also in (Shafer and Pearl, 1990).Google Scholar
  19. Rissanen, J. (1983). A universal prior for integers and estimation by minimum description length. Ann. Statist., 11, 416–431.Google Scholar
  20. Shafer, G., & Pearl, J. (Eds.) (1990). Uncertain reasoning. San Mateo, CA: Morgan Kauffman.Google Scholar
  21. Takimoto, E., Maruoka, A., & Vovk, V. (1998). Predicting nearly as well as the best pruning of a decision tree through dynamic programming scheme. Submitted for publication.Google Scholar
  22. Vovk, V. (1990). Aggregating strategies. Proceedings of the 3rd Annual Workshop on Computational Learning Theory (pp. 371–383). San Mateo, CA: Morgan Kaufmann.Google Scholar
  23. Vovk, V. (1992). Universal forecasting algorithms. Inform. Computation, 96, 245–277.Google Scholar
  24. Vovk, V. (1997a). Derandomizing stochastic prediction strategies. Proceedings of the 9th Annual ACM Conference on Computational Learning Theory (pp. 32–44). New York: Assoc. Comput. Mach.Google Scholar
  25. Vovk, V. (1997b). On-line competitive linear regression. M.I. Jordan, M.J. Kearns, & S.A. Solla (Eds.), Advances in Neural Information Processing Systems 10 (pp. 364–370). Cambridge, MA: MIT Press.Google Scholar
  26. Vovk, V. (1998). A game of prediction with expert advice. J. Comput. Inform. Syst., 56, 153–173.Google Scholar
  27. Vovk, V., & Watkins, C.J.H.C. (1998). Universal portfolio selection. Proceedings of the 11th Annual ACM Conference on Computational Learning Theory (pp. 12–23). New York: Assoc. Comput. Mach.Google Scholar
  28. Watkins, C.J.H.C. (1997). How to use advice from small numbers of experts. (Technical Report CSD-TR-97-16) Department of Computer Science, Royal Holloway, University of London.Google Scholar
  29. Yamanishi, K. (1995). Randomized approximate aggregating strategies and their applications to prediction and discrimination. Proceedings of the 8th Annual ACMConference on Computational Learning Theory (pp. 83–90). New York: Assoc. Comput. Mach.Google Scholar

Copyright information

© Kluwer Academic Publishers 1999

Authors and Affiliations

  • V. Vovk
    • 1
  1. 1.Department of Computer Science, Royal HollowayUniversity of LondonEghamUK

Personalised recommendations