On the Convergence Speed of MDL Predictions for Bernoulli Sequences

  • Jan Poland
  • Marcus Hutter
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3244)


We consider the Minimum Description Length principle for online sequence prediction. If the underlying model class is discrete, then the total expected square loss is a particularly interesting performance measure: (a) this quantity is bounded, implying convergence with probability one, and (b) it additionally specifies a rate of convergence. Generally, for MDL only exponential loss bounds hold, as opposed to the linear bounds for a Bayes mixture. We show that this is even the case if the model class contains only Bernoulli distributions. We derive a new upper bound on the prediction error for countable Bernoulli classes. This implies a small bound (comparable to the one for Bayes mixtures) for certain important model classes. The results apply to many Machine Learning tasks including classification and hypothesis testing. We provide arguments that our theorems generalize to countable classes of i.i.d. models.


Convergence Speed True Parameter Kolmogorov Complexity Cumulative Loss Universal Turing Machine 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Clarke, B.S., Barron, A.R.: Information-theoretic asymptotics of Bayes methods. IEEE Trans. on Information Theory 36, 453–471 (1990)zbMATHCrossRefMathSciNetGoogle Scholar
  2. 2.
    Rissanen, J.J.: Fisher Information and Stochastic Complexity. IEEE Trans. on Information Theory 42, 40–47 (1996)zbMATHCrossRefMathSciNetGoogle Scholar
  3. 3.
    Barron, A.R., Rissanen, J.J., Yu, B.: The minimum description length principle in coding and modeling. IEEE Trans. on Information Theory 44, 2743–2760 (1998)zbMATHCrossRefMathSciNetGoogle Scholar
  4. 4.
    Barron, A.R., Cover, T.M.: Minimum complexity density estimation. IEEE Trans. on Information Theory 37, 1034–1054 (1991)zbMATHCrossRefMathSciNetGoogle Scholar
  5. 5.
    Solomonoff, R.J.: Complexity-based induction systems: comparisons and convergence theorems. IEEE Trans. Information Theory IT-24, 422–432 (1978)CrossRefMathSciNetGoogle Scholar
  6. 6.
    Hutter, M.: Convergence and error bounds for universal prediction of nonbinary sequences. In: Flach, P.A., De Raedt, L. (eds.) ECML 2001. LNCS (LNAI), vol. 2167, pp. 239–250. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  7. 7.
    Poland, J., Hutter, M.: Convergence of discrete MDL for sequential prediction. In: Shawe-Taylor, J., Singer, Y. (eds.) COLT 2004. LNCS (LNAI), vol. 3120, pp. 300–314. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  8. 8.
    Vovk, V.G.: Learning about the parameter of the bernoulli model. Journal of Computer and System Sciences 55, 96–104 (1997)zbMATHCrossRefMathSciNetGoogle Scholar
  9. 9.
    Vitányi, P.M., Li, M.: Minimum description length induction, Bayesianism, and Kolmogorov complexity. IEEE Trans. on Information Theory 46, 446–464 (2000)zbMATHCrossRefGoogle Scholar
  10. 10.
    Li, M., Vitányi, P.M.B.: An introduction to Kolmogorov complexity and its applications, 2nd edn. Springer, Heidelberg (1997)zbMATHGoogle Scholar
  11. 11.
    Gács, P.: On the relation between descriptional complexity and algorithmic probability. Theoretical Computer Science 22, 71–93 (1983)zbMATHCrossRefMathSciNetGoogle Scholar
  12. 12.
    Hutter, M.: Sequence prediction based on monotone complexity. In: Proc. 16th Annual Conference on Learning Theory (COLT 2003). LNCS (LNAI), pp. 506–521. Springer, Berlin (2003)Google Scholar
  13. 13.
    Zvonkin, A.K., Levin, L.A.: The complexity of finite objects and the development of the concepts of information and randomness by means of the theory of algorithms. Russian Mathematical Surveys 25, 83–124 (1970)zbMATHCrossRefMathSciNetGoogle Scholar
  14. 14.
    Hutter, M.: Sequential predictions based on algorithmic complexity. Technical report (2004) IDSIA-16-04Google Scholar
  15. 15.
    Hutter, M.: Convergence and loss bounds for Bayesian sequence prediction. IEEE Trans. on Information Theory 49, 2061–2067 (2003)CrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Jan Poland
    • 1
  • Marcus Hutter
    • 1
  1. 1.IDSIAManno (Lugano)Switzerland

Personalised recommendations