Convergence of Discrete MDL for Sequential Prediction

  • Jan Poland
  • Marcus Hutter
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3120)

Abstract

We study the properties of the Minimum Description Length principle for sequence prediction, considering a two-part MDL estimator which is chosen from a countable class of models. This applies in particular to the important case of universal sequence prediction, where the model class corresponds to all algorithms for some fixed universal Turing machine (this correspondence is by enumerable semimeasures, hence the resulting models are stochastic). We prove convergence theorems similar to Solomonoff’s theorem of universal induction, which also holds for general Bayes mixtures. The bound characterizing the convergence speed for MDL predictions is exponentially larger as compared to Bayes mixtures. We observe that there are at least three different ways of using MDL for prediction. One of these has worse prediction properties, for which predictions only converge if the MDL estimator stabilizes. We establish sufficient conditions for this to occur. Finally, some immediate consequences for complexity relations and randomness criteria are proven.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Wallace, C.S., Boulton, D.M.: An information measure for classification. Computer Jrnl. 11, 185–194 (1968)MATHGoogle Scholar
  2. 2.
    Rissanen, J.J.: Modeling by shortest data description. Automatica 14, 465–471 (1978)MATHCrossRefGoogle Scholar
  3. 3.
    Grünwald, P.D.: The Minimum Discription Length Principle and Reasoning under Uncertainty. PhD thesis, Universiteit van Amsterdam (1998) Google Scholar
  4. 4.
    Barron, A.R., Rissanen, J.J., Yu, B.: The minimum description length principle in coding and modeling. IEEE Transactions on Information Theory 44, 2743–2760 (1998)MATHCrossRefMathSciNetGoogle Scholar
  5. 5.
    Rissanen, J.J.: Fisher Information and Stochastic Complexity. IEEE Trans on Information Theory 42, 40–47 (1996)MATHCrossRefMathSciNetGoogle Scholar
  6. 6.
    Barron, A.R., Cover, T.M.: Minimum complexity density estimation. IEEE Transactions on Information Theory 37, 1034–1054 (1991)MATHCrossRefMathSciNetGoogle Scholar
  7. 7.
    Rissanen, J.J.: Hypothesis selection and testing by the MDL principle. The Computer Journal 42, 260–269 (1999)MATHCrossRefGoogle Scholar
  8. 8.
    Zvonkin, A.K., Levin, L.A.: The complexity of finite objects and the development of the concepts of information and randomness by means of the theory of algorithms. Russian Mathematical Surveys 25, 83–124 (1970)MATHCrossRefMathSciNetGoogle Scholar
  9. 9.
    Li, M., Vitányi, P.M.B.: An introduction to Kolmogorov complexity and its applications, 2nd edn. Springer, Heidelberg (1997)MATHGoogle Scholar
  10. 10.
    Calude, C.S.: Information and Randomness, 2nd edn. Springer, Berlin (2002)MATHGoogle Scholar
  11. 11.
    Solomonoff, R.J.: A formal theory of inductive inference: Part 1 and 2. Inform. Control 7 (1964) 1–22, 224–254 MATHCrossRefMathSciNetGoogle Scholar
  12. 12.
    Solomonoff, R.J.: Complexity-based induction systems: comparisons and convergence theorems. IEEE Trans. Inform. Theory IT-24, 422–432 (1978)CrossRefMathSciNetGoogle Scholar
  13. 13.
    Hutter, M.: New error bounds for Solomonoff prediction. Journal of Computer and System Sciences 62, 653–667 (2001)MATHCrossRefMathSciNetGoogle Scholar
  14. 14.
    Hutter, M.: Optimality of universal Bayesian prediction for general loss and alphabet. Journal of Machine Learning Research 4, 971–1000 (2003)CrossRefMathSciNetGoogle Scholar
  15. 15.
    Hutter, M.: Sequence prediction based on monotone complexity. In: Proceedings of the 16th Annual Conference on Learning Theory (COLT 2003). LNCS (LNAI), pp. 506–521. Springer, Berlin (2003)Google Scholar
  16. 16.
    Hutter, M.: Convergence and error bounds of universal prediction for general alphabet. In: Flach, P.A., De Raedt, L. (eds.) ECML 2001. LNCS (LNAI), vol. 2167, pp. 239–250. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  17. 17.
    Li, M., Chen, X., Li, X., Ma, B., Vitányi, P.M.B.: The similarity metric. In: Proc. 14th ACM-SIAM Symposium on Discrete Algorithms, SODA (2003)Google Scholar
  18. 18.
    Poland, J., Hutter, M.: On the convergence speed of MDL predictions for Bernoulli sequences. (2004) (preprint) Google Scholar
  19. 19.
    Gács, P.: On the relation between descriptional complexity and algorithmic probability. Theoretical Computer Science 22, 71–93 (1983)MATHCrossRefMathSciNetGoogle Scholar
  20. 20.
    Levin, L.A.: On the notion of a random sequence. Soviet Math. Dokl. 14, 1413–1416 (1973)MATHGoogle Scholar
  21. 21.
    Schnorr, C.P.: Zufälligkeit und Wahrscheinlichkeit. In: Lecture Notes in Mathematics, vol. 218, Springer, Chichester (1971)Google Scholar
  22. 22.
    Wang, Y.: Randomness and Complexity. PhD thesis, Ruprecht-Karls-Universität Heidelberg (1996)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Jan Poland
    • 1
  • Marcus Hutter
    • 1
  1. 1.IDSIAManno (Lugano)Switzerland

Personalised recommendations