The Minimum Description Length principle for online sequence estimation/prediction in a proper learning setup is studied. If the underlying model class is discrete, then the total expected square loss is a particularly interesting performance measure: (a) this quantity is finitely bounded, implying convergence with probability one, and (b) it additionally specifies the convergence speed. For MDL, in general one can only have loss bounds which are finite but exponentially larger than those for Bayes mixtures. We show that this is even the case if the model class contains only Bernoulli distributions. We derive a new upper bound on the prediction error for countable Bernoulli classes. This implies a small bound (comparable to the one for Bayes mixtures) for certain important model classes. We discuss the application to Machine Learning tasks such as classification and hypothesis testing, and generalization to countable classes of i.i.d. models.
KeywordsArtificial Intelligence Machine Learn Hypothesis Testing Prediction Error Model Class
Unable to display preview. Download preview PDF.
- Grünwald P. and Langford J. 2004. Suboptimal behaviour of Bayes and MDL in classification under misspecification. In 17th Annual Conference on Learning Theory (COLT, pp. 331–347.Google Scholar
- Hutter M. 2001. Convergence and error bounds for universal prediction of nonbinary sequences. Proc. 12th Eurpean Conference on Machine Learning (ECML-2001), pp. 239–250Google Scholar
- Hutter. M. 2003c. Sequence prediction based on monotone complexity. In Proc. 16th Annual Conference on Learning Theory (COLT-2003), Lecture Notes in Artificial Intelligence, Berlin, Springer, pp. 506–521.Google Scholar
- Li J. Q. 1999. Estimation of Mixture Models. PhD thesis, Dept. of Statistics. Yale University.Google Scholar
- Li M. and Vit’anyi P. M. B. 1997. An introduction to Kolmogorov complexity and its applications. Springer, 2nd edition.Google Scholar
- Poland J. and Hutter M. 2004a. Convergence of discrete MDL for sequential prediction. In 17th Annual Conference on Learning Theory (COLT), pp. 300–314.Google Scholar
- Poland J. and Hutter M. 2004b. On the convergence speed of MDL predictions for Bernoulli sequences. In International Conference on Algorithmic Learning Theory (ALT), pp. 294– 308.Google Scholar
- Poland J. and Hutter M. 2005. Strong asymptotic assertions for discrete MDL in regression and classification. In Benelearn 2005 (Ann. Machine Learning Conf. of Belgium and the Netherlands)Google Scholar
- Zhang T. 2004. On the convergence of MDL density estimation. In Proc. 17th Annual Conference on Learning Theory (COLT), pp. 315–330,Google Scholar