An Introduction to the Minimum Description Length Principle

  • Jun’ichi Takeuchi
Part of the Mathematics for Industry book series (MFI, volume 5)


We give a brief introduction to the minimum description length (MDL) principle. The MDL principle is a mathematical formulation of Occam’s razor. It says ‘simple explanations of a given phenomenon are to be preferred over complex ones.’ This is recognized as one of basic stances of scientists, and plays an important role in statistics and machine learning. In particular, Rissanen proposed MDL criterion for statistical model selection based on information theory in 1978. After that, much literature has been published and the notion of MDL principle was founded in the 1990s. In this article, we review some important results on the MDL principle.


Bayes mixture Laplace estimator MDL Model selection Minimax regret Universal code 


  1. 1.
    H. Akaike, A new look at the statistical model identification. IEEE Trans. Autom. Control 19(6), 716–723 (1974)MathSciNetCrossRefMATHGoogle Scholar
  2. 2.
    S. Amari, Differential-Geometrical Methods in Statistics, 2nd edn. (Springer, Berlin Heidelberg, 1990)Google Scholar
  3. 3.
    S. Amari, H. Nagaoka, Methods of Information Geometry (AMS & Oxford University Press, Oxford 2000)Google Scholar
  4. 4.
    A.R. Barron, T.M. Cover, Minimum complexity density estimation. IEEE Trans. Inf. Theory 37(4), 1034–1054 (1991)MathSciNetCrossRefMATHGoogle Scholar
  5. 5.
    A.R. Barron, J. Rissanen, B. Yu, The minimum description length principle in coding and modeling. IEEE Trans. Inf. Theory 44(6), 2743–2760 (1998)MathSciNetCrossRefMATHGoogle Scholar
  6. 6.
    A.R. Barron, J. Takeuchi, in Proceedings of 1998 Information Theory Workshop. Mixture models achieving optimal coding regret (1998), p. 16Google Scholar
  7. 7.
    L. Brown, Fundamentals of Statistical Exponential families (Institute of Mathematical Statistics, Hayward, 1986)Google Scholar
  8. 8.
    B. Clarke, A.R. Barron, Jeffreys prior is asymptotically least favorable under entropy risk. JSPI 41, 37–60 (1994)MathSciNetMATHGoogle Scholar
  9. 9.
    T.M. Cover, J.A. Thomas, in Elements of Information Theory, 2nd edn. Wiley Series in Telecommunications and Signal Processing (Wiley-Interscience, New York, 2006)Google Scholar
  10. 10.
    P. Grünwald, I.J. Myung, M. Pitt, Advances in Minimum Description Length: Theory and Applications (MIT Press, Cambridge, 2005)Google Scholar
  11. 11.
    P. Grünwald, The Minimum Description Length Principle (MIT Press, Cambridge, 2007)Google Scholar
  12. 12.
    P. Jacquet, W. Szpankowski, Markov types and minimax redundancy for Markov sources. IEEE Trans. Inf. Theory 50(7), 1393–1402 (2004)Google Scholar
  13. 13.
    H. Jeffreys, Theory of Probability, 3rd edn. (University of California Press, Berkeley, 1961)MATHGoogle Scholar
  14. 14.
    P. Kontkanen, P. Myllymäki, A linear-time algorithm for computing the multinomial stochastic complexity. Inf. Process. Lett. 103, 227–233 (2007)CrossRefMATHGoogle Scholar
  15. 15.
    R.E. Krichevsky, V.K. Trofimov, The performance of universal encoding. IEEE Trans. Inf. Theory 27(2), 199–207 (1981)MathSciNetCrossRefMATHGoogle Scholar
  16. 16.
    A.A. Maurer, in Medieval Philosophy. Etienne Gilson Series (Pontifical Instutite of Medieval Studies, Toronto, 1982)Google Scholar
  17. 17.
    A. Rényi, On measures of entropy and information, in Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1 (1961), pp. 547–561Google Scholar
  18. 18.
    J. Rissanen, Modeling by shortest data description. Automatica 14, 465–471 (1978)CrossRefMATHGoogle Scholar
  19. 19.
    J. Rissanen, Fisher information and stochastic complexity. IEEE Trans. Inf. Theory 40(1), 40–47 (1996)MathSciNetCrossRefGoogle Scholar
  20. 20.
    Y.M. Shtar’kov, Universal sequential coding of single messages. Probl. Inf. Transm. 23, 3–17 (1988)Google Scholar
  21. 21.
    T. Silander, T. Roos, P. Myllymäki, Learning locally minimax optimal Bayesian networks. Int. J. Approx. Reason. 51(5), 544–557 (2010)CrossRefGoogle Scholar
  22. 22.
    J. Takeuchi, A.R. Barron, Asymptotically minimax regret for exponential families, in Proceedings of the 20th Symposium on Information Theory and Its Applications (SITA’97), (1997), pp. 665–668Google Scholar
  23. 23.
    J. Takeuchi, A.R. Barron, Asymptotically minimax regret by Bayes mixtures, in Proceedings of 1998 IEEE International Symposium on Information Theory, (1998), p. 318Google Scholar
  24. 24.
    J. Takeuchi, A.R. Barron, Asymptotically minimax regret by Bayes mixtures for non-exponential families, in Proceedings of 2013 IEEE Information Theory Workshop, (2013a), pp. 204–208Google Scholar
  25. 25.
    J. Takeuchi, A.R. Barron, Asymptotically minimax prediction for mixture families, in Proceedings of the 36th Symposium on Information Theory and Its Applications (SITA’13), (2013b), pp. 653–657Google Scholar
  26. 26.
    J. Takeuchi, A.R. Barron, T. Kawabata, Statistical curvature and stochastic complexity, in Proceedings of the 2nd Symposium on Information Geometry and Its Applications (2006), pp. 29–36Google Scholar
  27. 27.
    J. Takeuchi, T. Kawabata, A.R. Barron, Properties of Jeffreys mixture for Markov sources. IEEE Trans. Inf. Theory 59(1), 438–457 (2013)MathSciNetCrossRefGoogle Scholar
  28. 28.
    Q. Xie, A.R. Barron, Asymptotic minimax regret for data compression, gambling and prediction. IEEE Trans. Inf. Theory 46(2), 431–445 (2000)MathSciNetCrossRefMATHGoogle Scholar

Copyright information

© Springer Japan 2014

Authors and Affiliations

  1. 1.Kyushu UniversityFukuokaJapan

Personalised recommendations