Skip to main content

Minimum Description Length Principle

  • Reference work entry
  • First Online:
Encyclopedia of Machine Learning and Data Mining
  • 224 Accesses

Abstract

The minimum description length (MDL) principle states that one should prefer the model that yields the shortest description of the data when the complexity of the model itself is also accounted for. MDL provides a versatile approach to statistical modeling. It is applicable to model selection and regularization. Modern versions of MDL lead to robust methods that are well suited for choosing an appropriate model complexity based on the data, thus extracting the maximum amount of information from the data without over-fitting. The modern versions of MDL go well beyond the familiar \(\frac{k} {2} \log n\) formula.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 699.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 949.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Good review articles on MDL include Barron et al. (1998); Hansen and Yu (2001). The textbook by Grünwald (2007) is a comprehensive and detailed reference covering developments until 2007 Grünwald (2007).

Recommended Reading

Good review articles on MDL include Barron et al. (1998); Hansen and Yu (2001). The textbook by Grünwald (2007) is a comprehensive and detailed reference covering developments until 2007 Grünwald (2007).

  • Barron A, Cover T (1991) Minimum complexity density estimation. IEEE Trans Inf Theory 37(4):1034–1054

    Article  MathSciNet  MATH  Google Scholar 

  • Barron A, Rissanen J, Yu B (1998) The minimum description length principle in coding and modeling. IEEE Trans Inf Theory 44:2734–2760

    Article  MathSciNet  MATH  Google Scholar 

  • Fayyad U, Irani K (1993) Multi-interval discretization of continuous-valued attributes for classification learning. In: Bajczy R (ed) Proceedings of the 13th International Joint Conference on Artificial Intelligence and Minimum Description Length Principle, Chambery. Morgan Kauffman

    Google Scholar 

  • Grünwald P (2007) The Minimum Description Length Principle. MIT Press, Cambridge

    Google Scholar 

  • Hansen M, Yu B (2001) Model selection and the principle of minimum description length. J Am Stat Assoc 96(454):746–774

    Article  MathSciNet  MATH  Google Scholar 

  • Lam W, Bacchus F (1994) Learning Bayesian belief networks: an approach based on the MDL principle. Comput Intell 10:269–293

    Article  Google Scholar 

  • Rissanen J (1978) Modeling by shortest data description. Automatica 14(5):465–658

    Article  MATH  Google Scholar 

  • Rissanen J (1984) Universal coding, information, prediction, and estimation. IEEE Trans Inf Theory 30:629–636

    Article  MathSciNet  MATH  Google Scholar 

  • Rissanen J (1986) Stochastic complexity and modeling. Ann Stat 14(3):1080–1100

    Article  MathSciNet  MATH  Google Scholar 

  • Rissanen J (1996) Fisher information and stochasic complexity. IEEE Trans Inf Theory 42(1):40–47

    Article  MATH  Google Scholar 

  • Rissanen J (2000) MDL denoising. IEEE Trans Inf Theory 46(7):2537–2543

    Article  MATH  Google Scholar 

  • Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464

    Article  MathSciNet  MATH  Google Scholar 

  • Silander T, Roos T, Myllymäki P (2010) Learning locally minimax optimal Bayesian networks. Int J Approx Reason 51(5):544–557

    Article  MathSciNet  Google Scholar 

  • Speed T, Yu B (1993) Model selection and prediction: normal regression. Ann Inst Stat Math 45(1):35–54

    Article  MathSciNet  MATH  Google Scholar 

  • Wallace C, Boulton D (1968) An information measure for classification. Comput J 11(2):185–194

    Article  MATH  Google Scholar 

  • Wei C (1992) On predictive least squares principles. Ann Stat 20(1):1–42

    Article  MathSciNet  MATH  Google Scholar 

  • Weinberger M, Rissanen J, Feder M (1995) A universal finite memory source. IEEE Trans Inf Theory 41(3):643–652

    Article  MATH  Google Scholar 

  • Yuri Shtarkov (1987) Universal sequential coding of single messages. Probl Inf Transm 23(3):3–17

    MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Teemu Roos .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Science+Business Media New York

About this entry

Cite this entry

Roos, T. (2017). Minimum Description Length Principle. In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning and Data Mining. Springer, Boston, MA. https://doi.org/10.1007/978-1-4899-7687-1_894

Download citation

Publish with us

Policies and ethics