Abstract
The minimum description length (MDL) principle states that one should prefer the model that yields the shortest description of the data when the complexity of the model itself is also accounted for. MDL provides a versatile approach to statistical modeling. It is applicable to model selection and regularization. Modern versions of MDL lead to robust methods that are well suited for choosing an appropriate model complexity based on the data, thus extracting the maximum amount of information from the data without over-fitting. The modern versions of MDL go well beyond the familiar \(\frac{k} {2} \log n\) formula.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Recommended Reading
Good review articles on MDL include Barron et al. (1998); Hansen and Yu (2001). The textbook by Grünwald (2007) is a comprehensive and detailed reference covering developments until 2007 Grünwald (2007).
Barron A, Cover T (1991) Minimum complexity density estimation. IEEE Trans Inf Theory 37(4):1034–1054
Barron A, Rissanen J, Yu B (1998) The minimum description length principle in coding and modeling. IEEE Trans Inf Theory 44:2734–2760
Fayyad U, Irani K (1993) Multi-interval discretization of continuous-valued attributes for classification learning. In: Bajczy R (ed) Proceedings of the 13th International Joint Conference on Artificial Intelligence and Minimum Description Length Principle, Chambery. Morgan Kauffman
Grünwald P (2007) The Minimum Description Length Principle. MIT Press, Cambridge
Hansen M, Yu B (2001) Model selection and the principle of minimum description length. J Am Stat Assoc 96(454):746–774
Lam W, Bacchus F (1994) Learning Bayesian belief networks: an approach based on the MDL principle. Comput Intell 10:269–293
Rissanen J (1978) Modeling by shortest data description. Automatica 14(5):465–658
Rissanen J (1984) Universal coding, information, prediction, and estimation. IEEE Trans Inf Theory 30:629–636
Rissanen J (1986) Stochastic complexity and modeling. Ann Stat 14(3):1080–1100
Rissanen J (1996) Fisher information and stochasic complexity. IEEE Trans Inf Theory 42(1):40–47
Rissanen J (2000) MDL denoising. IEEE Trans Inf Theory 46(7):2537–2543
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464
Silander T, Roos T, Myllymäki P (2010) Learning locally minimax optimal Bayesian networks. Int J Approx Reason 51(5):544–557
Speed T, Yu B (1993) Model selection and prediction: normal regression. Ann Inst Stat Math 45(1):35–54
Wallace C, Boulton D (1968) An information measure for classification. Comput J 11(2):185–194
Wei C (1992) On predictive least squares principles. Ann Stat 20(1):1–42
Weinberger M, Rissanen J, Feder M (1995) A universal finite memory source. IEEE Trans Inf Theory 41(3):643–652
Yuri Shtarkov (1987) Universal sequential coding of single messages. Probl Inf Transm 23(3):3–17
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Science+Business Media New York
About this entry
Cite this entry
Roos, T. (2017). Minimum Description Length Principle. In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning and Data Mining. Springer, Boston, MA. https://doi.org/10.1007/978-1-4899-7687-1_894
Download citation
DOI: https://doi.org/10.1007/978-1-4899-7687-1_894
Published:
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4899-7685-7
Online ISBN: 978-1-4899-7687-1
eBook Packages: Computer ScienceReference Module Computer Science and Engineering