Abstract
Machine learning has been formalized as the problem of estimating a conditional distribution as the ‘concept’ to be learned. The learning algorithm is based upon the MDL (Minimum Description Length) principle. The asymptotically optimal learning rate is determined for a typical example.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Angluin, D., Laird, P. (1988), ‘Learning from Noisy Examples’, Machine Learning 2, 343–370
Elias, P. (1975), ‘Universal Codeword Sets and Representations of the Integers’, IEEE Trans. Inf. Theory, Vol. IT-21, no. 2, 194–203
Haussler, D. (1989), ‘Generalizing the PAC Model for Neural Net and Other Learning Applications’, Technical Report UCSC CRL-89-30, University of Santa Cruz, September
Kearns M., Li M. (1988), ‘Learning in the Presence of Malicious Errors’, Proc. of the 20th Annual ACM Symposium on Theory of Computing’, Illinois, May 1988
Pollard, D. (1984), Convergence of Stochastic Processes, Springer-Verlag, New York, (215 pages)
Rissanen, J. (1983), ‘A Universal Prior for Integers and Estimation by Minimum Description Length’, Annals of Statistics, Vol. 11, No. 2, 416–431
Rissanen, J. (1984), ‘Universal Coding, Information, Prediction, and Estimation’, IEEE Trans. Inf. Theory, Vol. IT-30, No. 4, 629–636
Rissanen, J. (1986), ‘Stochastic Complexity and Modeling’, Annals of Statistics, Vol 14, 1080–1100
Rissanen, J. (1989), Stochastic Complexity in Statistical Inquiry, World Scientific Publ. Co., New Jersey, (175 pages)
Rissanen, J. (1994), ‘Stochastic Complexity and Fisher Information’ (submitted to IEEE Trans. Inf. Theory)
Rissanen, J., Speed, T., Yu, B. (1989), ‘Density Estimation by Stochastic Complexity’, IEEE Trans. Inf. Theory, Vol. IT-38, No. 2, 315–323
Rivest, R.L. (1987), ‘Learning Decision Lists’, Machine Learning, 2, 229–246
Stone, C.J. (1980), ‘Optimal Rates of Convergence for Nonparametric Estimators’, Annals of Statistics, Vol. 8, No. 6, 1348–1360
Suzuki, J. (1990), ‘Generalization of the Learning Method for Classifying Rules with Consistency Irrespective of the Representation Form and the Number of Classified Patterns’, Int. Symposium on Information Theory and its Applications, Hawaii, November 1990
Valiant, L.G. (1984), ‘A Theory of the Learnable’, Comm. of the ACM, 27, 1134–1142
Yamanishi, K. (1990), ‘A Learning Criterion for Stochastic Rules’, Proc. of the Third Annual Workshop on Computational Learning Theory, August 1990
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1996 Birkhäuser Boston
About this chapter
Cite this chapter
Rissanen, J., Yu, B. (1996). Learning by MDL. In: Kueker, D.W., Smith, C.H. (eds) Learning and Geometry: Computational Approaches. Progress in Computer Science and Applied Logic, vol 14. Birkhäuser Boston. https://doi.org/10.1007/978-1-4612-4088-4_1
Download citation
DOI: https://doi.org/10.1007/978-1-4612-4088-4_1
Publisher Name: Birkhäuser Boston
Print ISBN: 978-1-4612-8646-2
Online ISBN: 978-1-4612-4088-4
eBook Packages: Springer Book Archive