No Free Lunch versus Occam’s Razor in Supervised Learning

Lattimore, Tor; Hutter, Marcus

doi:10.1007/978-3-642-44958-1_17

Tor Lattimore¹⁷ &
Marcus Hutter^17,18,19

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7070))

1687 Accesses
12 Citations

Abstract

The No Free Lunch theorems are often used to argue that domain specific knowledge is required to design successful algorithms. We use algorithmic information theory to argue the case for a universal bias allowing an algorithm to succeed in all interesting problem domains. Additionally, we give a new algorithm for off-line classification, inspired by Solomonoff induction, with good performance on all structured (compressible) problems under reasonable assumptions. This includes a proof of the efficacy of the well-known heuristic of randomly selecting training data in the hope of reducing the misclassification rate.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Carroll, J., Seppi, K.: No-free-lunch and Bayesian optimality. In: IJCNN Workshop on Meta-Learning (2007)
Google Scholar
Cilibrasi, R., Vitanyi, P.: Clustering by compression. IEEE Transactions on Information Theory 51(4), 1523–1545 (2005)
Article MathSciNet Google Scholar
Derbeko, P., El-yaniv, R., Meir, R.: Error bounds for transductive learning via compression and clustering. In: NIPS, vol. 16 (2004)
Google Scholar
Dowe, D.: MML, hybrid Bayesian network graphical models, statistical consistency, invariance and uniqueness. In: Handbook of Philosophy of Statistics, vol. 7, pp. 901–982. Elsevier (2011)
Google Scholar
Gács, P.: On the relation between descriptional complexity and algorithmic probability. Theoretical Computer Science 22(1-2), 71–93 (1983)
Article MathSciNet MATH Google Scholar
Gács, P.: Expanded and improved proof of the relation between description complexity and algorithmic probability (2008) (unpublished)
Google Scholar
Giraud-Carrier, C., Provost, F.: Toward a justification of meta-learning: Is the no free lunch theorem a show-stopper. In: ICML Workshop on Meta-Learning, pp. 9–16 (2005)
Google Scholar
Grünwald, P.: The Minimum Description Length Principle. MIT Press Books, vol. 1. The MIT Press (2007)
Google Scholar
Hutter, M.: Universal Artificial Intelligence: Sequential Decisions based on Algorithmic Probability. Springer, Berlin (2004)
Google Scholar
Hutter, M.: A complete theory of everything (will be subjective). Algorithms 3(4), 329–350 (2010)
Article MathSciNet Google Scholar
Hutter, M., Muchnik, A.: On semimeasures predicting Martin-Löf random sequences. Theoretical Computer Science 382(3), 247–261 (2007)
Article MathSciNet MATH Google Scholar
Kirchherr, W., Li, M., Vitanyi, P.: The miraculous universal distribution. The Mathematical Intelligencer 19(4), 7–15 (1997)
Article MathSciNet MATH Google Scholar
Li, M., Vitanyi, P.: An Introduction to Kolmogorov Complexity and Its Applications, 3rd edn. Springer (2008)
Google Scholar
Martin-Löf, P.: The definition of random sequences. Information and Control 9(6), 602–619 (1966)
Article MathSciNet MATH Google Scholar
Rathmanner, S., Hutter, M.: A philosophical treatise of universal induction. Entropy 13(6), 1076–1136 (2011)
Article MathSciNet Google Scholar
Schaffer, C.: A conservation law for generalization performance. In: Proceedings of the Eleventh International Conference on Machine Learning, pp. 259–265. Morgan Kaufmann (1994)
Google Scholar
Schumacher, C., Vose, M., Whitley, L.: The no free lunch and problem description length. In: Spector, L., Goodman, E.D. (eds.) GECCO 2001: Proc. of the Genetic and Evolutionary Computation Conf., pp. 565–570. Morgan Kaufmann, San Francisco (2001)
Google Scholar
Solomonoff, R.: A formal theory of inductive inference, Part I. Information and Control 7(1), 1–22 (1964)
Article MathSciNet MATH Google Scholar
Solomonoff, R.: A formal theory of inductive inference, Part II. Information and Control 7(2), 224–254 (1964)
Article MathSciNet MATH Google Scholar
Vapnik, V.: Estimation of Dependences Based on Empirical Data. Springer, New York (1982)
MATH Google Scholar
Vapnik, V.: The Nature of Statistical Learning Theory, 2nd edn. Springer, Berlin (2000)
Book MATH Google Scholar
Veness, J., Ng, K.S., Hutter, M., Uther, W., Silver, D.: A Monte Carlo AIXI approximation. Journal of Artificial Intelligence Research 40, 95–142 (2011)
MathSciNet MATH Google Scholar
Wallace, C., Boulton, D.: An information measure for classification. The Computer Journal 11(2), 185–194 (1968)
Article MATH Google Scholar
Wallace, C., Dowe, D.: Minimum message length and Kolmogorov complexity. The Computer Journal 42(4), 270–283 (1999)
Article MATH Google Scholar
Watanabe, S., Donovan, S.: Knowing and guessing; a quantitative study of inference and information. Wiley, New York (1969)
MATH Google Scholar
Wolpert, D.: The supervised learning no-free-lunch theorems. In: Proc. 6th Online World Conference on Soft Computing in Industrial Applications, pp. 25–42 (2001)
Google Scholar
Wolpert, D., Macready, W.: No free lunch theorems for optimization. IEEE Transactions on Evolutionary Computation 1(1), 67–82 (1997)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Australian National University, Canberra, Australia
Tor Lattimore & Marcus Hutter
ETH Zürich, Switzerland
Marcus Hutter
NICTA, Australia
Marcus Hutter

Authors

Tor Lattimore
View author publications
You can also search for this author in PubMed Google Scholar
Marcus Hutter
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Information Technology, Clayton School of Information Technology, Monash University, Bldg. 63, Wellington Road, 3800, Clayton, VIC, Australia
David L. Dowe

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Lattimore, T., Hutter, M. (2013). No Free Lunch versus Occam’s Razor in Supervised Learning. In: Dowe, D.L. (eds) Algorithmic Probability and Friends. Bayesian Prediction and Artificial Intelligence. Lecture Notes in Computer Science, vol 7070. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-44958-1_17

Download citation

DOI: https://doi.org/10.1007/978-3-642-44958-1_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-44957-4
Online ISBN: 978-3-642-44958-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics