Skip to main content

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7070))

Abstract

The No Free Lunch theorems are often used to argue that domain specific knowledge is required to design successful algorithms. We use algorithmic information theory to argue the case for a universal bias allowing an algorithm to succeed in all interesting problem domains. Additionally, we give a new algorithm for off-line classification, inspired by Solomonoff induction, with good performance on all structured (compressible) problems under reasonable assumptions. This includes a proof of the efficacy of the well-known heuristic of randomly selecting training data in the hope of reducing the misclassification rate.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Carroll, J., Seppi, K.: No-free-lunch and Bayesian optimality. In: IJCNN Workshop on Meta-Learning (2007)

    Google Scholar 

  2. Cilibrasi, R., Vitanyi, P.: Clustering by compression. IEEE Transactions on Information Theory 51(4), 1523–1545 (2005)

    Article  MathSciNet  Google Scholar 

  3. Derbeko, P., El-yaniv, R., Meir, R.: Error bounds for transductive learning via compression and clustering. In: NIPS, vol. 16 (2004)

    Google Scholar 

  4. Dowe, D.: MML, hybrid Bayesian network graphical models, statistical consistency, invariance and uniqueness. In: Handbook of Philosophy of Statistics, vol. 7, pp. 901–982. Elsevier (2011)

    Google Scholar 

  5. Gács, P.: On the relation between descriptional complexity and algorithmic probability. Theoretical Computer Science 22(1-2), 71–93 (1983)

    Article  MathSciNet  MATH  Google Scholar 

  6. Gács, P.: Expanded and improved proof of the relation between description complexity and algorithmic probability (2008) (unpublished)

    Google Scholar 

  7. Giraud-Carrier, C., Provost, F.: Toward a justification of meta-learning: Is the no free lunch theorem a show-stopper. In: ICML Workshop on Meta-Learning, pp. 9–16 (2005)

    Google Scholar 

  8. Grünwald, P.: The Minimum Description Length Principle. MIT Press Books, vol. 1. The MIT Press (2007)

    Google Scholar 

  9. Hutter, M.: Universal Artificial Intelligence: Sequential Decisions based on Algorithmic Probability. Springer, Berlin (2004)

    Google Scholar 

  10. Hutter, M.: A complete theory of everything (will be subjective). Algorithms 3(4), 329–350 (2010)

    Article  MathSciNet  Google Scholar 

  11. Hutter, M., Muchnik, A.: On semimeasures predicting Martin-Löf random sequences. Theoretical Computer Science 382(3), 247–261 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  12. Kirchherr, W., Li, M., Vitanyi, P.: The miraculous universal distribution. The Mathematical Intelligencer 19(4), 7–15 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  13. Li, M., Vitanyi, P.: An Introduction to Kolmogorov Complexity and Its Applications, 3rd edn. Springer (2008)

    Google Scholar 

  14. Martin-Löf, P.: The definition of random sequences. Information and Control 9(6), 602–619 (1966)

    Article  MathSciNet  MATH  Google Scholar 

  15. Rathmanner, S., Hutter, M.: A philosophical treatise of universal induction. Entropy 13(6), 1076–1136 (2011)

    Article  MathSciNet  Google Scholar 

  16. Schaffer, C.: A conservation law for generalization performance. In: Proceedings of the Eleventh International Conference on Machine Learning, pp. 259–265. Morgan Kaufmann (1994)

    Google Scholar 

  17. Schumacher, C., Vose, M., Whitley, L.: The no free lunch and problem description length. In: Spector, L., Goodman, E.D. (eds.) GECCO 2001: Proc. of the Genetic and Evolutionary Computation Conf., pp. 565–570. Morgan Kaufmann, San Francisco (2001)

    Google Scholar 

  18. Solomonoff, R.: A formal theory of inductive inference, Part I. Information and Control 7(1), 1–22 (1964)

    Article  MathSciNet  MATH  Google Scholar 

  19. Solomonoff, R.: A formal theory of inductive inference, Part II. Information and Control 7(2), 224–254 (1964)

    Article  MathSciNet  MATH  Google Scholar 

  20. Vapnik, V.: Estimation of Dependences Based on Empirical Data. Springer, New York (1982)

    MATH  Google Scholar 

  21. Vapnik, V.: The Nature of Statistical Learning Theory, 2nd edn. Springer, Berlin (2000)

    Book  MATH  Google Scholar 

  22. Veness, J., Ng, K.S., Hutter, M., Uther, W., Silver, D.: A Monte Carlo AIXI approximation. Journal of Artificial Intelligence Research 40, 95–142 (2011)

    MathSciNet  MATH  Google Scholar 

  23. Wallace, C., Boulton, D.: An information measure for classification. The Computer Journal 11(2), 185–194 (1968)

    Article  MATH  Google Scholar 

  24. Wallace, C., Dowe, D.: Minimum message length and Kolmogorov complexity. The Computer Journal 42(4), 270–283 (1999)

    Article  MATH  Google Scholar 

  25. Watanabe, S., Donovan, S.: Knowing and guessing; a quantitative study of inference and information. Wiley, New York (1969)

    MATH  Google Scholar 

  26. Wolpert, D.: The supervised learning no-free-lunch theorems. In: Proc. 6th Online World Conference on Soft Computing in Industrial Applications, pp. 25–42 (2001)

    Google Scholar 

  27. Wolpert, D., Macready, W.: No free lunch theorems for optimization. IEEE Transactions on Evolutionary Computation 1(1), 67–82 (1997)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Lattimore, T., Hutter, M. (2013). No Free Lunch versus Occam’s Razor in Supervised Learning. In: Dowe, D.L. (eds) Algorithmic Probability and Friends. Bayesian Prediction and Artificial Intelligence. Lecture Notes in Computer Science, vol 7070. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-44958-1_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-44958-1_17

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-44957-4

  • Online ISBN: 978-3-642-44958-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics