No Free Lunch versus Occam’s Razor in Supervised Learning

  • Tor Lattimore
  • Marcus Hutter
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7070)


The No Free Lunch theorems are often used to argue that domain specific knowledge is required to design successful algorithms. We use algorithmic information theory to argue the case for a universal bias allowing an algorithm to succeed in all interesting problem domains. Additionally, we give a new algorithm for off-line classification, inspired by Solomonoff induction, with good performance on all structured (compressible) problems under reasonable assumptions. This includes a proof of the efficacy of the well-known heuristic of randomly selecting training data in the hope of reducing the misclassification rate.


Supervised learning Kolmogorov complexity no free lunch Occam’s razor 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Carroll, J., Seppi, K.: No-free-lunch and Bayesian optimality. In: IJCNN Workshop on Meta-Learning (2007)Google Scholar
  2. 2.
    Cilibrasi, R., Vitanyi, P.: Clustering by compression. IEEE Transactions on Information Theory 51(4), 1523–1545 (2005)MathSciNetCrossRefGoogle Scholar
  3. 3.
    Derbeko, P., El-yaniv, R., Meir, R.: Error bounds for transductive learning via compression and clustering. In: NIPS, vol. 16 (2004)Google Scholar
  4. 4.
    Dowe, D.: MML, hybrid Bayesian network graphical models, statistical consistency, invariance and uniqueness. In: Handbook of Philosophy of Statistics, vol. 7, pp. 901–982. Elsevier (2011)Google Scholar
  5. 5.
    Gács, P.: On the relation between descriptional complexity and algorithmic probability. Theoretical Computer Science 22(1-2), 71–93 (1983)MathSciNetCrossRefzbMATHGoogle Scholar
  6. 6.
    Gács, P.: Expanded and improved proof of the relation between description complexity and algorithmic probability (2008) (unpublished)Google Scholar
  7. 7.
    Giraud-Carrier, C., Provost, F.: Toward a justification of meta-learning: Is the no free lunch theorem a show-stopper. In: ICML Workshop on Meta-Learning, pp. 9–16 (2005)Google Scholar
  8. 8.
    Grünwald, P.: The Minimum Description Length Principle. MIT Press Books, vol. 1. The MIT Press (2007)Google Scholar
  9. 9.
    Hutter, M.: Universal Artificial Intelligence: Sequential Decisions based on Algorithmic Probability. Springer, Berlin (2004)Google Scholar
  10. 10.
    Hutter, M.: A complete theory of everything (will be subjective). Algorithms 3(4), 329–350 (2010)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Hutter, M., Muchnik, A.: On semimeasures predicting Martin-Löf random sequences. Theoretical Computer Science 382(3), 247–261 (2007)MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    Kirchherr, W., Li, M., Vitanyi, P.: The miraculous universal distribution. The Mathematical Intelligencer 19(4), 7–15 (1997)MathSciNetCrossRefzbMATHGoogle Scholar
  13. 13.
    Li, M., Vitanyi, P.: An Introduction to Kolmogorov Complexity and Its Applications, 3rd edn. Springer (2008)Google Scholar
  14. 14.
    Martin-Löf, P.: The definition of random sequences. Information and Control 9(6), 602–619 (1966)MathSciNetCrossRefzbMATHGoogle Scholar
  15. 15.
    Rathmanner, S., Hutter, M.: A philosophical treatise of universal induction. Entropy 13(6), 1076–1136 (2011)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Schaffer, C.: A conservation law for generalization performance. In: Proceedings of the Eleventh International Conference on Machine Learning, pp. 259–265. Morgan Kaufmann (1994)Google Scholar
  17. 17.
    Schumacher, C., Vose, M., Whitley, L.: The no free lunch and problem description length. In: Spector, L., Goodman, E.D. (eds.) GECCO 2001: Proc. of the Genetic and Evolutionary Computation Conf., pp. 565–570. Morgan Kaufmann, San Francisco (2001)Google Scholar
  18. 18.
    Solomonoff, R.: A formal theory of inductive inference, Part I. Information and Control 7(1), 1–22 (1964)MathSciNetCrossRefzbMATHGoogle Scholar
  19. 19.
    Solomonoff, R.: A formal theory of inductive inference, Part II. Information and Control 7(2), 224–254 (1964)MathSciNetCrossRefzbMATHGoogle Scholar
  20. 20.
    Vapnik, V.: Estimation of Dependences Based on Empirical Data. Springer, New York (1982)zbMATHGoogle Scholar
  21. 21.
    Vapnik, V.: The Nature of Statistical Learning Theory, 2nd edn. Springer, Berlin (2000)CrossRefzbMATHGoogle Scholar
  22. 22.
    Veness, J., Ng, K.S., Hutter, M., Uther, W., Silver, D.: A Monte Carlo AIXI approximation. Journal of Artificial Intelligence Research 40, 95–142 (2011)MathSciNetzbMATHGoogle Scholar
  23. 23.
    Wallace, C., Boulton, D.: An information measure for classification. The Computer Journal 11(2), 185–194 (1968)CrossRefzbMATHGoogle Scholar
  24. 24.
    Wallace, C., Dowe, D.: Minimum message length and Kolmogorov complexity. The Computer Journal 42(4), 270–283 (1999)CrossRefzbMATHGoogle Scholar
  25. 25.
    Watanabe, S., Donovan, S.: Knowing and guessing; a quantitative study of inference and information. Wiley, New York (1969)zbMATHGoogle Scholar
  26. 26.
    Wolpert, D.: The supervised learning no-free-lunch theorems. In: Proc. 6th Online World Conference on Soft Computing in Industrial Applications, pp. 25–42 (2001)Google Scholar
  27. 27.
    Wolpert, D., Macready, W.: No free lunch theorems for optimization. IEEE Transactions on Evolutionary Computation 1(1), 67–82 (1997)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Tor Lattimore
    • 1
  • Marcus Hutter
    • 1
    • 2
    • 3
  1. 1.Australian National UniversityCanberraAustralia
  2. 2.ETH ZürichSwitzerland
  3. 3.NICTAAustralia

Personalised recommendations