Algorithmic Probability and Friends. Bayesian Prediction and Artificial Intelligence pp 223-235 | Cite as
No Free Lunch versus Occam’s Razor in Supervised Learning
Abstract
The No Free Lunch theorems are often used to argue that domain specific knowledge is required to design successful algorithms. We use algorithmic information theory to argue the case for a universal bias allowing an algorithm to succeed in all interesting problem domains. Additionally, we give a new algorithm for off-line classification, inspired by Solomonoff induction, with good performance on all structured (compressible) problems under reasonable assumptions. This includes a proof of the efficacy of the well-known heuristic of randomly selecting training data in the hope of reducing the misclassification rate.
Keywords
Supervised learning Kolmogorov complexity no free lunch Occam’s razorPreview
Unable to display preview. Download preview PDF.
References
- 1.Carroll, J., Seppi, K.: No-free-lunch and Bayesian optimality. In: IJCNN Workshop on Meta-Learning (2007)Google Scholar
- 2.Cilibrasi, R., Vitanyi, P.: Clustering by compression. IEEE Transactions on Information Theory 51(4), 1523–1545 (2005)MathSciNetCrossRefGoogle Scholar
- 3.Derbeko, P., El-yaniv, R., Meir, R.: Error bounds for transductive learning via compression and clustering. In: NIPS, vol. 16 (2004)Google Scholar
- 4.Dowe, D.: MML, hybrid Bayesian network graphical models, statistical consistency, invariance and uniqueness. In: Handbook of Philosophy of Statistics, vol. 7, pp. 901–982. Elsevier (2011)Google Scholar
- 5.Gács, P.: On the relation between descriptional complexity and algorithmic probability. Theoretical Computer Science 22(1-2), 71–93 (1983)MathSciNetCrossRefzbMATHGoogle Scholar
- 6.Gács, P.: Expanded and improved proof of the relation between description complexity and algorithmic probability (2008) (unpublished)Google Scholar
- 7.Giraud-Carrier, C., Provost, F.: Toward a justification of meta-learning: Is the no free lunch theorem a show-stopper. In: ICML Workshop on Meta-Learning, pp. 9–16 (2005)Google Scholar
- 8.Grünwald, P.: The Minimum Description Length Principle. MIT Press Books, vol. 1. The MIT Press (2007)Google Scholar
- 9.Hutter, M.: Universal Artificial Intelligence: Sequential Decisions based on Algorithmic Probability. Springer, Berlin (2004)Google Scholar
- 10.Hutter, M.: A complete theory of everything (will be subjective). Algorithms 3(4), 329–350 (2010)MathSciNetCrossRefGoogle Scholar
- 11.Hutter, M., Muchnik, A.: On semimeasures predicting Martin-Löf random sequences. Theoretical Computer Science 382(3), 247–261 (2007)MathSciNetCrossRefzbMATHGoogle Scholar
- 12.Kirchherr, W., Li, M., Vitanyi, P.: The miraculous universal distribution. The Mathematical Intelligencer 19(4), 7–15 (1997)MathSciNetCrossRefzbMATHGoogle Scholar
- 13.Li, M., Vitanyi, P.: An Introduction to Kolmogorov Complexity and Its Applications, 3rd edn. Springer (2008)Google Scholar
- 14.Martin-Löf, P.: The definition of random sequences. Information and Control 9(6), 602–619 (1966)MathSciNetCrossRefzbMATHGoogle Scholar
- 15.Rathmanner, S., Hutter, M.: A philosophical treatise of universal induction. Entropy 13(6), 1076–1136 (2011)MathSciNetCrossRefGoogle Scholar
- 16.Schaffer, C.: A conservation law for generalization performance. In: Proceedings of the Eleventh International Conference on Machine Learning, pp. 259–265. Morgan Kaufmann (1994)Google Scholar
- 17.Schumacher, C., Vose, M., Whitley, L.: The no free lunch and problem description length. In: Spector, L., Goodman, E.D. (eds.) GECCO 2001: Proc. of the Genetic and Evolutionary Computation Conf., pp. 565–570. Morgan Kaufmann, San Francisco (2001)Google Scholar
- 18.Solomonoff, R.: A formal theory of inductive inference, Part I. Information and Control 7(1), 1–22 (1964)MathSciNetCrossRefzbMATHGoogle Scholar
- 19.Solomonoff, R.: A formal theory of inductive inference, Part II. Information and Control 7(2), 224–254 (1964)MathSciNetCrossRefzbMATHGoogle Scholar
- 20.Vapnik, V.: Estimation of Dependences Based on Empirical Data. Springer, New York (1982)zbMATHGoogle Scholar
- 21.Vapnik, V.: The Nature of Statistical Learning Theory, 2nd edn. Springer, Berlin (2000)CrossRefzbMATHGoogle Scholar
- 22.Veness, J., Ng, K.S., Hutter, M., Uther, W., Silver, D.: A Monte Carlo AIXI approximation. Journal of Artificial Intelligence Research 40, 95–142 (2011)MathSciNetzbMATHGoogle Scholar
- 23.Wallace, C., Boulton, D.: An information measure for classification. The Computer Journal 11(2), 185–194 (1968)CrossRefzbMATHGoogle Scholar
- 24.Wallace, C., Dowe, D.: Minimum message length and Kolmogorov complexity. The Computer Journal 42(4), 270–283 (1999)CrossRefzbMATHGoogle Scholar
- 25.Watanabe, S., Donovan, S.: Knowing and guessing; a quantitative study of inference and information. Wiley, New York (1969)zbMATHGoogle Scholar
- 26.Wolpert, D.: The supervised learning no-free-lunch theorems. In: Proc. 6th Online World Conference on Soft Computing in Industrial Applications, pp. 25–42 (2001)Google Scholar
- 27.Wolpert, D., Macready, W.: No free lunch theorems for optimization. IEEE Transactions on Evolutionary Computation 1(1), 67–82 (1997)CrossRefGoogle Scholar