There is only limited value in knowledge derived from experience. The knowledge imposes a pattern, and falsifies, for the pattern is new in every moment.

T.S. Eliot.

## Abstract

The important recent book by Schurz (2019) appreciates that the no-free-lunch theorems (NFL) have major implications for the problem of (meta) induction. Here I review the NFL theorems, emphasizing that they do not only concern the case where there is a uniform prior—they prove that there are “as many priors” (loosely speaking) for which any induction algorithm *A* out-generalizes some induction algorithm *B* as vice-versa. Importantly though, in addition to the NFL theorems, there are many *free lunch* theorems. In particular, the NFL theorems can only be used to compare the expected performance of an induction algorithm *A*, considered in isolation, with the expected performance of an induction algorithm *B*, considered in isolation. There is a rich set of free lunches which instead concern the statistical *correlations* among the generalization errors of induction algorithms. As I describe, the meta-induction algorithms that Schurz advocates as a “solution to Hume’s problem” are simply examples of such a free lunch based on correlations among the generalization errors of induction algorithms. I end by pointing out that the prior that Schurz advocates, which is uniform over bit frequencies rather than bit patterns, is contradicted by thousands of experiments in statistical physics and by the great success of the maximum entropy procedure in inductive inference.

This is a preview of subscription content, access via your institution.

## Notes

To see this relationship, note that cross-validation chooses among a set of learning algorithms (rather than theories), and does so according to which of those performs best at out-of-sample prediction (evaluating that performance by forming “folds” of the single provided data set).

As an historical aside, it’s interesting to note that Parrondo went on to make some of the seminal contributions to stochastic thermodynamics and non-equilibrium statistical physics (Parrondo 2015).

The interested reader is directed to Adam et al. (2019) for further discussion reconciling the NFL theorems and computational learning theory.

It is also true if we condition on a particular one of the two allowed

*f*’s, as in sampling theory statistics, in which case the prior is irrelevant, and NFL does not apply.

## References

Adam, S. P., Alexandropoulos, S. A. N., Pardalos, P. M., & Vrahatis, M. N. (2019). No free lunch theorem: A review. In

*Approximation and Optimization*(pp. 57–82). Springer.Breiman, L. (1996). Stacked regression.

*Machine Learning, 24*.Cesa-Bianchi, N., Long, P. M., & Warmuth, M. K. (1996). Worst-case quadratic loss bounds for prediction using linear functions and gradient descent.

*IEEE Transactions on Neural Networks, 7*(3), 604–619.Cesa-Bianchi, N., Freund, Y., Haussler, D., Helmbold, D. P., Schapire, R. E., & Warmuth, M. K. (1997). How to use expert advice.

*Journal of the ACM (JACM), 44*(3), 427–485.Cesa-Bianchi, N., & Lugosi, G. (2006).

*Prediction, learning, and games*. Cambridge University Press.Christopher, M. (2006).

*Bishop pattern recognition and machine learning*. Springer.Clarke, B. (2003). Bayes model averaging and stacking when model approximation error cannot be ignored.

*Journal of Machine Learning Research*, 683–712.Ghasemian, A., Hosseinmardi, H., Galstyan, A., Airoldi, E. M., & Clauset, A. (2020). Stacking models for nearly optimal link prediction in complex networks.

*Proceedings of the National Academy of Sciences, 117*(38), 23393–23400.Guimerà, R. (2020). One model to rule them all in network science?

*Proceedings of the National Academy of Sciences, 117*(41), 25195–25197.Hans, R., et al. (1938). Experience and prediction: An analysis of the foundations and the structure of knowledge.

Harmer, G. P., & Abbott, D. (1999). Losing strategies can win by parrondo’s paradox.

*Nature, 402*(6764), 864–864.Hume, D. (2003).

*A treatise of human nature*. Courier Corporation.Jaynes, E. T., & Bretthorst, G. L. (2003).

*Probability theory: The logic of science*. Cambridge University Press.Jaynes, E. T. (1968). Prior probabilities.

*IEEE Transactions on systems science and cybernetics, 4*(3), 227–241.Kohavi, R., & Wolpert, D. H. (1996). Bias plus variance decomposition for zero-one loss functions. In

*ICML*(Vol. 96, pp. 275–283).Kroese, D. P., Botev, Z., Taimre, T., & Vaisman, R. (2019).

*Data science and machine learning: mathematical and statistical methods*. CRC Press.Parrondo, J. M. R., Horowitz, J. M., & Sagawa, T. (2015). Thermodynamics of information.

*Nature Physics, 11*(2), 131–139.Parrondo, J. M. R. & Español, P. (1996). Criticism of feynman’s analysis of the ratchet as an engine.

*American Journal of Physics, 64*(9), 1125–1130.Peel, L., Larremore, D. B., & Clauset, A. (2017). The ground truth about metadata and community detection in networks.

*Science Advances, 3*(5), e1602548.Rubinstein, R. Y., & Kroese, D. P. (2016).

*Simulation and the Monte Carlo method*(Vol. 10). Wiley.Schurz, G. (2019).

*Hume’s problem solved: The optimality of meta-induction*. MIT Press.Smyth, P., & Wolpert, D. (1999). Linearly combining density estimators via stacking.

*Machine Learning, 36*(1–2), 59–83.Tom, F. (2019).

*Sterkenburg. The meta-inductive justification of induction.*Episteme.Tracey, B., Wolpert, D., & Alonso, J. J. (2013). Using supervised learning to improve monte carlo integral estimation.

*AIAA Journal, 51*(8), 2015–2023.Wolpert, D. H. (1995). The relationship between PAC, the statistical physics framework, the Bayesian framework, and the VC framework. In

*The mathematics of generalization*(pp. 117–215). Addison-Wesley.Wolpert, D. H. (1990). The relationship between occam’s razor and convergent guessing.

*Complex Systems, 4*, 319–368.Wolpert, D. H. (1996a). The lack of a prior distinctions between learning algorithms.

*Neural Computation, 8*(1341–1390), 1391–1421.Wolpert, D. H. (1996b). The existence of a priori distinctions between learning algorithms.

*Neural Computation, 8*, 1391–1420.Wolpert, D. H. (1997). On bias plus variance.

*Neural Computation, 9*, 1211–1244.Wolpert, D. H., & Macready, W. G. (1997). No free lunch theorems for optimization.

*IEEE Transactions on Evolutionary Computation, 1*(1), 67–82.Wolpert, D. H., & Macready, W. (2005). Coevolutionary free lunches.

*Transactions on Evolutionary Computation, 9*, 721–735.Wolpert, D., & Rajnarayan, D. (2013). Using machine learning to improve stochastic optimization. In

*Workshops at the twenty-seventh AAAI conference on artificial intelligence*.Yao, Y., Vehtari, A., Simpson, D., & Gelman, A. (2018). Using stacking to average bayesian predictive distributions (with discussion).

*Bayesian Analysis, 13*(3), 917–1007.

## Acknowledgements

I would like to thank the Santa Fe Institute for support.

## Author information

### Authors and Affiliations

### Corresponding author

## Additional information

### Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Rights and permissions

## About this article

### Cite this article

Wolpert, D.H. The Implications of the No-Free-Lunch Theorems for Meta-induction.
*J Gen Philos Sci* (2023). https://doi.org/10.1007/s10838-022-09609-2

Accepted:

Published:

DOI: https://doi.org/10.1007/s10838-022-09609-2