Putnam’s Diagonal Argument and the Impossibility of a Universal Learning Machine

Sterkenburg, Tom F.

doi:10.1007/s10670-018-9975-x

Putnam’s Diagonal Argument and the Impossibility of a Universal Learning Machine

Published: 21 February 2018

Volume 84, pages 633–656, (2019)
Cite this article

Erkenntnis Aims and scope Submit manuscript

Tom F. Sterkenburg ORCID: orcid.org/0000-0002-4860-727X¹

504 Accesses
8 Citations
4 Altmetric
1 Mention
Explore all metrics

Abstract

Putnam construed the aim of Carnap’s program of inductive logic as the specification of a “universal learning machine,” and presented a diagonal proof against the very possibility of such a thing. Yet the ideas of Solomonoff and Levin lead to a mathematical foundation of precisely those aspects of Carnap’s program that Putnam took issue with, and in particular, resurrect the notion of a universal mechanical rule for induction. In this paper, I take up the question whether the Solomonoff–Levin proposal is successful in this respect. I expose the general strategy to evade Putnam’s argument, leading to a broader discussion of the outer limits of mechanized induction. I argue that this strategy ultimately still succumbs to diagonalization, reinforcing Putnam’s impossibility claim.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Foundations of Artificial Intelligence and Effective Universal Induction

The no-free-lunch theorems of supervised learning

Article Open access 04 June 2021

Connections Between Inductive Inference and Machine Learning

References

Achinstein, P. (1963). Confirmation theory, order, and periodicity. Philosophy of Science, 30, 17–35.
Article Google Scholar
Blackwell, D., & Dubins, L. (1962). Merging of opinion with increasing information. The Annals of Mathematical Statistics, 33, 882–886.
Article Google Scholar
Carnap, R. (1950). Logical foundations of probability. Chicago, IL: The University of Chicago Press.
Google Scholar
Carnap, R. (1963a). Replies and systematic expositions. In Schilpp (1963), pp. 859–1013
Carnap, R. (1963b). Variety, analogy, and periodicity in inductive logic. Philosophy of Science, 30(3), 222–227.
Article Google Scholar
Dawid, A. P. (1985a). Calibration-based empirical probability. The Annals of Statistics, 13(4), 1251–1274.
Article Google Scholar
Dawid, A. P. (1985b). The impossibility of inductive inference. Comment on Oakes (1985). Journal of the American Statistical Association, 80(390), 339.
Article Google Scholar
Diaconis, P. W., & Freedman, D. A. (1986). On the consistency of Bayes estimates. The Annals of Statistics, 14(1), 1–26.
Article Google Scholar
Downey, R. G., & Hirschfeldt, D. R. (2010). Algorithmic randomness and complexity. New York: Springer.
Earman, J. (1992). Bayes or bust? A critical examination of Bayesian confirmation theory. Cambridge, MA: MIT Press.
Gillies, D. A. (2001a). Popper and computer induction. BioEssays, 23, 859–860.
Article Google Scholar
Gillies, D. A. (2001b). Bayesianism and the fixity of the theoretical framework. In D. Corfield & J. Williamson (Eds.), Foundations of Bayesianism (pp. 363–379). Berlin: Springer.
Chapter Google Scholar
Goodman, N. (1946). A query on confirmation. The Journal of Philosophy, 43(14), 383–385.
Article Google Scholar
Goodman, N. (1947). On infirmities of confirmation-theory. Philosophy and Phenomenological Research, 8(1), 149–151.
Article Google Scholar
Hintikka, J. (1965). Towards a theory of inductive generalization. In Y. Bar-Hillel (Eds.), Logic, Methodology and philosophy of science. Proceedings of the 1964 international congress (pp. 274–288). North-Holland, Amsterdam.
Howson, C. (2000). Hume’s problem: Induction and the justification of belief. New York: Oxford University Press.
Book Google Scholar
Huttegger, S. M. (2015). Merging of opinions and probability kinematics. The Review of Symbolic Logic, 8(4), 611–648.
Article Google Scholar
Hutter, M. (2003). Convergence and loss bounds for Bayesian sequence prediction. IEEE Transactions on Information Theory, 49(8), 2061–2067.
Article Google Scholar
Hutter, M. (2007). On universal prediction and Bayesian confirmation. Theoretical Computer Science, 384(1), 33–48.
Article Google Scholar
Kelly, K. T. (2004). Learning theory and epistemology. In I. Niiniluoto, M. Sintonen, J. Woleński (Eds.), Handbook of epistemology (pp. 183–203). Kluwer, Dordrecht, Page numbers refer to reprint in H. Arló-Costa, V. F. Hendricks, J. F. A. K. van Benthem (Eds.), (2016). Readings in formal epistemology.
Kelly, K. T., Juhl, C. F., & Glymour, C. (1994). Reliability, realism, and relativism. In P. Clark & B. Hale (Eds.), Reading Putnam (pp. 98–160). Oxford: Blackwell.
Google Scholar
Leike, J., & Hutter, M. (2015). On the computability of Solomonoff induction and knowledge-seeking. In K. Chaudhuri, C. Gentile, S. Zilles (Eds.), Algorithmic learning theory: proceedings of the twenty-sixth international conference (ALT 2015) (pp. 364–378). Springer.
Levin, L. A. (2010). Some theorems on the algorithmic approach to probability theory and information theory. Annals of Pure and Applied Logic, 162, 224–235. Translation of PhD dissertation, 1971. Russia: Moscow State University.
Li, M., & Vitányi, P. M. B. (2008). An introduction to Kolmogorov complexity and its applications (3rd ed.). New York: Springer.
Book Google Scholar
Nies, A. (2009). Computability and randomness. Oxford: Oxford University Press.
Book Google Scholar
Oakes, D. (1985). Self-calibrating priors do not exist. Journal of the American Statistical Association, 80(390), 340–341.
Article Google Scholar
Poland, J., & Hutter, M. (2005). Asymptotics of discrete MDL for online prediction. IEEE Transactions on Information Theory, 51(11), 3780–3795.
Article Google Scholar
Putnam, H. (1963a) Degree of confirmation’ and inductive logic. In Schilpp (1963), pp. 761–783. Reprinted in Putnam (1975), pp. 270–292.
Putnam, H. (1963b). Probability and confirmation. In The voice of America forum lectures. U.S. Information Agency, Washington, D.C., Page numbers refer to reprint in Putnam (1975), pp. 293–304.
Putnam, H. (1974). The ‘corroboration’ of theories. In P. A. Schilpp (Ed.), The philosophy of Karl Popper, Book I. The Library of Living Philosophers (Vol. 14, pp. 221–240). Open Court, LaSalle, IL, Reprinted in Putnam (1975), pp. 250–269.
Putnam, H. (1975). Mathematics, matter, and method. Cambridge: Cambridge University Press.
Google Scholar
Rathmanner, S., & Hutter, M. (2011). A philosophical treatise of universal induction. Entropy, 13(6), 1076–1136.
Article Google Scholar
Reichenbach, H. (1933). Die logischen Grundlagen des Wahrscheinlichkeitsbegriffs. Erkenntnis, 3, 401–425.
Article Google Scholar
Reichenbach, H. (1935). Wahrscheinlichkeitslehre: eine Untersuchung Über die Logischen und Mathematischen Grundlagen der Wahrscheinlichkeitsrechnung. Leiden: Sijthoff.
Google Scholar
Reichenbach, H. (1938). Experience and prediction. Chicago, IL: University of Chicago Press.
Google Scholar
Reimann, J. (2009). Randomness—Beyond Lebesgue measure. In S. B. Cooper, H. Geuvers, A. Pillay, & J. Väänänen (Eds.), Logic colloquium 2006 (pp. 247–279). Chicago, IL: Association for Symbolic Logic.
Chapter Google Scholar
Romeijn, J.-W. (2004). Hypotheses and inductive predictions. Synthese, 141(3), 333–364.
Article Google Scholar
Salmon, W. C. (1967). The foundations of scientific inference. Pittsburgh, PA: University of Pittsburgh Press.
Book Google Scholar
Salmon, W. C. (1991). Hans Reichenbach’s vindication of induction. Erkenntnis, 35, 99–122.
Google Scholar
Schervish, M. J. (1985). Comment on Dawid (1985a). The Annals of Statistics, 13(4), 1274–1282.
Article Google Scholar
Schilpp, P. A. (Ed.). (1963). The philosophy of Rudolf Carnap. The library of living philosophers (Vol. 11). LaSalle, IL: Open Court.
Google Scholar
Shen, A. K., Uspensky, V. A., & Vereshchagin, N. K. (2017). Kolmogorov complexity and algorithmic randomness. Providence, RI: American Mathematical Society.
Google Scholar
Skyrms, B. (1991). Carnapian inductive logic for Markov chains. Erkenntnis, 35, 439–460.
Google Scholar
Skyrms, B. (1996). Carnapian inductive logic and Bayesian statistics. In T. Ferguson, L. Shapley, & J. MacQueen (Eds.), Statistics, probability and game theory: Papers in honor of David Blackwell (pp. 321–336). Beachwood: Institute of Mathematical Statistics.
Chapter Google Scholar
Soare, R. I. (2016). Turing computability: Theory and applications. New York: Springer.
Google Scholar
Solomonoff, R. J. (1964). A formal theory of inductive inference. Parts I and II. Information and Control, 7(1–22), 224–254.
Article Google Scholar
Solomonoff, R. J. (1978). Complexity-based induction systems: Comparisons and convergence theorems. IEEE Transactions on Information Theory, 24(4), 422–432.
Article Google Scholar
Sterkenburg, T. F. (2016). Solomonoff prediction and Occam’s razor. Philosophy of Science, 83(4), 459–479.
Article Google Scholar
Tao, T. (2011). An introduction to measure theory. Providence, RI: American Mathematical Society.
Google Scholar
Turing, A. M. (1936). On computable numbers, with an application to the Entscheidungsproblem. Proceedings of the London Mathematical Society, 2(42), 230–265.
Google Scholar
van Fraassen, B. C. (1989). Laws and symmetry. Oxford: Clarendon Press.
Book Google Scholar
van Fraassen, B. C. (2000). The false hopes of traditional epistemology. Philosophy and Phenomenological Research, 60(2), 253–280.
Article Google Scholar
Zvonkin, A. K., & Levin, L. A. (1970). The complexity of finite objects and the development of the concepts of information and randomness by means of the theory of algorithms. Russian Mathematical Surveys, 26(6), 83–124. Translation of the Russian original. Uspekhi Matematicheskikh Nauk, 25(6), 85–127, 1970.

Download references

Author information

Authors and Affiliations

Munich Center for Mathematical Philosophy, LMU Munich, Geschwister-Scholl-Platz 1, 80539, Munich, Germany
Tom F. Sterkenburg

Authors

Tom F. Sterkenburg
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tom F. Sterkenburg.

Additional information

This paper was written while I was with the Machine Learning group, Centrum Wiskunde & Informatica, Amsterdam, and the Faculty of Philosophy, University of Groningen. I want to thank Peter Grünwald, Wouter Koolen, Jan Leike, and Nishant Mehta for helpful discussions, Jan-Willem Romeijn for valuable advice on earlier versions of the paper, and finally the anonymous reviewers for their careful comments, which did much to improve it.

Appendix

Theorem 2 is in the literature (Li and Vitányi 2008, 352ff; Hutter 2003, 2062; Poland and Hutter 2005, 3781) usually presented as a consequence of (variations of) the following stronger result, first shown by Solomonoff (1978, 426f). Let us introduce as a measure of the divergence between two distributions $P_1$ and $P_2$ over $\{0,1\}$ the squared Hellinger distance

$$\begin{aligned} H(P_1,P_2) := \sum _{x \in \{0,1\}}\left( \sqrt{P_1(x)}-\sqrt{P_2(x)}\right) ^2. \end{aligned}$$

(1)

Then, for every $\mu \in \Delta _1$, the expected infinite sum of divergences between $Q_U$ and $\mu $

$$\begin{aligned} {{\mathrm{{\mathbf {E}}}}}_{X^\omega \sim \mu }\left[ \sum _{n=0}^\infty H\left( \mu (\cdot \mid X^n),Q_U(\cdot \mid X^n) \right) \right] \end{aligned}$$

(2)

is bounded by a constant.

To see how $(\text {I: }\Delta _1)$ follows from this constant bound, suppose that $Q_U$ does not satisfy $(\text {I: }\Delta _1)$: there is a $\mu \in \Delta _1$ such that with probability $\epsilon >0$ there is a $\delta > 0$ such that $\left| \mu (x_{n+1} \mid \pmb {x}^n)-Q_U(x_{n+1} \mid \pmb {x}^n)\right| >\delta $ infinitely often. But that means that with positive probability the infinite sum of squared Hellinger distances is infinite, and the expectation (2) cannot be bounded by a constant.

The proof of the constant bound on (2) starts with the fact that the distance $H(P_1,P_2)$ is bounded by the Kullback-Leibler divergence

$$\begin{aligned} D(P_1 \parallel P_2) := {{\mathrm{{\mathbf {E}}}}}_{X \sim P_1}\left[ -\log \frac{P_2(X)}{P_1(X)} \right] . \end{aligned}$$

(3)

The term $-\log P(\pmb {x})$ expresses the logarithmic loss of P on sequence $\pmb {x}$, a standard measure of prediction error; the difference $-\log P_2(\pmb {x})-\left( -\log P_1(\pmb {x})\right) =-\log \frac{P_2(\pmb {x})}{P_1(\pmb {x})}$ expresses the surplus prediction error or regret of $P_2$ relative to $P_1$ on sequence $\pmb {x}$. Thus the Kullback-Leibler divergence (3) expresses the P₁-expected regret of $P_2$ relative to $P_1$.

Using $H(P_1,P_2) \le D(P_1 \parallel P_2)$ one can work out that (2) is bounded by

$$\begin{aligned} {{\mathrm{{\mathbf {E}}}}}_{X^\omega \sim \mu }\left[ \sum _{n=0}^\infty -\log \frac{Q_U(X_{n+1}\mid X^n)}{\mu (X_{n+1}\mid X^n)}\right] . \end{aligned}$$

(4)

Now by the universality of $Q_U$ in the class of $\Sigma _1$ measures we know that $Q_U$ majorizes $\mu $: for every finite $\pmb {x}$ there is a constant $c \in [0,1]$ such that $Q_U(\pmb {x}) \ge c \cdot \mu (\pmb {x})$. Indeed we can identify c with $w(\mu )$, where w is the prior over hypothesis class ${{\mathcal {H}}_{\Sigma _1}}$ in the classical Bayesian representation $\xi ^{\Sigma _1}_w$ of $Q_U$. This fact allows us to derive that for every sequence$\pmb {x}^m$of any lengthm

$$\begin{aligned} \sum _{n=0}^{m-1} -\log \frac{Q_U(x_{n+1}\mid \pmb {x}^n)}{\mu (x_{n+1}\mid \pmb {x}^n)}&= -\log \prod _{n=0}^{m-1} \frac{Q_U(x_{n+1}\mid \pmb {x}^n)}{\mu (x_{n+1}\mid \pmb {x}^n)} \nonumber \\&= -\log \frac{Q_U(\pmb {x}^m)}{\mu (\pmb {x}^m)} \nonumber \\&\le -\log w(\mu ). \end{aligned}$$

(5)

This concludes the proof that (2) is bounded by a constant: since the bound (5) holds for any individual sequence of any length, it also holds for (4) and thus for (2).

The absolute optimality property mentioned in Sect. 8 is just this individual sequence bound (5), which continues to hold for $\nu $ that are $\Sigma _1$. To reformulate, for any such $\nu $, the sum of surplus prediction errors (regrets) of $Q_U$ relative to $\nu $ will always (for any sequence $\pmb {x}^m$ of any length m) be bounded by a constant:

$$\begin{aligned} \sum _{n=0}^{m-1} \left( - \log Q_U(x_{n+1} \mid \pmb {x}^n) - \left( - \log \nu (x_{n+1} \mid \pmb {x}^n)\right) \right) \le -\log w(\nu ). \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sterkenburg, T.F. Putnam’s Diagonal Argument and the Impossibility of a Universal Learning Machine. Erkenn 84, 633–656 (2019). https://doi.org/10.1007/s10670-018-9975-x

Download citation

Received: 02 February 2017
Accepted: 30 January 2018
Published: 21 February 2018
Issue Date: 15 June 2019
DOI: https://doi.org/10.1007/s10670-018-9975-x

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Putnam’s Diagonal Argument and the Impossibility of a Universal Learning Machine

Abstract

Access this article

Similar content being viewed by others

Foundations of Artificial Intelligence and Effective Universal Induction

The no-free-lunch theorems of supervised learning

Connections Between Inductive Inference and Machine Learning

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

Rights and permissions

About this article

Cite this article

Navigation

Putnam’s Diagonal Argument and the Impossibility of a Universal Learning Machine

Abstract

Access this article

Similar content being viewed by others

Foundations of Artificial Intelligence and Effective Universal Induction

The no-free-lunch theorems of supervised learning

Connections Between Inductive Inference and Machine Learning

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation