Skip to main content
Log in

Putnam’s Diagonal Argument and the Impossibility of a Universal Learning Machine

  • Published:
Erkenntnis Aims and scope Submit manuscript

Abstract

Putnam construed the aim of Carnap’s program of inductive logic as the specification of a “universal learning machine,” and presented a diagonal proof against the very possibility of such a thing. Yet the ideas of Solomonoff and Levin lead to a mathematical foundation of precisely those aspects of Carnap’s program that Putnam took issue with, and in particular, resurrect the notion of a universal mechanical rule for induction. In this paper, I take up the question whether the Solomonoff–Levin proposal is successful in this respect. I expose the general strategy to evade Putnam’s argument, leading to a broader discussion of the outer limits of mechanized induction. I argue that this strategy ultimately still succumbs to diagonalization, reinforcing Putnam’s impossibility claim.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Achinstein, P. (1963). Confirmation theory, order, and periodicity. Philosophy of Science, 30, 17–35.

    Article  Google Scholar 

  • Blackwell, D., & Dubins, L. (1962). Merging of opinion with increasing information. The Annals of Mathematical Statistics, 33, 882–886.

    Article  Google Scholar 

  • Carnap, R. (1950). Logical foundations of probability. Chicago, IL: The University of Chicago Press.

    Google Scholar 

  • Carnap, R. (1963a). Replies and systematic expositions. In Schilpp (1963), pp. 859–1013

  • Carnap, R. (1963b). Variety, analogy, and periodicity in inductive logic. Philosophy of Science, 30(3), 222–227.

    Article  Google Scholar 

  • Dawid, A. P. (1985a). Calibration-based empirical probability. The Annals of Statistics, 13(4), 1251–1274.

    Article  Google Scholar 

  • Dawid, A. P. (1985b). The impossibility of inductive inference. Comment on Oakes (1985). Journal of the American Statistical Association, 80(390), 339.

    Article  Google Scholar 

  • Diaconis, P. W., & Freedman, D. A. (1986). On the consistency of Bayes estimates. The Annals of Statistics, 14(1), 1–26.

    Article  Google Scholar 

  • Downey, R. G., & Hirschfeldt, D. R. (2010). Algorithmic randomness and complexity. New York: Springer.

  • Earman, J. (1992). Bayes or bust? A critical examination of Bayesian confirmation theory. Cambridge, MA: MIT Press.

  • Gillies, D. A. (2001a). Popper and computer induction. BioEssays, 23, 859–860.

    Article  Google Scholar 

  • Gillies, D. A. (2001b). Bayesianism and the fixity of the theoretical framework. In D. Corfield & J. Williamson (Eds.), Foundations of Bayesianism (pp. 363–379). Berlin: Springer.

    Chapter  Google Scholar 

  • Goodman, N. (1946). A query on confirmation. The Journal of Philosophy, 43(14), 383–385.

    Article  Google Scholar 

  • Goodman, N. (1947). On infirmities of confirmation-theory. Philosophy and Phenomenological Research, 8(1), 149–151.

    Article  Google Scholar 

  • Hintikka, J. (1965). Towards a theory of inductive generalization. In Y. Bar-Hillel (Eds.), Logic, Methodology and philosophy of science. Proceedings of the 1964 international congress (pp. 274–288). North-Holland, Amsterdam.

  • Howson, C. (2000). Hume’s problem: Induction and the justification of belief. New York: Oxford University Press.

    Book  Google Scholar 

  • Huttegger, S. M. (2015). Merging of opinions and probability kinematics. The Review of Symbolic Logic, 8(4), 611–648.

    Article  Google Scholar 

  • Hutter, M. (2003). Convergence and loss bounds for Bayesian sequence prediction. IEEE Transactions on Information Theory, 49(8), 2061–2067.

    Article  Google Scholar 

  • Hutter, M. (2007). On universal prediction and Bayesian confirmation. Theoretical Computer Science, 384(1), 33–48.

    Article  Google Scholar 

  • Kelly, K. T. (2004). Learning theory and epistemology. In I. Niiniluoto, M. Sintonen, J. Woleński (Eds.), Handbook of epistemology (pp. 183–203). Kluwer, Dordrecht, Page numbers refer to reprint in H. Arló-Costa, V. F. Hendricks, J. F. A. K. van Benthem (Eds.), (2016). Readings in formal epistemology.

  • Kelly, K. T., Juhl, C. F., & Glymour, C. (1994). Reliability, realism, and relativism. In P. Clark & B. Hale (Eds.), Reading Putnam (pp. 98–160). Oxford: Blackwell.

    Google Scholar 

  • Leike, J., & Hutter, M. (2015). On the computability of Solomonoff induction and knowledge-seeking. In K. Chaudhuri, C. Gentile, S. Zilles (Eds.), Algorithmic learning theory: proceedings of the twenty-sixth international conference (ALT 2015) (pp. 364–378). Springer.

  • Levin, L. A. (2010). Some theorems on the algorithmic approach to probability theory and information theory. Annals of Pure and Applied Logic, 162, 224–235. Translation of PhD dissertation, 1971. Russia: Moscow State University.

  • Li, M., & Vitányi, P. M. B. (2008). An introduction to Kolmogorov complexity and its applications (3rd ed.). New York: Springer.

    Book  Google Scholar 

  • Nies, A. (2009). Computability and randomness. Oxford: Oxford University Press.

    Book  Google Scholar 

  • Oakes, D. (1985). Self-calibrating priors do not exist. Journal of the American Statistical Association, 80(390), 340–341.

    Article  Google Scholar 

  • Poland, J., & Hutter, M. (2005). Asymptotics of discrete MDL for online prediction. IEEE Transactions on Information Theory, 51(11), 3780–3795.

    Article  Google Scholar 

  • Putnam, H. (1963a) Degree of confirmation’ and inductive logic. In Schilpp (1963), pp. 761–783. Reprinted in Putnam (1975), pp. 270–292.

  • Putnam, H. (1963b). Probability and confirmation. In The voice of America forum lectures. U.S. Information Agency, Washington, D.C., Page numbers refer to reprint in Putnam (1975), pp. 293–304.

  • Putnam, H. (1974). The ‘corroboration’ of theories. In P. A. Schilpp (Ed.), The philosophy of Karl Popper, Book I. The Library of Living Philosophers (Vol. 14, pp. 221–240). Open Court, LaSalle, IL, Reprinted in Putnam (1975), pp. 250–269.

  • Putnam, H. (1975). Mathematics, matter, and method. Cambridge: Cambridge University Press.

    Google Scholar 

  • Rathmanner, S., & Hutter, M. (2011). A philosophical treatise of universal induction. Entropy, 13(6), 1076–1136.

    Article  Google Scholar 

  • Reichenbach, H. (1933). Die logischen Grundlagen des Wahrscheinlichkeitsbegriffs. Erkenntnis, 3, 401–425.

    Article  Google Scholar 

  • Reichenbach, H. (1935). Wahrscheinlichkeitslehre: eine Untersuchung Über die Logischen und Mathematischen Grundlagen der Wahrscheinlichkeitsrechnung. Leiden: Sijthoff.

    Google Scholar 

  • Reichenbach, H. (1938). Experience and prediction. Chicago, IL: University of Chicago Press.

    Google Scholar 

  • Reimann, J. (2009). Randomness—Beyond Lebesgue measure. In S. B. Cooper, H. Geuvers, A. Pillay, & J. Väänänen (Eds.), Logic colloquium 2006 (pp. 247–279). Chicago, IL: Association for Symbolic Logic.

    Chapter  Google Scholar 

  • Romeijn, J.-W. (2004). Hypotheses and inductive predictions. Synthese, 141(3), 333–364.

    Article  Google Scholar 

  • Salmon, W. C. (1967). The foundations of scientific inference. Pittsburgh, PA: University of Pittsburgh Press.

    Book  Google Scholar 

  • Salmon, W. C. (1991). Hans Reichenbach’s vindication of induction. Erkenntnis, 35, 99–122.

    Google Scholar 

  • Schervish, M. J. (1985). Comment on Dawid (1985a). The Annals of Statistics, 13(4), 1274–1282.

    Article  Google Scholar 

  • Schilpp, P. A. (Ed.). (1963). The philosophy of Rudolf Carnap. The library of living philosophers (Vol. 11). LaSalle, IL: Open Court.

    Google Scholar 

  • Shen, A. K., Uspensky, V. A., & Vereshchagin, N. K. (2017). Kolmogorov complexity and algorithmic randomness. Providence, RI: American Mathematical Society.

    Google Scholar 

  • Skyrms, B. (1991). Carnapian inductive logic for Markov chains. Erkenntnis, 35, 439–460.

    Google Scholar 

  • Skyrms, B. (1996). Carnapian inductive logic and Bayesian statistics. In T. Ferguson, L. Shapley, & J. MacQueen (Eds.), Statistics, probability and game theory: Papers in honor of David Blackwell (pp. 321–336). Beachwood: Institute of Mathematical Statistics.

    Chapter  Google Scholar 

  • Soare, R. I. (2016). Turing computability: Theory and applications. New York: Springer.

    Google Scholar 

  • Solomonoff, R. J. (1964). A formal theory of inductive inference. Parts I and II. Information and Control, 7(1–22), 224–254.

    Article  Google Scholar 

  • Solomonoff, R. J. (1978). Complexity-based induction systems: Comparisons and convergence theorems. IEEE Transactions on Information Theory, 24(4), 422–432.

    Article  Google Scholar 

  • Sterkenburg, T. F. (2016). Solomonoff prediction and Occam’s razor. Philosophy of Science, 83(4), 459–479.

    Article  Google Scholar 

  • Tao, T. (2011). An introduction to measure theory. Providence, RI: American Mathematical Society.

    Google Scholar 

  • Turing, A. M. (1936). On computable numbers, with an application to the Entscheidungsproblem. Proceedings of the London Mathematical Society, 2(42), 230–265.

    Google Scholar 

  • van Fraassen, B. C. (1989). Laws and symmetry. Oxford: Clarendon Press.

    Book  Google Scholar 

  • van Fraassen, B. C. (2000). The false hopes of traditional epistemology. Philosophy and Phenomenological Research, 60(2), 253–280.

    Article  Google Scholar 

  • Zvonkin, A. K., & Levin, L. A. (1970). The complexity of finite objects and the development of the concepts of information and randomness by means of the theory of algorithms. Russian Mathematical Surveys, 26(6), 83–124. Translation of the Russian original. Uspekhi Matematicheskikh Nauk, 25(6), 85–127, 1970.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tom F. Sterkenburg.

Additional information

This paper was written while I was with the Machine Learning group, Centrum Wiskunde & Informatica, Amsterdam, and the Faculty of Philosophy, University of Groningen. I want to thank Peter Grünwald, Wouter Koolen, Jan Leike, and Nishant Mehta for helpful discussions, Jan-Willem Romeijn for valuable advice on earlier versions of the paper, and finally the anonymous reviewers for their careful comments, which did much to improve it.

Appendix

Appendix

Theorem 2 is in the literature (Li and Vitányi 2008, 352ff; Hutter 2003, 2062; Poland and Hutter 2005, 3781) usually presented as a consequence of (variations of) the following stronger result, first shown by Solomonoff (1978, 426f). Let us introduce as a measure of the divergence between two distributions \(P_1\) and \(P_2\) over \(\{0,1\}\) the squared Hellinger distance

$$\begin{aligned} H(P_1,P_2) := \sum _{x \in \{0,1\}}\left( \sqrt{P_1(x)}-\sqrt{P_2(x)}\right) ^2. \end{aligned}$$
(1)

Then, for every \(\mu \in \Delta _1\), the expected infinite sum of divergences between \(Q_U\) and \(\mu \)

$$\begin{aligned} {{\mathrm{{\mathbf {E}}}}}_{X^\omega \sim \mu }\left[ \sum _{n=0}^\infty H\left( \mu (\cdot \mid X^n),Q_U(\cdot \mid X^n) \right) \right] \end{aligned}$$
(2)

is bounded by a constant.

To see how \((\text {I: }\Delta _1)\) follows from this constant bound, suppose that \(Q_U\) does not satisfy \((\text {I: }\Delta _1)\): there is a \(\mu \in \Delta _1\) such that with probability \(\epsilon >0\) there is a \(\delta > 0\) such that \(\left| \mu (x_{n+1} \mid \pmb {x}^n)-Q_U(x_{n+1} \mid \pmb {x}^n)\right| >\delta \) infinitely often. But that means that with positive probability the infinite sum of squared Hellinger distances is infinite, and the expectation (2) cannot be bounded by a constant.

The proof of the constant bound on (2) starts with the fact that the distance \(H(P_1,P_2)\) is bounded by the Kullback-Leibler divergence

$$\begin{aligned} D(P_1 \parallel P_2) := {{\mathrm{{\mathbf {E}}}}}_{X \sim P_1}\left[ -\log \frac{P_2(X)}{P_1(X)} \right] . \end{aligned}$$
(3)

The term \(-\log P(\pmb {x})\) expresses the logarithmic loss of P on sequence \(\pmb {x}\), a standard measure of prediction error; the difference \(-\log P_2(\pmb {x})-\left( -\log P_1(\pmb {x})\right) =-\log \frac{P_2(\pmb {x})}{P_1(\pmb {x})}\) expresses the surplus prediction error or regret of \(P_2\) relative to \(P_1\) on sequence \(\pmb {x}\). Thus the Kullback-Leibler divergence (3) expresses the P1-expected regret of \(P_2\) relative to \(P_1\).

Using \(H(P_1,P_2) \le D(P_1 \parallel P_2)\) one can work out that (2) is bounded by

$$\begin{aligned} {{\mathrm{{\mathbf {E}}}}}_{X^\omega \sim \mu }\left[ \sum _{n=0}^\infty -\log \frac{Q_U(X_{n+1}\mid X^n)}{\mu (X_{n+1}\mid X^n)}\right] . \end{aligned}$$
(4)

Now by the universality of \(Q_U\) in the class of \(\Sigma _1\) measures we know that \(Q_U\) majorizes \(\mu \): for every finite \(\pmb {x}\) there is a constant \(c \in [0,1]\) such that \(Q_U(\pmb {x}) \ge c \cdot \mu (\pmb {x})\). Indeed we can identify c with \(w(\mu )\), where w is the prior over hypothesis class \({{\mathcal {H}}_{\Sigma _1}}\) in the classical Bayesian representation \(\xi ^{\Sigma _1}_w\) of \(Q_U\). This fact allows us to derive that for every sequence\(\pmb {x}^m\)of any lengthm

$$\begin{aligned} \sum _{n=0}^{m-1} -\log \frac{Q_U(x_{n+1}\mid \pmb {x}^n)}{\mu (x_{n+1}\mid \pmb {x}^n)}&= -\log \prod _{n=0}^{m-1} \frac{Q_U(x_{n+1}\mid \pmb {x}^n)}{\mu (x_{n+1}\mid \pmb {x}^n)} \nonumber \\&= -\log \frac{Q_U(\pmb {x}^m)}{\mu (\pmb {x}^m)} \nonumber \\&\le -\log w(\mu ). \end{aligned}$$
(5)

This concludes the proof that (2) is bounded by a constant: since the bound (5) holds for any individual sequence of any length, it also holds for (4) and thus for (2).

The absolute optimality property mentioned in Sect. 8 is just this individual sequence bound (5), which continues to hold for \(\nu \) that are \(\Sigma _1\). To reformulate, for any such \(\nu \), the sum of surplus prediction errors (regrets) of \(Q_U\) relative to \(\nu \) will always (for any sequence \(\pmb {x}^m\) of any length m) be bounded by a constant:

$$\begin{aligned} \sum _{n=0}^{m-1} \left( - \log Q_U(x_{n+1} \mid \pmb {x}^n) - \left( - \log \nu (x_{n+1} \mid \pmb {x}^n)\right) \right) \le -\log w(\nu ). \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sterkenburg, T.F. Putnam’s Diagonal Argument and the Impossibility of a Universal Learning Machine. Erkenn 84, 633–656 (2019). https://doi.org/10.1007/s10670-018-9975-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10670-018-9975-x

Navigation