Abstract
Putnam construed the aim of Carnap’s program of inductive logic as the specification of a “universal learning machine,” and presented a diagonal proof against the very possibility of such a thing. Yet the ideas of Solomonoff and Levin lead to a mathematical foundation of precisely those aspects of Carnap’s program that Putnam took issue with, and in particular, resurrect the notion of a universal mechanical rule for induction. In this paper, I take up the question whether the Solomonoff–Levin proposal is successful in this respect. I expose the general strategy to evade Putnam’s argument, leading to a broader discussion of the outer limits of mechanized induction. I argue that this strategy ultimately still succumbs to diagonalization, reinforcing Putnam’s impossibility claim.
Similar content being viewed by others
References
Achinstein, P. (1963). Confirmation theory, order, and periodicity. Philosophy of Science, 30, 17–35.
Blackwell, D., & Dubins, L. (1962). Merging of opinion with increasing information. The Annals of Mathematical Statistics, 33, 882–886.
Carnap, R. (1950). Logical foundations of probability. Chicago, IL: The University of Chicago Press.
Carnap, R. (1963a). Replies and systematic expositions. In Schilpp (1963), pp. 859–1013
Carnap, R. (1963b). Variety, analogy, and periodicity in inductive logic. Philosophy of Science, 30(3), 222–227.
Dawid, A. P. (1985a). Calibration-based empirical probability. The Annals of Statistics, 13(4), 1251–1274.
Dawid, A. P. (1985b). The impossibility of inductive inference. Comment on Oakes (1985). Journal of the American Statistical Association, 80(390), 339.
Diaconis, P. W., & Freedman, D. A. (1986). On the consistency of Bayes estimates. The Annals of Statistics, 14(1), 1–26.
Downey, R. G., & Hirschfeldt, D. R. (2010). Algorithmic randomness and complexity. New York: Springer.
Earman, J. (1992). Bayes or bust? A critical examination of Bayesian confirmation theory. Cambridge, MA: MIT Press.
Gillies, D. A. (2001a). Popper and computer induction. BioEssays, 23, 859–860.
Gillies, D. A. (2001b). Bayesianism and the fixity of the theoretical framework. In D. Corfield & J. Williamson (Eds.), Foundations of Bayesianism (pp. 363–379). Berlin: Springer.
Goodman, N. (1946). A query on confirmation. The Journal of Philosophy, 43(14), 383–385.
Goodman, N. (1947). On infirmities of confirmation-theory. Philosophy and Phenomenological Research, 8(1), 149–151.
Hintikka, J. (1965). Towards a theory of inductive generalization. In Y. Bar-Hillel (Eds.), Logic, Methodology and philosophy of science. Proceedings of the 1964 international congress (pp. 274–288). North-Holland, Amsterdam.
Howson, C. (2000). Hume’s problem: Induction and the justification of belief. New York: Oxford University Press.
Huttegger, S. M. (2015). Merging of opinions and probability kinematics. The Review of Symbolic Logic, 8(4), 611–648.
Hutter, M. (2003). Convergence and loss bounds for Bayesian sequence prediction. IEEE Transactions on Information Theory, 49(8), 2061–2067.
Hutter, M. (2007). On universal prediction and Bayesian confirmation. Theoretical Computer Science, 384(1), 33–48.
Kelly, K. T. (2004). Learning theory and epistemology. In I. Niiniluoto, M. Sintonen, J. Woleński (Eds.), Handbook of epistemology (pp. 183–203). Kluwer, Dordrecht, Page numbers refer to reprint in H. Arló-Costa, V. F. Hendricks, J. F. A. K. van Benthem (Eds.), (2016). Readings in formal epistemology.
Kelly, K. T., Juhl, C. F., & Glymour, C. (1994). Reliability, realism, and relativism. In P. Clark & B. Hale (Eds.), Reading Putnam (pp. 98–160). Oxford: Blackwell.
Leike, J., & Hutter, M. (2015). On the computability of Solomonoff induction and knowledge-seeking. In K. Chaudhuri, C. Gentile, S. Zilles (Eds.), Algorithmic learning theory: proceedings of the twenty-sixth international conference (ALT 2015) (pp. 364–378). Springer.
Levin, L. A. (2010). Some theorems on the algorithmic approach to probability theory and information theory. Annals of Pure and Applied Logic, 162, 224–235. Translation of PhD dissertation, 1971. Russia: Moscow State University.
Li, M., & Vitányi, P. M. B. (2008). An introduction to Kolmogorov complexity and its applications (3rd ed.). New York: Springer.
Nies, A. (2009). Computability and randomness. Oxford: Oxford University Press.
Oakes, D. (1985). Self-calibrating priors do not exist. Journal of the American Statistical Association, 80(390), 340–341.
Poland, J., & Hutter, M. (2005). Asymptotics of discrete MDL for online prediction. IEEE Transactions on Information Theory, 51(11), 3780–3795.
Putnam, H. (1963a) Degree of confirmation’ and inductive logic. In Schilpp (1963), pp. 761–783. Reprinted in Putnam (1975), pp. 270–292.
Putnam, H. (1963b). Probability and confirmation. In The voice of America forum lectures. U.S. Information Agency, Washington, D.C., Page numbers refer to reprint in Putnam (1975), pp. 293–304.
Putnam, H. (1974). The ‘corroboration’ of theories. In P. A. Schilpp (Ed.), The philosophy of Karl Popper, Book I. The Library of Living Philosophers (Vol. 14, pp. 221–240). Open Court, LaSalle, IL, Reprinted in Putnam (1975), pp. 250–269.
Putnam, H. (1975). Mathematics, matter, and method. Cambridge: Cambridge University Press.
Rathmanner, S., & Hutter, M. (2011). A philosophical treatise of universal induction. Entropy, 13(6), 1076–1136.
Reichenbach, H. (1933). Die logischen Grundlagen des Wahrscheinlichkeitsbegriffs. Erkenntnis, 3, 401–425.
Reichenbach, H. (1935). Wahrscheinlichkeitslehre: eine Untersuchung Über die Logischen und Mathematischen Grundlagen der Wahrscheinlichkeitsrechnung. Leiden: Sijthoff.
Reichenbach, H. (1938). Experience and prediction. Chicago, IL: University of Chicago Press.
Reimann, J. (2009). Randomness—Beyond Lebesgue measure. In S. B. Cooper, H. Geuvers, A. Pillay, & J. Väänänen (Eds.), Logic colloquium 2006 (pp. 247–279). Chicago, IL: Association for Symbolic Logic.
Romeijn, J.-W. (2004). Hypotheses and inductive predictions. Synthese, 141(3), 333–364.
Salmon, W. C. (1967). The foundations of scientific inference. Pittsburgh, PA: University of Pittsburgh Press.
Salmon, W. C. (1991). Hans Reichenbach’s vindication of induction. Erkenntnis, 35, 99–122.
Schervish, M. J. (1985). Comment on Dawid (1985a). The Annals of Statistics, 13(4), 1274–1282.
Schilpp, P. A. (Ed.). (1963). The philosophy of Rudolf Carnap. The library of living philosophers (Vol. 11). LaSalle, IL: Open Court.
Shen, A. K., Uspensky, V. A., & Vereshchagin, N. K. (2017). Kolmogorov complexity and algorithmic randomness. Providence, RI: American Mathematical Society.
Skyrms, B. (1991). Carnapian inductive logic for Markov chains. Erkenntnis, 35, 439–460.
Skyrms, B. (1996). Carnapian inductive logic and Bayesian statistics. In T. Ferguson, L. Shapley, & J. MacQueen (Eds.), Statistics, probability and game theory: Papers in honor of David Blackwell (pp. 321–336). Beachwood: Institute of Mathematical Statistics.
Soare, R. I. (2016). Turing computability: Theory and applications. New York: Springer.
Solomonoff, R. J. (1964). A formal theory of inductive inference. Parts I and II. Information and Control, 7(1–22), 224–254.
Solomonoff, R. J. (1978). Complexity-based induction systems: Comparisons and convergence theorems. IEEE Transactions on Information Theory, 24(4), 422–432.
Sterkenburg, T. F. (2016). Solomonoff prediction and Occam’s razor. Philosophy of Science, 83(4), 459–479.
Tao, T. (2011). An introduction to measure theory. Providence, RI: American Mathematical Society.
Turing, A. M. (1936). On computable numbers, with an application to the Entscheidungsproblem. Proceedings of the London Mathematical Society, 2(42), 230–265.
van Fraassen, B. C. (1989). Laws and symmetry. Oxford: Clarendon Press.
van Fraassen, B. C. (2000). The false hopes of traditional epistemology. Philosophy and Phenomenological Research, 60(2), 253–280.
Zvonkin, A. K., & Levin, L. A. (1970). The complexity of finite objects and the development of the concepts of information and randomness by means of the theory of algorithms. Russian Mathematical Surveys, 26(6), 83–124. Translation of the Russian original. Uspekhi Matematicheskikh Nauk, 25(6), 85–127, 1970.
Author information
Authors and Affiliations
Corresponding author
Additional information
This paper was written while I was with the Machine Learning group, Centrum Wiskunde & Informatica, Amsterdam, and the Faculty of Philosophy, University of Groningen. I want to thank Peter Grünwald, Wouter Koolen, Jan Leike, and Nishant Mehta for helpful discussions, Jan-Willem Romeijn for valuable advice on earlier versions of the paper, and finally the anonymous reviewers for their careful comments, which did much to improve it.
Appendix
Appendix
Theorem 2 is in the literature (Li and Vitányi 2008, 352ff; Hutter 2003, 2062; Poland and Hutter 2005, 3781) usually presented as a consequence of (variations of) the following stronger result, first shown by Solomonoff (1978, 426f). Let us introduce as a measure of the divergence between two distributions \(P_1\) and \(P_2\) over \(\{0,1\}\) the squared Hellinger distance
Then, for every \(\mu \in \Delta _1\), the expected infinite sum of divergences between \(Q_U\) and \(\mu \)
is bounded by a constant.
To see how \((\text {I: }\Delta _1)\) follows from this constant bound, suppose that \(Q_U\) does not satisfy \((\text {I: }\Delta _1)\): there is a \(\mu \in \Delta _1\) such that with probability \(\epsilon >0\) there is a \(\delta > 0\) such that \(\left| \mu (x_{n+1} \mid \pmb {x}^n)-Q_U(x_{n+1} \mid \pmb {x}^n)\right| >\delta \) infinitely often. But that means that with positive probability the infinite sum of squared Hellinger distances is infinite, and the expectation (2) cannot be bounded by a constant.
The proof of the constant bound on (2) starts with the fact that the distance \(H(P_1,P_2)\) is bounded by the Kullback-Leibler divergence
The term \(-\log P(\pmb {x})\) expresses the logarithmic loss of P on sequence \(\pmb {x}\), a standard measure of prediction error; the difference \(-\log P_2(\pmb {x})-\left( -\log P_1(\pmb {x})\right) =-\log \frac{P_2(\pmb {x})}{P_1(\pmb {x})}\) expresses the surplus prediction error or regret of \(P_2\) relative to \(P_1\) on sequence \(\pmb {x}\). Thus the Kullback-Leibler divergence (3) expresses the P1-expected regret of \(P_2\) relative to \(P_1\).
Using \(H(P_1,P_2) \le D(P_1 \parallel P_2)\) one can work out that (2) is bounded by
Now by the universality of \(Q_U\) in the class of \(\Sigma _1\) measures we know that \(Q_U\) majorizes \(\mu \): for every finite \(\pmb {x}\) there is a constant \(c \in [0,1]\) such that \(Q_U(\pmb {x}) \ge c \cdot \mu (\pmb {x})\). Indeed we can identify c with \(w(\mu )\), where w is the prior over hypothesis class \({{\mathcal {H}}_{\Sigma _1}}\) in the classical Bayesian representation \(\xi ^{\Sigma _1}_w\) of \(Q_U\). This fact allows us to derive that for every sequence\(\pmb {x}^m\)of any lengthm
This concludes the proof that (2) is bounded by a constant: since the bound (5) holds for any individual sequence of any length, it also holds for (4) and thus for (2).
The absolute optimality property mentioned in Sect. 8 is just this individual sequence bound (5), which continues to hold for \(\nu \) that are \(\Sigma _1\). To reformulate, for any such \(\nu \), the sum of surplus prediction errors (regrets) of \(Q_U\) relative to \(\nu \) will always (for any sequence \(\pmb {x}^m\) of any length m) be bounded by a constant:
Rights and permissions
About this article
Cite this article
Sterkenburg, T.F. Putnam’s Diagonal Argument and the Impossibility of a Universal Learning Machine. Erkenn 84, 633–656 (2019). https://doi.org/10.1007/s10670-018-9975-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10670-018-9975-x