Abstract
We present a computational model of mathematical reasoning according to which mathematics is a fundamentally stochastic process. That is, in our model, whether or not a given formula is deemed a theorem in some axiomatic system is not a matter of certainty, but is instead governed by a probability distribution. We then show that this framework gives a compelling account of several aspects of mathematical practice. These include: 1) the way in which mathematicians generate research programs, 2) the applicability of Bayesian models of mathematical heuristics, 3) the role of abductive reasoning in mathematics, 4) the way in which multiple proofs of a proposition can strengthen our degree of belief in that proposition, and 5) the nature of the hypothesis that there are multiple formal systems that are isomorphic to physically possible universes. Thus, by embracing a model of mathematics as not perfectly predictable, we generate a new and fruitful perspective on the epistemology and practice of mathematics.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
 1.
Note that just like the authors of all other papers written about mathematics, we believe that the deductive reasoning in this essay is correct. The fact that we acknowledge the possibility of erroneous deductive reasoning, and that in fact the unavoidability of erroneous reasoning is the topic of this essay, doesn’t render our belief in the correctness of our reasoning about that topic any more or less legitimate than the analogous belief by those other authors.
 2.
This is equivalent to requiring that an NDR machine is a “sequential information source” [8]. In the current context, it imposes restrictions on how likely the NDR machine is to remove claims from the claims tape.
 3.
Note that even if a claims set C is small, it might only arise with nonnegligible probability in large claims lists, i.e., claims lists produced after many iterations of the NDR machine. For example, this might happen in the NDR machine of the community of mathematicians if the claims in c would not even make sense to mathematicians until the community of mathematicians has been investigating mathematics for a long time.
 4.
Note the implicit convention that \({\overline{P}}(v \;\; q)\) concerns the probability of a claims list containing a single claim in which the answer v arises for the precise question q, not the probability of a claims list that has an answer v in some claim, and that also has the question q in some (perhaps different) claim.
 5.
In general, even if a mathematician updates their beliefs in a Bayesian manner, the priors and likelihoods they use to do so may be “wrong”, in the sense that they differ from the ones used by the farfuture community of mathematicians. The use of purely Bayesian reasoning, by itself, provides no advantage over using nonBayesian reasoning—unless the subjective priors and likelihoods of the current community of mathematicians happen to agree with those of the farfuture community of mathematicians. In the rest of this section we assume that there is such agreement. See [5, 23] for how to analyze expected performance of a Bayesian decisionmaker once we allow for the possibility that the priors they use to make decisions differ from the realworld priors that determine the expected loss of their decisionmaking.
 6.
Note that this argument doesn’t require the answer distribution of the farfuture community of mathematicians to be mistakefree. (The possibility that “correct” mathematics contains inconsistencies with some nonzero probability is discussed below, in Sect. 10.5.) Note also that the simple algebra leading from Eq. (10.7) to Eq. (10.12) would still hold even if q and/or \(q'\) were not currently an open question, and in particular even if one or both of them were in the current claims list C. However, in that case, the conclusion of the argument would not concern the process of abduction narrowly construed, since the conclusion would also involve the probability that the farfuture community of mathematicians overturns claims that are accepted by the current community of mathematicians.
 7.
Technically the update function only needs to be defined on the “finitary” subset of \({\mathbb {R}}\times {\mathbb {Z}}\times \Lambda ^\infty \), namely, those elements of \({\mathbb {R}}\times {\mathbb {Z}}\times \Lambda ^\infty \) for which the tape contents has a nonblank value in only finitely many positions.
References
S. Aaronson, Why philosophers should care about computational complexity, in Computability: Turing, Gödel, Church, and Beyond, pp. 261–327 (MIT Press, 2013)
S. Arora, B. Barak, Computational Complexity: A Modern Approach (Cambridge University Press, 2009)
J.D. Barrow, Theories of Everything: The Quest for Ultimate Explanation (Clarendon Press, Oxford, 1991)
J.D. Barrow, Godel and physics. Kurt Gödel and the Foundations of Mathematics: Horizons of Truth, p. 255 (2011)
J.L. Carroll, A Bayesian decision theoretical approach to supervised learning, selective sampling, and empirical function optimization (2010)
R. Fagin, Y. Moses, J.Y. Halpern, M.Y. Vardi, Reasoning About Knowledge (MIT Press, 2003)
K. Gödel, On Undecidable Propositions of Formal Mathematics Systems (Institute for Advanced Study, 1934)
P. Grunwald, P. Vitányi, Shannon information and Kolmogorov complexity. arXiv preprint arXiv:cs/0410002 (2004)
D. Hilbert, Die grundlagen der mathematik, in Die Grundlagen der Mathematik, pp. 1–21 (Springer, 1928)
D. Hume, A Treatise of Human Nature (Courier Corporation, 2012). Book 1, Part 4, Section 1
P. Hut, M. Alford, M. Tegmark, On math, matter and mind. Found. Phys. 36(6), 765–794 (2006)
D. Lewis, Counterfactuals (Basil Blackwell, Oxford, 1973)
C.S. Peirce, Collected Papers of Charles Sanders Peirce, vol. 2 (Harvard University Press, 1960)
H. Poincaré, Mathematical creation. The Monist 321–335 (1910)
J. Schmidhuber, A computer scientist’s view of life, the universe, and everything, in Foundations of Computer Science, pp. 201–208 (Springer, 1997)
B. Settles, Active learning literature survey. Technical report, University of WisconsinMadison Department of Computer Sciences (2009)
M. Tegmark, Is “the theory of everything” merely the ultimate ensemble theory? Ann. Phys. 270(1), 1–51 (1998)
M. Tegmark, The mathematical universe. Found. Phys. 38(2), 101–150 (2008)
M. Tegmark, The multiverse hierarchy. arXiv preprint arXiv:0905.1283 (2009)
M. Tegmark, Our Mathematical Universe: My Quest for the Ultimate Nature of Reality (Vintage, 2014)
S. Viteri, S. DeDeo, Explosive proofs of mathematical truths. arXiv preprint arXiv:2004.00055 (2020)
E.P. Wigner, The Unreasonable Effectiveness of Mathematics in the Natural Sciences, vol. 13, pp. 1–14 (1960)
D.H. Wolpert, The lack of a priori distinctions between learning algorithms. Neural Comput. 8(7), 1341–1390 (1996)
D.H. Wolpert, The stochastic thermodynamics of computation. J. Phys. A Math. Theor. 52(19), 193001 (2019)
D.H. Wolpert, W.G. Macready, No free lunch theorems for optimization. IEEE Trans. Evol. Comput. 1(1), 67–82 (1997)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A Probabilistic Turing Machines
A Probabilistic Turing Machines
Perhaps the most famous class of computational machines are Turing machines. One reason for their fame is that it seems one can model any computational machine that is constructable by humans as a Turing machine. A bit more formally, the ChurchTuring thesis states that “a function on the natural numbers is computable by a human being following an algorithm, ignoring resource limitations, if and only if it is computable by a Turing machine.”
There are many different definitions of Turing machines (TMs) that are “computationally equivalent” to one another. For us, it will suffice to define a TM as a 7tuple \((R,\Lambda ,b,v,r^\varnothing ,r^A,\rho )\) where:

1.
R is a finite set of computational states;

2.
\(\Lambda \) is a finite alphabet containing at least three symbols;

3.
\(b \in \Lambda \) is a special blank symbol;

4.
\(v \in {\mathbb {Z}}\) is a pointer;

5.
\(r^\varnothing \in R\) is the start state;

6.
\(r^A \in R\) is the halt state; and

7.
\(\rho : R \times {\mathbb {Z}}\times \Lambda ^\infty \rightarrow R \times {\mathbb {Z}}\times \Lambda ^\infty \) is the update function. It is required that for all triples (r, v, T), that if we write \((r', v', T') = \rho (r, v, T)\), then \(v'\) does not differ by more than 1 from v, and the vector \(T'\) is identical to the vectors T for all components with the possible exception of the component with index v^{Footnote 7};
We sometimes refer to R as the states of the “head” of the TM, and refer to the third argument of \(\rho \) as a tape, writing a value of the tape (i.e., of the semiinfinite string of elements of the alphabet) as T.
Any TM \((R,\Sigma ,b,v,r^\varnothing , r^A, \rho )\) starts with \(r = r^\varnothing \), the counter set to a specific initial value (e.g, 0), and with T consisting of a finite contiguous set of nonblank symbols, with all other symbols equal to b. The TM operates by iteratively applying \(\rho \), until the computational state falls in \(r^A\), at which time it stops, i.e., any ID with the head in the halt state is a fixed point of \(\rho \).
If running a TM on a given initial state of the tape results in the TM eventually halting, the largest blankdelimited string that contains the position of the pointer when the TM halts is called the TM’s output. The initial state of T (excluding the blanks) is sometimes called the associated input, or program. (However, the reader should be warned that the term “program” has been used by some physicists to mean specifically the shortest input to a TM that results in it computing a given output.) We also say that the TM computes an output from an input. In general, there will be inputs for which the TM never halts. The set of all those inputs to a TM that cause it to eventually halt is called its halting set.
The set of triples that are possible arguments to the update function of a given TM are sometimes called the set of instantaneous descriptions (IDs) of the TM. Note that as an alternative to the definition in (7) above, we could define the update function of any TM as a map over an associated space of IDs.
In one particularly popular variant of this definition of TMs the single tape is replaced by multiple tapes. Typically one of those tapes contains the input, one contains the TM’s output (if and) when the TM halts, and there are one or more intermediate “work tapes” that are in essence used as scratch pads. The advantage of using this more complicated variant of TMs is that it is often easier to prove theorems for such machines than for singletape TMs. However, there is no difference in their computational power. More precisely, one can transform any singletape TM into an equivalent multitape TM (i.e., one that computes the same partial function), as shown by Arora and Barak [2].
A universal Turing machine (UTM), M, is one that can be used to emulate any other TM. More precisely, in terms of the singletape variant of TMs, a UTM M has the property that for any other TM \(M'\), there is an invertible map f from the set of possible states of the tape of \(M'\) into the set of possible states of the tape of M, such that if we:

1.
apply f to an input string \(\sigma '\) of \(M'\) to fix an input string \(\sigma \) of M;

2.
run M on \(\sigma \) until it halts;

3.
apply \(f^{1}\) to the resultant output of M;
then we get exactly the output computed by \(M'\) if it is run directly on \(\sigma '\).
An important theorem of computer science is that there exist universal TMs (UTMs). Intuitively, this just means that there exists programming languages which are “universal”, in that we can use them to implement any desired program in any other language, after appropriate translation of that program from that other language. The physical CT thesis considers UTMs, and we implicitly restrict attention to them as well.
Suppose we have two strings \(s^1\) and \(s^2\) where \(s^1\) is a proper prefix of \(s^2\). If we run the TM on \(s^1\), it can detect when it gets to the end of its input, by noting that the following symbol on the tape is a blank. Therefore, it can behave differently after having reached the end of \(s^1\) from how it behaves when it reaches the end of the first \(\ell (s^1)\) bits in \(s^2\). As a result, it may be that both of those input strings are in its halting set, but result in different outputs. A prefix (free) TM is one in which this can never happen: there is no string in its halting set that is a proper prefix of another string in its halting set. For technical reasons, it is conventional in the physics literature to focus on prefix TMs, and we do so here.
The coinflipping distribution of a prefix TM M is the probability distribution over the strings in M’s halting set generated by IID “tossing a coin” to generate those strings, in a Bernoulli process, and then normalizing. So any string \(\sigma \) in the halting set has probability \(2^{\;\;\sigma \;\;} / \Omega \) under the coinflipping prior, where \(\Omega \) is the normalization constant for the TM in question.
Finally, for our purposes, a Probabilistic Turing Machine (PTM) is a conventional TM as defined by conditions (1)–(7), except that the update function \(\rho \) is generalized to be a conditional distribution. The conditional distribution is not arbitrary however. In particular, we typically require that there is zero probability that applying such an update conditional distribution violates condition (7). Depending on how we use a PTM to model NDR machines, we may introduce other requirements as well.
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Wolpert, D.H., Kinney, D. (2021). Noisy Deductive Reasoning: How Humans Construct Math, and How Math Constructs Universes. In: Aguirre, A., Merali, Z., Sloan, D. (eds) Undecidability, Uncomputability, and Unpredictability. The Frontiers Collection. Springer, Cham. https://doi.org/10.1007/9783030703547_10
Download citation
DOI: https://doi.org/10.1007/9783030703547_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 9783030703530
Online ISBN: 9783030703547
eBook Packages: Physics and AstronomyPhysics and Astronomy (R0)