1 Introduction

In this chapter we aim to give a nontechnical account of the mathematical theory of randomness. This theory can be seen as an extension of classical probability theory that allows us to talk about individual random objects. Besides answering the philosophical question what it means to be random, the theory of randomness has applications ranging from biology, computer science, physics, and linguistics, to mathematics itself.

The theory comes in two flavors: A theory of randomness for finite objects (for which the textbook by Li and Vitányi 2008 is the standard reference) and a theory for infinite ones. The latter theory, as well as the relation between the two theories of randomness, is surveyed in the paper (Downey et al. 2006), and developed more in full in the recent textbooks by Downey and Hirschfeldt (2010) and Nies (2009). Built on the theory of computation, the theory of randomness has itself deeply influenced computability theory in recent years.

We warn the reader who is afraid of mathematics that there will be formulas and mathematical notation, but we promise that they will be explained at a nontechnical level. Some more background information about the concepts involved is given in footnotes and in two appendices. It is fair to say, however, that to come to a better understanding of the subject, there is of course no way around the formulas, and we quote Euclid, who supposedly told King Ptolemy I, when the latter asked about an easier way of learning geometry than Euclid’s Elements, that “there is no royal road to geometry”.Footnote 1

2 What Is Randomness?

Classical probability theory talks about random objects, for example by saying that if you randomly select four cards from a standard deck, the probability of getting four aces is very small. However, every configuration of four cards has the same small probability of appearing, so there is no qualitative difference between individual configurations in this setting. Similarly, if we flip a fair coin one hundred times, and we get a sequence of one hundred tails in succession, we may feel that this outcome is very special, but how do we justify our excitement over this outcome? Is the probability for this outcome not exactly the same as that of any other sequence of one hundred heads and tails?

Probability theory has been, and continues to be, a highly successful theory, with applications in almost every branch of mathematics. It was put on a sound mathematical foundation in (1933) by Kolmogorov, and in its modern formulation it is part of the branch of mathematics called measure theory. (See Appendix A.) In this form it allows us to also talk not only about randomness in discrete domains (such as cards and coin flips), but also in continuous domains such as numbers on the real line. However, it is important to realize that even in this general setting, probability theory is a theory about sets of objects, not of individual objects. In particular, it does not answer the question what an individual random object is, or how we could call a sequence of fifty zero’s less random than any other sequence of the same length. Consider the following two sequences of coin flips, where 0 stands for heads and 1 for tails:

$$\begin{aligned}&0000000000 0000000000 0000000000 0000000000 0000000000 \\&0000111001 1111011110 0111100100 1010111100 1111010111 \end{aligned}$$

The first sequence consists of fifty 0’s, and the second was obtained by flipping a coin fifty times.Footnote 2 Is there any way in which we can make our feeling that the first sequence is special, and that the second is less so, mathematically precise?

3 Can Randomness Be Defined?

A common misconception about the notion of randomness is that it cannot be formally defined, by applying a tautological reasoning of the form: As soon as something can be precisely defined, it ceases to be random. The following quotation by the Dutch topologist Freudenthal (1969) (taken from van Lambalgen 1987) may serve to illustrate this point:

It may be taken for granted that any attempt at defining disorder in a formal way will lead to a contradiction. This does not mean that the notion of disorder is contradictory. It is so, however, as soon as I try to formalize it.

A recent discussion of randomness and definability, and what can happen if we equate “random” with “not definable”, is in Doyle (2011).Footnote 3 The problem is not that the notion of definability is inherently vague (because it is not), but that no absolute notion of randomness can exist, and that in order to properly define the notion, one has to specify with respect to what the supposed random objects should be random. This is precisely what happens in the modern theory of randomness: A random object is defined as an object that is random with respect to a given type of definition, or class of sets. As the class may vary, this yields a scale of notions of randomness, which may be adapted to the specific context in which the notion is to be applied.

The first person to attempt to give a mathematical definition of randomness was von Mises (1919), and his proposed definition met with a great deal of opposition of the kind indicated above. Von Mises formalized the intuition that a random sequence should be unpredictable. Without giving technical details, his definition can be described as follows. Suppose that X is an infinite binary sequence, that is, a sequence

$$ X(0), X(1), X(2), X(3), {\ldots } $$

where for each positive integer n, X(n) is either 0 or 1. Suppose further that the values of X are unknown to us. We now play a game: At every stage of the game we point to a new location n in the sequence, and then the value of X(n) is revealed to us. Now, according to von Mises, for X to be called random, we should not be able to predict in this way the values of X with probability better than \(\frac{1}{2}\), no matter how we select the locations in X. A strategy to select locations in X is formalized by a selection function, and hence this notion says that no selection function should be able to give us an edge in predicting values from X. However, as in the above discussion on absolute randomness, in this full generality, this notion is vacuous! To counter this, von Mises proposed to restrict attention to “acceptable” selection rules, without further specifying which these should be. He called the sequences satisfying his requirement for randomness Kollektiv’s.Footnote 4

Later Wald (1936, 1937) showed that von Mises’ notion of Kollektiv is nonempty if we restrict to any countable set of selection functions.Footnote 5 Wald did not specify a canonical choice for such a set, but later Church (1940) suggested that the (countable) set of computable selection rules would be such a canonical choice. We thus arrive at the notion of Mises–Wald–Church randomness, defined as the set of Kollektiv’s based on computable selection rules. This notion of random sequence already contains several of the key ingredients of the modern theory of randomness, namely:

  • the insight that randomness is a relative notion, not an absolute one, in that it depends on the choice of the set of selection rules;

  • it is founded on the theory of computation, by restricting attention to the computable selection functions (cf. Sect. 4).

Ville (1939) later showed that von Mises’ notion of Kollektiv is flawed in the sense that there are basic statistical laws that are not satisfied by them. Nevertheless, the notion of Mises–Wald–Church randomness has been decisive for the subsequent developments in the theory of randomness.Footnote 6

The Mises–Wald–Church notion formalized the intuition that a random sequence should be unpredictable. This was taken further by Ville using the notion of martingale. We discuss this approach in Sect. 7. The approach using Kolmogorov complexity formalizes the intuition that a random sequence, since it is lacking in recognizable structure, is hard to describe. We discuss this approach in Sect. 5. Finally, the notion randomness proposed by Martin-Löf formalizes the intuitions underlying classical probability and measure theory. This is discussed in Sect. 6. It is a highly remarkable fact that these approaches are intimately related, and ultimately turn out to be essentially equivalent. As the theory of computation is an essential ingredient in all of this, we have to briefly discuss it before we can proceed.

4 Computability Theory

The theory of computation arose in the 1930s out of concerns about what is provable in mathematics and what is not. Gödel’s famous incompleteness theorem from 1931 states, informally speaking, that in any formal system strong enough to reason about arithmetic, there always exist true statements that are not provable in the system. This shows that there can never be a definitive formal system encompassing all of mathematics. Although it is a statement about mathematical provability, the proof of the incompleteness theorem shows that it is in essence a result about computability. The recursive functions used by Gödel in his proof of the incompleteness theorem were later shown by Turing (1936) to define the same class of functions computable by a Turing machine. Subsequently, many equivalent definitions of the same class of computable functions were found, leading to a robust foundation for a general theory of computation, called recursion theory, referring to the recursive functions in Gödel’s proof. Nowadays the area is mostly called computability theory, to emphasize that it is about what is computable and what is not, rather than about recursion.

Turing machines serve as a very basic model of computation, which are nevertheless able to perform any type of algorithmic computation.Footnote 7 The fortunate circumstance that there are so many equivalent definitions of the same class of computable functions allows us to treat this notion very informally, without giving a precise definition of what a Turing machine is. Thus, a computable function is a function for which there is an algorithm, i.e. a finite step-by-step procedure, that computes it. It is an empirical fact that any reasonable formalization of this concept leads to the same class of functions.Footnote 8

Having a precise mathematical definition of the notion of computability allows us to prove that certain functions or problems are not computable. One of the most famous examples is Turing’s Halting Problem:

Definition 4.1

The Halting Problem is the problem, given a Turing machine M and an input x, to decide whether M produces an output on x in a finite number of steps (as opposed to continuing indefinitely).

Turing (1936) showed that the Halting Problem is undecidable, that is, that there is no algorithm deciding it. (Note the self-referential flavor of this statement: There is no algorithm deciding the behavior of algorithms.) Not only does this point to a fundamental obstacle in computer science (which did not yet exist in at the time that Turing proved this result), but it also entails the undecidability of a host of other problems.Footnote 9 Its importance for the theory of randomness will become clear in what follows.

5 Kolmogorov Complexity

An old and venerable philosophical principle, called Occam’s razor, says that when given the choice between several hypotheses or explanations, one should always select the simplest one. The problem in applying this principle has always been to determine which is the simplest explanation: that which is simple in one context may be complicated in another, and there does not seem to be a canonical choice for a frame of reference.

A similar problem arises when we consider the two sequences on page 3: We would like to say that the first one, consisting of only 0’s, is simpler than the second, because it has a shorter description. But what are we to choose as our description mechanism? When we require, as seems reasonable, that an object can be effectively reconstructed from its description, the notion of Turing machine comes to mind. For simplicity we will for the moment only consider finite binary strings. (This is not a severe restriction, since many objects such as numbers and graphs can be represented as binary strings in a natural way.) Thus, given a Turing machine M, we define a string y to be a description of a string x if \(M(y) = x\), i.e. M produces x when given y as input. Now we can take the length of the string y as a measure of the complexity of x. However, this definition still depends on the choice of M. Kolmogorov observed that a canonical choice for M would be a universal Turing machine, that is, a machine that is able to simulate all other Turing machines. It is an elementary fact of computability theory that such universal machines exist. We thus arrive at the following definition:

Definition 5.1

Fix a universal Turing machine U. The Kolmogorov complexity of of a finite binary string x is the smallest length of a string y such that

$$ U(y) = x. $$

We denote the Kolmogorov complexity of the string x by C(x).

Hence, to say that \(C(x)=n\) means that there is a string y of length n such that \(U(y)=x\), and that there is no such y of length smaller than n. Note that the definition of C(x) still depends on the choice of U. However, and this is the essential point, the theory of Kolmogorov complexity is independent of the choice of U in the sense that when we choose a different universal Turing machine \(U^{\prime }\) as our frame of reference, the whole theory only shifts by a fixed constant.Footnote 10 For this reason, the reference to U is suppressed from this point onwards, and we will simply speak about the Kolmogorov complexity of a string.

Armed with this definition of descriptive complexity, we can now define what it means for a finite string to be random. The idea is that a string is random if it has no description that is shorter than the string itself, that is, if there is no way to describe the string more efficiently than by listing it completely.

Definition 5.2

A finite string x is Kolmogorov random if C(x) is at least the length of x itself.

For example, a sequence of 1000 zero’s is far from random, since its shortest description is much shorter than the string itself: The string itself has length 1000, but we have just described it using only a few words.Footnote 11 More generally, if a string contains a regular pattern that can be used to efficiently describe it, then it is not random. Thus this notion of randomness is related to the compression of strings: If \(U(y)=x\), and y is shorter than x, we may think of y as a compressed version of x, and random strings are those that cannot be compressed.

A major hindrance in using Kolmogorov complexity is the fact that the complexity function C is noncomputable. A precise proof of this fact is given in Appendix B (see Corollary B.2), but it is also intuitively plausible, since to compute the complexity of y we have to see for which inputs x the universal machine U produces y as output. But as we have seen in Sect. 4, this is in general impossible to do by the undecidability of the Halting Problem! This leaves us with a definition that may be wonderful for theoretical purposes, but that one would not expect to be of much practical relevance. One of the miracles of Kolmogorov complexity is that the subject does indeed have genuine applications, many of which are discussed in the book by Li and Vitányi (2008). We will briefly discuss applications in Sect. 11.

We will not go into the delicate subject of the history of Kolmogorov complexity, other than saying that it was invented by Solomonoff, Kolmogorov, and Chaitin (in that order), and we refer to Li and Vitányi (2008) and Downey and Hirschfeldt (2010) for further information.

6 Martin-Löf Randomness

The notion of Martin-Löf randomness, introduced by Martin-Löf in (1966), is based on classical probability theory, which in its modern formulation is phrased in terms of measure theory. In Appendix A the notion of a measure space is explained in some detail, but for now we keep the discussion as light as possible.

The unit interval [0, 1] consists of all the numbers on the real line between 0 and 1. We wish to discuss probabilities in this setting by assigning to subsets A of the unit interval, called events, a probability, which informally should be the probability that when we “randomly” pick a real from [0, 1] that we end up in A. The uniform or Lebesgue measure on [0, 1] assigns the measure \(b-a\) to every interval [ab], i.e. the measure of an interval is simply its length. For example, the interval \([0,\frac{1}{2}]\) has measure \(\frac{1}{2}\), the interval \([\frac{3}{4},1]\) has measure \(\frac{1}{4}\). Note that [0, 1] itself has measure 1.

Given this, we can also define the measure of more complicated sets by considering combinations of intervals. For example, we give the combined event consisting of the union of the intervals \([0,\frac{1}{2}]\) and \([\frac{3}{4},1]\) the measure \(\frac{1}{2} + \frac{1}{4} = \frac{3}{4}\). Since the measures of the subsets of [0, 1] defined in this way satisfy the laws of probability (cf. Appendix A), we can think of them as probabilities.

A series of intervals is called a cover for an event A if A is contained in the union of all the intervals in the series. Now an event A is defined to have measure 0 if it is possible to cover A with intervals in such a way that the total sum of the lengths of all the intervals can be chosen arbitrarily small.

For example, for every real x in [0, 1], the event A consisting only of the real x has measure 0, since for every n, x is contained in the interval \([x-\frac{1}{n},x+\frac{1}{n}]\), and the length of the latter interval is \(2\frac{1}{n}\), which tends to 0 if n tends to infinity.

These definitions suffice to do probability theory on [0, 1], and to speak informally about picking reals “at random”, but we now wish to define what it means for a single real x to be random. We can view any event of measure 0 as a “test for randomness”, where the elements not included in the event pass the test, and those in it fail. All the usual statistical laws, such as the law of large numbers, correspond to such tests. Now we would like to define x to be random if x passes all statistical tests, i.e. x is not in any set of measure 0. But, as we have just seen in the example above, every single real x has measure 0, hence in its full generality this definition is vacuous. (The reader may compare this to the situation we already encountered above in Sect. 3 when we discussed Kollektiv’s.)

However, as Martin-Löf observed, we obtain a viable definition if we restrict ourselves to a countable collection of measure 0 sets. More precisely, let us say that an event A has effective measure 0 if there is a computable series of covers of A, with the measure of the covers in the series tending to 0. Phrased more informally: A has effective measure 0 if there is an algorithm witnessing that A has measure 0, by producing an appropriate series of covers for A. Now we can finally define:

Definition 6.1

A real x is Martin-Löf random if x is not contained in any event of effective measure 0.

It can be shown that with this modification random reals exist.Footnote 12 Moreover, almost every real in [0, 1] is random, in the sense that the set of nonrandom reals is of effective measure 0.

Note the analogy between Definition 6.1 and the way that Church modified von Mises definition of Kollektiv, as described in Sect. 3: There we restricted to the computable selection functions, here we restrict to the effective measure 0 events.

Identifying a real number x with its decimal expansion,Footnote 13 we have thus obtained a definition of randomness for infinite sequences. The question now immediately presents itself what the relation, if any, of this definition is with the definition of randomness of finite sequences from Sect. 5. A first guess could be that an infinite sequence is random in the sense of Martin-Löf if and only if all of its finite initial segments are random in the sense of Kolmogorov, but this turns out to be false. A technical modification to Definition 5.1 is needed to make this work.

A string y is called a prefix of a string \(y'\) if y is an initial segment of \(y'\). For example, the string 001 is a prefix of the string 001101. Let us now impose the following restriction on descriptions: If \(U(y) = x\), i.e. y is a description of x, and \(U(y')=x'\), then we require that y is not a prefix of \(y'\). This restriction may seem arbitrary, but we can motivate it as follows. Suppose that we identify persons by their phone numbers. It is then a natural restriction that no phone number is a prefix of another, since if the phone number y of x were a prefix of a phone number \(y'\) of \(x'\), then when trying to call \(x'\) we would end up talking to x. Indeed, in practice phone numbers are not prefixes of one another. We say that the set of phone numbers is prefix-free. We now require that the set of descriptions y used as inputs for the universal machine U in Definition 5.1 is prefix-free. Of course, this changes the definition of the complexity function C(x): Since there are fewer descriptions available, in general the descriptive complexity of strings will be higher. The complexity of strings under this new definition is called the prefix-free complexity. The underlying idea of the prefix-free complexity is the same as that of Kolmogorov complexity, but technically the theory of it differs from Kolmogorov complexity in several important ways. For us, at this point of the discussion, the most important feature of it is the following landmark result. It was proven in 1973 by Claus-Peter Schnorr, one of the pioneers of the subject.

Theorem 6.2

(Schnorr 1973) An infinite sequence X is Martin-Löf random if and only if there is a constant c such that every initial segment of X of length n has prefix-free complexity at least \(n-c\).

The reader should take a moment to let the full meaning and beauty of this theorem sink in. It offers no less than an equivalence between two seemingly unrelated theories. One is the theory of randomness for finite sequences, based on descriptive complexity, and the other is the theory of infinite sequences, based on measure theory. The fact that there is a relation between these theories at all is truly remarkable.

7 Martingales

Thus far we have seen three different formalizations of intuitions underlying randomness:

  1. (i)

    Mises–Wald–Church randomness, formalizing unpredictability using selection functions,

  2. (ii)

    Kolmogorov complexity, based on descriptive complexity,

  3. (iii)

    Martin-Löf randomness, based on measure theory.

Theorem 6.2 provided the link between (ii) and (iii), and (i) was discussed in Sect. 3. We already mentioned Ville, who showed that the notion in (i) was flawed in a certain sense. Ville also showed an alternative way to formalize the notion of unpredictability of an infinite sequence, using the notion of a martingale, which we now discuss.Footnote 14 Continuing our game-theoretic discussion of Sect. 3, we imagine that we are playing against an unknown infinite binary sequence X. At each stage of the game, we are shown a finite initial part

$$ X(0), X(1), X(2), \ldots , X(n-1) $$

of the sequence X, and we are asked to bet on the next value X(n). Suppose that at this stage of the game, we have a capital of d dollar. Now we may split the amount d into parts \(b_0\) and \(b_1\), and bet the amount \(b_0\) that X(n) is 0, and the amount \(b_1\) that X(n) is 1. After placing our bets, we receive a payoff \(d_0 = 2b_0\) if \(X(n)=0\), and a payoff \(d_1 = 2b_1\) if \(X(n)=1\). Hence the payoffs satisfy the equation

$$\begin{aligned} \frac{d_0 + d_1}{2} = d. \end{aligned}$$

After placing our bets, we receive a payoff \(d_0\) if \(X(n)\,{=}\,0\), and a payoff \(d_1\) if \(X(n)\,{=}\,1\).

For example, we may let \(b_0 = b_1 = \frac{1}{2}d\), in which case our payoff will be d, no matter what X(n) is. So this is the same as not betting at all, and leaving our capital intact. But we can also set \(b_0 = d\) and \(b_1 =0\). In this case, if \(X(n)=0\) we receive a payoff of 2d, and we have doubled our capital. However, if it turns out that \(X(n)=1\), we receive 0, and we have lost everything. Hence this placement of the bets should be made only when we are quite sure that \(X(n)=0\). Any other placement of bets between these two extremes can be made, reflecting our willingness to bet on \(X(n)=0\) or \(X(n)=1\).

After betting on X(n), the value X(n) is revealed, we receive our payoff for this round, and the game continues with betting on \(X(n+1)\).

Now the idea of Ville’s definition is that we should not be able to win an infinite amount of money by betting on a random sequence. For a given binary string \(\sigma \), let \(\sigma \widehat{{}}0\) denote the string \(\sigma \) extended by a 0, and \(\sigma \widehat{{}}1\) the string \(\sigma \) extended by a 1. Formally, a martingale is a function d such that for every finite string \(\sigma \) the martingale equality

$$\begin{aligned} \frac{d(\sigma \widehat{{}}0) + d(\sigma \widehat{{}}1)}{2} = d(\sigma ) \end{aligned}$$

holds. The meaning of this equation is that when we are seeing the initial segment \(\sigma \), and we have a capital \(d(\sigma )\), we can bet the amount \(\frac{1}{2}d(\sigma \widehat{{}}0)\) that the next value will be a zero, and \(\frac{1}{2}d(\sigma \widehat{{}}1)\) that the next value will be a one, just as above in Eq. (1). Thus the martingale d represents a particular betting strategy. Now for a random sequence X, the amounts of capital

$$ d\big (X(0),\ldots , X(n-1)\big ) $$

that we win when betting on X should not tend to infinity.Footnote 15

As in the case of Mises–Wald–Church randomness and the case of Martin-Löf randomness, this definition only makes sense when we restrict ourselves to a countable class of martingales.Footnote 16 A natural choice would be to consider the computable martingales. The resulting notion of randomness was studied in Schnorr (1971), and it turns out to be weaker than Martin-Löf randomness. However, there exists another natural class of martingales, the so-called c.e.-martingales,Footnote 17 such that the resulting notion of randomness is equivalent to Martin-Löf randomness.

Thus Ville’s approach to formalizing the notion of unpredictability using martingales gives yet a third equivalent way to define the same notion of randomness.

8 Randomness and Provability

By Gödel’s incompleteness theorem (see Sect. 4), in any reasonable formal system of arithmetic, there exist formulas that are true yet unprovable. A consequence of this result is that there is no algorithm to decide the truth of arithmetical formulas. It follows from the undecidability of the Halting Problem (see Definition 4.1) that the set of formulas that are provable is also undecidable.Footnote 18 However, the set of provable formulas is computably enumerable, meaning that there is an algorithm that lists all the provable statements. Computably enumerable, or c.e., sets, play an important role in computability theory. For example, the set H representing the Halting Problem is an example of a c.e. set, because we can in principle make an infinite list of all the halting computations.Footnote 19 The complement \(\overline{H}\) of the set H, consisting of all nonconvergent computations, is not c.e. For if it were, we could decide membership in H as follows: Given a pair M and x, effectively list both H and its complement \(\overline{H}\) until the pair appears in one of them, thus answering the question whether the computation M(x) converges. Since H is not computable, it follows that \(\overline{H}\) cannot be c.e. Because the set of all provable statements is c.e., it also follows that not all statements of the form

$$ ``M(x)\,\text {does not halt}\text {''} $$

are provable. Hence there exist computations that do not halt, but for which this fact is not provable! Thus we obtain a specific example of a true, but unprovable statement. The same kind of reasoning applies if we replace H by any other noncomputable c.e. set.

Now consider the set R of all strings that are Kolmogorov random, and let non-R be the set of all strings that are not Kolmogorov random. We have the following facts:

  1. (i)

    non-R is c.e. This is easily seen as follows: If x is not random, there is a description y shorter than x such that \(U(y)=x\). Since the set of halting computations is c.e., it follows that non-R is also c.e.

  2. (ii)

    R is not c.e. This is proved in Theorem B.1 in Appendix B.

By applying the same reasoning as for H above, we conclude from this that there are statements of the form

$$ \text {``}x\;\text {is random''} $$

that are true, but not provable. This is Chaitin’s version of the incompleteness theorem, cf. Chaitin (1974).Footnote 20

9 Other Notions of Randomness

Mises–Wald–Church random sequences were defined using computable selection functions, and Martin-Löf random sequences with computable covers, which in Ville’s approach correspond to c.e.-martingales. As Wald already pointed out in the case of Kollektiv’s, all of these notions can be defined relative to any countable collection of selection functions, respectively covers and martingales. Choosing computable covers in the case of Martin-Löf randomness gave the fundamental and appealing connection with Kolmogorov randomness (Theorem 6.2), but there are situations in which this is either too weak, or too strong. Viewing the level of computability of covers and martingales as a parameter that we can vary allows us to introduce notions of randomness that are either weaker or stronger than the ones we have discussed so far.

In his groundbreaking book (1971), Schnorr discussed alternatives to the notion of Martin-Löf randomness, thus challenging the status of this notion (not claimed by Martin-Löf himself) as the “true” notion of randomness.Footnote 21

In studying the randomness notions corresponding to various levels of computability, rather than yielding a single “true” notion of randomness, a picture has emerged in which every notion has a corresponding context in which it fruitfully can be applied. This ranges from low levels of complexity in computational complexity theory (see e.g. the survey paper by Lutz 1997), to the levels of computability (computable and c.e.) that we have been discussing in the previous sections, to higher levels of computability, all the way up to the higher levels of set theory. In studying notions of randomness across these levels, randomness has also served as a unifying theme between various areas of mathematical logic.

The general theory also serves as a background for the study of specific cases. Consider the example of \(\pi \). Since \(\pi \) is a computable real number, its decimal expansion is perfectly predictable, and hence \(\pi \) it is not random in any of the senses discussed above. However, the distribution of the digits \(0,\ldots ,9\) in \(\pi \) appears to be “random”. Real numbers with a decimal expansion in which every digit occurs with frequency \(\frac{1}{10}\), and more general, every block of digits of length n occurs with frequency \(\frac{1}{10^n}\), are called normal to base 10. Normality can be seen as a very weak notion of randomness, where we consider just one type of statistical test, instead of infinitely many as in the case of Martin-Löf randomness. It is in fact not known if \(\pi \) is normal to base 10, but it is conjectured that \(\pi \) is indeed “random” in this weak sense. For a recent discussion of the notion of normality, see Becher and Slaman (2014).

10 Pseudorandom Number Generators and Complexity Theory

In many contexts, it is desirable to have a good source of random numbers, for example when one wants to take an unbiased random sample, in the simulation of economic or atmospheric models, or when using statistical methods to estimate things that are difficult to compute directly (the so-called Monte Carlo method). In such a case, one may turn to physical devices (which begs the question about randomness of physical sources), or one may try to generate random strings using a computer. However, the outcome of a deterministic procedure on a computer cannot be random in any of the senses discussed above. (By Theorem B.1 in Appendix B, there is no purely algorithmic way of effectively generating infinitely many random strings, and it is easy to see that a Martin-Löf random set cannot be computable.) Hence the best an algorithm can do is to produce an outcome that is pseudorandom, that is, “random enough”, where the precise meaning of “random enough” depends on the context. In practice this usually means that the outcome should pass a number of standard statistical tests. Such procedures are called pseudorandom number generators. That the outcomes of a pseudorandom number generator should not be taken as truly random was pointed out by the great mathematician and physicist John von Neumann, when he remarked that

Anyone who considers arithmetical methods of producing random digits is, of course, in a state of sin.Footnote 22

Randomized algorithms are algorithms that employ randomness during computations, and that allow for a small probability of error in their answers. For example, the first feasibleFootnote 23 algorithms to determine whether a number is prime were randomized algorithms.Footnote 24 An important theme in computational complexity theory is the extent to which it is possible to derandomize randomized algorithms, i.e. to convert them to deterministic algorithms. This is connected to fundamental open problems about the relation between deterministic algorithms, nondeterministic algorithms, and randomized computation.Footnote 25 Besides being of theoretical interest, this matter is of great practical importance, for example in the security of cryptographic schemes that are currently widely used. For an overview of current research we refer the reader to Arora and Barak (2009). It is also interesting to note that randomness plays an important part in many of the proofs of results about deterministic algorithms, that do not otherwise mention randomness.

11 Applications

As pointed out in Sect. 5 and Corollary B.2, due to the undecidability of the Halting Problem, the notion of Kolmogorov complexity is inherently noncomputable. This means that there is no algorithm that, given a finite sequence, can compute its complexity, or decide whether it is random or not. Can such a concept, apart from mathematical and philosophical applications, have any practical applications? Perhaps surprisingly, the answer is “yes”. A large number of applications, ranging from philosophy to physics and biology, is discussed in the monograph by Li and Vitányi (2008). Instead of attempting to give an overview of all applications, for which we do not have the space, we give an example of one striking application, namely the notion of information distance. Information distance is a notion built on Kolmogorov complexity that was introduced by Bennett et al. (1998). It satisfies the properties of a metric (up to constants), and it gives a well-defined notion of distance between arbitrary pairs of binary strings. The computational status of information distance (and its normalized version) was unclear for a while, but as the notion of Kolmogorov complexity itself it turned out to be noncomputable (Terwijn et al. 2011). However, it is possible to approximate the ideal notion using existing, computable, compressors. This gives a computable approximation of information distance, that can in principle be applied to any pair of binary strings, be it musical files, the genetic code of mammals, or texts in any language. By computing the information distance between various files from a given domain, one can use the notion to classify anything that can be coded as a binary string. The results obtained in this way are startling. E.g. the method is able to correctly classify pieces of music by their composers, animals by their genetic code, or languages by their common roots, purely on the basis of similarity of their binary encodings, and without any expert knowledge. Apart from these applications, the notion of information distance is an example of a provably intractable notion, which nevertheless has important practical consequences. This provides a strong case for the study of such theoretical notions.