1 Introduction

In 1909, Émile Borel [1] introduced normal numbers and proved that almost all numbers are normal.Footnote 1 Today, several different proofs of Borel’s theorem exist. This includes some that are generally considered to be elementary, e.g., [6, 7], and [3].

One of the earliest proofs can be found in Felix Hausdorff’s Grundzüge der Mengenlehre [5] from 1914. But Hausdorff only proved that almost all numbers are simply normal in base 2 and then claimed it would be “evident” that the statement was true for other bases as well. He did not define normal numbers and gave no indication how to prove a stronger version of his result. As we will show in this article, Hausdorff’s argument is not hard to generalize, although the way to do it might not be totally obvious either.

Anyway, to the author’s knowledge, nobody has picked up Hausdorff’s elegant idea so far. [4] and [8] contain proofs which argue along similar lines but require more technical finesse and are less direct.

This article is intended to be accessible to undergraduates at the beginning of their studies and we thus will not presuppose a lot of previous knowledge except for basic combinatorics, basic calculus, and a bit of set theory (up to the definition of countable). Everything else will be defined and proved, including enough (informal) measure theory to state and prove the main theorem.

2 “Almost all”

The idea of a measure is to assign non-negative numbers to sets (of real numbers) in such a way that these numbers can intuitively be interpreted as the sizes of the sets. Two obviously meaningful requirements for a measure are that the empty set is assigned the measure zero and that the measure is additive: the measure of the union of two (or finitely many) sets which are mutually disjoint must be the sum of their measures. In order to be useful in analysis, measures are actually required to be \(\sigma\)-additive: the above must also hold for countably many sets (in which case the sum of the measures becomes a series).

The most important measure, and the one to be used in this article, is the Lebesgue measure which we will denote by the letter \(\lambda\). The Lebesgue measure of an interval of real numbers is its length, e.g., \(\lambda([1,5])=5-1=4\). And the open interval \((1,5)\) has the same measure. This implies that the endpoints “do not count:” finite sets like \(\{1,5\}\) are null sets, their measure is zero. Generally, a set is a null set if, for every \(\epsilon> 0\), one can find countably many intervals such that the set is a subset of the union of these intervals and the sum of the measures of the intervals is at most \(\epsilon\). An important example of a null set is the set \(\mathbb{Q}\) of rational numbers. More generally:

Lemma 1

Every countable set is a null set.

Proof

Let \(A\) be a countable set and let \((a_{n})_{n\in\mathbb{N}}\) be an enumeration of \(A\).Footnote 2 Furthermore, let \(\epsilon\) be an arbitrary positive number. For each \(n\in\mathbb{N}\), let \(I_{n}\) be an interval of length \(\epsilon/2^{n+1}\) which includes \(a_{n}\). \(A\) is then covered by the intervals \(I_{n}\) and we have

$$\sum_{n=0}^{\infty}\lambda(I_{n})=\sum_{n=0}^{\infty}\frac{\epsilon}{2^{n+1}}=\epsilon.$$

Lemma 2

The countable union of null sets is again a null set.

Proof

The main idea is that, for a given \(\epsilon\), we will cover the first null set with intervals which have a total measure of at most \(\epsilon/2\), the second one with intervals with a total measure of at most \(\epsilon/4\), and so on, i.e., we will again use the geometric series. The details are left to the reader.\(\square\)

We also note in passing that a subset of a null set is a null set and that more generally the additivity of \(\lambda\) implies \(\lambda(A)\leq\lambda(B)\) for \(A\subseteq B\) because \(B\) is the disjoint union of \(A\) und \(B\setminus A\) and thus \(\lambda(B)=\lambda(A)+\lambda(B\setminus A)\).

If \(A\) is a set with a positive measure and \(P\) is a property the elements of \(A\) can have or not have, then we say that almost all elements of \(A\) have property \(P\) if the set of numbers not having property \(P\) is a null set.

3 Normal numbers

For \(\alpha\in\mathbb{R}\) let

$$\alpha=\sum_{j=-d}^{\infty}\alpha_{j,r}r^{-j}$$

be its representation in base (or radix\(r\in\mathbb{N}\setminus\{0,1\}\). This means that all \(\alpha_{j,r}\) are digits in this base, i.e., elements of the set \(\Sigma_{r}=\{0,\dots,r-1\}\). In order to ensure uniqueness, we also require:

  1. (i)

    \(\alpha_{-d,r}\neq 0\) for \(\alpha\neq 0\) and \(d=-1\) for \(\alpha=0\)

  2. (ii)

    There is no \(k\) with \(\alpha_{j,r}=r-1\) for all \(j\geq k\).

We almost always write \(\alpha_{j}\) instead of \(\alpha_{j,r}\) if \(r\) is implicit.

For example, for \(\alpha=\pi\) and \(r=10\) we have \(d=0\), \(\alpha_{0}=3\), \(\alpha_{1}=1\), \(\alpha_{2}=4\), and so on. For \(\beta=1/2\) and \(r=2\) we have \(d=-1\), \(\beta_{1}=1\) and \(\beta_{j}=0\) for \(j> 1\). Note that the alternative representation \(d=-2\) and \(\beta_{j}=1\) for all \(j\) is forbidden by the second requirement above.Footnote 3

To make up for the fact that we do not have enough digits for bases greater than 10, we will sometimes write \([a]\) for the \(a\)-th digit. For example, \(\pi\) in base \(r=100\) will start like this:

$$\pi=[3].[14][15][92]\dots$$

This means that \(\alpha_{2}\) is \([15]\), the 15th digit in base 100.Footnote 4

We now define

$$p_{b,n,\alpha,r}=|\{j\in\mathbb{N}^{+}:j\leq n\land\alpha_{j,r}=b\}|$$
(1)

for \(b\in\{0,\dots,r\}\). This counts how often the digit \(b\) occurs among the first \(n\) digits after the radix point.Footnote 5 Again, we might omit the \(r\) (and even the \(\alpha\)).

As \(\alpha=\pi\) starts like this

$$3.14159265\textbf{3}58979\textbf{3}2\textbf{3}846264\textbf{3}\textbf{3}8\textbf{3}279502884197169\textbf{3}99\textbf{3}7510\dots$$

in base \(r=10\), we have \(p_{3,50,\pi}=8\).

Definition 1

A number \(\alpha\in\mathbb{R}\) is called simply normal in base \(r\), if

$$\lim_{n\to\infty}p_{b,n,\alpha,r}/n=1/r$$
(2)

holds for all digits \(b\in\Sigma_{r}\), i.e., if each digit occurs with the same relative frequency “in the long run.”

An example that demonstrates how this property depends on the base is the number \(1/3\) which is simply normal in base 2, but obviously not simply normal in the usual base 10.

Definition 2

\(\alpha\) is called normal in base \(r\) if \(r^{m}\alpha\) is simply normal in base \(r^{n}\) for all \(n\in\mathbb{N}^{+}\) and all \(m\in\mathbb{N}\).

So, the number \(\alpha=1/3\) from above is simply normal in base \(r=2\), but it is not normal in this base because it is not simply normal in base 4 (using \(m=0\) and \(n=2\)): in base 4 we have \(\alpha_{j}=1\) for all \(j\in\mathbb{N}^{+}\) and thus the limit in (2) is 1 for the digit \(b=1\) and 0 for the other three digits, but never \(1/4\).

Lemma 3

If \(r^{m}\alpha\) is simply normal in base  \(r^{n}\) for all \(n\in\mathbb{N}^{+}\) and all \(m<n\) , then \(\alpha\) is normal in base  \(r\) .

Proof

Let \(n\) be fixed and use \(r^{n}\) as the base. For \(m\geq n\) we find non-negative integers \(c\) and \(k\) with \(c<n\) and \(m=kn+c\), and we have

$$r^{m}\alpha=r^{c}\cdot(r^{n})^{k}\alpha.$$

The digits after the radix point of \((r^{n})^{k}\alpha\) are the same as those of \(\alpha\) beginning at the \(k\)-th digit. And thus the digits after the radix point of \(r^{m}\alpha\) are the same as those of \(r^{c}\alpha\) beginning at the \(k\)-th digit.

The reason for this is that the digits in base \(r^{n}\) are obtained by combining groups of \(n\) digits in base \(r\). Multiplication by \(r^{c}\) thus has the same effect on both sequences of digits. \(\square\)

The following example demonstrates the “shift effect” described in the proof above (for a number \(\alpha\) which is obviously not simply normal in base 10 or 1000):Footnote 6

$$\begin{aligned}\displaystyle&\displaystyle r=10\\ \displaystyle&\displaystyle n=3\\ \displaystyle&\displaystyle m=7=2\cdot n+1\\ \displaystyle&\displaystyle\alpha=0.[123]\overline{[345][42]}_{1000}=0.123\overline{345042}_{10}\\ \displaystyle&\displaystyle r^{m}\alpha=10^{7}\cdot 0.123\overline{345042}_{10}=1233450.\overline{423450}_{10}\\ \displaystyle&\displaystyle\quad\quad=[1][233][450].\overline{[423][450]}_{1000}\\ \displaystyle&\displaystyle r^{c}\alpha=10\cdot 0.123\overline{345042}_{10}=1.23\overline{345042}_{10}=[1].[233]\overline{[450][423]}_{1000}\end{aligned}$$

We will call a finite sequence \(w=b_{1}b_{2}b_{3}\dots b_{n}\) of digits in base \(r\) an \(r\)-word (or simply a word) and write \(|w|\) for its length \(n\). For the word \(\alpha_{n_{1},r}\dots\alpha_{n_{2},r}\) consisting of the digits of \(\alpha\) beginning at position \(n_{1}\) and ending at position \(n_{2}\), we will write \(\alpha_{[n_{1},n_{2}],r}\). For an arbitrary word \(w\) we define

$$p_{w,n,\alpha,r}=|\{j\in\mathbb{N}^{+}:j+|w|-1\leq n\land\alpha_{[j,j+|w|-1],r}=w\}|.$$

This number counts how often the block \(w\) of digits appears as a substring of \(\alpha_{[1,n],r}\), i.e., of the first \(n\) digits of \(\alpha\). Note that this definition agrees with (1) for words consisting of just one digit.

As an example, consider \(\alpha=0.11010111011\). We have \(p_{101,11,\alpha}=3\) which means that the word 101 occurs three times among the first 11 digits. Note that it does not matter that the first two occurrences overlap.

Lemma 4

If \(\alpha\) is normal in base  \(r\) , then we have for all \(r\) -words  \(w\)

$$\lim_{n\to\infty}p_{w,n,\alpha,r}/n=1/r^{|w|}.$$

So, if \(\alpha\) is normal in some base, then each finite sequence of digits of this base, no matter how long, appears infinitely often in the representation of \(\alpha\) and with the same frequency “in the long run” as all other sequences of the same length. Which is pretty fascinating if you think about it. Imagine the text of your favorite book stored in a computer file and viewed as a sequence of ones and zeros. If \(\alpha\) is normal in base 2, then your book will appear infinitely often in the binary representation of \(\alpha\), as will any other book—and your favorite songs as well!

One should think that such “magic” numbers are pretty rare or do not exist at all. But the whole purpose of this article is to prove that they are “normal” in the sense that numbers which do not have this strange property are extremely scarce. On the other hand, we do not know many normal numbers. The ones we do know about were “breeded” for this purpose while the numbers we deal with on a daily basis are either obviously not normal, like the rational numbers, or it is unknown whether they are normal. It is for example an open question whether \(\sqrt{2}\), \(\pi\), or \(\mathrm{e}\) are normal.

By the way, the property described in Lemma 4 is sometimes used to define normality. And it is in fact equivalent to our definition. However, proving the equivalence requires a lot of technical effort which we will forego. See [8] if you are interested.

Proving lemma 4 is not that hard, though. But instead of a formal proof (which would probably be confusing because of the notation), we will go through an example which is hopefully illuminating enough to illustrate the general idea.Footnote 7 Consider the base \(r=2\), a number \(\alpha\) normal in this base, and the word \(w=11\) consisting of two digits. Furthermore, let \(\epsilon\) be some positive real number. Because \(\alpha\) is normal in base 2, it is simply normal in base 4. That means we can find a number \(n_{1}\) such that for \(n\geq n_{1}\) approximately \(n/4\) of the first \(n\) digits in base 4 are the digit 3. Approximately here is supposed to mean that the actual number deviates from \(n/4\) by no more than \(\epsilon n\). But that implies that among the first \(2n\) digits in the base 2 representation of \(\alpha\) we will have \(n/4\pm\epsilon n\) occurrences of the word 11 (which corresponds to the digit 3 in base 4).

$$\begin{aligned} \text{Base 4:} & \quad\texttt{ 0 2 1 0 0 \underline{3} 2 \underline{3} 1 \underline{3} \underline{3} 2 0 1 0 0 1 2 2 \underline{3} 0 1 \underline{3} 2 0} \\ \text{Base 2:} & \quad\texttt{0010010000\underline{11}10\underline{11}01\underline{11}\underline{11}1000010000011010\underline{11}0001\underline{11}1000} \end{aligned}$$

And because \(\alpha\) is normal in base 2, \(2\alpha\) is also simply normal in base 4. Which entails that we can find a number \(n_{2}\) which has the same property for \(2\alpha\) that \(n_{1}\) has for \(\alpha\). And we can certainly arrange for \(n_{2}\) to be at least as big as \(n_{1}\). So, for \(n\geq n_{2}\) we will again find \(n/4\pm\epsilon n\) occurrences of the word 11, this time among the first \(2n\) digits of the base 2 representation of \(2\alpha\). But that is just the base 2 representation of \(\alpha\) shifted by one digit and so these are new occurrences we have not counted yet.

$$\begin{aligned} \text{Base 4:} & \quad\texttt{ 1 0 2 0 1 \underline{3} 1 2 \underline{3} \underline{3} \underline{3} 0 0 2 0 0 \underline{3} 1 1 2 0 \underline{3} \underline{3} 0 1} \\ \text{Base 2:} & \quad\texttt{0100100001\underline{11}0110\underline{11}\underline{11}\underline{11}0000100000\underline{11}01011000\underline{11}\underline{11}0001} \end{aligned}$$

Combined with the ones we already had we now have \(n/2\pm\epsilon 2n\) places where 11 is a substring. Apart from the possible deviation by \(\epsilon 2n\) that is one quarter of \(2n\) base 2 digits and that is what we needed to show.

The final definition is the following:

Definition 3

\(\alpha\) is called (absolutely) normal if it is normal in any integer base greater than 1.

4 The main lemma

The proof that almost all numbers are normal relies on a technical lemma which generalizes a computation from [5, p. 420 f]:

Lemma 5

If \(r\) is an integer greater than 1, then there is a positive constant  \(D\) such that the inequality

$$\sum_{p=0}^{n}\binom{n}{p}\frac{(r-1)^{n-p}}{r^{n}}\left(\frac{p}{n}-\frac{1}{r}\right)^{\!4}\leq\frac{D}{n^{2}}$$

holds for all \(n\in\mathbb{N}^{+}\) .

Proof

We fix positive integers \(s\) and \(n\) and define some functions recursively:

$$\begin{aligned}\displaystyle f_{0}(x,y)=\sum_{p=0}^{n}\binom{n}{p}x^{sp}y^{n-p}\end{aligned}$$
$$\begin{aligned}\displaystyle f_{k+1}(x,y)=x\cdot\frac{\partial}{\partial x}f_{k}(x,y)-y\cdot\frac{\partial}{\partial y}f_{k}(x,y)\quad\quad\quad(k\in\mathbb{N})\end{aligned}$$

By working with individual items in the sum, it is easy to check that

$$\begin{aligned}f_{k}(x,y)=\sum_{p=0}^{n}\binom{n}{p}\bigl((s+1)p-n\bigr)^{k}x^{sp}y^{n-p}\end{aligned}$$
(3)

holds for all \(k\).

By the binomial theorem, we know that \(f_{0}\) can also be written as

$$f_{0}(x,y)=(x^{s}+y)^{n}.$$

It is a tedious—but completely elementary—exercise to compute \(f_{4}\) based on this representation.Footnote 8 We get

$$\begin{aligned}\displaystyle f_{4}(x,y)=n(x^{s}+y)^{n-4}Q\\ \displaystyle\text{with }Q=6n^{2}(s+1)^{2}x^{s}y(sx^{s}-y)^{2}+n^{3}(sx^{s}-y)^{4}+\\ \displaystyle\quad\mathrel{\phantom{=}}(s+1)^{4}x^{s}y(x^{2s}-4x^{s}y+y^{2})+\\ \displaystyle\quad\mathrel{\phantom{=}}n(s+1)^{3}x^{s}y(7(s+1)x^{s}y-4sx^{2s}-4y^{2}).\end{aligned}$$

We now set \(r=s+1\), \(x=1/\sqrt[s]{r}\), and \(y=(r-1)/r\). This implies \(sx^{s}-y=0\) and two of the four items in the sum \(Q\) vanish. The remaining terms simplify to

$$\begin{aligned}\displaystyle f_{4}(x,y)=3(r-1)^{2}n^{2}+(r^{3}-7r^{2}+12r-6)n.\end{aligned}$$

That is a second degree polynomial in \(n\) and we thus know that

$$\begin{aligned}\displaystyle f_{4}(x,y)\leq Cn^{2}\end{aligned}$$

for some constant \(C\) independent of \(n\).

If we now replace \(f_{4}(x,y)\) with the term from (3), we get

$$\sum_{p=0}^{n}\binom{n}{p}\frac{(r-1)^{n-p}}{r^{n}}(rp-n)^{4}\leq Cn^{2}.$$

Dividing by \((rn)^{4}\) yields the inequality we are after with \(D=C/r^{4}\). \(\square\)

5 Almost all numbers are normal.

For the rest of this text, we will concentrate on numbers in the interval \([0,1)\). We fix some base \(r\geq 2\). If we look at a specific sequence of \(n\) digits, then the set of numbers starting with this sequence is an interval with Lebesgue measure \(1/r^{n}\). For example, in base 10, the set of numbers starting with the sequence 141 is the interval \([0.141,0.142)\) which includes numbers like \(\pi-3\). Its measure is \(1/1000\).

We now also fix a specific digit \(b\) of \(\Sigma_{r}\). We want to know the measure of the set of numbers that have exactly \(p\) occurrences of this digit among their first \(n\) digits. That is also easy to compute: There are \(\binom{n}{p}\) ways to pick \(p\) of the available \(n\) positions. For the remaining \(n-p\) positions we can pick any of the other \(r-1\) digits and there are \((r-1)^{n-p}\) ways to do that. And each of the sequences thus created results in an interval of length \(1/r^{n}\) disjoint from all other intervals of the same type. The measure therefore is

$$\binom{n}{p}\cdot(r-1)^{n-p}\cdot\frac{1}{r^{n}}.$$
(4)

For a positive real number \(\epsilon\) we now look at the set

$$M_{b}(n,\epsilon)=\{\alpha\in[0,1):|p_{b,n,\alpha,r}/n-1/r|\geq\epsilon\}$$

of all numbers where the relative frequency of \(b\)’s among the first \(n\) digits deviates from the “expected” value \(1/r\) by at least \(\epsilon\). With (4), we can compute the measure of this set as

$$\begin{aligned}\displaystyle\lambda(M_{b}(n,\epsilon))=\sum_{\substack{p=0\\ |p/n-1/r|\geq\epsilon}}^{n}\binom{n}{p}\frac{(r-1)^{n-p}}{r^{n}}.\end{aligned}$$

Using the constant \(D\) from Lemma 5 we get

$$\begin{aligned}\displaystyle\epsilon^{4}\cdot\lambda(M_{b}(n,\epsilon))=\sum_{\substack{p=0\\ |p/n-1/r|\geq\epsilon}}^{n}\binom{n}{p}\frac{(r-1)^{n-p}}{r^{n}}\cdot\epsilon^{4}\end{aligned}$$
$$\begin{aligned}\displaystyle\quad\quad\quad\leq\sum_{\substack{p=0\\ |p/n-1/r|\geq\epsilon}}^{n}\binom{n}{p}\frac{(r-1)^{n-p}}{r^{n}}\biggl(\frac{p}{n}-\frac{1}{r}\biggr)^{4}\leq\frac{D}{n^{2}}\end{aligned}$$

and thus

$$\begin{aligned}\lambda(M_{b}(n,\epsilon))\leq\frac{D}{\epsilon^{4}}\cdot\frac{1}{n^{2}}.\end{aligned}$$
(5)

Let \(M_{b}(\epsilon)\) be the set of numbers \(\alpha\in[0,1)\) where the relative frequency \(p_{b,n,\alpha,r}/n\) deviates from \(1/r\) by at least \(\epsilon\) for infinitely many \(n\). In other words, \(\alpha\in M_{b}(\epsilon)\) iff for each \(m\in\mathbb{N}\) there is an \(n\geq m\) such that \(\alpha\in M_{b}(n,\epsilon)\):

$$\begin{aligned}\displaystyle M_{b}(\epsilon)&\displaystyle=\bigcap_{m=1}^{\infty}S_{b}(m,\epsilon)\\ \displaystyle S_{b}(m,\epsilon)&\displaystyle=\bigcup_{n=m}^{\infty}M_{b}(n,\epsilon)\end{aligned}$$

By (5), we have

$$\lambda(S_{b}(m,\epsilon))\leq\sum_{n=m}^{\infty}\lambda(M_{b}(n,\epsilon))\leq\frac{D}{\epsilon^{4}}\sum_{n=m}^{\infty}\frac{1}{n^{2}}$$

and because the series on the right converges, the measure of \(S_{b}(m,\epsilon)\) will become arbitrarily small if \(m\) is just big enough. \(M_{b}(\epsilon)\) must therefore be a null set as it is contained in all \(S_{b}(m,\epsilon)\).

Finally, let \(M_{b}\) be the set of all numbers in \([0,1)\) that are not simply normal in base \(r\) because condition (2) is violated by at least the digit \(b\). By the definition of a limit, \(M_{b}\) will look like this

$$M_{b}=\bigcup_{k=1}^{\infty}M_{b}(1/k)$$

and as a countable union of null sets it is itself a null set by Lemma 2. The set of the elements of \([0,1)\) which are not simply normal in base \(r\) is then also a null set as it is the union of the \(r\) sets \(M_{0}\) to \(M_{r-1}\). We have just proved:

Theorem 1

If \(r\geq 2\) is an arbitrary base, then almost all numbers are simply normal in this base.Footnote 9

We are not quite done yet, but the rest is fairly easy. Let us again fix a base \(r\). If we multiply each element of \([0,1)\) by a factor \(r^{m}\) for some \(m> 0\), then the set of products is “spread” over the following intervals:

$$r^{m}\cdot[0,1)=[0,1)\cup[1,2)\cup\dots\cup[r^{m}-1,r^{m})$$

But as the set of numbers not simply normal in base \(r\) in each of these intervals is a null set (we have just proved that), their union is also a null set, again by Lemma 2.

Another application of Lemma 2 yields that the set of numbers \(\alpha\in[0,1)\) such that \(r^{m}\alpha\) is not simply normal in base \(r\) for at least one \(m\in\mathbb{N}\) is a null set. But the same argument also works for the bases \(r^{2}\), \(r^{3}\), and so on. Invoking Lemma 2 a third time we get:

Corollary 1

If \(r\geq 2\) is an arbitrary base, then almost all numbers are normal in this base.

Finally, you guessed it, we use Lemma 2 for the last time, utilizing that there are only countably many bases:

Corollary 2

Almost all numbers are absolutely normal.