Hausdorff's forgotten proof that almost all numbers are normal

In 1914, Felix Hausdorff published an elegant proof that almost all numbers are simply normal in base 2. We generalize this proof to show that almost all numbers are normal. The result is arguably the most elementary proof for this theorem so far and should be accessible to undergraduates in their first year.


Introduction
In 1909,Émile Borel [1] introduced normal numbers and proved that almost all numbers are normal. 1 Today, several different proofs of Borel's theorem exist. This includes some that are generally considered to be elementary, e.g., [6], [7], and [3].
One of the earliest proofs can be found in Felix Hausdorff's Grundzüge der Mengenlehre [5] from 1914. But Hausdorff only proved that almost all numbers are simply normal in base 2 and then claimed it would be "evident" that the statement was true for other bases as well. He didn't define normal numbers and gave no indication how to prove a stronger version of his result. As we will show in this article, Hausdorff's argument isn't hard to generalize, although the way to do it might not be totally obvious either.
Anyway, to the author's knowledge, nobody has picked up Hausdorff's elegant idea so far. [4] and [8] contain proofs which argue along similar lines but require more technical finesse and are less direct.
This article is intended to be accessible to undergraduates at the beginning of their studies and we thus won't presuppose a lot of previous knowledge except for basic combinatorics, basic calculus, and a bit of set theory (up to the definition of countable). Everything else will be defined and proved, including enough (informal) measure theory to state and prove the main theorem.

"Almost all"
The idea of a measure is to assign non-negative numbers to sets (of real numbers) in such a way that these numbers can intuitively be interpreted as the sizes of the sets. Two obviously meaningful requirements for a measure are that the empty set is assigned the measure zero and that the measure is additive: the measure of the union of two (or finitely many) sets which are mutually disjoint must be the sum of their measures. In order to be useful in analysis, measures are actually required to be σ-additive: the above must also hold for countably many sets (in which case the sum of the measures becomes a series).
The most important measure, and the one to be used in this article, is the Lebesgue measure which we'll denote with the letter λ. The Lebesgue measure of an interval of real numbers is its length, e.g., λ( [1,5] And the open interval (1,5) has the same measure. This implies that the endpoints "don't count:" finite sets like {1, 5} are null sets, their measure is zero. Generally, a set is a null set if, for every ε > 0, one can find countably many intervals such that the set is a subset of the union of these intervals and the sum of the measures of the intervals is at most ε. An important example of a null set is the set Q of rational numbers. More generally: Lemma 2.1. Every countable set is a null set.
Proof. Let A be a countable set and let (a n ) n∈N be an enumeration of A. 2 Furthermore, let ε be an arbitrary positive number. For each n ∈ N, let I n be an interval of length ε/2 n+1 which includes a n . A is then covered by the intervals I n and we have: Proof. The main idea is that, for a given ε, we'll cover the first null set with intervals which have a total measure of at most ε/2, the second one with intervals with a total measure of at most ε/4, and so on, i.e., we'll again use the geometric series. The details are left to the reader.
We also note in passing that a subset of a null set is a null set and that more generally the additivity of λ implies λ( If A is a set with a positive measure and P is a property the elements of A can have or not have, then we say that almost all elements of A have property P if the set of numbers not having property P is a null set.

Normal numbers
. This means that all α j,r are digits in this base, i.e., elements of the set Σ r = {0, . . . , r − 1}. In order to ensure uniqueness, we also require: We sometimes write α j instead of α j,r if r is implicit.
For example, for α = π and r = 10 we have d = 0, α 0 = 3, α 1 = 1, α 2 = 4, and so on. For β = 1/2 and r = 2 we have d = −1, β 1 = 1 and β j = 0 for j > 1. Note that the alternative representation d = −2 and β j = 1 for all j is forbidden by the second requirement above. 3 To make up for the fact that we don't have enough digits for bases greater than 10, we will sometimes write [a] for the a-th digit. For example, π in base r = 100 will start like this: This means that α 2 is [15], the 15th digit in base 100. 4 We now define for b ∈ {0, . . . , r}: This counts how often the digit b occurs among the first n digits after the radix point. 5 Again, we might omit the r (and even the α).
As α = π starts like this 3.14159265358979323846264338327950288419716939937510 . . . 3 We will soon see that this decision is irrelevant in the context of normality. 4 The 15th digit is usually written as F in hexadecimal notation. 5 We will usually only be concerned with digits after the radix point and will from now on in most cases omit the phrase "after the radix point." in base r = 10, we have p 3,50,π = 8.
A number α ∈ R is called simply normal in base r, if lim n→∞ p b,n,α,r /n = 1/r (2) holds for all digits b ∈ Σ r , i.e., if each digit occurs with the same relative frequency "in the long run." An example that demonstrates how this property depends on the base is the number 1/3 which is simply normal in base 2, but obviously not simply normal in the usual base 10.
α is called normal in base r if r m α is simply normal in base r n for all n ∈ N + and all m ∈ N. So, the number α = 1/3 from above is simply normal in base r = 2, but it is not normal in this base because it is not simply normal in base 4 (using m = 0 and n = 2): in base 4 we have α j = 1 for all j ∈ N + and thus the limit in (2) is 1 for the digit b = 1 and 0 for the other three digits, but never 1/4. Lemma 3.1. If r m α is simply normal in base r n for all n ∈ N + and all m < n, then α is normal in base r.
Proof. Let n be fixed and use r n as the base. For m ≥ n we find non-negative integers c and k with c < n and m = kn + c, and we have: The digits after the radix point of (r n ) k α are the same as those of α beginning at the k-th digit. And thus the digits after the radix point of r m α are the same as those of r c α beginning at the k-the digit.
The reason for this is that the digits in base r n are obtained by combining groups of n digits in base r. Multiplication with r c thus has the same effect on both sequences of digits.
The following example demonstrates the "shift effect" described in the proof above (for a number α which is obviously not simply normal in base 10 or 1000): We will call a finite sequence w = b 1 b 2 b 3 . . . b n of digits in base r an r-word (or simply a word) and write |w| for its length n. For the word α n 1 ,r . . . α n 2 ,r consisting of the digits of α beginning at position n 1 and ending at position n 2 , we'll write α [n 1 ,n 2 ],r . For an arbitrary word w we define: p w,n,α,r = |{j ∈ N + : j + |w| − 1 ≤ n ∧ α [j,j+|w|−1],r = w}| This number counts how often the block w of digits appears as a substring of α [1,n],r , i.e., of the first n digits of α. Note that this definition agrees with (1) for words consisting of just one digit.
As an example, consider α = 0.11010111011. We have p 101,11,α = 3 which means that the word 101 occurs three times among the first 11 digits. Note that it doesn't matter that the first two occurrences overlap.

Lemma 3.2. If α is normal in base r, then we have for all r-words w:
lim n→∞ p w,n,α,r /n = 1/r |w| So, if α is normal in some base, then each finite sequence of digits of this base, no matter how long, appears infinitely often in the representation of α and with the same frequency "in the long run" as all other sequences of the same length. Which is pretty fascinating if you think about it. Imagine the text of your favorite book stored in a computer file and viewed as a sequence of ones and zeros. If α is normal in base 2, then your book will appear infinitely often in the binary representation of α, as will any other book-and your favorite songs as well! One should think that such "magic" numbers are pretty rare or don't exist at all. But the whole purpose of this article is to prove that they are "normal" in the sense that numbers which don't have this strange property are extremely scarce. On the other hand, we don't know many normal numbers. The ones we do know about were "breeded" for this purpose while the numbers we deal with on a daily basis are either obviously not normal, like the rational numbers, or it is unknown whether they are normal. It is for example an open question whether √ 2, π, or e are normal. By the way, the property described in lemma 3.2 is sometimes used to define normality. And it is in fact equivalent to our definition. However, proving the equivalence requires a lot of technical effort which we'll forego. See [8] if you're interested.
Proving lemma 3.2 is not that hard, though. But instead of a formal proof (which would probably be confusing because of the notation), we'll go through an example which is hopefully illuminating enough to illustrate the general idea. 7 Consider the base r = 2, a number α normal in this base, and the word w = 11 consisting of two digits. Furthermore, let ε be some positive real number. Because α is normal in base 2, it is simply normal in base 4. That means we can find a number n 1 such that for n ≥ n 1 approximately n/4 of the first n digits in base 4 are the digit 3. Approximately here is supposed to mean that the actual number deviates from n/4 by no more than εn. But that implies that among the first 2n digits in the base 2 representation of α we will have n/4 ± εn occurrences of the word 11 (which corresponds to the digit 3 in base 4). And because α is normal in base 2, 2α is also simply normal in base 4. Which entails that we can find a number n 2 which has the same property for 2α that n 1 has for α. And we can certainly arrange for n 2 to be at least as big as n 1 . So, for n ≥ n 2 we'll again find n/4 ± εn occurrences of the word 11, this time among the first 2n digits of the base 2 representation of 2α. But that's just the base 2 representation of α shifted by one digit and so these are new occurrences we haven't counted yet. Combined with the ones we already had we now have n/2 ± ε2n places where 11 is a substring. Apart from the possible deviation by ε2n that's one quarter of 2n base 2 digits and that's what we needed to show.
The final definition is the following: α is called (absolutely) normal if it is normal in any integer base greater than 1.

The main lemma
The proof that almost all numbers are normal relies on a technical lemma which generalizes a computation from [5, p. 420 f]: By working with individual summands, it is easy to check that holds for all k.
By the binomial theorem, we know that f 0 can also be written like this: It is a tedious-but completely elementary-exercise to compute f 4 based on this representation. 8 We get: We now set r = s + 1, x = 1/ s √ r, and y = (r − 1)/r. This implies sx s − y = 0 and two of the four summands in Q vanish. The remaining terms simplify to this: That's a second degree polynomial in n and we thus know that for some constant C independent of n.
If we now replace f 4 (x, y) with the term from (3), we get: Dividing by (rn) 4 yields the inequality we're after with D = C/r 4 .

Almost all numbers are normal.
For the rest of this text, we will concentrate on numbers in the interval [0, 1). We fix some base r ≥ 2. If we look at a specific sequence of n digits, then the set of numbers starting with this sequence is an interval with Lebesgue measure 1/r n . For example, in base 10, the set of numbers starting with the By (5), we have and because the series on the right converges, the measure of S b (m, ε) will become arbitrarily small if m is just big enough. M b (ε) must therefore be a null set as it is contained in all S b (m, ε). Finally, let M b be the set of all numbers in [0, 1) that are not simply normal in base r because condition (2) is violated by at least the digit b. By the definition of a limit, M b will look like this and as a countable union of null sets it is itself a null set by lemma 2.2. The set of the elements of [0, 1) which are not simply normal in base r is then also a null set as it is the union of the r sets M 0 to M r−1 . We just proved: Theorem 5.1. If r ≥ 2 is an arbitrary base, then almost all numbers are simply normal in this base. 9 We are not quite done yet, but the rest is fairly easy. Let's again fix a base r. If we multiply each element of [0, 1) with a factor r m for some m > 0, then the set of products is "spread" over the following intervals: r m · [0, 1) = [0, 1) ∪ [1, 2) ∪ · · · ∪ [r m − 1, r m ) But as the set of numbers not simply normal in base r in each of these intervals is a null set (we just proved that), their union is also a null set, again by lemma 2.2.
Another application of lemma 2.2 yields that the set of numbers α ∈ [0, 1) such that r m α is not simply normal in base r for at least one m ∈ N is a null set. But the same argument also works for the bases r 2 , r 3 , and so on. Invoking lemma 2.2 a third time we get: Corollary 5.2. If r ≥ 2 is an arbitrary base, then almost all numbers are normal in this base.
Finally, you guessed it, we use lemma 2.2 for the last time, utilizing that there are only countably many bases: Corollary 5.3. Almost all numbers are absolutely normal. 9 We can drop the restriction to the interval [0, 1) if we want. The proof obviously works just as well for any interval [m, m + 1) where m is an integer and R is the countable union of such intervals.