Statistical independence in mathematics–the key to a Gaussian law

In this manuscript we discuss the notion of (statistical) independence embedded in its historical context. We focus in particular on its appearance and role in number theory, concomitantly exploring the intimate connection of independence and the famous Gaussian law of errors. As we shall see, this at times requires us to go adrift from the celebrated Kolmogorov axioms, which give the appearance of being ultimate ever since they have been introduced in the 1930s. While these insights are known to many a mathematician, we feel it is time for both a reminder and renewed awareness. Among other things, we present the independence of the coefficients in a binary expansion together with a central limit theorem for the sum-of-digits function as well as the independence of divisibility by primes and the resulting, famous central limit theorem of Paul Erdős and Mark Kac on the number of different prime factors of a number n∈N\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n\in{\mathbb{N}}$$\end{document}. We shall also present some of the (modern) developments in the framework of lacunary series that have its origin in a work of Raphaël Salem and Antoni Zygmund.


Introduction
One of the most famous graphs, not only among mathematicians and scientists, is the probability density function of the (standard) normal distribution (see Fig. 1), which has adorned the 10 Mark note of the former German currency for many years. Although already taking a central role in a work of Abraham de Moivre (26. May 1667 in Vitry-le-Francois; 27. November 1754 in London) from 1718, this curve only earned its enduring fame through the work of famous German mathematician Carl Friedrich Gauß (30. April 1777 in Braunschweig; 23. February 1855 in Göttingen), who used it in the approximation of orbits by ellipsoids when developing the least squares method, nowadays a standard approach in regression analysis. More precisely, Gauß conceived this method to master the random errors, i.e., those which fluctuate due to the unpredictability or uncertainty inherent in the measuring process, that occur when one tries to measure orbits of celestial bodies. The strength of this method became apparent when he used it to predict the future location of the newly discovered asteroid Ceres. Ever since, this curve seems to be the key to the mysterious world of chance and still the myth holds on that wherever this curve appears, randomness is at play.
With this article we seek to address mathematicians as well as a mathematically educated audience alike. One can say that the goal of this manuscript is 3-fold. First, for those less familiar with it we want to undo the fetters that connect chance and the Gaussian curve so onesidedly. Second, we want to recall the deep and intimate connection of the notion of statistical independence and the Gaussian law of errors beyond classical probability theory, which, thirdly, demonstrates that occasionally one is obliged to step aside from its seemingly ultimate form in terms of the Kolmogorov axioms and work with notions having its roots in earlier foundations of probability theory.
To achieve this goal we shall, partially embedded in a historic context, present and discuss several results from mathematics where, once an appropriate form of statistical independence has been established, the Gaussian curve emerges naturally. In more modern language this means that central limit theorems describe the fluctuations of mathematical quantities in different contexts. Our focus shall be on results that nowadays are considered to be part of probabilistic number theory. At the very heart of this development lies the true comprehension and appreciation of independence by Polish mathematician Mark Kac (3. August 1914 in Kremenez;26. October 1984 in California). His pioneering works and insights, especially his  (26. March 1913 in Budapest;20. September 1996 in Warsaw), have revolutionized our understanding and formed the development of probabilistic number theory for many years with lasting influence. We refer the reader to [10,11,49,50] for general literature on the subject.

The classical central limit theorems and independence-a refresher
In this section we start with two fundamental results of probability theory and the notion of independence. These considerations form the starting point for future deliberations.

The notion of independence
Independence is one of the central notions in probability theory. It is hard to imagine today that this, for us so seemingly elementary and simple concept, has only been used vaguely and intuitively for hundreds of years without a formal definition underlying this notion. Implicitly this concept can be traced back to the works of Jakob Bernoulli (6. January 1655 in Basel; 16. August 1705 in Basel) and evolved in the capable hands of Abraham de Moivre. In his famous oeuvre "The Doctrine of Chances" [15] he wrote: "...if a Fraction expresses the Probability of an Event, and another Fraction the Probability of another Event, and those two Events are independent; the Probability that both those Events will Happen, will be the Product of those Fractions." It is to be noted that, even though this definition matches the modern one, neither the notion "Probability" nor "Event" had been introduced in an axiomatic way. It seems that the first formal definition of independence goes back to the year 1900 and the work [12] of German mathematician Georg Bohlmann (23. April 1869 in Berlin; 25. April 1928 in Berlin) 1 . In fact, long before Andrei Nikolajewitsch Kolmogorov (25. April 1903 in Tambow;20. October 1987 in Moscow) proposed his axioms that today form the foundation of probability theory, Bohlmann had presented an axiomatization-but without asking for -additivity. For a detailed exposition of the historical development and the work of Bohlmann, we refer the reader to an article of Ulrich Krengel [37].
Roughly speaking, two events are considered to be independent if the occurrence of one does not affect the probability of occurrence of the other, see also Remark 3. We now continue with the formal definition of independence as it is used today. Let . ; A; P/ be a probability space consisting of a non-empty set (the sample space), 1 It was decades later that Hugo Steinhaus and Mark Kac rediscovered this concept independently of the other mathematicians [34]. They were unaware of the previous works. a -Algebra (the set of events) on , and a probability measure P W A ! OE0,1. We then say that two events A; B 2 A are (statistically) independent if and only if POEA \ B D POEA POEB : In other words two events are independent if their joint probability equals the product of their probabilities. This extends to any collection .A i / i 2I of events, which is said to be independent if and only if for every n 2 N, n 2, and all subsets J Â I of cardinality n, It is important to note that in this case we ask for way more than just and still much more than pairwise independence. Consequently, we also have to verify much more: the number of conditions to be verified to show that n given events are independent is exactly Having this notion of independence at hand, we define independent random variables. If X W ! R and Y W ! R are two random variables, then we say they are independent if and only if for all measurable subsets A; B Â R, We use the standard notation fX 2 Ag for f! 2 W X.!/ 2 Ag, POEX 2 A for POEfX 2 Ag, and POEX 2 A; Y 2 B for POEfX 2 Ag \ fY 2 Bg.
This means that the random variables X and Y are independent if and only if for all measurable subsets A; B Â R the events fX 2 Ag 2 A and fY 2 Bg 2 A are independent. Again, a sequence X 1 ; X 2 ; ::: W ! R of random variables is said to be independent if and only if for every n 2 N, n 2, any subset I Â N of cardinality n, and all measurable sets

The central limit theorems of de Moivre-Laplace and Lindeberg
The history of the central limit theorem starts with the work of French mathematician Abraham de Moivre, who, around the year 1730, proved a central limit K theorem for standardized sums of independent random variables following a symmetric Bernoulli distribution [16]. 2 [20] of Peter Eichelsbacher and Matthias Löwe. For an exhaustive presentation on the history of the central limit theorem we warmly recommend the monograph of Hans Fischer [24]. Let us start with the classical central limit theorem of de Moivre, hence restricting ourselves to the symmetric case p D 1 2 in the Bernoulli distribution.
Theorem 1 (De Moivre, 1730) Let X 1 ; X 2 ; X 3 ; ::: be a sequence of independent random variables with a symmetric Bernoulli distribution. Then, for all a; b 2 R with a < b, we have The theorem of de Moivre, when discussed in school for instance, can be nicely depicted using the Galton Board (also known as bean machine). Let us consider the experiment of throwing an ideal and fair coin n-times (i.e., head shows up with probability 1=2). The single throws are regarded to be independent as none of them influences the other. The number k of heads showing up in that experiment is a number between 0 and n. The probability that we see heads exactly k-times is described by a binomial distribution. Now de Moivre's theorem says that, for a large number n of tosses tending to infinity, the form of a suitably standardized histogramm approaches the Gaussian curve.
We have already mentioned at the beginning of this section that under suitable conditions a central limit theorem for general independent random variables may be obtained, not only those describing or modeling a coin toss.
We formulate Lindeberg's central limit theorem. In what follows, we shall denote by 1 A the indicator function of the set A, i.e., 1 A .x/ 2 f0; 1g with 1 A .x/ D 1 if and only if x 2 A. The expectation of a random variable X with respect to the probability measure P is defined as EOEX WD R XdP, if this integral is defined. X 2 A random variable X is Bernoulli distributed if and only if P.X D 0/ C P.X D 1/ D 1. Here p D P.X D 1/ is the parameter of the Bernoulli distribution and in the case where p D 1 2 , we call the distribution "symmetric". In his paper de Moivre did not call them Bernoulli random variables, but spoke of the probability distribution of the number of heads in coin toss. is called centered if and only if EOEX D 0. If EOEjXj < 1 we define the variance by VarOEX WD E .X EOEX/ 2 . Theorem 2 (Lindeberg CLT, 1922) Let X 1 ; X 2 ; X 3 ; ::: be a sequence of independent, centered, and square integrable random variables. Assume that for each " 2 .0; 1/, ! 0 .Lindeberg condition/; Corollary 1 Let X 1 ; X 2 ; X 3 ; ::: be a sequence of independent and identically distributed random variables with EOEX 1 D 0 and VarOEX 1 D 2 2 .0; 1/. Then, for all a; b 2 R with a < b, we have One thing we immediately notice in the general version of Lindeberg's central limit theorem is the universality towards the underlying distribution of the random variables. Hence, the distribution seems to be irrelevant. On the other hand, in both the central limit theorem of de Moivre and the one of Lindeberg, we require the random variables to be independent. Could it be that independence is the key to a Gaussian law of errors? If so, does this connection go deeper and beyond a purely probabilistic framework? In the remaining parts of this work we want to get to the bottom of those questions.

Binary expansion and independence
In this section we will present a first example which a priori is non probabilistic. It has to do with intervals corresponding to binary expansions of real numbers x 2 OE0,1 and a corresponding product rule for their lengths.
For simplicity, we start by reminding the reader of the decimal expansion of a number x 2 OE0,1/. One can prove that each number x 2 OE0,1/ has a non-terminating and unique decimal expansion (see, e.g., [8]). For example, x/ 10 3 C ::: : Analogous to the decimal expansion, each number x 2 OE0,1/ has a binary expansion (also known as dyadic expansion), i.e., there are unique numbers b 1 .x/; b 2 .x/; b 3 .x/; ::: in the set f0,1g such that For instance, we can write K To guarantee uniqueness in the expansion, we agree to write the expansion in such a way that infinitely many of the binary digits are zero. As already indicated by the way we write it, the binary digits are functions in the variable we denoted by x, i.e., b k W OE0,1/ ! f0,1g; Sometimes these functions are called Rademacher functions, although Hans Rademacher (3. April 1892 in Wandsbek; 7. February 1969 in Haverford) defined a slightly different version [45]. The value that b k takes at x not only provides information about the k-th binary digit of x, but also about x itself. Obviously, if b 1 .x/ D 1, then More generally, if we define for each k 2 N the set These considerations yield the following: if n 2 N, k 1 ; :::; k n 2 N, and " 1 ; :::" n 2 f0,1g, then where denotes the 1-dimensional Lebesgue measure (which in this case simply assigns the length to an interval). This implies that the binary coefficients as functions in x 2 OE0,1/, are independent; a result seemingly discovered by French mathematician Émile Borel (7. January 1871 in Saint-Affrique; 3. February 1956 in Paris) in 1909 [13]. In particular, the random variables X k D b k satisfy the assumptions of de Moivre's theorem (Theorem 1) and so we obtain a central limit theorem for binary expansions b k . Probability in the sense of coin tosses or events has not played any role in our arguments. (Nevertheless, technically the X k 's are bona-fide random variables on the probability space OE0,1/; B.OE0,1//; /.)

Prime factors and independence
We shall now consider a fundamentally different example of independence in mathematics. Take a sufficiently large natural number N 2 N. We note that roughly half of the numbers between 1 and N are divisible by the prime number 2, namely 2; 4; 6 and so on. In the same way, roughly one third of the numbers between 1 and N are divisible by the prime number 3, namely 3; 6; 9 and so on. If we now consider the numbers between 1 and N which are divisible by 6, then this is again roughly one sixth. However, divisibility by 6 is equivalent to both divisibility by 2 and 3 and we can write this as for the corresponding fractions of numbers between 1 and N . But this reminds us of the multiplication of probabilities-as occurring in the concept of independence! Of course, the same argument applies for divisibility by general distinct primes p and q as well as by any finite number of primes. We can say, in this sense, that divisibility of a number by distinct primes is independent.
Apparently, every second natural number is divisible by 2, so that the numbers with this property constitute one half of all natural numbers. One could thus think that a randomly chosen natural number is divisible by 2 with probability 1 2 . In the same way, this number would be divisible by 3 with probability 1 3 , and an analog statement would hold for divisibility by every natural number.
It turns out that this notion, although intuitive, is incompatible with Kolmogorov's concept of probability in that no probability measure on the naturals with the above property exists (And, as a consequence, it is impossible to define a uniform measure on any countably infinite set).
To see this, define, for every pair of numbers n; k with n 2 N and k 2 f1; :::; ng the set A n;k WD fj n C k W j 2 N [ f0gg. For k ¤ n, A n;k consists of all natural numbers which yield remainder k after division by n, while for k D n we have A n;k D A n;n , which is the set of all natural numbers that are divisible by n. We denote by P.N/ the set of all subsets of N.

Lemma 1
Let be a finite measure on the set P.N/, which satisfies for every prime number p and every k 2 f1; :::; pg. Then .fmg/ D 0 for every m 2 N, and therefore .A/ D 0 for all A Â N.
Proof First note that (2)  Since .N/ < 1 by assumption, and since there are arbitrarily large primes, it follows that .fmg/ D 0. But since is a measure, and thus is -additive, we get .A/ D P m2A .fmg/ D 0 for every A Â N. So there exists no measure on P.N/ having the desired property (2). But could it be that we have chosen the domain of too large? The next proposition shows that there is no smaller domain containing all A p;k .

Proposition 1 We have
˚A p;k W p prime; k 2 f1; :::; pg Proof We define the set † WD ˚A p;k W p prime; k 2 f1; :::; pg

Remark 2
Eq. (2) in Lemma 1 formalizes our earlier intuition that if .A p;p / is the fraction of numbers divisible by p then this should equal the fraction of numbers giving remainder 1 and so on. The Lemma shows us that there cannot be a non-trivial finite measure with this property and therefore we cannot assign meaningful probabilities to those subsets in the framework of Kolmogorov's theory. In contrast to the independence of distinct binary digits of a number in OE0,1/, we cannot cover the independence of divisibility by distinct primes of a number in N using Kolmogorov's notion of independence of random variables.

Relative Measures
A possible remedy is a notion related to one of the earlier approaches to probability theory going back at least to Richard von Mises (19. April 1883 in Lviv; 14. Juli 1953 in Boston) and can be found in early work of Kac and Steinhaus. However, we were unable to trace the original source. In any case, this approach has to a large extent been replaced by Kolmogorov's axiomatization of probability.
One of the central notions in this manuscript shall be referred to as relative measure and its definition and properties be discussed in the following section.

Definition 1 (Relatively measurable subsets of N and relative measure)
We say that a subset A Â N is relatively measurable if and only if the limit lim N !1 jA \ f1; :::; N gj N ; exists. In that case we define the relative measure R of A as exactly this limit, jA \ f1; :::; N gj N : It is easy to see that the collection of relatively measurable subsets of N forms an algebra and that R is a non-negative and (finitely-)additive set function on it. Moreover, it is obvious that every finite subset of N is relatively measurable with relative measure 0.
The sets A n;k , n 2 N and k 2 f0; :::; n 1g defined in Sect. 2.4 are relatively measurable with It is a direct consequence of Lemma 1 that R cannot be -additive. Indeed, On the other hand, we can construct sets which are not relatively measurable.
Example 1 Let a 1 D 0 and define Consider the level set A WD fk 2 N W a k D 1g. Then A is not relatively measurable because 2 .2mC2/ jA \ f1; :::; 2 2mC2 gj D 2 .2mC2/ 2.1 C 2 2 C ::: The relative measure allows us to conceive and show the independence of divisibility by different primes in a formal way. In this regard this notion is superior to a measure in the sense of Kolmogorov. We are now going to prove the indepen-dence of A p;p and A q;q for different primes p and q. By the fundamental theorem of arithmetic a number is divisible by p as well as q if and only if it is divisible by their product pq, and so A p;p \ A q;q D A pq;pq . Therefore, we obtain which is the product rule so characteristic for independence. Similarly, one can show this property for each finite collection of different primes p 1 ; :::; p m .
The following lemma shows that if the indicator function of a subset of the natural numbers is eventually periodic, then the relative measure of that set is equal to the average over the period. We shall leave the proof to the reader.

Lemma 2 Consider a set
then A is relatively measurable and R .A/ D jA \ fn 0 C 1; :::; n 0 C kgj k :

Remark 3 (Independence and information)
One important property of statistical independence is that knowledge of one event, say B, does not present any information about an independent event A: for independent A; B we have P.AjB/ D P.A/.
A similar situation occurs with numbers: knowledge about divisibility by one prime does not tell us anything about divisibility by another one. This holds also true for the digits considered earlier: if we know the k-th digit of a number x 2 OE0,1/ this does not tell us anything about its`-th digit.
Consider now, for every j 2 N the functionˇj W N ! f0,1g defined by such thatˇj .n/ is the j -th binary digit of n, and .n/ 2 j 1 : To every j 2 N assign the set B j WD fn 2 N Wˇj .n/ D 1g, i.e. the set of all natural numbers for which the j -th binary digit equals 1. It follows from the definition of binary digits that for each j 2 N f2 j m; :::; 2 j m C 2 j 1 1g ; which can be proven using Lemma 2.
Definition 2 Let .A j / j 2J be a family of relatively measurable subsets of N. We say that .A j / j 2J are independent if and only if for every m 2 N and every subset I of cardinality m Summarizing the preceding thoughts, we obtain the following result.

Proposition 2
1. For n 2 N and k 2 f1; :::; ng , let A n;k WD fj n C k W j 2 N [ f0gg . Then the family A p;p p2N;p prime is independent.
Then the family B j j 2N is independent.
It is quite interesting that similar results to the ones for expansions of real numbers in OE0,1/ with respect to the Lebesgue measure can be obtained for the expansion of natural numbers with respect to the relative measure on N.

Relatively measurable sequences and their distribution
In this subsection we shall introduce the notion of a relatively measurable sequence and, in broad similarity to the way independence is defined in the sense of Kolmogorov, we introduce the notion of relatively independent sequences x; y W N ! R and define a distribution function with respect to relative measures. As we shall see, such a distribution function does not possess all the properties that-coming from probability theory-we might expect it to have.

Definition 3 (Relatively measurable sequence)
A sequence x W N ! R is said to be relatively measurable if and only if the pre-image We shall now introduce what it means for two sequences to be independent with respect to a relative measure. This is again done via a product rule.

Definition 4 (Independent sequences)
Two relatively measurable sequences x; y W N ! R are said to be R -independent if and only if for any two intervals I; J Â R we have This definition can be generalized in an obvious way to any finite number of relatively measurable sequences.
We now turn to the definition of a (relative) distribution function of a relatively measurable sequence.

Definition 5 (Distribution function)
Let x W N ! R be a relatively measurable sequence. Then the function By its very definition such a distribution function resembles a classical distribution function we know from probability theory. In particular, it is immediately clear that it is non-decreasing. However, in general not all properties we may expect from a relative distribution function have to hold.

Example 2
Consider the sequence x W N ! R given by Then it is easy to see that x is relatively measurable and that its relative distribution function is given by Hence, F x is neither left nor right continuous, and we have lim z! 1 F x .z/ > 0 and lim z! 1 F x .z/ < 1 .
Note however that for every bounded relatively measurable sequence x lim z! 1 F x .z/ D 0 and lim z! 1 F x .z/ D 1 .

K
Next we introduce and study the notion of an average of a relatively measurable sequence.
Definition 6 (Relative average) Let x W N ! R be a relatively measurable sequence. Then we define the relative average of x by whenever this limit exists.
The following theorem shows that the relative average of a relatively measurable and bounded sequence can be written in terms of a Stieltjes integral with respect to the relative distribution function. We observe that Similarly one can show that liminf N !1 1 N P N nD1 x n U.id ; F; Z/, which then proves the assertion.
It follows from the properties of Riemann-Stieltjes integrals that whenever F x is differentiable on R with F 0 x D f x outside some at most finite subset of R.

Remark 4
We see that measurable sequences behave in many ways like random variables, and indeed a measurable sequence can be taken as a mathematical model for a "random number". As noted before, this kind of model has been put forward by Austrian mathematician Richard von Mises in the first half of the 20th century. This model was-at least among the vast majority of probabilists-replaced by Kolmogorov's approach, mainly because of the potent tools from Lebesgue's measure theory and the accompanied clean and simple concepts and theorems of convergence.
Nevertheless there is a certain appeal to the alternative, in particular its slim theoretical foundation. Within this approach one can simply state that a real number is a Cauchy sequence of rational numbers and a random number is a relatively measurable sequence of real numbers.
We now assign to every Z-valued and relatively measurable sequence x a function x W Z ! OE0,1 via x .k/ WD R fn 2 N W x n D kg : Then for bounded, Z-valued and relatively measurable sequences we have P k2Z x .k/ = 1 and the well-known convolution formula: Proposition 3 Let x; y W N ! R be bounded and relatively measurable sequences taking values in Z. If x and y are R -independent, then xCy D x y , where x y .k/ WD X j 2Z x .j / y .k j / ; k 2 Z : All in all, we can say that relatively measurable sequences behave in many ways like random variables. For instance, the indicator functions of the sets B j introduced after Remark 3 form an independent, relatively measurable, bounded, and Z-valued sequence. Therefore, their sums satisfy K This means that the partial sums of the indicator functions of the sets B j satisfy the central limit theorem of de Moivre (Theorem 1), i.e., for any a; b 2 R with a < b, Note again that the set considered above is indeed relatively measurable. To see this, we note that, as was argued before, the sets B j are all relatively measurable and hence, because the collection of relatively measurable sets forms an algebra, so are their complements B c j . This immediately implies that the sequences .1 B j .n// n2N and their finite sums are relatively measurable.
Thus, for the binary expansion of natural numbers we have the same central limit theorem as for the binary expansion of real numbers in OE0,1/. In fact, we can now formulate a quite interesting version of this, which can be found, for example, in [18]. Contrary to almost all numbers in OE0,1/, every natural number has a finite expansion and hence it is reasonable to define for n 2 N its sum-of-digits function with respect to the binary expansion, The following result describes the Gaussian fluctuations of the sum-of-digits function.

Theorem 2 (Central limit theorem for the sum-of-digits function) For all
We recall the following lemma from probability theory.

Lemma 3
Let F W R ! OE0,1 be a continuous cumulative distribution function and let .F n / n2N be a sequence of non-decreasing functions F n W R ! OE0,1 with lim n!1 F n .x/ D F .x/ for all x 2 R. Then F n ! F uniformly on R.
We are now able to prove the central limit theorem for the sum-of-digits function.

Proof (Proof of Theorem 2)
Let " 2 .0; 1/. For b 2 R let us writeˆ.b/ WD x 2 2 dx. It follows from de Moivre's central limit theorem (see Eq. (6)) and Lemma 3 that there exists m 0 2 N such that for all m m 0 and every b 2 R, Moreover, for each m m 0 , we havě Now let`2 N with 2 `< " 3 and j 2 f1; :::; 2`g. For every m `C m 0 , K and in the same way, 1 j 2 m `ˇn 2 m Ä n < 2 m C j 2 m `W s 2 .n/ Ä b Note that this equation holds in particular for j D 2`, so that Now let N > 2 m 1 3 " , and let m D blog 2 .N /c. Then 2 m C .j 1/2 m `Ä N < 2 m C j 2 m `f or some j 2 f1; :::; 2`g. Then, where we have used that since 2 `< " 3 , we also have 2 m `1 N Ä 2 m `1 2 m < " 3 . In the same way we get which proves the result.

Uniform distribution mod 1 and Weyl's theorem
In this section we address a famous theorem of Hermann Weyl (9. November 1885 in Elmshorn; 8. December 1955 in Zürich). Before we start, let us remind the reader that the fractional part of a number x 2 R is defined as If we are given a sequence x W N ! R and a set B Â OE0,1/, then we define another set by setting The sequence x D .x n / n2N is said to be uniformly distributed modulo 1 (we simply write mod 1) if and only if for all a; b 2 R with 0 Ä a < b Ä 1, we have In particular, this means that for each uniformly distributed sequence .x n / n2N the sequence .fx n g/ n2N is relatively measurable. Weyl's theorem [51,52], also known as Weyl's criterion, says that a sequence .x n / n2N of real numbers is uniformly distributed mod 1 if and only if for every h 2 Z n f0g the following condition is satisfied, In an extended and multivariate version this theorem reads as follows. An important consequence is that for each˛2 R the sequence .n˛/ n2N is uniformly distributed mod 1 if and only if˛is irrational, and that for˛1; :::;˛m 2 R the sequences f˛1ng n2N ; :::; f˛mng n2N are uniformly distributed mod 1 and Rindependent if and only if 1;˛1; :::;˛m are linearly independent over Q.

Remark 5
Theorem 3 is also of practical interest, as it provides us with a method for numerical integration of a Riemann integrable function on OE0,1 m . Note that, if we only know that the coordinate sequences are uniformly distributed mod 1 and R -independent, we cannot say anything about the speed of convergence of the sums towards the integral.
The concept of discrepany of a sequence measures the speed with which a sequence in OE0,1/ m approaches the uniform distribution on OE0,1/ m . Sequences with a "high" speed of convergence are informally called low-discrepancy sequences and give rise to a class of numerical integration algorithms called quasi-Monte Carlo methods. For more information about these sequences and algorithms see [17,19,38,40].

Definition 7 (Finitely measurable function)
We say that a function g W I ! R is finitely measurable if and only if the pre-image of each interval J R under g can be written as the union of finitely many subintervals, i.e., there exists k 2 N and subintervals I 1 ; :::; I k of I such that The previous result, whose proof is left to the reader, has the following interesting corollary. Corollary 2 Let 1;˛1; :::;˛m 2 R be linearly independent over Q. Then the sequences cos.2 ˛1n/ n2N ; :::; cos.2 ˛mn/ n2N are relatively measurable and Rindependent.
Proof We have already concluded, as a consequence of Weyl's theorem, that the sequences f˛1ng n2N ; :::; f˛mng n2N are uniformly distributed mod 1 and Rindependent. Hence, by Proposition 4 the sequences cos.2 f˛1ng/ n2N ; :::; cos.2 f˛mng/ n2N are R -independent as well and thus the sequences cos.2 ˛1n/ n2N ; :::; cos.2 ˛mn/ n2N : Proposition 5 Let x; y W N ! R be bounded and relatively measurable sequences with continuous and increasing distribution functions F x and F y respectively. If x and y are R -independent, then the distribution function F xCy of x C y is given by the convolution of F x and F y , i.e.,

Proof
It is comparably easy to see that the sequences .F x .x n // n2N and .F y .y n // n2N are uniformly distributed mod 1. Proposition 4 implies that they are R -independent. Observe that the restriction of F x to the closure of ft 2 R W F x .t/ 2 .0,1/g is continuous and increasing and therefore has an inverse, which we denote by G x . Denote by G y the corresponding inverse function of F y . We have where we have used in . / that .F x .x n // n2N and .F y .y n // n2N are uniformly distributed mod 1 and independent.
If we consider, for instance, the sequence x D cos.2 ˛n/ n2N with irrational , then, since .˛n/ n2N is uniformly distributed mod 1, This means that the distribution function of the sequence cos.2 ˛1n/ C ::: C cos.2 ˛mn/ Á n2N is given by F m x . Therefore, we obtain a central limit theorem for partial sums of cosines with linearly independent frequencies, i.e., with 1;˛1;˛2; ::: linearly independent over Q, The deliberations of the previous subsection can quite effortlessly be lifted to a continuous setting. A continuous version of a relative measure on Lebesgue measurable subsets of R can be defined as the limit if it exists. In analogy to the case of sequences, one obtains a continuous version of Weyl's theorem (see also [38,Chap. 9]) and thus the independence of functions of uniformly distributed functions. An example is again given by the cosines with lin- The original approach to this result is, as we find, more complicated and can be found in [34]. The latter is presented in a more accessible way in [32,Chap. 3].

The Erdős-Kac Theorem
This section is devoted to a famous theorem of Paul Erdős and Mark Kac. One can say that this result marks the birth of what is today known as probabilistic number theory. The close link between probability theory and number theory illustrated by this theorem can hardly be overrated and turned out to be extremely fruitful. We shall start with the original heuristics of Mark Kac, which led him to conjecture the result he later proved together with Paul Erdős.

Heuristics-Independence & CLT
A guiding idea of Mark Kac has been that if there is some sort of independence, then there is the Gaussian law of errors at play. Exactly this maxim underlies the Erdős-Kac theorem. The object of interest is the number of different prime factors of a given number.
Let us consider the following indicator functions. For each prime number p and every n 2 N, we define Given a natural number n 2 N, we denote by !.n/ the number of different prime factors of n. The indicator functions allow us to express !.n/ as follows, !.n/ D X p prime I p .n/ : From Sect. 2.4 we already know that this collection of indicator functions is Rindependent. We now want to provide a plausibility argument, and here we follow Mark Kac's original heuristics, that suggests these indicator functions also satisfy Lindeberg's condition. In analogy to the central limit theorem of Lindeberg, this suggests that the properly normalized sum of indicator functions follows a Gaussian law of errors. For this we note first that for all x 2 R with x 2 we have which appears as Lemma 1 in [22]. This means that lim m!1 lim N !1 1 Nˇn n 2 f1; ::: If one could show that the two limits may be taken simultaneously, then we would obtain lim N !1 1 Nˇn n 2 f1; ::: Together with the (proper) asymptotics for ! N .n/; c N ; d N , this would give lim N !1 1 Nˇn n 2 f1; ::: Of course, this is merely a heuristic argument, not a proof. In any case, the heuristic and conjecture just presented leads us in the following subsection to the ingenious and famous central limit theorem of Erdős-Kac [22].

The CLT of Erdős-Kac
After having presented the heuristic of Mark Kac, let us tell the anecdote about the origin of the Erdős-Kac theorem as described by Mark Kac himself in his autobiography [33]. When once asked about their famous result, Mark Kac replied the following (see [14] and [33]): "It took what looks now like a miraculous confluence of circumstances to produce our result.... It would not have been enough, certainly not in 1939, to bring a number theorist and a probabilist together. It had to be Erdős and me: Erdős because he was almost unique in his knowledge and understanding of the number theoretic method of Viggo Brun,... and me because I could see independence and the normal law through the eyes of Steinhaus." We will now formulate the central limit theorem of Erdős and Kac.
Theorem 1 (Erdős-Kac, 1940) Let a; b 2 R with a < b. Then lim N !1 1 Nˇ n 2 f1; ::: In other words, for large N 2 N the proportion of natural numbers in the set f1; :::; N g for which the suitably normalized number of different prime factors is between a and b is close to a Gaussian integral from a to b. In short: the number of prime factors of a large, suitably normalized number follow a Gaussian curve.
Providing a formal proof for Theorem 1 would go beyond the scope of this paper. The original argument of Erdős and Kac use number theoretic methods of sieve theory (more precisely Brun's sieve). Another proof is due to Alfréd Rényi (20. March 1921  This means that for large N 2 N if we pick a number n 2 f1; :::; N g at random (with respect to the uniform distribution), then the number !.n/ of different prime factors is of order ln ln N .

Remark 6
Even though Pál Turán already noticed that the result of Hardy and Ramanujan can be obtained from an inequality for the second moment of !.n/ together with an application of Chebychev's inequality [9], one can say that the Erdős-Kac Theorem marks the beginning of probabilistic number theory. Also the work [23] of Paul Erdős and Aurel Wintner (8. April 1903 in Budapest; 15. January 1958 in Baltimore) has been one of the pioneering contributions to this complex of problems.
We close this section with the statement of a corollary that gives a different version of the Erdős-Kac theorem, in which N in the loglog terms is replaced by n, which looks more natural in our setup, because it directly states that the distribution function of the sequence !.n/ ln ln n p ln ln n Á n2N is that of the standard normal one.
which completes the proof. A similar calculation shows that the two formulations of the Erdős-Kac theorem are actually equivalent.

Some complementary considerations-The case of lacunary series
What we have seen so far shows the power of the concept of relative measure in number theory and how it can naturally (in large parts along the lines of classical probability theory) lead us to central limit theorems for number theoretic quantities, even where the axiomatic framework of Kolmogorov is not applicable. On the other hand, we have seen, when studying binary expansions, that Kolmogorov's theory is a powerful tool as well and allows us to obtain information about the Gaussian fluctuations of number theoretic quantities. A common spirit of both, and eventually a key to a Gaussian law, has always been a notion of independence.
In what follows, we complement the previous considerations by showing that lacunary series, for instance those that are formed with functions cos.2 n k / W OE0,1 ! R and quickly increasing gap sequence .n k / k2N , behave in many ways like independent random variables, and that this almost-independence or weak form of independence may still lead to fascinating results within the axiomatic theory of Kolmogorov.
Already in Sect. 2.3 on binary expansions we noted that Hans Rademacher introduced in [45] what is known today as Rademacher functions. Those functions are defined in the following way, where for x 2 R, Rademacher studied the convergence behavior of series 1 X kD1 a k r k .t/; t 2 OE0,1; .a k / 1 kD1 2 R N ; and proved that such series converge for almost all t 2 OE0,1 if The necessity of square integrability was obtained by Alexander Khintchine ( May 1992 in Chicago) and others studied the convergence behavior of trigonometric series 1 X kD1 a k cos.2 n k t/; t 2 OE0,1; .a k / 1 kD1 2 R N ; where the sequence .n k / 1 kD1 satisfies the Hadamard gap condition n kC1 n k > q > 1 for all k 2 N (see [7,36,44,53]). For such series one can obtain results similar to those for Rademacher series (8). Kolmogorov could prove in [36] that the square summability condition (9) is also sufficient for almost everywhere convergence of lacunary series. The necessity of (9) has been shown by Zygmund in [53]. An important analogy between Rademacher series and lacunary series, in particular in view of our article, remained unnoticed for a long time. In Sect. 2.3 we proved that the Rademacher functions (more precisely a version of them) are independent. In particular, given any sequence .a k / 1 kD1 of real numbers, the functions a k r k , k 2 N are independent (but no longer identically distributed), and we have for all k 2 N that EOEa k r k D 0 and V arOEa k r k D a 2 k : Using the notation from Lindeberg's theorem (see Theorem 2), we see that For Lindeberg's condition to be satisfied, we require the right-hand side to converge to 0 as n ! 1. A moment's thought, however, reveals that this is the case whenever Therefore, under condition (12), we obtain that, for all t 2 R, For sequences with very large gaps, i.e., those satisfying the stronger condition n kC1 n k k!1 ! C1 ; such a central limit theorem had been obtained in 1939 by Mark Kac in [29].
Around the same time as Salem and Zygmund, Mark Kac [30] (see also [31,32] and the references therein) obtained a central limit theorem for functions f W R ! R of bounded variation on OE0,1 satisfying This already indicates that the functions f .2 k /, k 2 N do not behave like independent random variables. In fact, in that case we would expect something like 2 D Z 1 0 f .t/ 2 dt ¤ 0 rather than condition (13). After further progress had been made by Gapoškin [25] and Takahashi [48], Gapoškin eventually discovered a deep connection between the validity of a central limit theorem and the number of solutions of a certain Diophantine equation [26], i.e., whether a central limit theorem holds or not depends not only on the growth rate of the sequence .n k / k2N , but also critically on its number theoretic properties. In 2010 Christoph Aistleitner and István Berkes presented a paper in which they obtained both necessary and sufficient conditions under which a sequence f .n k / k2N follows a Gaussian law of errors [1]. Please note that the preceding paragraph is not intended to be exhaustive. Still it indicates the development of the subject, highlights some fascinating results, and shows how analytic, probabilistic, and number theoretic arguments and properties intertwine.

Remark 7
The results presented in this final section are not restricted to central limit phenomena. Beyond the normal fluctuations one can also prove laws of the iterated logarithm for lacunary series and we refer the reader to the work of Erdős and Gál [21], Aistleitner and Fukuyama [4,5], Aistleitner, Berkes, and Tichy [2,3], and the references cited therein. The study of large deviation principles for lacunary sums has recently been initiated by Aistleitner, Gantert, Kabluchko, Prochno, and Ramanan in [6].
Applications" as well as a visiting professorship from Ruhr University Bochum and its Research School PLUS.
We thank Christoph Aistleitner, Jordan Stoyanov and an anonymous referee for valuable comments and suggestions.
Funding Open access funding provided by University of Graz.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4. 0/.