Turning a coin over instead of tossing it

Given a sequence of numbers $\{p_n\}$ in $[0,1]$, consider the following experiment. First, we flip a fair coin and then, at step $n$, we turn the coin over to the other side with probability $p_n$, $n\ge 2$. What can we say about the distribution of the empirical frequency of heads as $n\to\infty$? We show that a number of phase transitions take place as the turning gets slower (i.e. $p_n$ is getting smaller), leading first to the breakdown of the Central Limit Theorem and then to that of the Law of Large Numbers. It turns out that the critical regime is $p_n=\text{const}/n$. Among the scaling limits, we obtain Uniform, Gaussian, Semicircle and Arcsine laws.

Remark 1 (Poisson binomial random variable). The number of turns that occurred up to n, that is n 2 W i , is a Poisson binomial random variable. The Poisson binomial distribution has many applications in different areas such as reliability, actuarial science, survey sampling, econometrics, and so on. Its characteristic function is fairly simple: See [5] for more on Poisson binomial distribution; see also [7].
The following quantity will play an important role: for 1 ≤ i < j ≤ N, let For the centred variables Y n , we have Y j = Y i (−1) j i+1 W k , j > i, and so, using Corr and Cov for correlation and covariance, respectively, one has (1 − 2p k ) = e i,j ; (1) Corollary 1 (Correlation estimate). Assume that p k → 0 and let n 0 be such that p k ≤ 1/2 for k ≥ n 0 . For n 0 ≤ i < j, where r k := 2p 2 k e 2p k , which is tending to zero rapidly. Furthermore, for any given C > 1 there exists an n 0 such that For n 0 ≤ i < j, Proof. Use the Remainder Theorem for Taylor series, yielding that is, and multiply these inequalities, to get the first statement.
For the second statement, use that for sufficiently small positive x, Similarly to (1), if K = 2m is a positive even number, and i 1 < i 2 < · · · < i K then, using the fact that we obtain that We also define S N = Y 1 + · · · + Y N , and note that from symmetry it follows that if K is a positive odd integer, then ES K N = 0. We close this section with introducing some frequently used notation.
Notations: In the sequel, Bessel I α and Bessel K α will denote the modified Bessel function of the first kind (or Bessel-I function) and the modified Bessel function of the second kind (or Bessel-K function), respectively.
Writing out these functions explicitly, one has if α is not an integer (otherwise it is defined through the limit), where Γ is Euler's gammafunction. See e.g., Sections 9-10 in [1], and formula (6.8) in [2].

Supercritical cases
First, if n p n < ∞ then by the Borel-Cantelli lemma, only finitely many turns will occur a.s.; therefore the side we see stabilizes and by the assumption on X 1 , where ζ is a Bernoulli 1 2 random variable.

A simple critical case
Assume that Remark 2. The reader can easily check that convergence in distribution cannot be strengthened to an almost sure one.
Proof. Equation (1) gives Therefore from (3) we obtain that for even positive K, Now recall that S N = Y 1 + · · ·+ Y N and that the distribution of Y n , and thus of S N are symmetric around 0. Hence, the odd moments of S N are zero: ES K N = 0, K = 1, 3, 5, . . . For K even, we can use the multinomial theorem: where I stands for the sum of products where not all terms are different. Note that |Y l i | ≤ 1 for any l ≥ 1 (and Y l i ≡ 1 for l even). Therefore |I| ≤ m(N, K), where m(N, K) is the number of such products. But m(N, K) ≤ N · N K−2 = N K−1 , because each such product can be written where the numbers i ℓ , i 1 , ..., i K−2 are between 1 and N and are not necessarily distinct. Hence, We can estimate the sum in (4) as follows: Summing first over i 1 , then over i 2 , etc., gives It follows that, as N → ∞, hence from (4) we obtain Thus, as N → ∞, Putting things together, we have obtained that Hence the moments of S N /N converge to those of Uniform( supported on a compact interval, it follows (see e.g. Section 2, Exercise 3.27 in [3]) that it is the limit of the laws of S N /N.

General critical case
Fix a > 0 and let p n = a n , n ≥ 2.
Denote by Beta(a, a) the symmetric Beta distribution with a > 0, with the following moment generating function and for even K we obtain Just like before, since we are working on a compact interval, we conclude that S N /N → ξ a in distribution, where ξ a is distributed on [−1, 1] and has the following moments: , K = 2m is even, which, for even moments, can be equivalently written as The moment generating function of ξ a is Let ζ a := (ξ a + 1)/2. We know that 1 N N i X i → ζ a in distribution, and using (5), completing the proof.
Concerning the corresponding densities, we have the following explicit formulas.

Sub-critical case
Now fix γ, a > 0 and let p n = a n γ , n ≥ 2. Note that γ > 1 corresponds to the supercritical case studied in Section 2; so from now on assume 0 < γ < 1.
In the proof we will use that [3], Section 2, Exercise 3.15).
Let A (A > a) be a given constant. By Corollary 1, there exists an n 0 = n 0 (a, A, γ) such that Bounding the sum by the integral using the fact that x −γ is decreasing, we have that is, using the shorthands c : It is easy to check that sup N Eη 2 a,γ,N < ∞ (see the computation below with m = 1), and thus Chebyshev's inequality implies that {η a,γ,N } is a tight sequence of random variables. Hence, it is enough to show that each sub-sequential limit is the same.
Assume that (N l ) l≥1 is a sub-sequence and lim l→∞ Law(η a,γ,N l ) = L. Because trivially , and in fact, this limit must be the same for any A > a (and corresponding n 0 = n 0 (a, A, γ)).
Informally, this just means that we can through away a finite chunk of the sequence of Y i (at the beginning) without affecting its limit.
Let us denote the even moments of L by M 2m ∈ [0, ∞], m ≥ 1, while we note again that the odd moments must be zero by symmetry. Also, M N l ,A,K will denote the Kth moment under We will show below that for a fixed A > a and K = 2m, m ≥ 1, Once (11) is shown, it will follow from the upper estimate and from the relation L = lim l→∞ L N l ,A for all A > a that (12) lim l→∞ M N l ,A,K = M K for all K ≥ 1 and all A > a. Since (11) holds for any A > a, letting A ↓ a and using (11) and (12) that in fact In summary, we obtain that for any fixed A > a, At the same time the normal distribution is uniquely determined by its moments, and therefore the convergence towards a normal law is implied by the convergence of all the moments (see e.g. [3], Section 2.3.e). In our case, (13) along with (9) imply L = lim l→∞ L N l ,A = Normal(0, σ 2 ).
Therefore, it only remains to prove (11).
Let us start with the upper estimate in (11). It is easy to see that for K = 2m, one has where I are lower order terms, as it will be shown below. Using (3) along with (10), we may continue with where By the calculation in the Appendix, the RHS of (14) is By the same token, The reason the remaining terms, collected in I, are of lower order is the following. Apart form the already estimated term, in the expansion for E(Y n 0 + · · · + Y N ) K for r = 1, 2, . . . , K − 1 we also have to sum up the terms of the type where n 0 ≤ i 1 < · · · < i r ≤ N, all p j ≥ 1, and p 1 + p 2 + · · · + p r = K.
Since Y i = ±1, and thus Y p i = 1 if p is even and Y p i = Y i if p is odd, it suffices to estimate only the sums R(r; ℓ 1 , . . . , ℓ r ; N; K; γ) where the summation is taken over all sets (i 1 , . . . , i r ) such that i k+1 ≥ i k + ℓ k , 1 ≤ ℓ k ≤ K, for all k, i 1 ≥ 1 and i r ≤ N. However, since r ≤ K − 1, each of the sums R(r; ℓ 1 , . . . , ℓ r ; N; K; γ) is at most of order N r(1+γ)/2 ≤ N (K−1)(1+γ)/2 precisely by the same arguments which ere used to estimate the sum in (14). The number of those sums can be large, as it is the number of integer partitions of K, but it depends only on K and does not increase with N.
Consequently, for m ≥ 1 we have and by similar computation, The proof is complete.

When does the Law of Large Numbers hold for general sequences {p n }?
A natural question to ask is when S N obeys the Strong (Weak) Law of Large Numbers. The following result gives a partial answer.
For a positive even number K, introduce the shorthand   Note that (16) is the so called Carleman-condition, guaranteeing that the µ K 's correspond to at most one probability law (see Theorem 3.11, Section 2, in [3].) Proof. We will use the facts about the method of moments for weak convergence discussed in the proof of Theorem 3, along with the fact that from (4) it follows that (a) We prove for the two assumptions separately.
Under (C1), the statement follows from Theorem 1 in [6], as Under (C2), along the lines of Theorem 6.5 in Section 1 of [3], we note that for ε > 0, one has P S N N > ε ≤ ES K N ε K N K by the Markov inequality (recall that K is even). Since, by (4), the expression on the lefthand side of (15) is the leading order term in ES K N , by (15), we have N P(|S N /N| > ε) < ∞, and thus, by the Borel-Cantelli lemma, P(|S N /N| > ε i.o.) = 0, which implies the statement.
(b) Since the deterministically zero distribution is uniquely determined by its moments, convergence in law to that distribution follows from the convergence of all moments to zero.
Under the second condition in the theorem, all moments of S N /N converge to zero (the odd moments are zero by symmetry) and thus S N /N converge to zero in law (and also in probability, since the limit is deterministic).
(c) Assume that the conditions in (c) hold. Since the moments of S N /N converge (the odd moments are zero by symmetry), the corresponding laws are tight and, by the Carleman condition, all subsequential limits are the same. That is, as N → ∞, Law(S N /N) converges to a law with moments given by µ K , and since µ K > 0, the limit cannot be deterministically zero.
Proof. This follows from Theorem 4(a), using condition (C1), as we have seen that the inner sum is of order N 1+γ .

Further heuristic arguments and a conjecture
Consider a sum of N ≥ 1 variables having the same law with finite variance. As is well known, the two 'extreme cases' for a sum are the independent case, when the variance is linear and one gets the Central Limit Theorem, and the other one is when all the variables are identical and the variance grows like N 2 . By analogy then, after recalling that in our model it seems that the first crucial question is whether In case of (17) fails, one should know whether at least holds.
Indeed, if (17) is true, then Var (S N ) is of order N, and one expects that CLT holds, that is the fluctuations for the proportion of heads around 1/2 is of order √ n. This happens when p n ≡ p ∈ (0, 1). Condition (18) should intuitively be the one that guarantees WLLN to hold.
In light of this, we make the following Conjecture. Note that our condition (C1) for the Strong LLN is more stringent than (18).
In the examples below, the deviations from the Central Limit Theorem are becoming more marked as we go from Ex1 to Ex2 to Ex3.
Therefore the variance is still of order N but the constant has changed. Recall Cov(Y i , Y j ) = e i,j = κ j−i , and, following [4], define when Y 0 ∼ Bernoulli(1/2). In this case, since we are dealing with a time homogeneous Markov chain, it is well known (see [4]) that Hence, . Therefore, only when c = 1/2, will classical CLT hold for Y i . It is also clear that the limiting normal variance can be arbitrarily large when c is sufficiently small and thus turns occur very rarely. On the other hand, it can be arbitrarily small if c is sufficiently close to 1 and thus turns occur very frequently.
(Ex1) (CLT breaks down) Consider the case p n := a/n γ with 0 < γ < 1. Then Consequently that is Var(S N ) is of order N γ+1 , and the power is strictly between 1 and 2. Hence (17) is false. The closer γ to 1, the more the situation differs from CLT. However, (18) is true.
Thus, the Law of Large Numbers is still in force, so the proportion is still around 1/2, but the fluctuations are non-classical (larger than in CLT).
(Ex2) (LLN breaks down) Consider the case when p n = 1/n. Then Consequently, is of order N 2 , that is, (17) and even (18) are false, causing The Law of Large Numbers to break down, and the proportion is no longer around 1/2. This means that the correlation is as strong as in the case of identical variables, and the fluctuations are now of the same order N as the size of the sum.
Similar is the situation when p k = a k with a > 0. Instead of being around the δ 1/2 distribution, now one obtains all the Beta(a, a) distributions.
(Ex3) (Extreme limit) Consider the case when n p n < ∞. Then lim inf N →∞ E(N, 2) > 0 must hold (hence (17) and even (18) are false), because by a well known theorem (as p k > 0), the infinite product Π := Π k≥1 (1 −2p k ) in this case exists and positive, and so e j,i ≥ Π implies Then, indeed, as we know, the limit is 'extreme': Beta(0, 0) = 1 2 (δ 0 + δ 1 ), which is as far away from δ 1/2 as possible! (note that each expression is between 0 and 1) can be very well approximated by the corresponding integral, since, whenever y ≤ x, |x − x| ≤ 1 and |ỹ − y| ≤ 1, the ratio is bounded above by e c 1 [y −γ +x −γ ] where c 1 > 0 is some constant. Hence, outside of the area where x and y are both smaller than √ N, this constant is very close to 1, while the double sum over this area is at most N. Therefore, as N → ∞, To calculate the inner integral, observe (note that y ≥ n 0 ). On the other hand, where a := n 0 + 2l, b := i 2l+3 , q := (1 + γ)l. We are allowed to do this since, for |x − x| ≤ 1 and |ỹ − x| ≤ 1, the ratioỹ q e c[ỹ 1−γ −x 1−γ ] y q e c[y 1−γ −x 1−γ ] is bounded above by 1 + 2l y e c 1 (x −γ +y −γ where c 1 > 0 is some constant. Since y ≥ N γ , this constant is very close to 1, hence the double sum in (21)