Prime numbers in typical continued fraction expansions

We study, from the viewpoint of metrical number theory and (infinite) ergodic theory, the probabilistic laws governing the occurrence of prime numbers as digits in continued fraction expansions of real numbers.


Introduction
Ever since Gauss [Gau] declared his interest in the intriguing statistical properties of sequences of digits a n (x), n ≥ 1, in the continued fraction (CF ) expansion of real numbers x ∈ I := (0, 1], x = [a 1 (x), a 2 (x), . ..] = 1 (and, in particular, mentioned that this led to questions he could not answer), the metrical theory of continued fractions has attracted many mathematicians' attention.In the present paper we will be interested in the prime digits of x, i.e. those a n (x) which happen to belong to the set P of prime numbers.To single them out, we define, for x ∈ I and n ≥ 1, a ′ n (x) := ½ P (a n (x)) • a n (x) = a n (x) if a n (x) ∈ P, 0 otherwise.
(There is hardly any danger of misinterpreting this phonetically perfect symbol as a derivative.)The purpose of this note is to point out that it is in fact possible -with the aid of the prime number theorem and recent work in (infinite) ergodic theory and in the probability theory of dynamical systems -to derive a lot of information about the occurrences and values of prime digits in CF-expansions of (Lebesgue-) typical numbers.Besides stating the theorems themselves it is also our aim to show some newer more general results in ergodic theory in action.While many analogous versions of the following statements have directly been proven for the continued fraction digits, today, it is possible to deduce them or the version for the prime digits from more general theorems.

Main Results -Pointwise matters
We first consider questions about the pointwise behaviour of the sequence (a ′ n ) n≥1 on I. Throughout, λ denotes Lebesgue measure on (the Borel σ-field B I of) I, and almost everywhere (a.e.) is meant w.r.t.λ.For the sake of completeness, we also include a few easy basic facts, e.g. that for a.e.x ∈ I, the proportion of those k ∈ {1, . . ., n} for which a k (x) is prime converges: Proposition 2.1 (Asymptotic frequency of prime digits).We have a.e.
The results to follow can best be understood (and proved) by regarding (a ′ n ) as a stationary sequence with respect to the Gauss measure (cf.§4 below).The first statement of the next theorem is parallel to the classical Borel-Bernstein theorem (cf.[Bo, Be]) the third and fourth statements are in accordance with [KS2].As usual, i.o. is short for "infinitely often", i.e. "for infinitely many indices".We denote the iterated logarithms by log 1 := log and log m+1 := log • log m , m ≥ 1.
Furthermore, we define the maximal entry Remark 2.1.The exponent 0.475 in c) comes from estimates for the error term in the prime number theorem and might be improved by future research.
Example 2.1.A straightforward calculation shows that and this remains true if a ′ n is replaced by M ′ n .We thus find that As a consequence of Theorem 2.1 b), observing that the series otherwise.
In particular, A convenient condition for the criterion above is provided by As in the case of the full digit sequence (a n ) n≥1 , the peculiar properties of (a ′ n ) n≥1 are due to the fact that these functions are not integrable.A general fact for nonintegrable non-negative stationary sequences is the non-existence of a non-trivial strong law of large numbers, made precise in a), c) and d) of the next result, where c) is in the spirit of [P3].However, it is sometimes possible to recover a meaningful limit by trimming, i.e. by removing maximal terms.In the case of (a n ) n≥1 , this was first pointed out in [DV].Assertion b) below gives the proper version for the (a ′ n ) n≥1 .Theorem 2.2 (Strong laws of large numbers).a) The prime digits satisfy b) Subtracting M ′ n , we obtain a trimmed strong law, d) But, defining n(j) := e j log 2 j , j ≥ 1, and d ′ n := n(j) log 2 n(j)/ log 2 for n ∈ (n(j − 1), n(j)] gives a normalizing sequence for which The trimmed law from b) shows that the bad pointwise behaviour described in c) is due to a few exceptionally large individual terms a ′ n which, necessarily, have to be of the order of the preceding partial sum n−1 k=1 a ′ k .In fact, almost surely, the partial sum will infinitely often be of strictly smaller order than the following term, see statement a) below.We can also ask whether, or to what extent, the terms from the thinner sequence (a ′ n ) n≥1 come close to the partial sums ( n−1 k=0 a k ) n≥1 of the unrestricted one.The answer is given by the dichotomy rule in statement b) of the next result.
We shall tacitly interpret real sequences (g n ) n≥0 as functions on , and write Theorem 2.3 (Relative size of digits and partial sums).a) We have

Generally, for functions
In contrast, comparing to the unrestricted digit sum n−1 k=1 a k , one has

Generally, for functions
c) Turning to a comparison of partial sums, we find that Remark 2.2.A broad class of functions which satisfy g(η(t)) ≍ g(t) if η(t) ∼ t as t → ∞, are the regularly varying functions.Recall that a measurable function [BGT] for more information).
Whether or not the integrals diverge can easily be checked for many specific g's: while, for γ > 1, On the other hand, if we look at primes to some power γ we obtain -as a counterpart to Theorem 2.2 b) -the following result: where (2.13) Remark 2.3.It is not proven that a trimming rate slower than the one given in (2.11) is possible.However, by [H] one can deduce that for i.i.d.random variables with the same distribution function and b n ≍ log log n a strong law of large numbers as in (2.12) is no longer possible.
However, if we only ask for convergence in probability, the picture looks much simpler and we refer the reader to Theorem 3.1 in the next section.

Main Results -Distributional matters
The second set of results we present focuses on the distributions of (various functions of) the digits a ′ n .If (M, d) is a separable metric space with Borel σ-field B M , a sequence (ν n ) n≥1 of probability measures on (M, B M ) converges weakly to the probability measure ν on (M, B M ), written ν n =⇒ ν, if the integrals of bounded continuous function ψ : M → R converge, i.e. ψ dν n −→ ψ dν as n → ∞.If R n : I → M , n ≥ 1, Borel measurable functions and ν a Borel probability on M (or R another random element of M , not necessarily defined on I, with distribution ν) then (R n ) n≥1 converges in distribution to ν (or to R) under the probability measure P on B I , if the distributions P • R −1 n of the R n w.r.t.P converge weakly to ν. Explicitly specifying the underlying measure, we denote this by For sequences (R n ) defined on an ergodic dynamical system, it is often the case that a distributional limit theorem R n P =⇒ R automatically carries over to a large collection of other probability measures: strong distributional convergence, written We start by giving a counterpart to Theorem 2.4 for weak convergence, where b) is in the spirit of [Khi].
where S bn n is defined as in Theorem 2.4, (d n ) is given as in (2.13) and lim n→∞ b n = ∞ and b n = o(n 1−ǫ ).
Remark 3.1.Indeed by [KS3] the stronger result of convergence in mean follows for c).It is not proven that for the situation in c) convergence in probability can not hold for a lightly trimmed sum, i.e. a sum from which only a finite number of large entries, being independent of n is removed.However, it follows from [A] that n k=1 (a ′ k ) γ normed by the right norming sequence converges to a non-degenerate Mittag-Leffler distribution if γ > 1.On the other hand, by [Kes] it follows that light trimming does not have any influence on distributional convergence if the random variables considered are i.i.d.
As we have seen in the previous section, the maximum M ′ n has a large influence then the whole system, in the following we will give its distributional convergence.We let Θ denote a positive random variable with Pr[Θ ≤ y] = e −1/y , y > 0 and get the following counterpart to [P2].
A related classical topic, introduced by Doeblin [D], is the Poissonian nature of occurrences of very large CF-digits.For l ≥ 1 let ϕ l = ϕ l,1 := inf{k ≥ 1 : a k ≥ l}, the first position in the CF-expansion at which a digit ≥ l shows up, and ϕ l,i+1 := inf{k ≥ 1 : a ϕ l,i +k ≥ l} the distance between the ith and (i + 1)st occurrence.Defining Φ l : I → [0, ∞] N as Φ l := (ϕ l,1 , ϕ l,2 , . ..) and letting Φ Exp denote an i.i.d.sequence of normalized exponentially distributed random variables, we can express this classical result by stating that 1 log 2 Turning to prime digits, we shall consider the corresponding quantities ϕ ′ l,i with ϕ ′ l,0 := 0 and ϕ ′ l,i+1 := inf{k ≥ 1 : a ′ ϕ ′ l,i +k ≥ l}, i ≥ 0, and the processes Φ ′ l := (ϕ ′ l,1 , ϕ ′ l,2 , . ..) of distances between consecutive occurrences of prime digits of size at least l.In fact, we also provide refined versions of the limit theorem which show that, asymptotically, both the relative size compared to l of such a large prime digit a ′ ϕ ′ l,i and its residue class for a given modulus m, are stochastically independent of the positions ϕ ′ l,i at which they occur.(These statements are parallel to Propositions 10.1 and 10.2 of [Z3].A (q 1 , . . ., q d )-Bernoulli sequence is an iid sequence of random variables which can assume d different values with respective probabilities q 1 , . . ., q d .) Theorem 3.3 (Poisson limits for large prime CF-digits).The sequences Φ ′ l of positions at which large prime digits occur satisfy the following.a) Their distances converge to an i.i.d.sequence of exponential variables, b) Take any ϑ ∈ (0, 1), let ψ ′ l,i be the indicator function of {a ′ ϕ ′ l,i ≥ l/ϑ} and set , . ..), which identifies those prime digits ≥ l which are in fact ≥ l/ϑ.Then where ( Φ Exp , Υ ′ ) is an independent pair with Υ ′ a ( 1 φ(m) , . . ., 1 φ(m) )-Bernoulli sequence.(Here φ(m) denotes the Euler totient.) We finally look at the distribution of a function which counts how many a ′ n fall into particular sets A n giving a limit theorem in the spirit of [P1,KS2].We let N denote a positive random variable with Pr[N ≤ y] = y 0 e −t 2 /2 dt/ √ 2π, y > 0.
Theorem 3.4 (A CLT for counting primes in CF).Suppose that either as n → ∞.

The Gauss map and the prime digit function
The results announced above express properties of certain stochastic processes derived from the exceptionally well understood dynamical system generated by the ergodic continued fraction map (or Gauss map) [Gau], is known to preserve the probability density The invariant Gauss measure µ G on B I defined by the latter, µ G (B) := B h G (x) dx, is exact (and hence ergodic).As hardly any textbook on ergodic theory fails to point out, iteration of S reveals the continued fraction digits of any x ∈ I, in that As in classical probability theory, the tail behaviour of the distribution, given by , is the key to fine asymptotic results.However, the study of the CF digit sequence goes beyond standard results, since the random variables a • S n are not independent.Yet, it is well known that they still satisfy a strong form of asymptotic independence or mixing in the following sense: Given any measure preserving transformation T on a probability space (Ω, B, P ), and a countable measurable partition γ (mod P ), the ψ-mixing coefficients of γ are defined as The partition γ is said to be continued-fraction (CF-) mixing for the probability preserving system (Ω, B, P, T ) if it is generating, and if ψ γ (1) < ∞ as well as ψ γ (n) → 0 for n → ∞. (Note that (ψ γ (n)) n≥1 is non-increasing.)Of course, the nomenclature is due to the fact that (4.1) ξ is CF-mixing for (I, B I , µ G , S).
Actually, this system is exponentially CF-mixing, in that there are constants C > 0 and ρ ∈ (0, 1) such that (which is related to Gauss' famous question mentioned in the introduction, see e.g.[IK] or [Z1]).
We are going to study occurrences of prime digits by considering the restricted digit function a ′ := (½ P • a) • a : I → {0} ∪ P. As in the case of a, this function, as a random variable on (I, B I , µ G ), still has infinite expectation.Indeed, the prime number theorem (PNT) enables us to quickly determine the all-important tail asymptotics for the distribution of a ′ .The following lemma is the key to our analysis of the prime digit sequence.
Lemma 4.1 (Tail behaviour and truncated expectation of a ′ ).The distribution of a ′ (with respect to the Gauss measure) satisfies In particular, a ′ is not integrable, I a ′ dµ G = ∞.Moreover, Proof.First, the PNT is easily seen (cf.[HW], Theorem 1.8.8) to imply that (4.4) p n ∼ n log n as n → ∞, where p n denotes the nth prime number.Therefore, as N → ∞.
Letting N (K) denote the least n with p n ≥ K, we have, as and, by PNT, N (K) ∼ K/ log K. Combining these observations yields (4.2).The second statement is an easy consequence thereof, since Straightforward calculation verifies the assertions about a ′ and b ′ .
Remark 4.1.Several of the results allow for analogues in which prime digits are replaced by digits belonging to other subsets M of the integers for which π M (n) := #M ∩ {1, . . ., n} is regularly varying with m∈M 1 m = ∞, like, for example, the set of integers which are the product of exactly k prime numbers, see Theorem 3.5.11 of [J].(M.Thaler, personal communication.) 5. Proofs of the results on a.e.convergence We are now ready for the proofs of our pointwise convergence results.We can always work, without further mention, with the invariant measure µ G , since it has the same null-sets as λ.
Proof of Proposition 2.1.This, of course, is just the ergodic theorem, In the following we will repeatedly appeal to the following version of Rényi's Borel-Cantelli Lemma (BCL) (as in Lemma 1 of [ATZ]): Lemma 5.1 (Rényi's Borel-Cantelli Lemma).Assume that (E n ) n≥1 is a sequence of events in the probability space (Ω, B, P ) for which there is some r ∈ (0, ∞) such that This lemma enables us to prove Theorem 2.1.
Proof of Theorem 2.1.a) Note that {a ′ j > c} = S −(j−1) {a ′ > c} with {a ′ > c} measurable w.r.t.ξ.As a consequence of the CF-mixing property (4.1), we see that Rényi's BCL applies to show that By S-invariance of µ G and Lemma 4.1, we have } is easily seen to belong to the tail-σ-field n≥0 S −n B I of S. The system (I, B I , µ G , S) being exact, the latter is trivial mod µ 2) is seen by an easy routine argument, as in the proof of Proposition 3.1.8of [IK].
c) Without loss of generality we first assume that c n ≤ 0.5, for all n.If this doesn't hold, we can easily switch to a subsequence in which this holds and consider the subsequences separately.By the prime number theorem we have Next, we assume that c n > 0.5.We note that On the other hand, we have by [BHP], p. 562 that there exists K > 0 such that Combining this with (5.2) and ( 5.3) yields the statement of c).
On the other hand, c(n) < k implies log k > √ log n and hence log 2 k > (1/2) log 2 n, log k > 1, and log k > log 3 n.
Using these estimates we see that Taking into account that log 3 x is a primitive of 1/f (x) we get .
Since this estimate holds for infinitely many n, we see that Proof of Theorem 2.2.a) Since I a ′ dµ G = ∞ by Lemma 4.1, this is immediate from the ergodic theorem.
b) We apply Theorem 1.1 of [AN] to (I, B I , µ G , S) and a ′ , observing that (in the notation of that paper), d) Note first that letting c ′ n := d ′ n / log 2 (10j) for n ∈ (n(j − 1), n(j)], provides us with a non-decreasing sequence satisfying Together with (2.5) and n log 2 n/(log 2 Specializing (5.5), and using (2.5) and (5.4) again, we find that −→ 1 a.e.
sequence of natural numbers tending to infinity, (c n ) a sequence of positive numbers with c n ≤ d 0.475 n and ∞ n=1 1/(c n d n log(d n )) = ∞.Then, for S n := n k=1 ½ A k the following central limit theorem holds: where a : I → N is the digit function corresponding to the partition ξ := {I k : k ≥ 1}, i.e. a(x) := ⌊1/x⌋ = k for x ∈ I k .The stationary sequence (a • S n ) n≥0 on the probability space (I, B I , µ G ) thus obtained exhibits interesting properties since a has infinite expectation, I a dµ

d)
This follows immediately from [KS2, Theorem 6a].Proof of Lemma 2.1.By assumption there is some ε ∈ (0, 1) such that the set M := {n ≥ 1 : (n log 2 n)/b n ≥ ε} is infinite.Define c(x) := exp( √ log x) and f (x) := x log x log 2 x for x > 1, and note that c(x) < x for x > e. Suppose that n ∈ M , n ≥ 4, and c(n) < k ≤ n.Since k ≤ n, we have b Furthermore, by using the estimate of Lemma 4.1 and setting a ′ (N ) := N/L ′ (N ) ∼ log 2 • N/ log 2 N we get that its asymptotic inverse can be written as b ′ (N ) := (N log 2 N )/ log 2 which by the statement of the paper coincides with the norming sequence.c) Using Theorem 2.1 a), we first note that n≥1 1/(b n log b which by Lemma 2.1 entails (n log 2 n)/b n → 0. In view of Theorem 2.1 a), our assumption implies that M ′ n /b n → 0 a.e.Together with statement b) above, these observations prove (2 log x) log 2 g(x log x) dx (x log x) 2 ≍ ∞ c g(y) log 2 g(y) dy y 2 log y .b) Same argument as in a), this time with ϕ := g • a ′ and ψ := a, and replacing a ′ above by a(t) := t/L(t) ∼ log 2 • t/ log t as t → ∞. b) of Theorem 2.2 together with the Diamond-Vaaler trimmed law, log 2 ( n k=1 a k − M n ) /(n log n) → 1 a.e. and finally by a) of Corollary 2.1.Proof of Theorem 2.4.a) We have that (a ′ ) γ dλ < ∞ and the statement follows by the ergodic theorem.b) We may apply [KS1, Theorem 1.7 & erratum].That Property C is fulfilled with the bounded variation norm • BV is a standard result.For Property D, we notice that a • ½ {a≤ℓ} BV ≤ 2ℓ and ½ {a≤ℓ} BV ≤ 2 implying that this property is fulfilled.In order to calculate the norming sequence (d n ) we notice thatµ G (a ′ ) γ > n = µ G a ′ > n 1/γ ∼ 1 log 2 n 1/γ log n 1/γ = γ log 2 n 1/γ log n = L(n) n 1/γ ,where L(n) = γ/(log 2 log n) is a slowly varying function.Using then [KS1, Theorem 1.7 & erratum] we obtain that (2.12) holds for (b n ) fulfilling b n = o(n) and lim n→∞ b n log 2 n = ∞ and for (d n ) fulfillingd n ∼ 1/γ 1 − 1/γ n γ b 1−γ n L −γ # n b n γ , and hence µ G (A ′ l (j)) ∼ µ G (A ′ l ) /φ(m) with A ′ l (j) := A ′ l ∩ {a ′ ≡ j (mod m)} a ξ-measurable set.Another direct application of Theorem 10.2.b) in[Z3]  then completes the proof of our theorem.The result thus established essentially contains (3.1).Proof of Theorem 3.2.Theorem 3.3 a) contains the statement that µG (A ′ l ) ϕ ′ l,1converges to a standard exponential law.Using the natural duality {M ′ n < l} = {ϕ ′ l,1 ≥ n} this is easily seen to imply (3.1).Proof of Theorem 3.4.The result follows directly by [KS2, Theorem 3] by considering the sets A n = {a n ∈ P ∩ Γ n }.The only thing to check is that (6.1)