Endpoint estimates for the maximal function over prime numbers

Given an ergodic dynamical system $(X, \mathcal{B}, \mu, T)$, we prove that for each function $f$ belonging to the Orlicz space $L(\log L)^2(\log \log L)(X, \mu)$, the ergodic averages \[ \frac{1}{\pi(N)} \sum_{p \in \mathbb{P}_N} f\big(T^p x\big), \] converge for $\mu$-almost all $x \in X$, where $\mathbb{P}_N$ is the set of prime numbers not larger that $N$ and $\pi(N) = \# \mathbb{P}_N$.

converge for µ-almost all x ∈ X, where P N is the set of prime numbers not larger that N and π(N) = #P N .

I
Let (X, B, µ, T) be an ergodic dynamical system, that is (X, B, µ) is a probability space with a measurable and measure preserving transformation T : X → X. The classical Birkhoff theorem [2] states that for any function f from L p (X, µ) with p ∈ [1, ∞), the ergodic averages converge for µ-almost all x ∈ X. This classical result, among others, motivates studying ergodic averages over subsequences of integers. In this article we are interested in pointwise convergence of the following averages, where P N is the set of prime numbers not larger than N and π(N) = #P N . The problem of ergodic averages along prime numbers was initially studied by Bourgain in [4] where the case of functions belonging to L 2 (X, µ) has been covered. It was extended by Wierdl in [22] to all L p (X, µ), for p > 1, see also [6,Section 9]. However, the endpoint p = 1, was left open for more than twenty years. Following the method developed in [7] by Buczolich and Mauldin, LaVictoire in [13] has shown that for each ergodic dynamical system there exists f ∈ L 1 (X, µ) such that the sequence (A N f : N ∈ N) diverges on a set of positive measure. The purpose of this article is to find an Orlicz space close to L 1 (X, µ) where the almost everywhere convergence holds. We show the following theorem (see Theorem 7.4).
Theorem A. For each f ∈ L(log L) 2 (log log L)(X, µ), the limit exists for µ-almost all x ∈ X.
In light of the pointwise convergence obtained by Bourgain in [5], see also [16], to prove Theorem A it suffices to show the weak maximal ergodic inequality for functions in Orlicz space L(log L) 2 (log log L)(X, µ). This inequality is deduce from the following restricted weak Orlicz estimate.
By appealing to the Calderón transference principle, see [8], Theorem B is deduced from the corresponding result for integers Z with the counting measure and the shift operator. To be more precise, for a function f : Z → C, we define Our main result is following theorem (see Theorem 6.3).
Theorem C. There is C > 0 such that for any subset F ⊂ Z of a finite cardinality for all 0 < λ < 1.
Theorem C together with ℓ 2 (Z) estimates are sufficiently strong to imply the maximal inequality for all ℓ p (Z) spaces, for p > 1, giving an alternative proof of the Wierld's theorem [22].
Let us now give some details about the proof of Theorem C. Without loss of generality, we may restrict the supremum to dyadic numbers. It is more convenient to work with weighted averages Given t > 0, for each n ∈ N, we decompose the operator M 2 n into two parts A t n and B t n , in such a way that the maximal function associated with A t n has ℓ 1,∞ (Z) norm t f ℓ 1 , whereas the one corresponding to B t n has ℓ 2 (Z) norm exp − c √ t f ℓ 2 . When applied to the distribution function sup n∈N M 2 n (1 F ) > λ , we can optimize both estimates by taking t ≃ log 2 (e/λ). This idea originated to Ch. Fefferman [9], see also Bourgain [3]. Ionescu introduced this technique in a related discrete context, see [11]. The decomposition of M 2 n uses the circle method of Hardy and Littlewood. However, to achieve the exponential decay of the error term, due to the Page's theorem, the approximating multiplier has to contain the second term of the asymptotic as well. Thus, the possible existence of the Siegel zero entails that in the neighborhood of the rational point a/q the approximating multiplier L a,q 2 n (· − a/q) depends on the rational number a/q. We refer to Sections 3 and 5 for details. Thanks to the log-convexity of ℓ 1,∞ (Z), the weak type estimates are reduced to showing At this stage we exploit the behavior of the Gauss sums described in Theorem 2.1.
Let us emphasize that under the Generalized Riemann Hypothesis we can obtain in Proposition 3.1, and consequently in Theorem 3.2, a better error estimate. However, it is not clear whether one can prove Theorem 6.1 with the bounds proportional to √ t f ℓ 1 . The paper is organized as follows. In Section 2, we collect necessary facts about Dirichlet characters and the zero-free region. Then we evaluate the Gauss sum that appears in the approximating multiplier (Theorem 2.1). Section 3 is devoted to construction of the approximating multipliers. In Sections 5 and 6, we show ℓ 2 and the weak type estimates, respectively. In Section 7, we give two applications of Theorem C. Namely, we show how to deduce the maximal ergodic inequality for functions from ℓ p (Z), (Theorem 7.1). Next we apply the transference principle (Proposition 7.3) and show almost everywhere convergence of the ergodic averages (A N f : N ∈ N) for f ∈ L(log L) 2 (log log L)(X, µ), (Theorem 7.4).
Notation. Throughout the whole article, we write A B (A B) if there is an absolute constant C > 0 such that A ≤ CB, (A ≥ CB). Moreover, C stands for a large positive constant which value may vary from occurrence to occurrence. If A B and A B hold simultaneously then we write A ≃ B. The set of positive integers and the set of prime numbers are denoted by N and P, respectively. For

G
We start by recalling some basic facts from number theory. A general reference here is the book [17].
is called a Dirichlet character modulo q. The simplest example, called the principal character modulo q, is defined as (mn, q) = 1. For each character χ there is the unique primitive character χ ⋆ modulo q 0 for some q 0 | q, such that The character is quadratic if it takes only values {−1, 0, 1} with at least one −1. Recall that, if χ ⋆ is a primitive quadratic character with modulus q 0 , then , and q 0 is square-free, or • 4 | q 0 , q 0 /4 ≡ 2 or 3 (mod 4), and q 0 /4 is square-free. Given a Dirichlet character χ and s ∈ C with ℜs > 1, we define the Dirichlet L-function by the formula In fact, L( · , χ) extends to the analytic function in {z ∈ C : ℜz > 0}. There is an absolute constant c > 0, such that if χ is a Dirichlet character modulo q, then the region contains at most one zero of L( · , χ), which we denote by β q . The zero β q is real and the corresponding character is quadratic. The character having zero in (1) is called exceptional. Since L(β, χ) = 0 implies that L(1 − β, χ) = 0, we may assume that 1 2 ≤ β q < 1. The Gauss sum of a Dirichlet character χ modulo q is defined as where A q = 1 ≤ a ≤ q : gcd(a, q) = 1 , and ϕ(q) = #A q . Let us recall that for each ǫ > 0 there is C ǫ > 0 such that We set τ( χ) = ϕ(q)G( χ, 1). Let us denote by µ the Möbious function, which is defined for q = p α 1 1 . . . p α n n , where p 1 , . . . , p n are distinct primes, as and µ(1) = 1. The following theorem plays the crucial role in Section 6.
Theorem 2.1. Let χ be a quadratic Dirichlet character modulo q induced by χ ⋆ having the conductor q 0 . For x ∈ Z, we set r = gcd(q, x). Then provided that q/q 0 is square-free, gcd(q/q 0 , q 0 ) = 1 and r | q/q 0 . Otherwise the sum equals zero.
Let us observe that the identity (4) together with (2) imply that for any ǫ > 0. Moreover, G( χ, a) 0 entails that q is square-free or 4 | q and q/4 is square-free.

A
Let us denote by A N the averaging operator over prime numbers, that is for a function f : Z → C we have where P N = [1, N] ∩ P and π(N) = #P N . Since sums over primes are very irregular, it is more convenient to work with By the partial summation, we easily see that , To better understand the operators M N , we use the Hardy-Littlewood circle method. Let F denote the Fourier transform on R defined for any function f ∈ L 1 (R) as To simplify the notation we denote by F −1 the inverse Fourier transform on R or the inverse Fourier transform on the torus T ≡ [0, 1), depending on the context. Let m N be the Fourier multiplier corresponding to M N , i.e., (8) m Then for a finitely supported function f : Z → C, we have .
To simplify the notation we write For β < 1, we notice that the operators M β N are not averaging operators. Moreover, by the partial summation and (10), we get Hence, Moreover, Given q ∈ N, and a ∈ A q , we set when there is an exceptional character χ q modulo q and β q is the corresponding zero.
Proof. Observe that for a prime p, p | q if and only if (p mod q, q) > 1. Hence, Then, by the partial summation, we obtain Analogously, for any 1 2 ≤ β ≤ 1, we can write By the Page's theorem, there is an absolute constant c > 0 such that for each if there is no exceptional character modulo q, and when there is an exceptional character χ modulo q, and β is the concomitant zero. Therefore, by (15) and (16), we obtain Finally, by the prime number theorem and the proposition follows.
Next, we select η : R → R, a smooth function such that 0 ≤ η ≤ 1, and We may assume that η is a convolution of two smooth functions with supports contained in − 1 2 , 1 2 . For s ∈ N 0 , we set η s (ξ) = η 2 4s ξ . We define a family of approximating multipliers, by the formula where R s = a/q ∈ Q ∩ (0, 1] : a ∈ A q , and 2 s ≤ q < 2 s+1 , q is square-free or 4 | q and q/4 is square-free , and R 0 = {1}. We set ν n = s ≥0 ν s n . Theorem 3.2. There are C, c > 0 such that for all n ∈ N 0 and ξ ∈ T, where m N is defined by (8).
Proof. Let Q n = exp c 2 √ n where the constant c is determined in Proposition 3.1. By the Dirichlet's principle, there are coprime integers a and q, satisfying 1 ≤ a ≤ q ≤ 2 n Q −1 n , and such that Let us first consider the case when 1 ≤ q ≤ Q n . We select s 1 ∈ N 0 satisfying 2 s 1 +1 < 1 2 2 n Q −2 n ≤ 2 s 1 +2 . For s ≤ s 1 and a ′ /q ′ ∈ R s , with a ′ /q ′ a/q, we have Therefore, by (6) and (11), which implies that For s > s 1 , by (6) we obtain If q is square-free or 4 | q and q/4 is square-free then there is s 0 ∈ N 0 such that a/q ∈ R s 0 , thus Q n ≥ 2 s 0 . By Proposition 3.1, Finally, if q and q/4 are not square-free then by Proposition 3.1, It remains to deal with Q n ≤ q ≤ 2 n Q −1 n . By the Vinogradov's inequality (see [ Therefore, by (6) and (11), n , which entails that If s > s 2 , then by (6), we get hence by (18), and the theorem follows.

E ℓ 1
In this section we prove that the maximal function associated with kernels (M β 2 n : n ∈ N 0 ) has weak ℓ 1 (Z)norm equidistributed in residue classes. Before embarking on the proof, let us recall two lemmas essential for the argument. e 2πiξ x η s (ξ) dξ The following theorem is the main result of this section.
Since η s = η s η s−1 , by Young's convolution inequality and Lemma 4.1, we obtain where the last inequality is a consequence of 1 ≤ Q ≤ 2 2s . Therefore, in view of Lemma 4.2, we immediately get which is the desired conclusion.
Essentially the same reasoning as in the proof of Theorem 4.3 leads to the following theorem.

ℓ 2
We are now in the position to prove ℓ 2 (Z) boundedness of the maximal function associated to the multipliers (ν s n : n ∈ N). Theorem 5.1. For each ǫ > 0 there is C > 0 such that for all s ∈ N 0 , and any finitely supported function f : Z → C, Proof. We divide the supremum into two parts: 0 ≤ n < 2 s+4 and 2 s+4 ≤ n. Then the following holds true.
It remains now to treat supremum over n ≥ 2 s+4 . For each 1 2 ≤ β < 1 we set R β s = a/q ∈ R s : β q = β . and R 1 s = R s . In view of the Landau's theorem [17,Corollary 11.9], there are O(log s) distinct β's. Therefore, it suffices to show the following claim.
Let us fix 1 2 ≤ β ≤ 1. We define Observe that the functions x → I(x, y) and x → J(x, y) are Q s periodic where By the Plancherel's theorem, for u ∈ Z Q s , we have 2 −n |u| · η sf (· + a/q) L 2 , because by (11), Therefore, by the triangle inequality Since R s contains at most 2 2(s+1) rational numbers, by the Cauchy-Schwarz inequality we get Observe that

Now, by multiple change of variables and periodicity we get
Using Theorem 4.4, we can estimate Notice that Since supports of η s (· − a/q) are disjoint while a/q varies over R s , by (6) we get which together with (24) imply (23) and the theorem follows.

Corollary 5.4.
There are C, c > 0 such that for each t > 0, and any finitely supported function f ∈ Z → C, Proof. Since our assertion follows from Theorem 3.2 and Theorem 5.1. Indeed, by the Plancherel's theorem and Theorem 3.2 we get sup t ≤n On the other hand, by Theorem 5.1, which concludes the proof.

W
In this section we investigate the weak type estimates for the multipliers Π t n : n ≥ t . Then together with results from Section 5 we deduce Theorem C. Theorem 6.1. There is C > 0 such that for all t > 0 and any finitely supported function f : Z → C, Proof. Let us fix 2 s ≤ q < 2 s+1 for some 1 ≤ s ≤ √ t. Let 1 2 ≤ β ≤ 1. Suppose that χ is a quadratic Dirichlet character modulo q induced by χ ⋆ having the conductor q 0 . We claim that the following holds true.

Claim 6.2.
There is C > 0 such that for any finitely supported function f : Z → C, The constant C is independent of q, β and χ.
What is left now is to prove Claim 6.2. Let r ∈ {1, . . . , q}. For x ≡ r mod q, we have Hence, by Theorem 4.3, we obtain Next, by Young's convolution inequality we get . Now, by Theorem 2.1, we can compute where in the last inequality we have used Lemma 4.2 together with Lemma 4.1. Since (see e.g. [19]) proving the claim and the theorem follows. Theorem 6.3. There is C > 0 such that for any subset F ⊂ Z of a finite cardinality and all 0 < λ < 1, Proof. We start by proving the following statement. Claim 6.4. There are C, c > 0 such that for each t > 0, there are two sequences of operators (A t n : n ∈ N) and (B t n : n ∈ N) such that M 2 n = A t n + B t n , and for any finitely supported function f : Z → C, Without loss of generality, we may assume that f is non-negative finitely supported function on Z. For 1 ≤ n < t, we set A t n f = M 2 n f , and B t n f ≡ 0. Since by the prime number theorem, we have Hence, by the Hardy-Littlewood theorem, For t ≤ n, we set and B t n f = M 2 n f − A t n f . In view of Corollary 5.4 and Theorem 6.1, we obtain (27) and (26), respectively, and the claim follows. Now, the theorem is an easy consequence of Claim 6.4. Indeed, given a subset F ⊂ Z of a finite cardinality, for any t > 0, we can write Thus, taking t = (2c) −2 log 2 (e/λ), we get the desired conclusion.
In view of (7), Theorem 6.3 entails the following corollary, which is precisely Theorem C.
Corollary 6.5. There is C > 0 such that for any subset F ⊂ Z of a finite cardinality and all 0 < λ < 1,

A
In this section we show two applications of Theorem 6.3 and Corollary 6.5. First, we prove that the restricted weak Orlicz estimates together with strong ℓ 2 bounds are sufficient to get ℓ p maximal inequalities for all 1 < p ≤ 2. Next, we conclude almost everywhere convergence of ergodic averages for functions in some Orlicz space close to L 1 . 7.1. ℓ p theory. Theorem 7.1. For each p ∈ (1, 2] there is C > 0 such that for any function f ∈ ℓ p (Z), 7.2. Pointwise convergence. Let (X, B, µ) be a probability space with a measurable and measure preserving transformation T : X → X. We consider the following averages With a help of the Calderón transference principle from [8] applied to Corollary 6.5, we deduce the following proposition.

Proposition 7.3.
There is C > 0 such that for any subset A ∈ B, and all 0 < λ < 1, Proof. Fix A ∈ B and x ∈ X. For R > L > 0, we define a finite subset of F ⊂ Z by setting Then for 0 ≤ n ≤ R − N, N ≤ L, Hence, By Corollary 6.5, Since T preserves the measure µ, by integrating with respect to x ∈ X we obtain = C(R + 1)λ −1 log 2 (e/λ)µ(A). We now divide by R and take R approaching infinity to get µ x ∈ X : max 1≤ N ≤L A N 1 A T n x > λ ≤ Cλ −1 log 2 (e/λ)µ(A).
Finally, taking L tending to infinity by the monotone convergence theorem we conclude the proof.
We are now in the position to show µ-almost everywhere convergence of the ergodic averages (A N f : N) for a function f from the Orlicz space L(log L) 2 (log log L)(X, µ). Let us recall that L(log L) 2 (log log L)(X, µ) consists of functions such that ∫ X | f (x)| log + | f (x)| 2 log + log + | f (x)| dµ(x) < ∞ where log + t = max{0, log t}. The space L(log L) 2 (log log L)(X, µ) is a Banach space with the norm f L(log L) 2 (log log L) = where f * is the decreasing rearrangement of f , that is and φ(t) = log 2 (1 + t) log 1 + log t .
Theorem 7.4. There is C > 0 such that for each f ∈ L(log L) 2 (log log L)(X, µ), In particular, for each f ∈ L(log L) 2 (log log L)(X, µ), for µ-almost all x ∈ X.
Proof. We first prove the following claim.
Since | f (x)| ≤ a j for x ∈ A j , we have Moreover, if j > k then for x ∈ A j and y ∈ A k , we have | f (x)| ≥ | f (y)|. Since µ(A j ) = 2 −j , we get a j 1 [2 − j−1 ,2 − j ) (t).
On the other hand, by (33) we have f L(log L) 2 (log log L) = which together with (34) conclude the proof.