Algebraic twists of modular forms and Hecke orbits

We consider the question of the correlation of Fourier coefficients of modular forms with functions of algebraic origin. We establish the absence of correlation in considerable generality (with a power saving of Burgess type) and a corresponding equidistribution property for twisted Hecke orbits. This is done by exploiting the amplification method and the Riemann Hypothesis over finite fields, relying in particular on the ell-adic Fourier transform introduced by Deligne and studied by Katz and Laumon.


Introduction and statement of results
This paper concerns a certain type of sums involving Fourier coefficients of modular forms, which we call "algebraic twists". Their study can be naturally motivated either from a point of view coming from analytic number theory, or from geometric considerations involving Hecke orbits on modular curves. We will present them using the first approach, and discuss the geometric application in Section 2.2.
We will be considering either holomorphic cusp forms or Maass forms. Precisely, the statement f is a cusp form will mean, unless otherwise indicated, that f is either (1) a non-zero holomorphic cusp form of some even weight k 2 (sometimes denoted k f ) and some level N 1; or (2) a non-zero Maass cusp form of weight 0, level N and Laplace eigenvalue written 1/4 + t 2 f . In both cases, we assume f has trivial Nebentypus for simplicity.
The statement that a cusp form f of level N is a Hecke eigenform will also, unless otherwise indicated, mean that f is an eigenfunction of the Hecke operators T n with (n, N ) = 1.
1.1. Algebraic twists of modular forms. Let f : H → C be a cusp form (as discussed above). We have f (z + 1) = f (z), so f that admits a Fourier expansion at infinity, and we denote the n-th Fourier coefficient of f by ̺ f (n). Explicitly, if f is holomorphic of weight k, the Fourier expansion takes the form f (z) = n 1 n (k−1)/2 ̺ f (n)e(nz), and if f is a Maass form, the Fourier expansion is normalized as in (3.8) below. It follows from Rankin-Selberg theory that the Fourier coefficients ̺ f (n) are bounded on average, namely for some c f > 0. For individual terms, we have for any ε > 0 by the work of Kim and Sarnak [26], and moreover, if f is holomorphic, it follows from Deligne's proof of the Ramanujan-Petersson conjecture that the ̺ f (n) are almost bounded, so that ̺ f (n) ≪ ε,f n ε for any ε > 0. On the other hand, it is also well-known that the Fourier coefficients oscillate quite substantially, as the estimate (1.3) n x ̺ f (n)e(αn) ≪ x 1/2 (log 2x) valid for x 1 and α ∈ R, with an implied constant depending on f only, shows (see, e.g., [21,Th. 5.3] and [22,Th. 8.1]).
One may ask, more generally, whether the sequence (̺ f (n)) n 1 correlates with another bounded (or essentially bounded) sequence K(n). This may be defined formally as follows: (K(n)) does not correlate with the Fourier coefficients of f if we have n x ̺ f (n)K(n) ≪ x(log x) −A for all A 1, the implied constant depending on A. 1 There are many known examples, of which we list only a few particularly interesting ones: • For K(n) = µ(n), the Möbius function, the non-correlation is an incarnation of the Prime Number Theorem, and is a consequence of the non-vanishing of the Hecke L-function L(f, s) for Re s = 1 when f is primitive; more generally, for K(n) = µ(n)e(nα) where α ∈ R/Z, non-correlation has been obtained recently by Fouvry and Ganguly [14]; • When K(n) = ̺ g (n) for g any modular form which is orthogonal to f , non-correlation is provided by Rankin-Selberg theory; • For K(n) = ̺ g (n + h) with h = 0 and g any modular form, whether it is orthogonal to f or not, non-correlation follows from the study of shifted-convolution sums, and has crucial importance in many studies of automorphic L-functions.
There are many other weights which occur naturally. We highlight two types here. First, given rational functions φ 1 , φ 2 , say φ i (X) = R i (X) S i (X) ∈ Q(X), i = 1, 2 with R i , S i ∈ Z[X] coprime (in Q[X]), and given a non-trivial Dirichlet character χ (mod p), one can form where inverses are computed modulo p and with the usual convention χ(0) = 0. We will show that (1.4) holds for such weights with an absolute exponent of Burgess type (see Corollary 2.1 below). The proof depends ultimately on the Riemann Hypothesis over finite fields, which is applied in order to estimate exponential sums in 3 variables with square-root cancellation, using Deligne's results [7]. Second, for m 1 and a ∈ F × p let We will also show a bound of the type (1.4) for these rather wild weights. The precise common feature of these examples is that they arise as linear combination of Frobenius trace functions of certain ℓ-adic sheaves over the affine line A 1 Fp (for some prime ℓ = p). We therefore call these functions trace weights, and we will give the precise definition below. To state our main result, it is enough for the moment to know that we can measure the complexity of a trace weight modulo p with a numerical invariant called its conductor cond(K). Our result is, roughly, that when cond(K) remains bounded, K(n) does not correlate with Fourier coefficients of modular forms.
As a last step before stating our main result, we quantify the properties of the test function V that we handle. Given P > 0 and Q 1 real numbers, we define: Definition 1.1 (Condition (V (P, Q))). Let P > 0 and Q 1 be real numbers. A smooth compactly supported function V on [0, +∞[ satisfies Condition (V (P, Q)) if (1) The support of V is contained in the dyadic interval [P, 2P ]; (2) For all x > 0 and all integer ν 0 we have the inequality In particular, |V (x)| 1 for all x.

Good functions and correlating matrices.
To deal with the level of generality we consider, it is beneficial at first to completely forget all the specific properties that K might have, and to proceed abstractly. Therefore we consider the problem of bounding the sum S V (f, K; p) for K : Z/pZ → C a general function, assuming only that we know that |K(n)| M for some M that we think as fixed.
For the case of Dirichlet characters, Duke, Friedlander and Iwaniec [10] amplified K(n) = χ(n) among characters with a fixed modulus. Given the absence of structure on K in our situation, this strategy seems difficult to implement. Instead, we use an idea found in [5]: 3 we consider K "fixed", and consider the family of sums S V (g, K; p) for g varying over a basis of modular cusp forms of level p, viewing f (suitably normalized) as an old form modulo p. Estimating the amplified second moment of S V (g, K; p) over that family by the Petersson-Kuznetzov formula and the Poisson formula, we ultimately have to confront some sums which we call correlation sums, which we now define.
The matrices γ which arise in our amplification are the reduction modulo p of integral matrices parameterized by various coefficients from the amplifier, and we need the sums C(K; γ) to be as small as possible.
If K ∞ M (or even K 2 M ), then the Cauchy-Schwarz inequality and the Parseval formula show that (1.10) |C(K; γ)| M 2 p.
This bound is, unsurprinsingly, insufficient. Our method is based on the idea that C(K; γ) should be significantly smaller for most of the γ which occur (even by a factor p −1/2 , according to the square-root cancellation philosophy) and that we can control the γ where this cancellation does not occur. By this, we mean that these matrices (which we call the set of correlation matrices) is nicely structured and rather small, unlessK is constant, a situation which means that K(n) is proportional to e( an p ) for some a ∈ Z, in which case we can use (1.3) anyway. In this paper, the structure we obtain is algebraic. To discuss it, we introduce the following notation concerning the algebraic subgroups of PGL 2 : -we denote by B ⊂ PGL 2 the subgroup of upper-triangular matrices, the stabilizer of ∞ ∈ P 1 ; -we denote by w = 0 1 1 0 the Weyl element, so that Bw (resp. wB) is the set of matrices mapping 0 to ∞ (resp ∞ to 0); -we denote by PGL 2,par the subset of matrices in PGL 2 which are parabolic, i.e., which have a single fixed point in P 1 ; -Given x = y in P 1 , the pointwise stabilizer of x and y is denoted T x,y (this is a maximal torus), and its normalizer in PGL 2 (or the stabilizer of the set {x, y}) is denoted N x,y . Definition 1.8 (Correlation matrices and good weights). Let p be a prime and K : F p → C an arbitrary function. Let M 1 be such that K 2 M .
(1) We let the set of M -correlation matrices.
(2) We say that K is (p, M )-good if there exist at most M pairs (x i , y i ) of distinct elements in P 1 (F p ) such that In other words: given M 1 and p a prime, a p-periodic weight K is (p, M )-good if the only matrices for which the estimate |C(K; γ)| M p 1/2 fails are either (1) upper-triangular or sending 0 to ∞ or ∞ to 0; or (2) parabolic; or (3) elements which permute two points defined by at most M integral quadratic (or linear) equations. We note that if we fix such data, a "generic" matrix is not of this type.
This notion has little content if M is larger that p 1/2 , but we will already present below some elementary examples of (p, M )-good weights, together with their sets of correlation matrices for M fixed and p arbitrary large (not surprisingly, all these examples come from trace weights).
Given a (p, M )-good weight K, we next show using counting arguments that the set of matrices γ constructed from the amplifier does not intersect the set of correlating matrices in a too large set and we eventually obtain our main technical result: Theorem 1.9 (Bounds for good twists). Let f be a Hecke eigenform, p be a prime number and V a function satisfying (V (P, Q)). Let M 1 be given, and let K be a (p, M )-good function modulo p with K ∞ M .
There exists s 1 absolute such that for any δ < 1/8, where the implied constant depends only on (f, δ).
Remark 1.10. Although it is an elementary step (compare (5.14) and (5.15) in the proof) the beautiful modular interpretation of correlation sums is a key observation for this paper. It gives a group theoretic interpretation and introduce symmetry into sums, the estimation of which might otherwise seem to be hopeless.

1.3.
Trace functions of ℓ-adic sheaves. The class of functions to which we apply these general considerations are the trace weights modulo p, which we now define formally. Let p be a prime number and ℓ = p an auxiliary prime. The weights K(x) modulo p that we consider are the trace functions of suitable constructible sheaves on A 1 Fp evaluated at x ∈ F p . To be precise, we will consider ℓ-adic constructible sheaves on A 1 Fp . The trace function of such a sheaf F takes values in an ℓ-adic field so we also fix an isomorphism ι :Q ℓ −→ C, and we consider the weights of the shape  (1), the restriction of F to U is geometrically isotypic when seen as a representation of the geometric fundamental group of U : it is the direct sum of several copies of some (necessarily non-trivial) irreducible representation of the geometric fundamental group of U (see [24, §8.4]).
If F is geometrically irreducible (instead of being geometrically isotypic), the sheaf will be called an irreducible trace sheaf.
We use similar terminology for the trace functions: Definition 1.12 (Trace weight). Let p be a prime number. A p-periodic weight K(n) defined for n 1, seen also as a function on F p , is a trace weight (resp. Fourier trace weight, isotypic trace weight) if there is some trace sheaf (resp. Fourier trace sheaf, resp. isotypic trace sheaf) F on A 1 Fp such that K is given by (1.13).
We need an invariant to measure the geometric complexity of a trace weight, which may be defined in greater generality.
Fp , of rank rank(F) with n(F) singularities in P 1 , and with Swan(F) = x Swan x (F) the (finite) sum being over all singularities of F, we define the (analytic) conductor of F to be (1.14) cond(F) = rank(F) + n(F) + Swan(F).
If K(n) is a trace weight modulo p, its conductor is the smallest conductor of a trace sheaf F with trace function K.
With these definitions, our third main result, which together with Theorem 1.9 immediately implies Theorem 1.2, is very simple to state: Theorem 1.14 (Trace weights are good). Let p be a prime number, N 1 and F an isotypic trace sheaf on A 1 Fp , with conductor N . Let K be the corresponding isotypic trace weight. Then K is (p, aN s )-good for some absolute constants a 1 and s 1.  7) and a wide range of algebraic exponential sums, as well as point-counting functions for families of algebraic varieties over finite fields. From our point of view, the uniform treatment of trace functions is one of the main achievements in this paper. In fact our results can be read as much as being primarily about trace functions, and not Fourier coefficients of modular forms. Reviewing the literature, we have, for instance, found several very fine works in analytic number theory that exploit bounds on exponential sums which turn out to be special cases of the correlation sums of Theorem 1.14 (see [15,17,20,32,31]). Other works in preparation confirm the importance of trace weights (see [13]).
(2) Being isotypic is of course not stable under direct sum, but using Jordan-Hölder components, any Fourier trace weight can be written as a sum (with non-negative integral multiplicities) of isotypic trace weights, which allows us to extend many results to general trace weights (see Corollary 1.6).
1.4. The ℓ-adic Fourier transform and the Fourier-Möbius group. We now recall the counterpart of the Fourier transform at the level of sheaves, which was discovered by Deligne and developped especially by Laumon [28]. This plays a crucial role in our work.
Fix a non-trivial additive character ψ of F p with values inQ ℓ . For any Fourier sheaf F on A 1 , we denote by G ψ = FT ψ (F)(1/2) its (normalized) Fourier transform sheaf, where the Tate twist is always defined using the choice of square root of p inQ ℓ which maps to √ p > 0 under the fixed isomorphism ι (which we denote √ p or p 1/2 ). We will sometimes simply write G, although one must remember that this depends on the choice of the character ψ. Then G is another Fourier sheaf, such that for any y ∈ F p (see [25,Th. 7.3.8,(4)]).
In particular, if K is given by (1.13) and ψ is such that ι(ψ(x)) = e x p for x ∈ F p (we will call such a ψ the "standard character" relative to ι), then we have (1.15) ι((tr G)(F p , y)) = −K(y) for y in Z.
A key ingredient in the proof of Theorem 1.14 is the following geometric analogue of the set of correlation matrices: Definition 1.16 (Fourier-Möbius group). Let p be a prime number, and let F be an isotypic trace sheaf on A 1 Fp , with Fourier transform G with respect to ψ. The Fourier-Möbius group G F is the subgroup of PGL 2 (F p ) defined by The crucial feature of this definition is that G F is visibly a group (it is in fact even an algebraic subgroup of PGL 2,Fp , as follows from constructibility of higher-direct image sheaves with compact support, but we do not need this in this paper; it is however required in the sequel [13]). The fundamental step in the proof of Theorem 1.14 is the fact that, for F of conductor M , the set G K,M of correlation matrices is, for p large enough in terms of M , a subset of G F . This will be derived from the Riemann Hypothesis over finite fields in its most general form (see Corollary 9.2).

Basic examples.
We present here four examples where G K,M can be determined "by hand", though sometimes this may require Weil's results on exponential sums in one variable or even optimal bounds on exponential sums in three variables. This already gives interesting examples of good weights.
(2) Recall that the classical Kloosterman sums are defined by for c 1 and integers a and b.
We consider K(n) = S(1, n; p)/ √ p for 1 n p. By Weil's bound for Kloosterman sums, we have |K(n)| 2 for all n. We getK(v) = 0 for v = 0 and According to the results of Weil, we have |C(K; γ)| 2p 1/2 provided the rational function is not of the form φ(X) p − φ(X) + t for some constant t ∈ F p and φ ∈ F p (X) (and of course, in that case the sum is p − 3). But φ would have to be constant, and so for M 3 and p such that p − 3 > 3 √ p, the set G K,M is the set of γ for which (1.16) is a constant. A moment's thought then shows that Thus K is (p, 3)-good for all p 17.
(3) Let K(n) = e(n 2 /p). For p odd, we get by completing the square, where τ p is the quadratic Gauss sum. Since |τ p | 2 = p, we find for γ ∈ PGL 2 (F p ) as above the formula p .
For p 3, Weil's theory shows that |C(K; γ)| 2p 1/2 for all γ such that the rational function is not constant and otherwise |C(K; γ)| p − 1. Thus for M 2 and p 7 (when p − 1 > 2p 1/2 ), the set G K,M is the set of γ for which this function is constant: this requires c = 0 (the second term can not have a pole), and then we get the conditions b = 0 and (a/d) 2 = 1, so that Thus that weight K is (p, 2)-good for all primes p 7.
(4) Let K(n) = χ(n) where χ is a non-trivial Dirichlet character modulo p. Then we havê is the Gauss sum associated to χ. Then for γ as above, we have Again from Weil's theory, we know that |C(K; γ)| 2p 1/2 unless the rational function is of the form tP (X) h for some t ∈ F p and P ∈ F p (X), where h 2 is the order of χ (and in that case, the sum has modulus p − 3). This means that for M 2, and p 11, the set G K,M is the set of those γ where this condition is true. Looking at the order of the zero or pole at 0, we see that this can only occur if either b = c = 0 (in which case the function is the constant da −1 ) or, in the special case h = 2, when a = d = 0 (and the function is cb −1 X 2 ). In other words, for p 11 and M 2, we have if χ is real-valued. In both cases, these matrices are all in B(F p ) ∪ B(F p )w, so that the weight χ(n) is (p, 2)-good, for all p 11.
1.6. Notation. As usual, |X| denotes the cardinality of a set, and we write e(z) = e 2iπz for any z ∈ C. If a ∈ Z and n 1 are integers and (a, n) = 1, we sometimes writeā for the inverse of a in (Z/nZ) × ; the modulus n will always be clear from context. We write F p = Z/pZ.
where X is an arbitrary set on which f is defined, we mean synonymously that there exists a constant C 0 such that |f (x)| Cg(x) for all x ∈ X. The "implied constant" refers to any value of C for which this holds. It may depend on the set X, which is usually specified explicitly, or clearly determined by the context. We write f (x) ≍ g(x) to mean f ≪ g and g ≪ f . The notation n ∼ N means that the integer n satisfies the inequalities N < n 2N .
For a rational number x = a/b with a, b coprime integers, the height of x is defined by H(x) = max(|a|, |b|). We denote the divisor function by d(n).
Concerning sheaves, for a = 0, we will write [×a] * F for the pullback of a sheaf F on P 1 under the map x → ax.
For a sheaf F on P 1 /k, where k is an algebraic closure of a finite field, and x ∈ P 1 , we write F(x) for the representation of the inertia group at x on the geometric generic fiber of F, and F x for the stalk of F at x.
For F a sheaf on P 1 /k, where now k is a finite field of characteristic p, and for ν an integer or ±1/2, we also write F(ν) for the Tate twist of F, with the normalization of the half-twist as discussed in Section 1.4 using the underlying isomorphism ι :Q ℓ → C. From context, there should be no confusion between the two possible meanings of the notation F(x).

1.7.
Acknowledgments. This paper has benefited from the input of several people which we would like to thank V. Blomer, T. Browning, J. Ellenberg, C. Hall, H. Iwaniec, N. Katz, E. Lindenstrauss, R. Pink, G. Ricotta, P. Sarnak, A. Venkatesh and D. Zywina for input and encouraging comments; P. Nelson for illuminating discussions and a careful reading of the manuscript (yet the errors that may remain are entirely ours); G. Harcos for decisive comments on an earlier version of this paper which has led to a significant improvement.

Some applications
2.1. Characters and Kloosterman sums. We first spell out the examples of the introduction involving the weights (1.6) and (1.7). We give the proof now to illustrate how concise it is given our results, referring to later sections for some details.
Proof. The first case follows directly from Theorem 1.2 if φ 1 and φ 2 satisfy the assumption of Theorem 10.1. Otherwise we have K(n) = e( an+b p ) and the bound follows from (1.3). In the second case, we claim that K tr,s ≪ 1, where the implied constant depends only on (m, φ, Φ), so that Corollary 1.6 applies. Indeed, the triangle inequality shows that we may assume that Φ(U, V ) = U u V v is a non-constant monomial. Let Kℓ m,φ be the hyper-Kloosterman sheaf discussed in §10.3, Kℓ m,φ its dual and consider the sheaf of rank m 2 uv given by with associated trace function We replace F by its semisimplification (without changing notation), and we write where F 2 is the direct sum of the irreducible components of F which are geometrically isomorphic to Artin-Schreier sheaves L ψ , and F 1 is the direct sum of the other components. The trace function K 2 of F 2 is a sum of at most m 2 uv additive characters (times complex numbers of modulus 1) so by §10.3, so each of its isotypic components has conductor similarly bounded. (Compare with Proposition 8.3).

2.2.
Distribution of twisted Hecke orbits and horocycles. We present here a geometric consequence of our main result. Let Y 0 (N ) denote the modular curve Γ 0 (N )\H. For a prime p coprime to N , we denote byT p the geometric Hecke operator that acts on complex-valued functions f defined on Y 0 (N ) by the formulã (note that this differs from the usual Hecke operator T p = (p + 1)p −1/2T p acting on Maass forms, defined in (3.2)).
As we will also recall more precisely in Section 3, the L 2 -space has a basis consisting ofT p -eigenforms f , which are either constant functions, Maass cusp forms or combinations of Eisenstein series, with eigenvalues ν f (p) such that for some absolute constant θ < 1/2 (e.g., one can take θ = 7/64 by the work of Kim and Sarnak [26]). This bound implies the well-known equidistribution of the Hecke orbits {γ t ·τ } for a fixed τ ∈ Y 0 (N ), as p tends to infinity. Precisely, let where, for any τ ∈ H, δ Γ 0 (N )τ denotes the Dirac measure at Γ 0 (N )τ ∈ Y 0 (N ). Then µ p,τ → µ as p → +∞, in the weak- * sense, where µ is the hyperbolic probability measure on Y 0 (N ).
Note that all but one point of the Hecke orbit lie on the horocycle at height Im (τ )/p in Y 0 (N ) which is the image of the segment x + iIm (τ )/p where 0 x 1, so this can also be considered as a statement on equidistribution of discrete points on such horocycles.
We can then consider a variant of this question, which is suggested by the natural parameterization of the Hecke orbit by the F p -rational points of the projective line. Namely, given a complex-valued function K : F p → C 13 and a point z ∈ Y 0 (N ), we define a twisted measure which is now a (finite) signed measure on Y 0 (N ). We call these "algebraic twists of Hecke orbits", and we ask how they behave when p is large. For instance, K could be a characteristic function of some subset A p ⊂ P 1 (F p ), and we would be attempting to detect whether the subset A p is somehow biased in such a way that the corresponding fragment of the Hecke orbit always lives in a certain corner of the curve Y 0 (N ). We will prove that, when 1 Ap can be expressed or approximated by a linear combination of the constant function 1 and trace weights with bounded conductors, this type of behavior is forbidden. For instance if A p = (p) is the set of quadratic residues modulo p one has where it is pointed out that it is intimately related to the Burgess bound for short character sums and to subconvexity bounds for Dirichlet L-functions of real characters and twists of modular forms by such characters.
Our result is the following: For each prime p, let K p be an isotypic trace weight modulo p and Let µ Kp,Ip,τ be the signed measure Then, for any given τ ∈ H, and I p such that |I p | p 1−δ for some fixed δ < 1/16, the measures µ Kp,Ip,τ converge to 0 as p → +∞.
Here is a simple application where we twist the Hecke orbit by putting a multiplicity on the γ t corresponding to the value of a polynomial function on F p .

Corollary 2.3 (Polynomially-twisted Hecke orbits).
Let φ ∈ Z[X] be an arbitrary non-constant polynomial. For any τ ∈ Y 0 (N ) and any interval of length |I p | p 1−δ for some fixed δ < 1 16 , the sequence of measures values of φ has positive density in F p for p large, but the limsup of the density |A p |/p is usually strictly less than 1. The statement means, for instance, that the points of the Hecke orbit of τ parameterized by A p can not be made to almost all lie in some fixed "half" of Y 0 (N ), when φ is fixed.
These result could also be interpreted in terms of equidistribution of weighted p-adic horocycles; similar questions have been studied in different contexts for rather different weights in [36,38,37] (e.g., for short segments of horocycles). Also, as pointed out by P. Sarnak, the result admits an elementary interpretation in terms of representations of p by the quaternary quadratic form det(a, b, c, d) = ad − bc (equivalently in terms of of integral matrices of determinant p). Let It is well-known that the non-trivial bound (2.1) implies the equidistribution of with respect to the Haar measure on SL 2 (R) (see [34] for much more general statements). Now, any matrix γ ∈ M (p) 2 (Z) defines a non-zero singular matrix modulo p and determines a point z(γ) in P 1 (F p ), which is defined as the kernel of this matrix (e.g. z(γ t ) = t). By duality, our results imply the following refinement: for any non-constant polynomial φ ∈ Z[X], the subsets 2.3. Trace weights over the primes. In the paper [13], we build on our results and on further ingredients to prove the following statement: Theorem 2.4. Let K be an isotypic trace weight modulo p, associated to a sheaf F with conductor M , and such that F is not geometrically isomorphic to a direct sum of copies of a tensor product L χ(X) ⊗ L ψ(X) for some multiplicative character χ and additive character ψ. Then for any X 1, we have for any η < 1/48. The implicit constants depend only on η and M . Moreover, the dependency M is at most polynomial.
These bounds are non-trivial as long as X p 3/4+ε for some ε > 0, and for X = p, we save a factor ≫ ε p 1/48−ε over the trivial bound. In other terms, trace weights of bounded conductor do not correlate with the primes or the Möbius function when X is greater that X p 3/4+ε .
The condition we impose on F means that K(n) is not proportional to χ(n)ψ(n) for χ a Dirichlet character and ψ an additive character. This is a natural condition, since to deal with such cases is tantamount to proving a quasi-Riemann hypothesis.
This theorem itself has many applications when specialized to various weights. We refer to [13] for these.

Preliminaries concerning automorphic forms
3.1. Review of Kuznetsov formula. We review here the formula of Kuznetsov which expresses averages of products of Fourier coefficients of modular forms in terms of sums of Kloosterman sums. The version we will use here is taken mostly from [2], though we use a slightly different normalization of the Fourier coefficients.
3.1.1. Hecke eigenbases. Let q 1 be an integer, k 2 an even integer. We denote by S k (q), L 2 (q) and L 2 0 (q) ⊂ L 2 (q), respectively, the Hilbert spaces of holomorphic cusp forms of weight k, of Maass forms and of Maass cusp forms of weight k = 0, level q and trivial Nebentypus (which we denote χ 0 ), with respect to the Petersson norm defined by where k g is the weight for g holomorphic and k g = 0 if g is a Maass form.
These spaces are endowed with the action of the (commutative) algebra T generated by the Hecke operators {T n | n 1}, where where k g = 0 if g ∈ L 2 (q) and k g = k if g ∈ S k (q) (compare with the geometric operatorT p of Section 2.2). Moreover, the operators {T n | (n, q) = 1} are self-adjoint, and generate a subalgebra denoted T (q) . Therefore, the spaces S k (q) and L 2 0 (q) have an orthonormal basis made of eigenforms of T (q) and such a basis can be chosen to contain all L 2 -normalized Hecke newforms (in the sense of Atkin-Lehner theory). We denote such bases by B k (q) and B(q), respectively, and in the remainder of this paper, we tacitly assume that any basis we select satisfies these properties.
The orthogonal complement to L 2 0 (q) in L 2 (q) is spanned by the Eisenstein spectrum E(q) and the one-dimensional space of constant functions. The space E(q) is continuously spanned by a "basis" of Eisenstein series indexed by some finite set which is usually taken to be the set {a} of cusps of Γ 0 (q). It will be useful for us to employ another basis of Eisenstein series formed of Hecke eigenforms: the adelic reformulation of the theory of modular forms provides a natural spectral expansion of the Eisenstein spectrum in which the Eisenstein series are indexed by a set of parameters of the form where χ ranges over the characters of modulus q and B(χ) is some finite (possibly empty) set depending on χ (specifically, B(χ) corresponds to an orthonormal basis in the space of the principal series representation induced from the pair (χ, χ), but we need not be more precise). With this choice, the spectral expansion for ψ ∈ E(q) can be written where the Eisenstein series E χ,g (t) is itself a function from H to C. When needed, we denote its value at z ∈ H by E χ,g (z, t).
The main advantage of these Eisenstein series is the fact that they are Hecke eigenforms for T (q) : for (n, q) = 1, one has

3.1.2.
Multiplicative and boundedness properties of Hecke eigenvalues. Let f be any Hecke eigenform of T (q) , and let λ f (n) denote the corresponding eigenvalue for T n , which is real. Then for (mn, q) = 1, we have This formula (3.4) is valid for all m, n if f is an eigenform for all of T, with an additional multiplicative factor χ 0 (d) in the sum.
We recall some bounds satisfied by the Hecke eigenvalues. First, if f belongs to B k (q) (i.e., is holomorphic) or is an Eisenstein series E χ,f (t), then we have the Ramanujan-Petersson bound for any ε > 0. For f ∈ B(q), this is not known, but we will be able to work with suitable average versions, precisely with the second and fourth-power averages of Fourier coefficients. First, we have uniformly in f , for any x 1 and any ε > 0, where the implied constant depends only on ε (see [11,Prop. 19.6]). Secondly, we have for any x 1 (see, e.g., [27, (3.3), (3.4)]).

Hecke eigenvalues and Fourier coefficients.
For z = x + iy ∈ H, we write the Fourier expansion of a modular form f as follows: f is the Laplace eigenvalue, and [11, §4]; see also [16, 9.222.2,9.235.2].) When f is a Hecke eigenform, there is a close relationship between the Fourier coefficients of f and its Hecke eigenvalues λ f (n): for (m, q) = 1 and any n 1, we have and moreover, these relations hold for all m, n if f is a newform, with an additional factor χ 0 (d).
In particular, for (m, q) = 1, we have be Bessel transforms. Then for positive integers m, n we have the following trace formula due to Kuznetsov:

3.2.
Choice of the test function. For the proof of Theorem 1.9, we will need a function φ in Kuznetsov formula such that the transformsφ(k) andφ(t) are non-negative for k ∈ 2N >0 and t ∈ R ∪ (−i/4, i/4). Such φ is obtained as a linear combination of the following explicit functions. For integers 2 b < a with b odd and a − b ≡ 0 (mod 2), we take (3.18) In particular, Notice that if we have the freedom to choose a and b very large, we can ensure that the Bessel transforms of φ a,b decay faster than any fixed polynomial at infinity. 4. The amplification method 4.1. Strategy of the amplification. We prove Theorem 1.9 using the amplification method; precisely we will embed f in the space of forms of level pN (a technique used very successfully by Iwaniec in various contexts [19,5]), as well as by others [4], [3]. The specific implementation of amplification (involving the full spectrum, even for a holomorphic form f ) is based on [2].
We consider an automorphic form f of level N , non-zero, which is either a Maass form with Laplace eigenvalue 1/4 + t 2 f , or a holomorphic modular form of even weight k f 2, and which is an eigenform of all Hecke operators T n with (n, pN ) = 1.
By viewing f as being of level 2 or 3 if N = 1, we can assume that N 2, which will turn out to be convenient at some point of the later analysis. We will also assume that f is L 2 -normalized with respect to the Petersson inner product (3.1).
Finally, we can also assume that p > N , hence p is coprime with N . The form f is evidently a cusp form with respect to the smaller congruence subgroup Γ 0 (pN ) and the function Let a > b 2 be even integers, to be chosen later (both will be taken to be large), let φ = φ a,b be the function (3.17) defined in section 3.2. We define "amplified" second moments of the sums S(g, K; p), where g runs over suitable bases of B(q) and B k f (q). Precisely, for any coefficients (b ℓ ) defined for ℓ 2L and supported on ℓ ∼ L, and any modular form h, we define an amplifier B(h) by We will also use the notation for χ a Dirichlet character modulo N and g ∈ B(χ). We then let for any even integer k 2. In all cases, the sum M (L) contains a term involving S(f, K; p) = S V (f, K; p) (we omit the fixed test-function V from the notation for simplicity).
We will show: Proposition 4.1 (Bounds for the amplified moment). Assume that M 1 is such that K is (p, M )-good. Let V be a smooth compactly supported function satisfying Condition (V (P, Q)). Let (b ℓ ) be arbitrary complex numbers supported on primes ℓ ∼ L, such that |b ℓ | 2 for all ℓ.
For any ε > 0 there exist k(ε) 2, such that for any k k(ε) and any integers a > b > 2 we have The implied constants depend on (ε, a, b, f ), but they are independent of k.
We will prove Proposition 4.1 in Sections 5 and 6, but first we show how to exploit it to prove the main result.

4.2.
From Proposition 4.1 to Theorem 1.9. We assume here Proposition 4.1 and proceed to the proof of the main theorem.
The amplifier we use is due to Venkatesh. We put (note the use of Hecke eigenvalues, and not Fourier coefficients, here). With this choice, the pointwise bound |b ℓ | 1 is obvious, and on average we get Moreover, for L < p, we have where the implied constant depends on f . Indeed, we have which we bound from below by writing (using the Cauchy-Schwarz inequality, the Prime Number Theorem for the Rankin-Selberg L- Thus by (3.7), we have where the implied constant depends on f . 20 Remark 4.2. Using ideas of Holowinsky, it is possible to improve this lower bound a bit to B(f ) ≫ f L/ log L (see [14,Proposition 3.1]). A direct application of the prime number theorem for L(f ⊗ f, s) yields en exponential dependency on the parameters of f ; however, with more sophisticated -Hoheisel-type -arguments (see [30] for instance), this dependency can be made polynomial.
Now we apply Proposition 4.1 for this choice. We recall from (3.19) that we havẽ in the second and third terms of the sum defining M (L), while for k 2, even, we have under our conditions on a and b.
On the other hand, if L < 1, we have P + Q > 1 2 p 1/4−ε , and then the estimate (4.13) is trivial. Thus we obtain Theorem 1.9.

4.3.
Packets of Eisenstein series. The above argument also yields a similar bound for packets of unitary Eisenstein series, i.e., when f is replaced by where χ is a Dirichlet character of modulus N , g ∈ B(χ) and ϕ is some smooth compactly supported function. We have the following: There exists an absolute constant s 1 such that for any δ < 1/8, where the implied constant depends only on (N, δ, ϕ).
Proof. Let T 0 be such that the support of ϕ is contained in [−T, T ]. Then we have and we will bound the right-hand side. We therefore get and the same argument used in the previous section leads to

Estimation of the amplified second moment
We begin here the proof of Proposition 4.1. Obviously, we can assume that P p, Q p. We start by expanding the squares in B(g) and |S(g, K; p)| 2 , getting and similarly where we used the fact that the Hecke eigenvalues λ g (ℓ 2 ) and λ χ (ℓ 2 , t) which are involved are real for ℓ 2 coprime to pN , because of the absence of Nebentypus. where, for instance, we have By (3.6) and the bound |b ℓ | 2, we get where the implied constant is independent of f . We can then apply the rapid decay (3.18) ofφ(t) at infinity and the large sieve inequality of Deshouillers-Iwaniec [9, Theorem 2, (1.29)] to obtain where the implied constant depends only on (ε, N ). The bounds for the holomorphic and Eisenstein portion are similar and in fact slightly simpler as we can use Deligne's bound on Hecke eigenvalues of holomorphic cusp form (or unitary Eisenstein series) instead of (3.7) (still using [9, Th. 2, (1.28), (1.30)]). And the treatment of M d (L; k) is essentially included in that of the holomorphic contribution.
On the other hand, by (3.15), there is no diagonal contribution for M nd (L), and we write where ∆ q,φ (m, n) is defined in (3.16).

Diagonal terms.
We begin with M 1 (L; k): we have Until this point, V could be arbitrary (provided the sums we wrote made sense). Now assume that V has compact support in [P, 2P ]. Then the sum over m is in fact of length ≪ min(pP/d, pP/e), the implied constant depending on V . But since de = ℓ 1 ℓ 2 with ℓ i ∼ L, we have max(d, e) > L.
Thus, simply using the bound |K(n)| M and the boundedness of b ℓ , we get:

5.4.
Arranging the off-diagonal terms. Now comes the most important case of M 2 (L) and M 2 (L; k). Their shape is very similar, so we define for an arbitrary function φ. We then have and We first transform these sums by writing where S(en 1 , n 2 ; cpN )K(dn 1 )K(n 2 )H φ (n 1 , n 2 ), Having fixed d, e as above, let C = C(d, e) 1/2 be a parameter. We decompose further where M 2,C [φ; d, e] denotes the contribution of the terms with c > C, and correspondingly We begin by estimating those, assuming that For our specific choices of φ, we note that we have the upper-bound where the constant implied is absolute. Recalling the definition (3.17), we obtain (5.6) with κ = a−b for φ = φ a,b and with κ = k − 1 for φ = 2πi −k J k−1 , and we note that in the latter case, the constant B is independent of k. Then, summing over c > C and then over (d, e), (ℓ 1 , ℓ 2 ), we obtain: Proposition 5.4. With notation as above, assuming that |K| M , we have where the implied constant depends on f .
In view of this proposition, we choose for some small parameter δ > 0 which is at our disposal. Then taking k = k(δ) and a = a(δ), b = b(δ) so that k and a − b are large enough, the total contribution M 2,C of the terms M 2,C [φ a,b ] and M 2,C [φ k ] to M (L) and M (L; k) will be bounded by (5.9) M 2,C ≪ p −10 L 2 P 2 M 2 , so it will be negligible. 26

5.5.
Estimating the off-diagonal terms. It remains to handle the complementary sum (see (5.4)) which is where C is defined by (5.8). In particular, we can assume C 1 otherwise the above sum is zero. Recall that we factored the product of distinct primes ℓ 1 ℓ 2 (with ℓ i ∼ L) as ℓ 1 ℓ 2 = de. Hence we have three types of factorizations of completely different nature, which we denote as follows: • Type (L 2 , 1): this is when d = ℓ 1 ℓ 2 and e = 1, so that L 2 < d 4L 2 ; • Type (1, L 2 ): this is when d = 1 and e = ℓ 1 ℓ 2 , so that L 2 < e 4L 2 ; • Type (L, L): this is when d and e are both = 1 (so d = ℓ 1 and e = ℓ 2 or conversely), so that L < d = e 2L. We will also work under the following (harmless) restriction (5.11) 2p δ P < L.
Note that the last expression is now independent of (x 1 , x 2 ), so that we will be justified to denote this simply by E(c, d, e, n 1 , n 2 ). Opening the Kloosterman sums in (5.13) and changing the order of summation, we see that Our next step is to implement the summation over x 1 and x 2 modulo cN in (5.12): we have We make the following definition: Definition 5.6 (Resonating matrix). For n 1 n 2 ≡ e (mod cN ), the integral matrix γ(c, d, e, n 1 , n 2 ) defined by (5.17) is called a resonating matrix.
Observe that det(γ(c, d, e, n 1 , n 2 )) = de and since de is coprime with p, the reduction of γ(c, d, e, n 1 , n 2 ) modulo p provides a well-defined element in PGL 2 (F p ). 28 5.6. Estimating the Fourier transform. Our next purpose is to truncate the sum over n 1 , n 2 in (5.16). To do this, we introduce a new parameter: Note that, since 1 c C = p δ P (e/d) 1/2 , we have We will use Z to estimate the Fourier transform H φ ( n 1 cpN , n 2 cpN ). The first bound is given by the following lemma: Lemma 5.7. Let (d, e) be of Type (L, L) or of Type (1, L 2 ). Let H φ and Z be defined by (5.3) and (5.18). Assume that V satisfies (V (P, Q)) and that n 1 n 2 = 0.
Proof. (1) Recalling (5.3) and (3.17), we have We use the uniform estimates for the Bessel function, valid for z > 0 and ν 0, where the implied constant depends on a and ν (see [12,Chap. VII]). We also remark that Z is the order of magnitude of the variable inside J a (· · · ) in the above formula, then integrating by parts µ times with respect to x and ν times with respect to y, we get the result indicated.
(2) This is very similar: since we want uniformity with respect to k, we use the integral representation for the Bessel function ( [16, 8.411]). After inserting it in the integral defining the Fourier transform, we find the desired estimates by repeated integrations by parts as before.
where E φ is the subsum ofẼ φ given by The implied constant depends on (δ, ε, N ), but is independent of k for φ = 2πi −k J k−1 .

5.7.
A more precise evaluation. In the range |n i | N i , i = 1, 2 we will need a more precise evaluation. We will take some time to prove the following result: Lemma 5.9. Let (d, e) be of Type (L, L) or of Type (1, L 2 ). Let H φ and Z be defined by (5.3) and (5.18). Assume that V satisfies (V (P, Q)) and that n 1 n 2 = 0.
(1) For φ = φ a,b , we have where the implied constant depends on (a, b, N ).
(2) For φ = φ k , we have where the implied constant depends on N .
Proof. We consider the case φ = φ k , the other one being similar. We shall exploit the asymptotic oscillation and decay of the Bessel function J k−1 (z) for large z. More precisely, we use the formula z which is valid uniformly for z > 0 and k 1 with an absolute implied constant (to see this, use the formula [22, p.227, (B 35)], which holds with an absolute implied constant for z 1 + (k − 1) 2 , and combine it with the bound |J k−1 (x)| 1.) The contribution of the second term in this expansion to 1 The contribution arising from the first term can be written as a linear combination (with bounded coefficients) of two expression of the shape We write these in the form where we note that the function is smooth and compactly supported in [0, 1] 2 , and -crucially -the phase In particular, since Z ≫ p −δ (see (5.19)), we obtain a first easy bound We now prove two lemmas in order to deal with the oscillatory integrals (5.22) above, from which we will gain an extra factor Z 1/2 . We use the notation for a function ϕ on R 2 .
Lemma 5.10. Let F (x, y) be a quadratic form and G(x, y) a smooth function, compactly supported on [0, 1], satisfying the inequality where G 0 is some positive constant. Let λ 2 denote the Lebesgue measure on R 2 . Then, for every B > 0, we have G(x, y)e F (x, y) dy .
To simplify the exposition, we suppose that A(x) is a segment of the form ]a(x), 1] with 0 a(x) 1 (when it consists in two segments, the proof is similar). Integrating by part, we get The first term in the right hand side of (5.25) is ≪ G 0 B −1 . The modulus of the second one is since, on the interval of integration, F (0,1) has a constant sign and F (0,2) is constant. Inserting these estimations in (5.24) and using the equality we complete the proof.
The following lemma gives an upper bound for the constant λ 2 (G(B)) that appears in the previous one. Proof. By integrating with respect to x first, we can write This set is again a segment, of length at most B/|c 1 |. Integrating over y, we get the desired result. 32 We return to the study of the integral appearing in (5.22). Here we see easily that Lemma 5.11 applies with Hence, by Lemma 5.10, we deduce for any B > 0. Choosing B = √ Z, we see that the above integral is ≪ QZ −1/2 . It only remains to gather (5.21), (5.22), (5.23) with the bound Z −3/2 ≪ p δ/2 Q/Z to complete the proof of Lemma 5.9. 5.8. Contribution of the non-correlating matrices. From now on, we simply choose δ = ε > 0 in order to finalize the estimates.
We start by separating the terms according as to whether |C(K; γ(c, d, e, n 1 , n 2 ))| M p 1/2 or not, i.e., as to whether the reduction modulo p of the resonating matrix γ(c, d, e, n 1 , n 2 ) is in the set G K,M of M -correlation matrices or not (see (1.11)). Thus we write where * restricts to those (n 1 , n 2 ) such that γ(c, d, e, n 1 , n 2 ) (mod p) ∈ G K,M , and E n φ is the contribution of the remaining terms. Similarly, we write We will treat M n 3 [φ; d, e] slightly differently, depending on whether (d, e) is of Type (L, L) or of Type (1, L 2 ). For T = (L, L) or (1, L 2 ), we write Notice that in both cases we have by (5.11), where the implied constant depends on N . This shows that the total numbers of terms in the sum E φ (c, d, e) (or its subsums E n φ (c, d, e)) is ≪ N 1 N 2 c −1 .
-When (d, e) is of Type (1, L 2 ), we have d = 1 and We now apply Lemma 5.9. Considering the case of φ = φ k , we get If φ = φ a,b , we obtain the same bound without the factor k 3 , but the implied constant then depends also on (a, b).

Contribution of the correlating matrices
To conclude the proof of Proposition 4.1 we evaluate the contribution M c 3 [φ, d, e], corresponding to the resonating matrices whose reduction modulo p is a correlating matrix, i.e., such that (6.1) γ(c, d, e, n 1 , n 2 ) = n 1 (n 1 n 2 − e)/(cN ) cdN dn 2 (mod p) ∈ G K,M .
The basic idea is that correlating matrices are sparse, which compensates the loss involved in this bound.
Corresponding to Definition 1.8, we write where the superscripts b, p, t, and w denote the subsums of E c φ (c, d, e) where (c, n 1 , n 2 ) are such that the resonating matrix γ = γ(c, d, e, n 1 , n 2 ) is of the corresponding type in Definition 1.8 (in case a matrix belongs to two different types, it is considered to belong to the first in which it belongs in the order b, p, t, w).
We write correspondingly . Most of the subsequent analysis works when d and e are fixed, and we will therefore often write γ(c, d, e, n 1 , n 2 ) = γ(c, n 1 , n 2 ) to simplify notation.
The main tool we use is the fact that, when the coefficients of γ(c, d, e, n 1 , n 2 ) are small enough compared with p, various properties which hold modulo p can be lifted to Z.

Triangular and related matrices. Note that
so that a matrix γ(c, n 1 , n 2 ) can only contribute to E b φ (c, d, e) if p|cN n 1 n 2 . If we impose the condition (6.3) p 3ε LQ < p (which will be strengthened later on), noting the bounds cd dC p ε P √ de ≪ p ε LP, and we see that cdn 1 n 2 N ≡ 0 (mod p) is impossible, hence the sum E b φ (c, d, e) is empty and (6.4) M b 3 [φ; d, e] = 0. 6.2. Parabolic matrices. We now consider E p φ (c, d, e), which is also easily handled. Indeed, a parabolic γ ∈ PGL 2 (F p ) has a unique fixed point in P 1 , and hence any representativeγ of γ in GL 2 (F p ) satisfies tr(γ) 2 − 4 det(γ) = 0. Now if there existed some matrix γ(c, n 1 , n 2 ) which is parabolic modulo p, we would get (n 1 + dn 2 ) 2 = 4de = 4ℓ 1 ℓ 2 (mod p).
6.3. Toric matrices. We now examine the more delicate case of E t φ (c, d, e). Recall that this is the contribution of matrices whose image in PGL 2 (F p ) belong to a set of M tori T x i ,y i . We will deal with each torus individually, so we may concentrate on those γ(c, n 1 , n 2 ) which (modulo p) fix x = y in P 1 (F p ). In fact, we can assume that x and y are finite, since otherwise γ would be treated by Section 6.1.
We make the stronger assumption (6.7) p 3ε LQ < p 1/3 to deal with this case. We therefore assume that there exists a resonating matrix γ(c, n 1 , n 2 ) whose image in PGL 2 (F p ) is contained in T x,y (F p ). From (6.3), we saw already that γ (mod p) is not a scalar matrix. Now consider the integral matrix 2γ − tr(γ)Id = n 1 − dn 2 2(n 1 n 2 − e)/(cN ) 2cdN (which has trace 0). The crucial (elementary!) fact is that, since γ is not scalar, an element γ 1 in GL 2 (F p ) has image in T x,y if and only 2γ 1 − tr(γ 1 )Id is proportional to 2γ − tr(γ)Id (indeed, this is easily checked if x = 0, y = ∞, and the general case follows by conjugation). Hence, if a resonating matrix γ 1 = γ(c 1 , m 1 , m 2 ) has reduction modulo p in T x,y , the matrix Because of (6.7), one sees that these equalities modulo p hold in fact over Z.
If u = 0, one checks by simple algebra that this implies is an integral binary quadratic form. Note that all its coefficients are ≪ p A for some A 0 and that similarly By a classical result going back to Estermann (see, e.g., [18,Theorem 3]), the number of integral solutions (x, y) to the equation F (x, y) = 2eu 2 such that |x|, |y| N i is bounded by ≪ p ε for any ε > 0. But when m 1 and m 2 are given solutions, the value of c 1 is uniquely determined from the second equation in (6.8). Hence the number of possible triples (c 1 , m 1 , m 2 ) is bounded by ≪ ε p ε .
Using Lemma 5.9 and (6.2), we then deduce (for a single torus) 1 and similarly, without the factor k 3 , for φ a,b . Hence, multiplying by the number M of tori, we have N, a, b).
6.4. Normalizers of tori. We now finally examine the contribution of G w K,M , i.e., of resonating matrices γ(c, n 1 , n 2 ) whose image in PGL 2 (F p ) are contained in the non-trivial coset of the normalizer of one of the tori T x i ,y i . Again, we may work with a fixed normalizer N x,y , and we can assume that x and y are finite.
Finally, let γ 2 = γ(c 2 , r 2 ) be some other resonating matrix such that γ 2 (mod p) ∈ N x,y − T x,y . We have the same anti-commutation relation (6.10) γ 2 (2γ − tr(γ)Id) = −(2γ − tr(γ)Id)γ 2 (mod p), and writing Looking at the sizes of u, v, w, we see that if we make the stronger assumption (6.11) p 3ε LQ < p 1/4 , this equation is valid over Z. This means that is again an integral binary quadratic form. Arguing as in the previous case, we conclude that under the assumption (6.11), the total number of resonating matrices γ(c, n 1 , n 2 ) for c C, |n i | N i , i = 1, 2 associated to N x,y − T x,y is ≪ p ε . We deduce then as before the bounds for one normalizer, and therefore for any ε > 0, where the implied constants depend on (a, b, N, ε).
Finally, we observe that if (5.11) does not hold, the above bound remains valid by Lemma 5.1 and (5.1), and this concludes the proof of Proposition 4.1.

Distribution of twisted Hecke orbits and horocycles
We prove in this section, the results of Section 2.2, using the main estimate of Theorem 1.9 as basic tool.
Proof of Theorem 2.2. Let K = K p be an isotypic trace weight and I = I p ⊂ [1, p] an interval. We have to show that if |I| p 3/4+ε for some ε > 0, we have the limit as p → +∞, for all ϕ continuous and compactly supported on Y 0 (N ) and all τ ∈ Y 0 (N ). By the spectral decomposition theorem for Y 0 (N ), it is sufficient to prove the result for ϕ either constant function 1, or a Maass Hecke-eigenforms or ϕ a packet of Eisenstein series.
Let ϕ = f be a Maass cusp form with Fourier expansion We can assume, by linearity, that f is an eigenfunction of the involution z → −z, so that there exists ε f = ±1 with for all n ∈ Z. We now derive the basic identity relating Hecke orbits with the twisted sums of Fourier coefficients: we have whereK is the unitarily-normalized Fourier transform modulo p, as before. Hence, by (7.1), we get x)e(−xRe (τ )).
By Lemma 8.1, Proposition 8.2 and Proposition 8.4, the functions [−x] * K , x ∈ F p are isotypic trace weights whose conductor is bounded solely in terms of cond(K). Therefore we would like to apply Theorem 1.9.
The functions V and W above do not satisfy Condition (V (P, Q)), but it is standard to reduce to this situation. First, we truncate the large values of n, observing that since where the implied constant depends on t (see (3.9)), the contribution of the terms with n p 1+ε to both sums is ≪ τ,t f exp(−p ε/2 ), for any ε > 0. Then, by means of a smooth dyadic partition of the remaining interval, we are reduced to bounding O(log p) sums of the shape for functionsṼ , depending on τ and t f , which satisfy Condition (V (P, Q)) with parameters Q ≪ t f ,ε 1.
But since ̺ χ,g (ϕ, 0)(z) is independent of the real part Re (z), its contribution to µ K,I,τ (E χ,g (ϕ)) is equal to as long as |I| p η with η > 1/2. The exact same argument yield the case ϕ = 1 and this concludes the proof of Theorem 2.2.
Proof of Corollary 2.3. We now consider a non-constant polynomial φ of degree deg φ 1. The probability measure (2.3) satisfies 1 |I| where K(t) = |{x ∈ F p | φ(x) = t}| − 1 for t ∈ F p . By §10.2, K is a Fourier trace weight (not necessarily isotypic), whose Fourier transform is therefore also a Fourier trace weight, given bŷ nφ(x) p , (n, p) = 1 By Proposition 8.3, we can expressK as a sum of at most deg(φ) weightsK i which are irreducible trace weights with conductors bounded by M . The contribution from the termsK i is then treated by the previous proof.

Trace weights
We now come to the setting of Section 1.3. For an isotypic trace weight K(n), we will see that the cohomological theory of algebraic exponential sums and the Riemann Hypothesis over finite fields provide interpretations of the sums C(K; γ), from which it can be shown that trace weights are good.
In this section, we present some preliminary results. In the next one, we give many different examples of trace weights (isotypic or not), and compute upper bounds for the conductor of the associated sheaves. We then use the cohomological theory to prove Theorem 1.14.
First we recall the following notation for trace functions: for a finite field k, an algebraic variety X/k, a constructible ℓ-adic sheaf F on X, a finite extension k ′ /k, and a point x ∈ X(k ′ ), we define (tr F)(k ′ , x) = tr(Fr k ′ | Fx), the trace of the geometric Frobenius automorphism of k ′ acting on the stalk of F at a geometric pointx over x (seen as a finite-dimensional representation of the Galois group of k ′ ; see [25, 7.3.7]). Now let p be a prime number, and let ℓ = p be another auxiliary prime. Let be a fixed isomorphism, and let F be an ℓ-adic constructible Fourier sheaf on A 1 Fp (in the sense of Katz [25,Def. 8.2.1.2]). Recall that we consider the weights . We also consider the (Tate-twist) Fourier transform G = FT ψ (F)(1/2) with respect to an additive ℓ-adic character ψ of F p . It satisfies for any finite extension k/F p and v ∈ k = A 1 (k) (see [25,Th. 7  (2) Suppose that F is pointwise ι-pure of weight 0, i.e., that it is a trace sheaf. Then -G is pointwise ι-pure of weight 0; -at the points v ∈ A 1 where G = FT ψ (F) is not lisse, it is pointwise mixed of weights 0, i.e., for any finite field k with v ∈ k, the eigenvalues of the Frobenius of k acting on the stalk of G at a geometric pointv over v are |k|-Weil numbers of weight at most 0. (3) If F is geometrically isotypic (resp. geometrically irreducible) then the Fourier transform G is also geometrically isotypic (resp. geometrically irreducible).
We defined the conductor of a sheaf in Definition 1.13. An important fact is that this invariant also controls the conductor of the Fourier transform, and that it controls the dimension of cohomology groups which enter into the Grothendieck-Lefschetz trace formula. We state suitable versions of these results: Proposition 8.2. Let p be a prime number and ℓ = p an auxiliary prime.
(2) For F 1 and F 2 lisse ℓ-adic sheaves on an open subset U ⊂ A 1 , we have Note that (8.2) can probably be improved, but this bound will be enough for us. Proof.
(1) It is clear that the rank, number of singularities, and the maximum Swan conductor are the same for G and any sheaf γ * G, since γ is an automorphism of P 1 . Thus we can assume γ = 1. We first bound the number of singularities where λ runs over the breaks of F(∞), and x over the singularities of F in A 1 . The first term is Swan ∞ (F), so that the rank of G is bounded by Thus it only remains to estimate the Swan conductors Swan x (G) at each singularity. We do this using the local description of the Fourier transform, due to Laumon [28], separately for 0, ∞ and points in G m . As for s ∞ , by a further result of Laumon [25,Th. 7.5.4 (1)], the contribution s ∞ is equal to the similar contribution of breaks > 1 to the Swan conductor Swan ∞ (F). Hence (2) We use the Euler-Poincaré formula: for a lisse ℓ-adic sheaf M on an affine curve U ⊂ P 1 over F p , we have We apply this formula to For the second term, we note simply that rank(M)(−χ c (U ×F p )) mr 1 r 2 .
For the last term, we bound the Swan conductor at x ∈ P 1 − U of F 1 ⊗ F 2 in terms of those of the factors. The existence of such a bound is a well-known result: if λ 1 (resp. λ 2 ) is the largest break of F 1 (resp. F 2 ) at x, then all breaks of F 1 ⊗ F 2 at x are at most max(λ 1 , λ 2 ) max(Swan x (F 1 ), Swan x (F 2 )), (see [24,Lemma 1.3]) and hence and Swan(F 1 ⊗ F 2 ) r 1 r 2 (Swan(F 1 ) + Swan(F 2 )) Adding this to the previous contribution, we get dim H 1 c (U ×F p , M) r 1 r 2 (1 + m + cond(F 1 ) + cond(F 2 )), as claimed.
We can also explain here how to deal with Fourier trace weights which are not necessarily isotypic. There exist at most M isotypic trace sheaves F i modulo p, each with conductor M , such that for all x ∈ F p . In particular, for any s 1, the trace weight Proof. Let U ֒→ A 1 be an open dense subset, defined over F p , such that F is lisse on U , and let denote the ℓ-adic representation corresponding to this restriction. Let be the semisimplification of this representation, where ̺ i is an irreducible representation of π 1 (U,η). We denote byF i the corresponding lisse sheaf on U , and let F i = j * Fi . Then each F i is a Fourier sheaf modulo p, with conductor M , and we have (8.8) (tr F)(k, x) = i∈I (tr F i )(k, x) 43 for any finite extension k/F p and x ∈ k. Indeed, this holds by definition for x ∈ U (k), and this extends to all x by properties of middle-extension sheaves. Each ̺ i is arithmetically irreducible, and there are two possibilities concerning its restriction ̺ g i to π 1 (U ×F p ,η): (1) either ̺ g i is isotypic, and hence F i is an isotypic trace sheaf; or (2) there exists an integer m 2, and a representation τ i of the proper normal subgroup H = π 1 (U × F p m ,η) such that ̺ i = Ind π 1 (U,η) H τ i (see, e.g., [35,Prop. 8.1]). We claim that in this second case, the trace function of F i is identically zero on F p , which finishes the proof since we can then drop F i from the decomposition (8.8).
To check the claim, note that it is obvious that the formula for the character of an induced representation shows that tr ̺ i (g) = 0 for any g / ∈ H. Hence the trace functions vanishes obviously on U (F p ) since the Frobenius elements associated to x ∈ U (F p ) relative to F p are not in H. This extends to x ∈ (A 1 − U )(F p ) by a similar argument applied to the representations of the inertia group on the stalks at a geometric point above x, which are similarly induced, and where the Frobenius classes also do not belong to the subgroup from which the stalk of F i is induced (we thank N. Katz for explaining this.) The following is relevant to Theorem 2.2.
Proposition 8.4. Let p be a prime number, ℓ = p an auxiliary prime. Let F be an ℓ-adic Fourier trace sheaf modulo p with conductor N . Let K(n) be the corresponding Fourier trace weight. Then, for any x ∈ F p , [+x] * K(n) = K(x + n) defines a Fourier trace weight associated to the sheaf and we have cond(F (x) ) = cond(F) N for all x ∈ F p .
Proof. It is clear that F (x) has the right trace function and that it is a Fourier trace sheaf, with the same conductor as F.
Finally, we state a well-known criterion for geometric isomorphism of sheaves, that says that two irreducible middle-extension sheaves are geometrically isomorphic if their trace functions are equal on A 1 (F p ) "up to a constant depending on the definition field". Precisely: Proposition 8.5 (Geometric isomorphism criterion). Let k be a finite field, and let F 1 and F 2 be geometrically irreducible ℓ-adic sheaves, lisse on a non-empty open set U/k and pointwise pure of weight 0. Then F 1 is geometrically isomorphic to F 2 if and only if there exists α ∈Q × ℓ such that for all finite extensions k 1 /k, we have for all x ∈ U (k 1 ).
In particular, if F 1 and F 2 are irreducible Fourier sheaves, they are geometrically isomorphic if and only if there exists α ∈Q × ℓ such that for all finite extensions k 1 /k, we have (8.10) (tr F 1 )(k 1 , x) = α [k 1 :k] (tr F 2 )(k 1 , x) for all x ∈ k 1 . This is a well-known fact; it is basically an instance of what is called "Clifford theory" in representation theory.
Here is a last definition. If F is a Fourier sheaf on A 1 /k, we write D(F) for the middle-extension dual of F, i.e., given a dense open set j : U ֒→ A 1 where F is lisse, we have D(F) = j * ((j * F) ′ ), where the prime denotes the lisse sheaf on U associated to the contragredient of the representation of the fundamental group of U which corresponds to j * F (see [25, 7.3.1]). If F is pointwise pure of weight 0, it is known that (8.11) ι((tr D(F))(k ′ , x)) = ι((tr F)(k ′ , x)) for all finite extensions k ′ /k and all x ∈ k ′ .

Application of the Riemann Hypothesis
We can now prove that correlation sums of trace weights are small, except for matrices in the Fourier-Möbius group. This is the crucial argument that relies on the Riemann Hypothesis over finite fields.
Theorem 9.1 (Cohomological bound for correlation sums). Let p be a prime number, ℓ = p another prime. Let F be an isotypic trace sheaf on A 1 Fp and let K denote its trace function. Let G be the Fourier transform of F computed with respect to some non-trivial additive character ψ, and denote by U the largest open subset of A 1 where G is lisse. We have The bounds (9.2) are certainly not sharp, but they show that the result is completely effective and explicit. Now fix a non-constant rational fraction, φ(T ) = R(T )/S(T ), R(T ), S(T ) ∈ Z[T ]. Assuming that p is large enough (greater that the degree of R, S and all their coefficients), the sheaf Kℓ m,φ = φ * Kℓ m satisfies (tr Kℓ m,φ )(F p , a) = (−1) m−1 Kl m (φ(a); p) for a ∈ F p − φ −1 ({0, ∞}). The following result is the main input to the proof of the second part of Corollary 2.1.
Proof. Deligne has shown that Kℓ m has rank m, is lisse on G m , and is tame at 0 and totally wild at ∞ with Swan conductor 1, so that cond(Kℓ m ) = m + 3 (see, e.g., [24, 11.0.2]). It follows therefore that Kℓ m,φ is of rank m, is lisse outside of the set φ −1 ({0, ∞}), is tame at the zeros of φ and wild at its poles. At a pole x ∈ φ −1 (∞) of order d x , the map φ is genericallyétale, and hence we know that Swan x (φ * Kℓ m ) = d x Swan ∞ (Kℓ m ) = d x by [24, 1.13.1]. Finally, Katz has showed that Kℓ m is geometrically Lie-irreducible (see [24,Thm. 11.1]), i.e., that its restriction to any finite-index subgroup of the fundamental group of G m is geometrically irreducible. Since φ is non-constant, this shows that Kℓ m,φ is also irreducible. 11. Examples of determination of G F Theorem 1.14 solves completely the question of showing that isotypic trace weights are good, reducing it to an estimation of the conductor of the associated sheaf. However we find it instructive to determine G F as precisely as possible for interesting families of weights, as was already done in Section 1.5 in simple cases. This gives illustrations of the various possibilities, and would be a first step in trying to improve the generic exponent 1/8. Since we won't need these results for this paper, we leave the proof to the reader as an exercise in the theory of the ℓ-adic Fourier transform (proximity with [24,25] is strongly advised).

Mixed characters. Let
be a sheaf corresponding to mixed characters, where either φ 1 is not a polynomial of order 1, or η is non-trivial of order h 2 and φ 2 is not of the form tφ 3 (X) h for some t ∈ F × p , and φ 3 ∈ F p (X). Then one can show that G F is contained either in B (the stabilizer of ∞) or in N 0,∞ the normalizer of the diagonal torus. For F = L ψ(X −1 ) , we have G F = 1.
If φ has degree 2 and 0 is not the unique critical value of φ, then one finds that G F is a subgroup of diagonal matrices of order bounded by deg(φ) − 1.