1 Introduction

The concept of families is a very fruitful one in number theory and in particular in the context of automorphic forms. It allows us to study asymptotic properties and has recently been put on some formal ground in [29, 32]. On the conceptual side it dampens irregularities of individual members (that may exist or whose non-existence we are unable to prove) and allows statistical concepts and deformation techniques to investigate properties within an ensemble. On the methodological it enables us to use strong analytic tools such as various types of trace formulae.

One of the key conjectures in the field of automorphic forms is the Ramanujan conjecture: cuspidal automorphic representations of the group \(\textrm{GL}(n)\) over a number field F are tempered (see [4] for a survey). Even for \(n = 2\) this appears to be far out of reach, and as a substitute one considers two types of approximations. On the one hand one can measure the worst case scenario, i.e. the largest distance from the tempered spectrum of an individual member in a family. On the other hand one can try to bound the number of members in a family violating the conjecture relative to the amount by which they violate the conjecture. This is a density result which is a familiar concept from the theory Dirichlet L-functions: although the Riemann hypothesis is far out of reach, we have good bounds for the number \(N(\sigma ,T, Q)\) of zeros with real part \(\geqslant \sigma \) and height \(\leqslant T\) of Dirichlet L-functions with conductor \(q \leqslant Q\) (see e.g. [18, Section 10]). The arithmetic reformulation of this is the Bombieri-Vinogradov theorem which roughly states that primes \(\leqslant x\) are equidistributed in “almost all” residue classes modulo \(q \leqslant x^{1/2 + o(1)}\) (similarly, “almost all” short intervals contain primes). In many applications this serves as a good substitute for the Riemann hypothesis.

In this note we want to consider the automorphic analogue for the family of automorphic forms for the group \(\Gamma _0(q) \subseteq \textrm{SL}_n({\mathbb {Z}})\) of matrices whose lowest row is congruent to \((0, \ldots , 0, *)\) modulo q. This is a very natural family as it contains precisely the automorphic forms of conductor dividing q [19]. Let us fix a place v of \(\mathbb {Q}\), and for an automorphic form \(\pi \) let us denote by \(\mu _{\pi }(v) = (\mu _{\pi }(v, 1), \ldots , \mu _{\pi }(v, n))\) its local spectral parameter (each entry viewed modulo \(\frac{2\pi i}{\log p} {\mathbb {Z}}\) if \(v = p\) is a prime). Write

$$\begin{aligned} \sigma _{\pi }(v) = \max _j|\Re \mu _{\pi }(v, j)|. \end{aligned}$$
(1.1)

The representation \(\pi \) is tempered at v if \(\sigma _{\pi }(v) = 0\), and the size of \(\sigma _{\pi }(v)\) measures how far \(\pi \) is from being tempered at v. An example of a non-tempered representation is the trivial representation which satisfies \(\sigma _{\text {triv}}(v) = (n-1)/2\) for every v. For a finite family \({\mathcal {F}}\) of automorphic representations for \(\textrm{GL}(n)\) and \(\sigma \geqslant 0\) we define

$$\begin{aligned} N_{ v}(\sigma , {\mathcal {F}}) = |\{\pi \in {\mathcal {F}} \mid \sigma _{\pi }(v) \geqslant \sigma \}|. \end{aligned}$$

We have trivially \(N_{ v}(0, {\mathcal {F}}) = |{\mathcal {F}}|\), and if the trivial representation is contained in \({\mathcal {F}}\), we have \(N_{ v}((n-1)/2, {\mathcal {F}}) \geqslant 1\). One may hope to be able to interpolate linearly between these two extreme cases:

$$\begin{aligned} N_{ v}(\sigma , {\mathcal {F}}) \ll _{v, \varepsilon } |{\mathcal {F}}|^{1 - \frac{\sigma }{a}+ \varepsilon } \end{aligned}$$
(1.2)

for arbitrarily small \(\varepsilon > 0\) with

$$\begin{aligned} a = (n-1)/2. \end{aligned}$$
(1.3)

This is precisely Sarnak’s density hypothesis [28, p. 465] stated there in the context of groups G of real rank 1, the principal congruence subgroup \(\Gamma (q) = \{\gamma \in G({\mathbb {Z}}) \mid \gamma \equiv \text {id} \, (\text {mod } q)\}\) and \(v = \infty \). For families of large level, Sarnak’s density hypothesis has recently attracted interest in the context of lifting matrices modulo q [30] and the almost diameter of Ramanujan complexes, and for families with growing infinitesimal character in the context of Golden Gates and quantum computing [26, 31]. In each of these cases it is not a spectral gap that is needed, but a certain kind of density result.

The shape of the bound (1.21.3) bears a certain similarity to the convexity bound for L-functions in the Selberg class in the critical strip. Unlike the convexity bound for L-functions, (1.21.3) is in general a very deep result that is completely open for general groups and families. On the other hand, it is a priori not impossible to even obtain “subconvexity”, i.e. a proof of (1.2) with a constant \(a < (n-1)/2\), if the trivial representation is not in \({\mathcal {F}}\). The Arthur-Selberg trace formula is usually not sensitive to whether the trivial representation is counted or not, but the Kuznetsov formula can be a versatile tool if no residual spectrum is involved.

For the group \(\textrm{GL}(2)\) there exist strong density results for many automorphic families, for instance by Sarnak [27], Iwaniec [17], Huxley [16], Blomer–Buttcane–Raulf [6], also in number field versions [7, 8] and for general real rank 1 groups [15, 33]. Various results are also available for \( \textrm{GL}(3)\), see e.g. [3, 5, 6]. For higher rank groups the deep analysis of the Arthur-Selberg trace formula of Matz–Templier [25] and Finis–Matz [10] provides as by-products some density results for the family of Maaß forms of Laplace eigenvalue up to height T and fixed level. The value of a is however much larger than (1.3) for \(n > 2\) (at least quadratic in n), so that even the “convexity bound” cannot be obtained.

In the present paper we consider the family \({\mathcal {F}}_I(q)\) of cuspidal automorphic representations generated by Maaß forms for the group \(\Gamma _0(q) \subseteq \textrm{SL}_n({\mathbb {Z}})\) for a large prime q and Laplace eigenvalue \(\lambda \) in a fixed interval I. If I is not too small, we have \(|{\mathcal {F}}_I(q)| \asymp _I q^{n-1}\). For this family and any place \(v \not = q\) of \(\mathbb {Q}\), we go beyond the density hypothesis and obtain strong “subconvexity” with a value of

$$\begin{aligned} a = (n-1)/4, \end{aligned}$$

which is halfway between (1.3) and the Ramanujan conjecture.

Theorem 1

Let \(n \geqslant 3\), q a prime, v be a place of \(\mathbb {Q}\) different from q, \(I \subseteq [0, \infty )\) a fixed interval, \(\varepsilon > 0\), and \(\sigma \geqslant 0\). Then

$$\begin{aligned} N_v(\sigma , {\mathcal {F}}_I(q)) \ll _{I, v, n, \varepsilon } q^{n-1 - 4\sigma + \varepsilon }. \end{aligned}$$

Of course, by [23] we know that \(N_v(\sigma , {\mathcal {F}}_I(q)) = 0\) for \(\sigma \geqslant 1/2 - 1/(n^2 + 1)\), but for \(0< \sigma < 1/2 - 1/(n^2 + 1)\) we obtain a substantial power saving. In fact, even the rather generic (though highly non-trivial) Jacquet-Shalika bounds \(N_v(\sigma , {\mathcal {F}}_I(q)) = 0\) for \(\sigma > 1/2\) [20] show for \(n \geqslant 3\) that cuspidal representations are always fairly far away from the trivial representation. Unfortunately it is not clear how to combine the Luo–Rudnick–Sarnak approach with the present techniques.

The theorem remains true for \(n=2\) (by a slightly different proof), but it is known in this case (see [17] for \(v=\infty \), and the proof for finite v is similar) and recovers Selberg’s 3/16 bound for exceptional eigenvalues. For \(n=3\) and \(v=\infty \) this is [5, Theorem 4]. As mentioned above, for larger n Theorem 1 is completely new. As we shall outline below it appears to be the limit of what is available by any trace formula approach, even in the case \(n=2\) nothing better is known.

The proof is based on a careful analysis of the arithmetic side of the Kuznetsov formula with a test function on the spectral side that blows up on exceptional Langlands parameters at v (and therefore increases the complexity on the arithmetic side). We denote by \(\lambda _{\pi }(m)\) the m-th Hecke eigenvalue of \(\pi \in {\mathcal {F}}_I(q)\).

Theorem 2

Keep the assumptions and notation of Theorem 1. Let \(m\in \mathbb {N}\) be coprime to q and \(Z \geqslant 1\). Then

$$\begin{aligned} \sum _{\pi \in {\mathcal {F}}_I(q)} |\lambda _{\pi }(m)|^2 Z^{2\sigma _{\pi }(\infty )} \ll _{I, n, \varepsilon } q^{n-1+\varepsilon } \end{aligned}$$

uniformly in \(mZ \ll q^2\) for a sufficiently small implied constant (depending on I and n).

We shall see in Lemma 4 below that \(|\lambda _{\pi }(p^{\nu })|^2\) is often as big as \(p^{2\nu \sigma _{\pi }(p)}\) for a prime p and \(\nu \in \mathbb {N}\), so that the “test function” \(|\lambda _{\pi }(m)|^2 Z^{2\sigma _{\pi }(\infty )} \) treats finite places and the infinite place essentially on equal footing.

Let us roughly sketch how one may hope to arrive at Theorem 2. Since the Laplacian eigenvalue is fixed, the Whittaker transforms in the Kuznetsov formula play no major role, and the battle is decided on the level of Kloosterman sums. Very roughly, the Kuznetsov formula takes the shape

$$\begin{aligned} \begin{aligned}&\frac{1}{| {\mathcal {F}}_I(q)| } \sum _{\pi \in {\mathcal {F}}_I(q)} |\lambda _{\pi }(m)|^2Z^{2\sigma _{\pi }(\infty )}\\ {}&\,\,\, ``\approx \hbox {''} \,\,\,1 + \sum _{\text {id} \not = w \in W}\, \sum _{\begin{array}{c} q \mid c_1, \ldots , q \mid c_{n-1} c_1, \ldots , c_{n-1} \ll mZ \end{array}} \frac{S_{q, w}(M, M, c)}{c_1 \cdots c_{n-1}} \end{aligned} \end{aligned}$$
(1.4)

where W is the Weyl group of permutation matrices, \(M = (m, 1, \ldots , 1) \in {\mathbb {Z}}^{n-1}\) and \(S_{q, w}(M, M, c)\) is a certain generalized Kloosterman sum, defined in (4.2) below, associated with the Weyl element w and moduli \(c = (c_1, \ldots , c_{n-1})\). If \(mZ \ll q\), then the off-diagonal term vanishes completely and we are done. We will use this observation in Theorem 4 below. This range of mZ recovers the “convexity bound” with the value (1.3). For larger values of mZ and stronger density results we must deal with the Kloosterman sums appearing in the off-diagonal term and improve on the trivial bound \(|S_{q, w}(*, *, c)| \ll (c_1 \cdots c_{n-1})^{1+\varepsilon }\), see (4.7) below. To obtain such bounds for general groups is a famous open problem. In an ideal world we would have complete Weil-type square root cancellation \(|S_{q, w}(*, *, c)| \ll (c_1 \cdots c_{n-1})^{1/2+\varepsilon }\) (at least under certain coprimality assumptions) which allows us to take mZ as large as \(q^2\). This square-root cancellation implies the statement of Theorem 2 and a density hypothesis halfway between the trivial representation and the Ramanujan conjecture. (Additional square root cancellation in the \(c_1, \ldots , c_{n-1}\) sum would give the full Ramanujan conjecture, but this is of course not in the cards).

Interestingly, this heuristic sketch turns out to be quite far from the truth. Square root cancellation for the size of Kloosterman sums may fail badly, so that we have to arrive at Theorem 2 by a rather different analysis. The key lemma is the following, which seems to be the first explicit analysis of general \(\textrm{GL}(n)\) Kloosterman sums beyond hyper-Kloosterman sums [12] associated to the Weyl element \(w = \left( {\begin{matrix} &{} 1\\ I_{n-1} &{} \end{matrix}}\right) \), where \(I_{n}\) denotes the n-by-n identity matrix.

Theorem 3

Let q be a prime and let \(M, N \in {\mathbb {Z}}^{n-1}\) with entries coprime to q (in particular non-zero). Let \(n \geqslant 3\) and let \(w \in W\). Then \(S_{q, w}(M, N, (q, \ldots , q)) = 0\) unless

$$\begin{aligned} w = w_{*} := \left( {\begin{matrix} &{} &{} 1\\ {} &{} I_{n-2} &{}\\ 1 &{}&{} \end{matrix}}\right) \end{aligned}$$
(1.5)

in which case \(S_{q, w}(M, N, (q, \ldots , q)) = q^{n-2}.\)

Note that this is in sharp contrast to the case \(n=2\), where a Kloosterman sum to prime modulus q has no closed evaluation. The key point is that by multiplicativity the Kloosterman sums in (1.4) contain \(S_{q, w}(*, *, (q, \ldots , q))\) as a large chunk. The critical case is the term corresponding to \(w = w_{*}\) where the Kloosterman sum is much bigger than the product of the square root of the moduli (if \(n > 3\)). Luckily in this case the remaining piece with moduli \((c_1/q, \ldots , c_{n-1}/q)\) comes with additional savings since the Weyl element \(w_{*}\) imposes certain relations among the \(c_j\). That the critical Weyl element is not the long Weyl element, but rather the permutation \(1 \leftrightarrow n\) that is relatively “close” to the identity, may also be quite surprising in this context. This analysis is sensitive to q being prime. While the general technique can be applied in a rather broad context (see e.g. [2, 24]), the estimation in Theorem 3 is particularly designed to the particular setup of \(\Gamma _0(q)\), q prime.

Theorem 2 and variations of it have other applications of which we mention here only one, namely a large sieve inequality.

Theorem 4

Let q be prime and \((\alpha (m))\) any sequence of complex numbers. Then

$$\begin{aligned} \sum _{\pi \in {\mathcal {F}}_I(q)} \Big | \sum _{\begin{array}{c} m \leqslant x\\ (m, q) = 1 \end{array}} \alpha (m) \lambda _{\pi }(m) \Big |^2 \ll _{I, n, \varepsilon } q^{n-1+\varepsilon } \sum _{\begin{array}{c} m \leqslant x\\ (m , q) = 1 \end{array}} |\alpha (m)|^2 \end{aligned}$$

uniformly in \(x \ll q\) for a sufficiently small implied constant (in terms of I and n).

This result holds (with literally the same proof) for all \(q \in \mathbb {N}\). For comparison, Venkatesh [35, Theorem 1] obtained this with \(x \leqslant q^{1/(2n-2)}\). A simple corollary is the following best-possible bound for a second moment of L-functions on the critical line:

Corollary 5

For q prime and \(t \in {\mathbb {R}}\) we have

$$\begin{aligned} \sum _{\pi \in {\mathcal {F}}_I(q)} |L(1/2 + it, \pi )|^2 \ll _{I, t, n, \varepsilon } q^{n-1 + \varepsilon }. \end{aligned}$$

The author would like to thank Farrell Brumley for encouragement and numerous discussions on the subject.

2 Basic notation

Let \(U \subseteq \textrm{GL}_n\) be the subgroup of unipotent upper triangular matrices. The Haar measure on \(U({\mathbb {R}})\) is given by \(\textrm{d}x = \prod _{1 \leqslant i < j \leqslant n} \textrm{d}x_{ij}\). As before let W be the Weyl group of permutation matrices; we identify a permutation matrix \(w = (w_{ij}) \in W\) with the permutation \( i \mapsto j\) for \(w_{ij} = 1\). For \(w \in W\) we define

$$\begin{aligned} U_w = w^{-1} U^{\top } w \cap U \end{aligned}$$

and we continue to write \(\textrm{d}x\) for the induced measure on the subgroup \(U_{w}({\mathbb {R}})\). (Here \(^{\top }\) denotes the transpose, so that \(U^{\top }\) is the set of unipotent lower triangular matrices.)

As

$$\begin{aligned} w^{-1} (x_{ij}) w = (x_{w^{-1}(i), w^{-1}(j)}), \end{aligned}$$
(2.1)

the group \(U_w\) has entries at (ij) with \(i< j\) exactly when \(w^{-1}(i) > w^{-1}(j)\) (since \(U^{\top }\) consists of lower triangular matrices). Equivalently, \(U_w\) has entries at

$$\begin{aligned} (w(i), w(j)) \text { for } 1 \leqslant j< i \leqslant n \text { whenever } w(i) < w(j). \end{aligned}$$
(2.2)

Let \(V \subseteq \textrm{GL}_n\) be the group of diagonal matrices with entries \(\pm 1\).

For \(N \in {\mathbb {Z}}^{n-1}\) we define a character \(\theta _N : U({\mathbb {R}})/U({\mathbb {Z}}) \rightarrow S^1\) by

$$\begin{aligned} \theta _N(x) = e(N_{n-1} x_{12} + \ldots + N_{1} x_{n-1, n}) . \end{aligned}$$
(2.3)

For \(v \in V\) we write \(\theta _N^v(x) = \theta _N(v^{-1} x v)\) (note that \(v^{-1} U v = U\)). If \(N = (1, \ldots , 1)\), we drop it from the notation of the character.

Let \(T \subseteq \textrm{GL}_n\) be the diagonal torus. We embed \(y = (y_1, \ldots , y_{n-1}) \in \mathbb {G}_m^{n-1}\) into T as

$$\begin{aligned} \iota (y) = \text {diag}(y_{n-1} \cdots y_1, \ldots , y_2y_1, y_1, 1). \end{aligned}$$
(2.4)

We multiply two elements in \(y, y' \in \mathbb {G}_m^{n-1}\) componentwise, written \(y \cdot y'\), so that \(\iota \) is a homomorphism. We denote the image of \({\mathbb {R}}_{>0}^{n-1}\) in T by \({\tilde{T}}({\mathbb {R}})\). Then \({\mathcal {H}} = U({\mathbb {R}}) {\tilde{T}}({\mathbb {R}})\) is the generalized upper half plane in the sense of [14, Chapter 1]. We identify \({\mathcal {H}}\) with \(\textrm{GL}_n({\mathbb {R}})/\textrm{O}_n({\mathbb {R}}) \textrm{Z}^{+}\) where \(\textrm{Z}^{+} \cong {\mathbb {R}}_{>0}\) is the subgroup of diagonal scalar matrices with positive entries. For \(g = x y k \alpha \in \textrm{GL}_n({\mathbb {R}})\) with \(x \in U({\mathbb {R}})\), \(y \in {\tilde{T}}({\mathbb {R}})\), \(k \in \textrm{O}_n({\mathbb {R}})\), \(\alpha \in \textrm{Z}^+\), we write \(\textrm{y}(g) = \iota ^{-1}y \in {\mathbb {R}}_{>0}^{n-1}\) for \((n-1)\)-tuple of Iwasawa y-coordinates. In particular, for \(g = \text {diag}(y_1, \ldots , y_n)\) with positive \(y_j\) we have

$$\begin{aligned} \textrm{y}(g) = \Big (\frac{y_{n-1}}{y_n}, \ldots , \frac{y_1}{y_2}\Big ) \in {\mathbb {R}}_{>0}^{n-1}. \end{aligned}$$
(2.5)

For \(w \in W\), \(y \in {\mathbb {R}}_{>0}^{n-1}\) we write

$$\begin{aligned} \textrm{y}(w\iota (y)^{-1} w^{-1}) = {}^wy = ({}^wy_1, \ldots , {}^wy_{n-1}) \end{aligned}$$

for the Iwasawa y-coordinates of \(w\iota (y)^{-1} w^{-1}\). Explicitly, combining (2.1) with \(w^{-1}\) in place of w, (2.4) and (2.5), we have

$$\begin{aligned} {}^wy = \Big ( \frac{y_1 \cdots y_{n - w(n-j+1)}}{y_1 \cdots y_{n- w(n-j)}}\Big )_{1 \leqslant j \leqslant n-1}. \end{aligned}$$
(2.6)

For \(\alpha \in \mathbb {C}^{n-1}\), \(y \in {\mathbb {R}}_{>0}^{n-1}\) we write \(y^{\alpha } = y_1^{\alpha _1} \cdots y_{n-1}^{\alpha _{n-1}}\in \mathbb {C}\). Let

$$\begin{aligned} \eta = (\eta _1, \ldots , \eta _{n-1}) = \Big (\frac{1}{2}j(n-j)\Big )_{1 \leqslant j \leqslant n-1}. \end{aligned}$$
(2.7)

We define a measure on \({\mathbb {R}}_{>0}^{n-1}\) by \(\textrm{d}^{*}y = y^{-2\eta } \frac{dy_1}{y_1} \cdots \frac{dy_{n-1}}{y_{n-1}}\) and correspondingly an inner product by

$$\begin{aligned} \langle f, g \rangle = \int _{{\mathbb {R}}_{>0}^{n-1}} f(y) {\bar{g}}(y) \textrm{d}^{*}y. \end{aligned}$$

We denote the push forward of \(\textrm{d}^{*}y\) to \({\tilde{T}}({\mathbb {R}})\) by \(\iota \) also by \(\textrm{d}^{*}y\). Then \(\textrm{d}x \, \textrm{d}^{*}y\) is a left \(\textrm{GL}_n({\mathbb {R}})\) invariant measure on \({\mathcal {H}}\).

We define a different embedding of \({\mathbb {R}}_{>0}^{n-1}\) into \(T({\mathbb {R}})\) by

$$\begin{aligned} c = (c_1, \ldots , c_{n-1}) \mapsto c^{*} = \text {diag}(1/c_{n-1}, c_{n-1}/c_{n-2}, \ldots , c_2/c_1, c_1). \end{aligned}$$

From (2.5), it is useful to observe that

$$\begin{aligned} \textrm{y}(c^{*}) = \Big (\frac{c_{j-1}c_{j+1}}{c_j^2}\Big )_{1 \leqslant j \leqslant n-1} \end{aligned}$$
(2.8)

where \(c_0 = c_n = 1\), and a simple computation shows

$$\begin{aligned} \textrm{y}(c^{*})^{\eta } = (c_1 \cdots c_{n-1})^{-1}. \end{aligned}$$
(2.9)

3 Auxiliary results

As the Iwasawa decomposition (with the compact group on the right) is the Gram-Schmidt orthogonalization of rows starting with the last row, we can compute \(\textrm{y}(g)\) explicitly. For \(1 \leqslant j \leqslant n\) let \(\Delta _j = \Delta _j(g)\) be the volume of the parallelepiped spanned by last j rows of g. Then

$$\begin{aligned} g \equiv \left( {\begin{matrix} \Delta _n/\Delta _{n-1} &{} *&{}\cdots &{} *\\ &{} \Delta _{n-1}/\Delta _{n-2} &{} \cdots &{} *\\ &{} &{} \ddots &{} \vdots \\ &{}&{}&{}\Delta _1\end{matrix}}\right) \quad (\text {mod } O_n({\mathbb {R}})), \end{aligned}$$

so that by (2.5) we have

$$\begin{aligned} \textrm{y} (g) = \Bigl (\frac{\Delta _{j+1}(g) \Delta _{j-1}(g)}{\Delta _{j}(g)^2}\Big )_{1 \leqslant j \leqslant n-1} \end{aligned}$$
(3.1)

with the convention \(\Delta _{0}(g) = 1\). By [11, Corollary 4.2 and p. 11] (or by hand) we confirm the inversion formula for the following \((n-1)\)-by-\((n-1)\) tridiagonal Toeplitz matrix

$$\begin{aligned} \left( {\begin{matrix} -2 &{} 1 &{} &{}\\ 1 &{} -2 &{} 1&{} \\ &{} \ddots &{} \ddots &{} \ddots \\ {} &{} &{} 1 &{}-2 &{}1\\ {} &{} &{} &{} 1 &{} -2 \end{matrix}}\right) ^{-1} = \Big (-s(i, j)\Big )_{ij}, \quad s(i, j) = \frac{1}{n} {\left\{ \begin{array}{ll} i(n-j), &{} i \leqslant j, \\ j(n-i), &{} i > j. \end{array}\right. }\nonumber \\ \end{aligned}$$
(3.2)

Therefore, given \(\textrm{y}(g) = (Y_1, \ldots , Y_{n-1})\) and \(\Delta _n(g) = |\det (g)|\) we can solve (3.1) explicitly for \(\Delta _1, \ldots , \Delta _{n-1} > 0\) getting

$$\begin{aligned} \Delta _j(g) = |\det (g)|^{j/n} \prod _{i=1}^{n-1} Y_i^{-s(i, j)}. \end{aligned}$$
(3.3)

Our first lemma will be used to bound the moduli c on the arithmetic side of the Kuznetsov formula.

Lemma 1

Let \(w\in W\), \(x \in U_w({\mathbb {R}})\), \(y, c, B \in {\mathbb {R}}_{>0}^{n-1}\). Write \(\textrm{y}\big (\iota (B)c^{*}w x\iota (y) \big ) = Y \in {\mathbb {R}}_{>0}^{n-1}\) and \(A = \iota (B) c^{*}\). Then

$$\begin{aligned} \begin{aligned}&c_j \ll _{y, Y} \prod _{i=1}^{n-1} B_i^{s(i,j)}\quad \text {and} \quad 1 \leqslant \Delta _j(wx) \ll _{y, Y} \prod _{i=1}^{n-1} \textrm{y}(A)_i^{s(i,j)} \end{aligned} \end{aligned}$$

for \(1 \leqslant j \leqslant n-1\).

Proof

We have

$$\begin{aligned} \Delta _j\big (\iota (B)c^{*}w x\iota (y) \big ) = \Delta _j(wx \iota (y) ) c_j \prod _{i=1}^j B_1 \cdots B_{i-1} \end{aligned}$$
(3.4)

since the diagonal matrix \(\iota (B)c^{*}\) multiplies the rows of \(w x\iota (y)\) by the corresponding diagonal entries. Clearly \(\Delta _j(wx) \geqslant 1\) since one of the minors is always 1, and clearly

$$\begin{aligned} |\det (\iota (B)c^{*}w x\iota (y) )| = |\det (\iota (B) \iota (y))| \asymp _y B_1^{n-1} \cdots B_{n-2}^2 B_{n-1}. \end{aligned}$$
(3.5)

From (3.4), and from (3.3) in combination with (3.5), we therefore obtain

$$\begin{aligned} \begin{aligned} c_j \leqslant c_j \Delta _j(wx)&\asymp _{\,y}\, c_j \Delta _j(wx\iota (y)) = \Delta _j\big (\iota (B)c^{*}w x\iota (y) \big ) \prod _{i=1}^{j-1} B_i^{-(j-i)}\\&\asymp _{\,Y, y} \, \prod _{i=1}^{n-1} B_i ^{(n-i)j/n} \prod _{i=1}^{j-1} B_i^{-(j-i)} = \prod _{i=1}^{n-1} B_i^{s(i, j)}. \end{aligned} \end{aligned}$$
(3.6)

This shows the first statement of the lemma, and the proof of the second is completed by observing that (2.8) and (3.2) imply

$$\begin{aligned} \frac{1}{c_j} = \prod _{i=1}^{n-1} \textrm{y}(c^{*})_i^{s(i, j)} \end{aligned}$$

for \(1 \leqslant j \leqslant n-1\), so that (3.6) implies \( \Delta _j(wx) \asymp _{y, Y} \prod _{i=1}^{n-1} (\textrm{y}(c^{*})_iB_i)^{s(i, j)}\) as desired. \(\square \)

We shall see in a moment that the only Weyl elements contributing to the Kuznetsov formula are of the form

(3.7)

with identity matrices \(I_{d_j}\) of dimension \(d_j\) (i.e. \(d_1 + \ldots +d_r = n\)), so without loss of generality we restrict our attention to such matrices. The following technical result computes the Jacobi determinant for a certain change of variables.

Lemma 2

Let \(N \in \mathbb {N}^{n-1}\), \(w\in W\) of the form (3.7). For \(x \in U_w({\mathbb {R}})\) define \(x' = \iota (N) x \iota (N)^{-1} \in U_w({\mathbb {R}})\). Then

$$\begin{aligned} \frac{\textrm{d}x'}{\textrm{d}x} = ({}^wN)^{\eta } N^{\eta } \end{aligned}$$

where the left hand side denotes the Jacobi determinant \(\det Dx'(x)\).

Proof

Since \(\iota (N)(x_{ij}) \iota (N)^{-1} = (x_{ij} N_1 \cdots N_{n-i}(N_1 \cdots N_{n-j})^{-1})_{ij}\) and recalling (2.2) and (2.6), we have to show

$$\begin{aligned} \prod _{\begin{array}{c} 1 \leqslant j< i \leqslant n\\ w(i) < w(j) \end{array}} N_{n- w(j) + 1} \cdots N_{n - w(i)} = \prod _{j=1}^{n-1} \Big ( N_j\frac{N_1 \cdots N_{n - w(n-j+1)}}{N_1 \cdots N_{n- w(n-j)}}\Big )^{\eta _j}\qquad \end{aligned}$$
(3.8)

for an arbitrary w as in (3.7). We use induction on r and write \(w' = \left( \begin{matrix} &{} w\\ I_d &{} \end{matrix}\right) \), so that \(n+d - w'(j) = n - w(j)\) for all \(1 \leqslant j \leqslant n\). We call L(w) the left hand side of (3.8) and R(w) the right hand side. We consider first the quotient \(L(w')/L(w)\). The pairs \(1 \leqslant j < i \leqslant n\) cancel, and for \(i > n\) only \(j \leqslant n\) satisfy the summation condition \(w'(i) < w'(j)\). We conclude

$$\begin{aligned} \frac{L(w')}{L(w)}= & {} \prod \limits _{j=1}^n \prod \limits _{i=n+1}^{n+d} N_{n+d- w'(j) + 1} \cdots N_{n +d- w'(i)} \nonumber \\ {}= & {} \prod \limits _{j=1}^{n-1} N_j^{dj} \prod \limits _{j=n}^{n+d-1} N_j^{n(n+d-j)}. \end{aligned}$$
(3.9)

On the other hand,

$$\begin{aligned} R(w)= & {} \prod _{j=1}^{n-1}N_j^{\eta _j} \prod _{i = 1}^n (N_1 \cdots N_{n- w(i)})^{\eta _{n-i+1} - \eta _{n-i}} \\ {}= & {} \prod _{j=1}^{n-1}N_j^{\frac{j(n-j)}{2}} \times \prod _{i = 1}^n (N_1 \cdots N_{n- w(i)})^{\frac{2i-n-1}{2}}, \end{aligned}$$

so \(R(w')/ R(w)\) equals

$$\begin{aligned} \begin{aligned}&\prod _{j=1}^{n-1} N_j^{\frac{j(n+d - j)}{2} - \frac{j(n-j)}{2}} \prod _{j=n}^{n+d-1} N_j^{\frac{j(n+d - j)}{2} }\\ {}&\times \prod _{i = 1}^n (N_1 \cdots N_{n- w(i)})^{-\frac{d}{2}}\prod _{i = n+1}^{n+d} (N_1 \cdots N_{n+d- w'(i)})^{\frac{2i-n-d-1}{2}}\\&= \prod _{j=1}^{n-1} N_j^{\frac{dj}{2}} \prod _{j=n}^{n+d-1} N_j^{\frac{j(n+d - j)}{2} }\\ {}&\times \prod _{j=1}^{n-1} N_j^{-\frac{(n-j)d}{2}} \prod _{j=1}^{n-1} N_j^{\sum _{i=n+1}^{n+d} \frac{2i-n-d-1}{2}} \prod _{j=n}^{n+d-1} N_j^ {\sum _{i=n+1}^{2n+d-j} \frac{2i-n-d-1}{2}} \end{aligned} \end{aligned}$$

which is easily seen to equal the right hand side of (3.9). Since trivially \(L(I_{d}) = R(I_{d}) = 1\), the induction is complete. \(\square \)

Lemma 3

Let \(B \in {\mathbb {R}}_{>0}^{n-1}\), \(w = w_{*}\in W\) as in (1.5). Then

$$\begin{aligned} \textrm{vol}\{x \in U_w({\mathbb {R}}) \mid \Delta _j(wx) \leqslant B_j, 1 \leqslant j \leqslant n-1\} \ll _{\varepsilon } (B_1 \cdots B_{n-1})^{1+\varepsilon } \end{aligned}$$

for any \(\varepsilon > 0\).

Proof

We can assume without loss of generality that \(B_j \geqslant 1\), otherwise the volume is 0 as seen in the proof of Lemma 1. For \(x \in U_w({\mathbb {R}})\) we have

so that by considering the lower right minors we obtain in particular the inequalities

$$\begin{aligned} |x_{1 n}| \leqslant B_1, \quad \Big |-x_{1, n} + \sum _{i = n+1-j}^{n-1} x_{1i} x_{i,n} \Big | \leqslant B_{j}, \quad j = 2, \ldots , n-1, \end{aligned}$$

and we also have \(|x_{ij}| \leqslant b := 1+ \max (B_1, \ldots , B_{n-1})\). If \(I \subseteq {\mathbb {R}}\) is any interval of length \(|I| \geqslant 1\), then

$$\begin{aligned} \begin{aligned}&\text {vol}\{(x, y) \in [-b, b]^2 : xy \in I\}\\ {}&\leqslant \int _{|y| \leqslant b} \min \Big (\frac{|I|}{|y|} , 2b\Big )\textrm{d}y \leqslant 4|I| + \int _{|I|/b \leqslant |y| \leqslant b} \frac{|I|}{|y|} \textrm{d}y\\&\leqslant 4 |I| (1 + \log b ). \end{aligned} \end{aligned}$$

Thus if \(|x_{1n}| \leqslant B_1\) is fixed, the volume of \((x_{1, n-1}, x_{n-1, n})\) is \(O(B_2 \log b)\), and if these are fixed, the volume of \((x_{1, n-2}, x_{n-2, n})\) is \(O(B_3 \log b)\), etc. Inductively we obtain the desired bound. \(\square \)

Most likely the statement holds for all w, but the proof is particularly simple for \(w_{*}\) which is all we need.

4 Kloosterman sums

Properties of Kloosterman sums for \(\textrm{SL}_n({\mathbb {Z}})\) have been obtained and summarized in [12]. They generalize in an obvious way to the congruence subgroup \(\Gamma _0(q)\). The Bruhat decomposition gives \(\textrm{GL}_n( \mathbb {Q}) = \bigcup _{w\in W} G_w(\mathbb {Q})\) with \(G_w := U T w U_w \) as a disjoint union. Let \(N, M, c \in {\mathbb {Z}}^{n-1}\), \(w \in W\), \(v \in V\). Then provided that

$$\begin{aligned} \theta _M(c^{*} w x w^{-1} (c^{*})^{-1}) = \theta _N^v(x) \end{aligned}$$
(4.1)

for all \(x \in w^{-1}U(\mathbb {Q}) w \cap U(\mathbb {Q})\) [this set is a “complement” of \(U_w\) in U], the Kloosterman sum

$$\begin{aligned} S^v_{q, w}(M, N, c) = \sum _{x c^{*} w y \in U({\mathbb {Z}})\backslash G_w(\mathbb {Q}) \cap \Gamma _0(q)/U_w({\mathbb {Z}}) } \theta _M(x )\theta _N^v(y) \end{aligned}$$
(4.2)

is well-defined, see [12, Proposition 1.3]. If (4.1) is not met, we define \(S^v_{q, w}(M, N, c) = 0\). If \(v = \text {id}\), we drop it from the notation. By [12, p. 175], the Kloosterman sum is non-zero only if w is of the form (3.7). If \(\gamma = x_1 c^{*} w x_2\in \Gamma _0(q)\) is a matrix occurring in the sum on the right hand side of (4.2), then any minor of \(\gamma \), and hence of \(c^{*} w\), obtained by deleting at least the first row and the last column is divisible by q. Hence if w is of the form (3.7), then the summation condition in (4.2) can only be met if

$$\begin{aligned} q \mid c_1, \quad q\mid c_2, \quad \ldots , \quad q \mid c_{n -d_1}. \end{aligned}$$
(4.3)

Observing that

$$\begin{aligned} \theta _{M}(x) = \theta (\iota (M) x \iota (M)^{-1}) \end{aligned}$$
(4.4)

and recalling (2.8), we see that (4.1) is equivalent to

$$\begin{aligned} M_{n-i} \frac{c_{n-i+1} c_{n-i-1}}{c_{n-i}^2} = \frac{v_{w(i) + 1}}{v_{w(i)}} N_{n-w(i)} \end{aligned}$$
(4.5)

for all \(1 \leqslant i \leqslant n-1\) satisfying \(w(i) + 1 = w(i+1)\) with the above convention \(c_0 = c_n = 1\) and \(v = \text {diag}(v_1, \ldots , v_n)\). If w is of the form (3.7), these are precisely the \(i \not \in \{d_1, d_1 + d_2, \ldots , d_1 + d_2 + \ldots + d_{r-1}\}\). If \(w = \text {id}\), then \(xc^{*} w y = xc^{*} y\) can only be in \(\Gamma _0(q)\) if \(c_1 = \ldots = c_{n-1} = 1\), in which case we conclude from (4.5) that \(M_j = \pm N_j\).

Kloosterman sums for \(\textrm{SL}_n({\mathbb {Z}})\) enjoy certain multiplicativity properties in the moduli, cf. [12, Proposition 2.4]. We state only one particular case. Let q be prime, suppose that \((c_1\cdots c_{n-1}, q) = 1\) and write \(qc = (qc_1, \ldots , qc_{n-1})\). Suppose that \(w(1) = n\) and \(w(n) = 1\). Then

$$\begin{aligned}&S^v_{q, w}(M, N, qc) \nonumber \\ {}&\quad = S^v_{q, w}(M, N' , (q, \ldots , q)) S^v_{1, w}(M, ({\bar{q}}N_1, N_2, \ldots , N_{n-2}, {\bar{q}}N_{n-1}) , c) \end{aligned}$$
(4.6)

with

$$\begin{aligned} \begin{aligned}&N_{n-i}' \equiv N_{n-i} c_{n - w(i)} c_{n - w(i+1) + 1} \overline{c_{n - w(i)+1} c_{n - w(i+1) } } \, (\text {mod } q).\\ \end{aligned} \end{aligned}$$

By [9, Theorem 0.3(i)] we have the trivial bound

$$\begin{aligned} |S^v_{q, w}(M, N, c) | \leqslant | U({\mathbb {Z}})\backslash G_w(\mathbb {Q}) \cap \textrm{SL}_n( {\mathbb {Z}})/U_w({\mathbb {Z}})| \ll (c_1 \cdot \ldots \cdot c_{n-1})^{1+\varepsilon }. \end{aligned}$$
(4.7)

We now give the proof of Theorem 3 from the introduction, which is the first non-trivial bound for a \(\textrm{GL}_n\) Kloosterman sum other than a hyper-Kloosterman sum. For \(n=3\), the statement is essentially contained in [5, Lemma 6(c)]. We wish to compute \(S_{q, w}(M, N, (q, \ldots , q)) = 0\) where \(M, N \in {\mathbb {Z}}^{n-1}\) have entries coprime to q.

As mentioned before, we can assume that w is of the form (3.7), otherwise the Kloosterman sum vanishes by definition. Next assume that \(d_1 > 1\) in (3.7). Applying (4.5) with \(i=1\) we obtain \(M_{n-1} = \pm N_{d_1 - 1} q\), a contradiction. In the same way we exclude the case \(d_r > 1\). For w of the form (3.7) with \(d_1 = d_r = 1\) and \(c^{*} = \text {diag}(1/q, 1, \ldots , 1, q)\), we recall the definition (4.2) and consider \(\gamma = x c^{*} w y \in G_w(\mathbb {Q}) \cap \Gamma _0(q)\) with uniquely determined \(x \in U({\mathbb {Z}})\backslash U(\mathbb {Q})\), \(y \in U_w(\mathbb {Q})/U_w({\mathbb {Z}})\). A system of representatives for \(U({\mathbb {Z}})\backslash U(\mathbb {Q})\) consists of matrices with rational entries in [0, 1) above the diagonal, and similarly we choose a system of representatives of \(U_w(\mathbb {Q})/U_w({\mathbb {Z}})\) where all relevant entries are restricted to [0, 1). We now determine those representatives xy that satisfy \(xc^{*} w y \in \Gamma _0(q)\). We have

Since \(\gamma \in \Gamma _0(q)\), we must have \(y_{12}, \ldots , y_{1, n-1} \in {\mathbb {Z}},\) hence by our choice of representatives

$$\begin{aligned} y_{12} = \ldots = y_{1, n-1}= 0. \end{aligned}$$

Next we consider the (\(n-1\))-st row

$$\begin{aligned} (\underbrace{qx_{n, n-1}}_{\in {\mathbb {Z}}}, \underbrace{\ldots }_{d_{r-1} \text { entries}}, *, \cdots , *, y_{w(n-1), n} + qy_{1n}x_{n-1, n}) \in {\mathbb {Z}}^n \end{aligned}$$

of \(\gamma \), where the stars are the same as the stars in the \((n-1)\)-st row of \(c^{*}wy\). We conclude that all star-ed entries in the \((n-1)\)-st row of \(c^{*} w y\) must be integral, hence 0. We continue with the \((n-2)\)-nd row of \(\gamma \). By the same argument we first have \(qx_{n-2, n} \in {\mathbb {Z}}\) and then also star-ed entry in \((n-2)\)-nd row of x is integral (hence 0) as well as all star-ed entries in the \((n-2)\)-nd row of \(c^{*} w y\). Continuing in this way, all star-ed entries must vanish. In other words, \(x \, \text {diag}(1/q, 1\ldots , 1, q)\, w y \in \Gamma _0(q)\) with \(x \in U(\mathbb {Q})\), \(y \in U_w(\mathbb {Q})\) and all relevant entries in [0, 1) implies

$$\begin{aligned} x = \left( \begin{matrix} 1 &{} &{}\cdots &{}x_{1}/q\\ &{} \ddots &{} &{} \vdots \\ &{}&{} 1&{} x_{n-1}/q \\ &{} &{} &{}1 \end{matrix}\right) , \quad y = \left( \begin{matrix} 1 &{} &{}\cdots &{}y_{1}/q\\ &{} \ddots &{} &{} \vdots \\ &{}&{} 1&{} y_{n-1}/q \\ &{} &{} &{}1 \end{matrix}\right) \end{aligned}$$

with \(x_i, y_i \in \{0, \ldots , q-1\}\). For these xy we compute

Obviously, this matrix is in \(\Gamma _0(q)\) if and only if the \(n-1\) congruences

$$\begin{aligned} x_1y_1 + 1 \equiv 0 \, (\text {mod } q), \quad x_i y_1 + y_{w(i)} \equiv 0 \, (\text {mod } q), \quad 2 \leqslant i \leqslant n-1 \end{aligned}$$

are satisfied. This can be solved easily, and we obtain the explicit expression

$$\begin{aligned} S^v_{q, w}(M, N, (q, \ldots , q)) = \sum _{\begin{array}{c} x_1, \ldots , x_{n-1} \, (\text {mod } q)\\ (x_1, q) = 1 \end{array}} e\left( \frac{M_1 x_{n-1} \pm N_1 {\bar{x}}_1 x_{w^{-1}(n-1) } }{q}\right) . \end{aligned}$$

If \((M_1N_1, q) = 1\), the sum vanishes unless \(n-1 = w^{-1}(n-1)\). The latter case happens for w of the form (3.7) with \(d_1 = d_r = 1\), if and only if \(w = w_{*}\) and then the Kloosterman sum equals \(q^{n-2}\). \(\square \)

5 Automorphic forms and Whittaker functions

We denote by \(\{\varpi \}\) an orthonormal basis of right \(\textrm{O}_n({\mathbb {R}})\textrm{Z}^+\)-invariant automorphic forms for the group \(\Gamma _0(q)\), cuspidal or Eisenstein series. The space \(L^2(\Gamma _0(q)\backslash {\mathcal {H}})\) is equipped with the standard inner product \(\langle f, g \rangle = \int _{\Gamma _0(q)\backslash {\mathcal {H}}} f(xy) {\bar{g}}(xy) \textrm{d}x\, \textrm{d}^{*}y\). We denote by \(\int _{(q)} \textrm{d}\varpi \) a combined sum/integral over the complete spectrum of \(L^2(\Gamma _0(q)\backslash {\mathcal {H}})\). The relevant spectral decomposition is a special case of Langlands’ general theory, see e.g. [1] for a convenient summary in adelic language. All \(\varpi \) belong to representations of level \(q' \mid q\) (cf. [19, Théorème]) and we assume that \(\{\varpi \}\) contains all cuspidal newvectors of level \(q' \mid q\). The underlying representation is denoted by \(\pi \), so \(\varpi \in V_{\pi }\). For notational simplicity let us denote the local archimedean Langlands parameter \(\mu _{\pi }(\infty )\) simply by \(\mu = (\mu _1, \ldots , \mu _n)\); it satisfies

$$\begin{aligned} \mu _1 + \ldots + \mu _n = 0, \quad \{\mu _1, \ldots , \mu _n\} = \{-{\bar{\mu }}_1, \ldots ,- {\bar{\mu }}_n\}. \end{aligned}$$
(5.1)

For a (not necessarily cuspidal) automorphic form \(\varpi \) and \(N \in \mathbb {N}^{n-1}\) we define its N-th Fourier coefficient \(A_{\varpi }(N)\) by

$$\begin{aligned} \int _{U({\mathbb {Z}})\backslash U({\mathbb {R}})} \varpi (xy) \theta _N(-x) \textrm{d}x = \frac{A_{\varpi }(N)}{N^{\eta }} W_{\mu }(N\cdot \textrm{y}(y)) \end{aligned}$$
(5.2)

where \(y \in {\tilde{T}}({\mathbb {R}})\) and \(W_{\mu } : {\mathbb {R}}^{n-1}_{>0} \rightarrow \mathbb {C}\) is the standard (spherical) Whittaker function, cf. e.g. [34, Section 2].

If \(\varpi \) is a cuspidal newform and \((m, q) = 1\), the \((m, 1, \ldots , 1)\)-th Fourier coefficient is proportional to the m-th Hecke eigenvalue \(\lambda _{\pi }(m)\) (or \(\lambda _{{\tilde{\pi }}}(m)\) depending on normalization), and by Rankin–Selberg theory we obtain

$$\begin{aligned}{} & {} |A_{\varpi }((m, 1, \ldots , 1))|^2 {\asymp }_{\mu }\, \frac{|\lambda _{\pi }(m)|^2 }{[\textrm{SL}_n({\mathbb {Z}}) : \Gamma _0(q)] L(1, \pi , \text {Ad}) } \nonumber \\ {}{} & {} \quad \gg _{\mu } \, |\lambda _{\pi }(m) |^2 q^{-(n-1) -\varepsilon } \end{aligned}$$
(5.3)

if \(\varpi \) is \(L^2\)-normalized, cf. e.g. [35, Proposition 1]. Here we used the upper bound [22, Theorem 2] for the L-value (the residue of the Rankin–Selberg L-function) on the edge of the critical strip.

The following easy, but important lemma shows that \(\lambda _{\pi }(p^{\nu })\) is (perhaps not always, but sufficiently often) as big as \(p^{\nu \sigma _{\pi }(p)}\) with \(\sigma _{\pi }(p)\) as in (1.1).

Lemma 4

For a prime \(p \not \mid q\) and \(\nu > n\) we have

$$\begin{aligned} \max _{0 \leqslant j \leqslant n-1} | \lambda _{\pi }(p^{\nu -j})| \geqslant (2 p^{\sigma _{\pi }(v)})^{1-n} p^{\nu \sigma _{\pi }(p)}. \end{aligned}$$

Proof

The following argument is taken from [21, Lemma 3]. We have an identity of power series

$$\begin{aligned} \sum _{\nu = 0}^{\infty } \lambda _{\pi }(p^{\nu }) x^{\nu } = \prod _{j=1}^n (1 - p^{\mu _{\pi }(p, j)} x)^{-1}. \end{aligned}$$

Without loss of generality let \(\mu _{\pi }(p, 1)\) have the largest real part, i.e. \(\Re \mu _{\pi }(p, 1) = \sigma _{\pi }(p)\). Then

$$\begin{aligned} \sum _{\nu = 0}^{\infty } p^{\nu \mu _{\pi }(p, 1)} x^{\nu } = \prod _{j=2}^n (1 - p^{\mu _{\pi }(p, j)} x) \sum _{\nu = 0}^{\infty } \lambda _{\pi }(p^{\nu }) x^{\nu } . \end{aligned}$$

Comparing coefficients, we obtain the lemma. \(\square \)

We need an archimedean analogue of this result, which is a bit more technical. Roughly speaking, the growth of \(W_{\mu }\) near the origin should capture the size of \(\sigma _{\pi }(\infty )\) in the same way as the growth of \(\lambda _{\pi }(p^{\nu })\) captures the size of \(\sigma _{\pi }(p)\), but this is harder to see as the Mellin transform of \(W_{\mu }\) is not perfectly understood and the location of poles is subtle. We start by summarizing some properties. As in [34, (3.1), (3.2)] we consider the re-normalized Whittaker function

$$\begin{aligned} W_{\mu }^{*}(y) = \pi ^{(n-1)n(n+1)/12} y^{-\eta /2} W_{2\mu }\Big ((\sqrt{y_1}/\pi , \ldots , \sqrt{y_{n-1}}/\pi )\Big ). \end{aligned}$$
(5.4)

The corresponding Mellin transform \({\widehat{W}}^{*}_{\mu }(s) = \int _{{\mathbb {R}}^{n-1}_{>0}} W^{*}_{\mu }(y) y^{s } \frac{dy_1}{y_1} \cdots \frac{dy_{n-1}}{y_{n-1}} \) is meromorphic in \(\mu \) and \(s \in \mathbb {C}^{n-1}\) [13]. Explicitly, we have [34, (3.7)]

$$\begin{aligned} \begin{aligned}&{\widehat{W}}^{*}_{\mu }(s_1) = \Gamma (s_1 + \mu _1)\Gamma (s_1+\mu _2), \quad n=2,\\&{\widehat{W}}^{*}_{\mu }(s_1, s_2) = \frac{\Gamma (s_1 + \mu _1)\Gamma (s_1+\mu _2)\Gamma (s_1 + \mu _3)\Gamma (s_2-\mu _1)\Gamma (s_2 - \mu _2)\Gamma (s_2+\mu _2)}{\Gamma (s_1+s_2)}, \\ {}&\qquad \qquad \qquad \qquad n=3, \end{aligned} \end{aligned}$$

but in general there do not seem to be such simple formulae. For \(\Re s_2, \ldots , \Re s_{n-1}\) sufficiently large and \(\Re s_1 > \sigma _{\pi }(\infty )\), the function \({\widehat{W}}^{*}_{\mu }(s)\) is holomorphic by [34, Theorem 3.1]. If in addition \(\mu _1, \ldots , \mu _n\) are pairwise distinct, then \({\widehat{W}}^{*}_{\mu }(s)\) has simple poles at \(s_1 = -\mu _j\), \(1 \leqslant j \leqslant n\), with residue

$$\begin{aligned} {\widehat{W}}^{*}_{\mu ^{(j)}}(s^{(j)})\prod _{\begin{array}{c} 1 \leqslant k \leqslant n\\ k \not = j \end{array}} \Gamma ( \mu _k - \mu _j) \end{aligned}$$

where

$$\begin{aligned} s^{(j)}= & {} (s_2, \ldots , s_{n-1}) + \left( \frac{n-2}{n-1}, \ldots , \frac{1}{n-1}\right) \mu _j ,\\{} & {} \mu ^{(j)} = (\mu _1, \ldots , \mu _{j-1} , \mu _{j+1}, \ldots , \mu _{n-1}) + \frac{\mu _j}{n-1} \cdot {{\textbf {1}}}, \end{aligned}$$

see [34, Theorem 3.2]. These statements are proved by a recursion formula [34, (3.5)] of the form

$$\begin{aligned} {\widehat{W}}^{*}_{\mu }(s) =\! \int \!\cdots \!\int \! {\widehat{W}}_{\nu }^{*}\Big (-t_1 - \frac{\alpha _1 + \alpha _2}{n-2}, \underbrace{*, \ldots , *}_{n-4}\Big ) \Gamma (t_1 + s_1) (*) \frac{\textrm{d}t_1 \cdots \textrm{d}t_{n-3}}{(2\pi i)^{n-3}} \end{aligned}$$

where \(\alpha _1, \alpha _2\) are any two elements from the multi-set \(\{\mu _1, \ldots , \mu _{n}\}\) and \(\nu - \frac{\alpha _1 + \alpha _2}{n-2} \cdot {{\textbf {1}}} \in \mathbb {C}^{n-2}\) is the \((n-2)\)-tuple of the remaining \(\mu _j\); moreover, \((*)\) is independent of \(s_1\) and holomorphic in \(t_1\) in a wide vertical strip if \(\Re s_2, \ldots , \Re s_{n-1}\) are sufficiently large, and the other \(n-4\) arguments of \({\widehat{W}}_{\nu }^{*} \) are independent of \(t_1\) and \(s_1\). Inductively, starting from the explicit formula for \(n=2\) and \(n=3\), we see that in any fixed vertical strip for \(s_1\) and for \(\Re s_2, \ldots , \Re s_{n-1}\) sufficiently large, the only poles can occur at \(s_1 = -\mu _j - k\) for \(1 \leqslant j \leqslant n\), \(k \in \mathbb {N}_0\). We conclude that

$$\begin{aligned} {\widehat{W}}^{\dagger }_{\mu }(s) := {\widehat{W}}^{*}_{\mu }(s) \prod _{j=1}^n (s_1 + \mu _j) \end{aligned}$$

is holomorphic for \(\Re s_1 > \sigma _{\pi }(\infty ) - 1\) (for sufficiently large \(\Re s_2, \ldots , \Re s_{n-1}\)) and

$$\begin{aligned} {\widehat{W}}^{\dagger }_{\mu }(-\mu _j, s_2, \ldots , s_{n-1}) = {\widehat{W}}^{*}_{\mu ^{(j)}}(s^{(j)})\prod _{\begin{array}{c} 1 \leqslant k \leqslant n\\ k \not = j \end{array}} \Gamma (1 + \mu _k - \mu _j) . \end{aligned}$$
(5.5)

For this statement the assumption that the \(\mu _j\) are pairwise distinct can be dropped by holomorphic continuation (note that by the Luo–Rudnick–Sarnak bounds or even the Jacquet-Shalika bounds \(|\Re \mu _j| < 1/2\) the gamma factors on the right hand side are always defined).

For \(\beta \in \mathbb {C}\) let \({\mathcal {D}}_{\beta } = -y\partial _{y} + \beta \). This is a commutative family of differential operators that under Mellin transformation correspond to multiplication with \(s + \beta \). In the proof of Lemma 6 below we will need to following technical, but elementary lemma.

Lemma 5

Let \( \alpha \geqslant 0\), \(c_0, c_1, c_2 > 0\), \(\beta \in \mathbb {C}\). Let \(I = [a, b]\subseteq ( 0, 1)\) be an interval with \((1+c_0)a \leqslant b \leqslant 2a\) and \(w : I \rightarrow \mathbb {C}\) a smooth function satisfying

$$\begin{aligned} |{\mathcal {D}}_{\beta } w(y) | \geqslant c_1 y^{-\alpha }, \quad | \partial _y ({\mathcal {D}}_{\beta } w)(y) | \leqslant c_2 \Vert {\mathcal {D}}_{\beta } w \Vert y^{-1} \end{aligned}$$
(5.6)

for \(y \in I\). Then there exist constants \(c_0', c_1', c_2'> 0\) depending only on \(c_0, c_1, c_2, \beta \) (but not on ab) and an interval \(I' = [a', b'] \subseteq I\) with \((b' - a') \geqslant c_0'(b-a)\) such that

$$\begin{aligned} |w(y)| \geqslant c_1' y^{-\alpha }, \quad |w'(y)| \leqslant c'_2 \Vert w_{I'} \Vert y^{-1} \end{aligned}$$
(5.7)

for \(y \in I'\).

Proof

Let \({\tilde{w}}(y) = w(y) y^{-\beta }\), so that

$$\begin{aligned} y^{1+\beta }{\tilde{w}}'(y) = -{\mathcal {D}}_{\beta }w(y), \quad y^{1+\beta }{\tilde{w}}''(y) = - \partial _y ({\mathcal {D}}_{\beta }w) (y) + \frac{1+\beta }{y} {\mathcal {D}}_{\beta }w(y). \end{aligned}$$

Then (5.6) implies

$$\begin{aligned} |y {\tilde{w}}'(y)| \geqslant c_1 y^{-{\tilde{\alpha }}}, \quad |{\tilde{w}}''(y)| \leqslant {\tilde{c}}_2 \Vert {\tilde{w}}' \Vert y^{-1} \end{aligned}$$

for \({\tilde{c}}_2 = 2^{1+|\Re \beta |} c_2 + |1+\beta |\) and \({\tilde{\alpha }} = \alpha + \Re \beta \). Let \(y_0 = \max _{y \in I} |{\tilde{w}}'(y)|\). Changing \({\tilde{w}}\) by a fourth root of unity if necessary, we can assume that

$$\begin{aligned} \Re {\tilde{w}}'(y_0) \geqslant \frac{1}{\sqrt{2}} \max \big ( c_1 y_0^{-{\tilde{\alpha }} - 1}, \Vert {\tilde{w}}' \Vert \big ). \end{aligned}$$

The condition \(|{\tilde{w}}''(y)| \leqslant {\tilde{c}}_2 \Vert {\tilde{w}}' \Vert y^{-1}\) implies that the slightly weaker inequality

$$\begin{aligned} \Re {\tilde{w}}'(y) \geqslant \frac{1}{2\sqrt{2}} \max \big ( c_1 y_0^{-{\tilde{\alpha }} - 1}, \Vert {\tilde{w}}' \Vert \big ) \asymp \Re {\tilde{w}}'(y_0) \end{aligned}$$

holds on some non-empty sub-interval \(I_0 = [a_0, b_0] \subseteq I\) containing \(y_0\), where \((b_0 - a_0) \geqslant (b-a)/\sqrt{8} {\tilde{c}}_2\). Distinguishing the cases \(\Re {\tilde{w}}(a_0) > - c_3 y_0 \Re {\tilde{w}}'(y_0)\) and \(\Re {\tilde{w}}(a_0) \leqslant - c_3 y_0 \Re {\tilde{w}}'(y_0)\) for \(c_3 = (b_0-a_0)/\sqrt{8}y_0 \geqslant c_0 /32 {\tilde{c}}_2\), we confirm in both cases

$$\begin{aligned} \begin{aligned}&\max (|\Re {\tilde{w}}(a_0)|, |\Re {\tilde{w}}(b_0)|) \\ {}&\quad \geqslant \min \Big (c_3 y_0 \Re {\tilde{w}}'(y_0), \Big (-c_3 y_0 + \frac{b_0-a_0}{2\sqrt{2}}\Big )\Re {\tilde{w}}'(y_0)\Big ) = c_3 y_0 \Re {\tilde{w}}'(y_0). \end{aligned} \end{aligned}$$

Thus there exists a non-empty subinterval \(I' = [a', b'] \subseteq I_0\) of length \(\gg (b_0 - a_0)\) such that \(|{\tilde{w}}(y)| \geqslant \frac{1}{10} c_3 c_1 y^{-{\tilde{\alpha }}}\) on \(I'\). Changing back to w, we obtain (5.7). \(\square \)

We are now prepared for the following analogue of Lemma 4. For a function E on \( {\mathbb {R}}^{n-1}_{>0}\) and \(X\in {\mathbb {R}}_{>0}^{n-1}\) define

$$\begin{aligned} E^{(X)}(y_1, \ldots , y_{n-1}) = E(X_1y_1, \ldots , X_{n-1}y_{n-1}). \end{aligned}$$
(5.8)

Lemma 6

Assume that \(\mu \) varies in some compact set \(\Omega \), and let \(Z \geqslant 1\). There exist \(r \in \mathbb {N}\) and a compact set \(S\subseteq {\mathbb {R}}_{>0}^{n-1}\) both depending only on \(\Omega \) (not on Z) and a finite collection of (measurable) functions \(E_1, \ldots , E_r : {\mathbb {R}}_{>0}^{n-1}\rightarrow {\mathbb {R}}\) h depending on \(\Omega \) and Z that are uniformly bounded (independent of Z) and supported in S such that

$$\begin{aligned} \sum _{j=1}^r |\langle E_j^{(Z, 1,\ldots , 1)}, W_{\mu }\rangle |^2 \gg _{\Omega } Z^{2\eta _1 +2 \sigma _{\pi }(\infty )} \end{aligned}$$

for \(\mu \in \Omega \) and \(\eta \) as in (2.7).

Proof

For \(Z \ll 1\) this is [5, Lemma 1]. For convenience we repeat the short argument in a slightly modified fashion that we will need later. There exists \(Z_0 > 0\) (depending only on \(\Omega \)) such that for each \(\mu \in \Omega \) we can choose an open set \(S_{\mu } \subseteq {\mathbb {R}}_{>0}^{n-1}\) such that \(|\Re W_{\mu }(y)| > 2Z_0\) for all \(y \in S_{\mu }\) or \(|\Im W_{\mu }(y)| > 2Z_0\) for all \(y \in S_{\mu }\). Next choose open neighbourhoods \(U_{\mu }\) about \(\mu \) such that \(|\Re W_{\mu ^{*}}(y)| > Z_0\) for all \(y \in S_{\mu }\) and all \(\mu ^{*} \in U_{\mu }\) or \(|\Im W_{\mu }(y)| > Z_0\) for all \(y \in S_{\mu }\) and all \(\mu ^{*} \in U_{\mu }\). By compactness we pick a finite collection of such neighbourhoods \(U_{\mu _1}, \ldots , U_{\mu _r}\) covering \(\Omega \), and define the corresponding \(E_j\) to be real-valued functions with support on \(S_{\mu _j}\) and non-vanishing on the interior \(\mathring{S}_{\mu _j}\).

Now suppose that Z is sufficiently large (in terms of \(\Omega \)). We try to mimic the proof of Lemma 4. Assume (without loss of generality by (5.1) and Weyl group symmetry) that \(\Re (-\mu _1) = \sigma _{\pi }(\infty )\), and with the notation as above let

$$\begin{aligned} \widehat{\texttt{W}}_{\mu }(s) := \frac{{\widehat{W}}^{\dagger }_{\mu }(s) }{s + \mu _1} = {\widehat{W}}^{*}_{\mu }(s) \prod _{j=2}^n (s_1 + \mu _j). \end{aligned}$$

Taking inverse Mellin transforms, we obtain

$$\begin{aligned} \texttt{W}_{\mu }(y) = {\mathcal {D}}_{\mu _2} \cdots {\mathcal {D}}_{\mu _n} W_{\mu }^{*}(y) \end{aligned}$$

where the differential operators are applied to the first variable \(y_1\). On the other hand, by Mellin inversion and (5.5) we have the asymptotic expansion

$$\begin{aligned} y_1^j \partial _{y_1}^j\texttt{W}_{\mu }(y) = \mu _1^jy_1^{\mu _1}W^{**}_{\mu }(y_2, \ldots , y_{n-1}) + O_{y_2, \ldots , y_{n-1}, \mu }\big (y_1^{\Re \mu _1 + 1/2}\big ) \end{aligned}$$

for \(y_1 \rightarrow 0\) and \(j \in \{0, 1\}\) where

$$\begin{aligned} W^{**}_{\mu }(y_2, \ldots , y_{n-1}) = W^{*}_{\mu ^{(1)}}(y_2, \ldots , y_{n-1}) \prod _{j=2}^{n-1} y_j^{\frac{n-j}{n-1}\mu _1} \prod _{k=2}^n\Gamma (1 + \mu _k - \mu _1). \end{aligned}$$

Whenever \(|W^{**}_{\mu }(y_2, \ldots , y_{n-1}) | \geqslant Z^{-1/2}\), say, and \(y_2, \ldots , y_n \asymp 1\) (with implied constants depending only on \(\Omega \)), we can apply repeatedly Lemma 5 to \(\texttt{W}_{\mu }(y)\) with \(\beta = \mu _2, \ldots , \mu _n\) to obtain two constants \(1/2< \gamma _1< \gamma _2 < 1\) with

$$\begin{aligned} |W_{\mu }^{*}(y) | \gg y_1^{-\sigma _{\pi }(\infty )} |W^{**}_{\mu }(y_2, \ldots , y_{n-1})| \end{aligned}$$

for \(y_1 \in [\gamma _1/Z^2, \gamma _2/Z^2]\). By the same argument as in the beginning of the proof, we can now choose a finite collection of functions \(E_j^{**} : {\mathbb {R}}_{>0}^{n-2} \rightarrow \mathbb {C}\) depending on \(\Omega \) (but not on Z, provided that Z is sufficiently large in terms of \(\Omega \)) such that \(\sum _j |\langle E^{**}_j, W^{**}_{\mu } \rangle |^2 \gg 1\) for \(\mu \in \Omega \), the inner product being restricted to the last \(n-2\) coordinates. Next define \(E^{*}_j(y_1, \ldots , y_n) = \delta _{\gamma _1 \leqslant y_1 \leqslant \gamma _2} E^{**}_j(y_2, \ldots , y_{n-1})\), so that

$$\begin{aligned} \sum _j \Bigl |\int _{{\mathbb {R}}_{>0}^{n-1}} E^{*}_j(Z^2 y_1, y_2, \ldots , y_{n-1}) \overline{W^{*}_{\mu }(y)} \frac{dy_1}{y_1} \cdots \frac{dy_{n-1}}{y_{n-1}} \Bigr |^2 \gg Z^{4\sigma _{\pi }(\infty )}. \end{aligned}$$

Finally changing variables \(y_j \leftarrow y_j^{1/2}/ \pi \) as in (5.4), we obtain

$$\begin{aligned} \begin{aligned} Z^{4\sigma _{\pi }(\infty )}&\ll \sum _j \Bigl |\int _{{\mathbb {R}}_{>0}^{n-1}} y^{-\eta } E_j^{*}(Z^2\pi ^2 y_1^2, \pi ^2 y_2^2, \ldots , \pi ^2 y_{n-1}^2)\\ {}&\qquad \quad \overline{W_{2\mu }(y)} \frac{dy_1}{y_1} \cdots \frac{dy_{n-1}}{y_{n-1}} \Bigr |^2\\&= Z^{-2\eta _1} \sum _j \bigl |\langle E^{(Z, 1, \ldots , 1)}_j, W_{2\mu }\rangle \big |^2 \end{aligned} \end{aligned}$$

upon defining \(E_j(y_1, \ldots , y_{n-1}) = y^{\eta } E_j^{*}(\pi ^2 y_1^2, \pi ^2 y_2^2, \ldots , \pi ^2 y_{n-1}^2)\). Re-normalizing \(\mu \) and \(\sigma _{\pi }(\infty )\) by division by 2, we obtain the lemma. \(\square \)

6 Poincaré series and the Kuznetsov formula

Let E be a fixed compactly supported (measurable) function on \({\mathbb {R}}_{>0}^{n-1}\), \(X \in {\mathbb {R}}_{>0}^{n-1}\) a “parameter” and define the right \(\textrm{O}_n({\mathbb {R}})\textrm{Z}^+\) invariant function \(F^{(X)} : \textrm{GL}_n({\mathbb {R}}) \rightarrow \mathbb {C}\) by

$$\begin{aligned} F^{(X)}(xyk\alpha ) = \theta (x) E^{(X)}(\textrm{y}(y)) \end{aligned}$$
(6.1)

for \(x \in U({\mathbb {R}})\), \(y \in {\tilde{T}}({\mathbb {R}})\), \(k \in \textrm{O}_n({\mathbb {R}})\), \(\alpha \in \textrm{Z}^{+} \) and \(\theta = \theta _{(1, \ldots , 1)}\) as in (2.3), \(E^{(X)}\) as in (5.8). For \(N \in \mathbb {N}^{n-1}\) we consider the Poincaré series

$$\begin{aligned} P^{(X)}_{N}(xy) = \sum _{\gamma \in U({\mathbb {Z}}) \backslash \Gamma _0(q)} F^{(X)}(\iota (N)\gamma xy). \end{aligned}$$

Note that \(F^{(X)}(\iota (N)xy) = \theta _{N}(x) E( X\cdot N \cdot \textrm{y}(y))\), cf. (4.4). Let \(N, M \in \mathbb {N}^{n-1}\). By [12, Theorem A] (with \(\rho = \text {triv}\), \(\nu _1 = \ldots = \nu _{n-1} = 0\)) we have

$$\begin{aligned} \begin{aligned}&\int _{U({\mathbb {Z}})\backslash U({\mathbb {R}})} P^{(X)}_{M}(xy) \theta _N(-x) \textrm{d}x\\&= \sum _{w\in W} \sum _{v \in V} \sum _{c \in \mathbb {N}^{n-1}} S_{q, w}^v( M, N, c) \int _{U_w({\mathbb {R}}) } F^{(X)}(\iota (M)c^{*}w xy )\theta _N^v(-x) \textrm{d}x. \end{aligned} \end{aligned}$$
(6.2)

For fixed y and fixed compact support of E, it follows from the two bounds in Lemma 1 with M in place of B that the c-sum runs over a finite set (depending on M, y and the support of F), and the \(U_w({\mathbb {R}})\)-integral runs over a compact domain (again depending on M, y and the support of F). In particular the right hand side is absolutely convergent (and the assumption \(\Re \nu _j > 2/n\) in [12, Theorem A] can be dropped; Friedberg works more generally with bounded E rather than compactly supported E). Without loss of generality we can assume that w is of the form (3.7).

Now let \(\varpi \) be a not necessarily cuspidal automorphic form occurring in the spectrum of \(L^2(\Gamma _0(q)\backslash {\mathcal {H}})\). By unfolding, (5.2) and a change of variables \(y \leftarrow \iota (N)y\), we have

$$\begin{aligned} \langle \varpi , P^{(X)}_{N}\rangle&= \int _{{\tilde{T}}({\mathbb {R}})} \int _{U({\mathbb {Z}})\backslash U({\mathbb {R}})} \varpi (xy) \theta _{ N}(-x) \overline{E^{(X)}(N \cdot \textrm{y}( y))} \textrm{d}x\, \textrm{d}^{*} y \\ {}&= N^{\eta } A_{\varpi }( N) \langle W_{\mu }, E^{(X)}\rangle \end{aligned}$$

where as before \(\mu = \mu _{\pi }(\infty )\). By Parseval we obtain

$$\begin{aligned} \langle P^{(X)}_{M}, P^{(X)}_{N} \rangle = N^{\eta } M^{\eta } \int _{(q)} \overline{A_{\varpi }(M)} A_{\varpi }(N) |\langle W_{\mu }, E^{(X)}\rangle |^2 \textrm{d}\varpi . \end{aligned}$$

On the other hand, by unfolding and (6.2) we can express \(\langle P^{(X)}_{M}, P^{(X)}_{N} \rangle \) as

$$\begin{aligned} \begin{aligned}&\int _{{\tilde{T}}({\mathbb {R}})} \int _{U({\mathbb {Z}})\backslash U({\mathbb {R}})} P^{(X)}_{M}(xy) \theta _{N}(-x) \overline{E^{(X)}(N \cdot \textrm{y}( y))} \textrm{d}x\, \textrm{d}^{*}y\\&= \sum _{w\in W} \sum _{v \in V} \sum _{c \in \mathbb {N}^{n-1}} S_{q, w}^v( M, N, c) \int _{{\tilde{T}}({\mathbb {R}})} \int _{U_w({\mathbb {R}}) } F^{(X)}(\iota (M)c^{*}w xy )\\ {}&\quad \theta _N^v(-x) \overline{E(X \cdot N \cdot \textrm{y}( y))} \textrm{d}x\, \textrm{d}^{*}y. \end{aligned} \end{aligned}$$

Let

$$\begin{aligned} A = \iota (X \cdot M)c^{*} w \iota (X \cdot N)^{-1} w^{-1} = \iota \big ( X \cdot M \cdot {}^w(X \cdot N) \big )c^{*} \in T({\mathbb {R}}),\qquad \end{aligned}$$
(6.3)

so that \(\textrm{y}(A)^{\eta } c_1 \cdots c_{n-1} = \big ( X \cdot M \cdot {}^w(X \cdot N) \big )^{\eta }\) by (2.9). We change variables \(y \leftarrow \iota (X \cdot N)y\), \(x \leftarrow \iota (X \cdot N)x \iota (X \cdot N)^{-1}\). By Lemma 2 we obtain

$$\begin{aligned} \begin{aligned}&\sum _{w\in W} \sum _{v \in V} \sum _{c \in \mathbb {N}^{n-1}} S_{q, w}^v( M, N, c) \frac{(X \cdot M)^{\eta } (X \cdot N)^{\eta }}{c_1 \cdots c_{n-1} \textrm{y}(A)^{\eta }} \\&\times \int _{{\tilde{T}}({\mathbb {R}})} \int _{U_w({\mathbb {R}}) } F^{(X)}(\iota (X)^{-1} A w xy ) \theta ^v(-x) \overline{E( \textrm{y}( y))} \textrm{d}x\, \textrm{d}^{*}y. \end{aligned} \end{aligned}$$

We conclude the following Kuznetsov-type formula.

Lemma 7

Let \(M, N \in \mathbb {N}^{n-1}\), \(X \in {\mathbb {R}}_{> 0}^{n-1}\), E a compactly supported function on \({\mathbb {R}}_{> 0}^{n-1}\) and define \(F^{(X)}\) as in (6.1). Then

$$\begin{aligned} \begin{aligned}&\int _{(q)} \overline{A_{\varpi }(M)} A_{\varpi }(N) |\langle W_{\mu }, E^{(X)}\rangle |^2 \textrm{d}\varpi \\&= \sum _{w\in W} \sum _{v \in V} \sum _{c \in \mathbb {N}^{n-1}} \frac{S_{q, w}^v(M, N, c) }{c_1 \cdots c_{n-1}} \frac{X^{2\eta }}{ \textrm{y}(A)^{\eta }} \\ {}&\int _{{\tilde{T}}({\mathbb {R}})} \int _{U_w({\mathbb {R}}) } F^{(X)}( \iota (X)^{-1} A w xy ) \theta ^v(-x) \overline{E( \textrm{y}( y))} \textrm{d}x\, \textrm{d}^{*}y \end{aligned} \end{aligned}$$
(6.4)

with A as in (6.3).

As mentioned before, the Kloosterman sum \(S_{q, w}^v(M, N, c)\) vanishes unless w is of the form (3.7), in which case we have the additional conditions (4.3), as well as (4.5) for \(i \not \in \{d_1, \ldots , d_1 + \ldots + d_{r-1}\}\). The c-sum is restricted by Lemma 1 and the support of E.

7 Proofs of Theorems 1, 2, 4

We start with the proof of Theorem 2. We specialize Lemma 7 to

$$\begin{aligned} M = N = (m,1, \ldots , 1), \quad X = (Z, 1, \ldots , 1) \end{aligned}$$

with \((m, q) = 1\). We need to bound the spectral side from below and the Kloosterman side from above. By (5.3) and positivity we have

$$\begin{aligned}{} & {} \sum _{\pi \in {\mathcal {F}}_I(q)} |\lambda _{\pi }(m)|^2 Z^{2\eta _1 + 2\sigma _{\pi }(\infty )} \nonumber \\ {}{} & {} \ll _I q^{n-1+\varepsilon } \int _{(q)} |A_{\varpi }(M)|^2 Z^{2\eta _1 + 2\sigma _{\pi }(\infty )} \delta _{\lambda _{\varpi } \in I} \, \textrm{d}\varpi . \end{aligned}$$
(7.1)

By Lemma 6 there is a finite set of compactly supported functions \(E_j\) such that

$$\begin{aligned} Z^{2\eta _1 + 2\sigma _{\pi }(\infty )} \delta _{\lambda _{\varpi } \in I} \ll _I \sum _j |\langle W_{\mu _{\varpi }}, E^{(X)}_j\rangle |^2 . \end{aligned}$$

Thus in order to bound the left hand side of (7.1) it suffices to consider the right hand side of (6.4) for a fixed \(E^{(X)} = E^{(X)}_j\), and we are left with bounding

$$\begin{aligned} \begin{aligned} q^{n-1+\varepsilon } \sum _{w\in W} \sum _{v \in V}&\sum _{c \in \mathbb {N}^{n-1}} \frac{S_{q, w}^v(M, N, c) }{c_1 \cdots c_{n-1}} \frac{X^{2\eta }}{ \textrm{y}(A)^{\eta }} \\&\times \int _{{\tilde{T}}({\mathbb {R}})} \int _{U_w({\mathbb {R}}) } F^{(X)}( \iota (X)^{-1} A w xy ) \theta ^v(-x) \overline{E( \textrm{y}( y))} \textrm{d}x\, \textrm{d}^{*}y. \end{aligned} \end{aligned}$$

For \(w = \text {id}\) we have \(c_1 = \ldots =c_{n-1} = 1\) and hence \(A = I_n\), and the contribution is \( O(q^{n-1+\varepsilon }Z^{2\eta _1})\).

Let us now consider the remaining w of the form (3.7). First we bound the moduli \(c_j\). To this end we apply Lemma 1 with \(B = X \cdot M \cdot {}^w(X \cdot N)\), so that by (2.6) we obtain \(B_1 = B_{n-d_1} = mZ\), \(B_{n-d_1+1} = 1/(mZ)\) if \(d_1 > 1\) and \(B_j=1\) for all other indices. This gives

$$\begin{aligned} c_j \ll (mZ)^{s(1, j) + s(n-d_1, j) - s(n - d_1 + 1, j)} = {\left\{ \begin{array}{ll} mZ, &{} j \leqslant n - d_1,\\ 1, &{} j > n - d_1. \end{array}\right. } \end{aligned}$$
(7.2)

We assume that \(mZ \ll q^2\) with a sufficiently small implied constant, so that \(c_j < q^2\) for all j. We may also assume that q is sufficiently large, otherwise there is nothing to prove. Now suppose that \(d_1 > 1\) (but \(d_1 < n\) since \(w \not = \text {id}\)). Then by (4.5) with \(i = d_1 - 1\) we have \(c_{n-d_1 + 2} c_{n- d_1} = \pm c_{n- d_1 + 1}^2 m\). Using (4.3) and comparing the q-adic valuation on both sides, we conclude from (4.3) that both \(c_{n-d_1 + 1}\) and \(c_{n- d_1 + 2}\) are divisible by q, which contradicts (7.2) for q sufficiently large. Hence \(d_1 = 1\), and we see from (4.3) that all \(c_j\) are divisible by q. We write \(c_j = qc_j'\). By (4.6)

we obtain

$$\begin{aligned} S_{q, w}^v(M, M, c) = S^v_{q, w}(*, *, (q, \ldots , q)) S_{1, w}^v(M, ({\bar{q}}m, 1, \ldots , 1, {\bar{q}}), c') \end{aligned}$$
(7.3)

where \(*\) is coprime to q. By Theorem 3 the first factor on the right hand side vanishes unless \(w = w_{*}\), in other words, only the trivial Weyl element and \(w_{*}\) survive.

(As an aside: if we only wanted to prove Sarnak’s density original density hypothesis with an exponent \(n- 1 - 2\sigma + \varepsilon \) in Theorem 1, then upon choosing \(mZ \ll q\) with a sufficiently small constant, all Weyl elements except the identity would vanish and no further analysis would be necessary. That in the stronger set-up \(mZ \ll q^2\) only \(w_{*}\) needs to be considered is an artefact of q being prime).

Our next aim is to estimate

$$\begin{aligned} \begin{aligned}&\Big | \int _{{\tilde{T}}({\mathbb {R}})} \int _{U_{w_{*}}({\mathbb {R}}) } F^{(X)}(\iota (X)^{-1}A w_{*} xy ) \theta ^v(-x) \overline{E( \textrm{y}( y))} \textrm{d}x\, \textrm{d}^{*}y\Big | \\&\leqslant \int _{{\tilde{T}}({\mathbb {R}})} \int _{U_{w_{*}}({\mathbb {R}}) } |E(\textrm{y}(A w_{*} xy) ) E( \textrm{y}( y))| \textrm{d}x\, \textrm{d}^{*}y. \end{aligned} \end{aligned}$$

By Lemma 1 and then Lemma 3 the right hand side is bounded by

$$\begin{aligned} \begin{aligned}&\ll _{\,E} \,\textrm{vol}\Big \{x \in U_{w_*}({\mathbb {R}}) \mid \Delta _j(w_{*}x) \ll _E \prod _{i=1}^{n-1} \textrm{y}(A)_i^{s(i, j)}, 1 \leqslant j \leqslant n-1\Big \} \\&\ll _{\,E} \,\prod _{i=1}^{n-1}\prod _{j=1}^{n-1} \textrm{y}(A)_i^{s(i, j)(1+\varepsilon )} = \textrm{y}(A)^{\eta (1+\varepsilon )} \end{aligned} \end{aligned}$$
(7.4)

since \(\sum _{i} s(i, j) = \eta _j\) by (3.2) and (2.7).

Summarizing the previous estimations (and changing the value of \(\varepsilon \)) and applying Theorem 3 to the first factor in (7.3), we obtain

$$\begin{aligned} \begin{aligned}&\sum _{\pi \in {\mathcal {F}}_I(q)} |\lambda _{\pi }(m)|^2 Z^{2\eta _1 + 2\sigma _{\pi }(\infty )} \ll _I Z^{2\eta _1} q^{n-1+\varepsilon } \\&\qquad \times \Big (1 + q^{n-2} \sum _{v \in V} \sum _{c'_1, \ldots , c_{n-1}' \ll mZ/q} \frac{|S^v_{1, w_{*}}(M, ({\bar{q}}m, 1, \ldots , 1, {\bar{q}}), c')|}{q^{n-1} c_1' \cdots c_{n-1}'}\Big ). \end{aligned} \end{aligned}$$

For the Weyl element \(w_{*}\) the consistency relations (4.5) impose serious restrictions on the moduli \(c_1', \ldots , c_{n-1}'\). We apply (4.5) with \(i = 2, \ldots , n-2\) getting \((c'_i)^2 = c'_{i-1}c'_{i+1}\) for \(i = 2, \ldots , n-2\). If \(n \geqslant 4\), then \(c'_{2}\) fixes \(c_1'\) and \(c_3'\) up to a divisor function, and inductively also \(c_4', \ldots , c_{n-1}'\). Using the trivial bound (4.7), we finally obtain (again changing the value of \(\varepsilon \))

$$\begin{aligned} \sum _{\pi \in {\mathcal {F}}_I(q)} |\lambda _{\pi }(m)|^2 Z^{ 2\sigma _{\pi }(\infty )} \ll _I q^{n-1+\varepsilon } \Big (1 + \frac{q^{n-2}}{q^{n-1}} \sum _{ c_{2}' \ll mZ/q} 1 \Big ) \ll q^{n-1+\varepsilon } \end{aligned}$$

provided \(mZ \ll q^2\). In the case \(n=3\) we quote from [5, (4.2) with \(N=1\)] the average Weil-type bound

$$\begin{aligned} \sum _{c_1, c_2 \leqslant X} |S^v_{w^{*}}((m, 1), ({\bar{q}}m, {\bar{q}}), c')| \ll X^3 (Xm)^{\varepsilon } \end{aligned}$$

to obtain again

$$\begin{aligned} \sum _{\pi \in {\mathcal {F}}_I(q)} |\lambda _{\pi }(m)|^2 Z^{ 2\sigma _{\pi }(\infty )} \ll _I q^{2+\varepsilon } \Big (1 + \frac{1}{q} \cdot \frac{mZ}{q} \Big ) \ll q^{2+\varepsilon } \end{aligned}$$

for \(mZ \ll q^2\). This completes the proof. \(\square \)

The proof of Theorem 4 is a simple variation. Again by positivity and (5.3) we have

$$\begin{aligned} \begin{aligned} \sum _{\pi \in {\mathcal {F}}_I(q)}&\Big | \sum _{\begin{array}{c} m \leqslant x\\ (m, q) = 1 \end{array}} \alpha (m) \lambda _{\pi }(m) \Big |^2 \\ {}&\ll q^{n-1+\varepsilon } \int _{(q)} \Big | \sum _{\begin{array}{c} m \leqslant x\\ (m, q) = 1 \end{array}} \alpha (m)A_{\varpi }(M) \Big |^2 \delta _{\lambda _{\varpi } \in I} \, \textrm{d}\varpi \\&= q^{n-1+\varepsilon } \sum _{\begin{array}{c} m_1, m_2 \leqslant x\\ (m_1m_2, q) = 1 \end{array}}\alpha (m_1) \overline{\alpha (m_2)} \int _{(q)} A_{\varpi }(M_1) \overline{A_{\varpi }(M_2)} \delta _{\lambda _{\varpi } \in I} \, \textrm{d}\varpi \end{aligned} \end{aligned}$$

where \(M = (m, 1, \ldots , 1)\), \(M_1 = (m_1, 1, \ldots , 1)\), \(M_2 = (m_2, 1, \ldots , 1)\). We detect the condition \(\delta _{\lambda _{\varpi } \in I}\) by a finite collection of test functions \(E_j\) with \(Z = 1\) as in the previous proof and apply Lemma 7. For \(w \not = \text {id}\) the analogue of (7.2) is

$$\begin{aligned} c_j \ll m_2^{s(1, j) } m_1^{s(n-d_1, j) - s(n - d_1 + 1, j)} \leqslant x \end{aligned}$$

which contradicts (4.3) for \(x \ll q\) (with a sufficiently small implied constant) since \(d_1 \not = n\). So only the trivial Weyl element survives, and we obtain the desired bound. \(\square \)

Corollary 5 follows easily Theorem 4 by observing that an approximate functional equation has length \(q^{1/2}\) (see [18, Section 5]): for all but O(1) cuspidal representations \(\pi \in {\mathcal {F}}_I(q)\) (and \(\varepsilon < 1/2\)) we have

$$\begin{aligned} |L(1/2 + it, \pi )|^2 \ll _{I, t, n, \varepsilon } q^{\varepsilon } \sum _{ 2^j = M \leqslant q^{1/2 + \varepsilon } } \frac{1}{M} \Bigl | \sum _{ M \leqslant m \leqslant 2M } \lambda _{\pi }(m) \Bigr |^2 \end{aligned}$$

and the desired bound follows directly from Theorem 4. Note that the shape of the ramified coefficients (i.e. \(q \mid m\)) is irrelevant and the condition \((m, q) = 1\) in Theorem 4 is void.

Finally we derive Theorem 1 from Theorem 2. Let us first assume that \(v = p\not = q\) is a fixed prime. We choose \(\nu _0\) maximal so that \( p^{\nu _0} \ll q^2\) with an implied constant that is admissible for Theorem 2. We conclude from Theorem 2 with \(Z = 1\), \(m = p^{\nu }\) and Lemma 4 that

$$\begin{aligned} N_p(\sigma , {\mathcal {F}}_I(q))\leqslant & {} \sum _{\pi \in {\mathcal {F}}_I(q)} \frac{p^{2\nu _0 \sigma _{\pi }(p)}}{p^{2\nu _0 \sigma }} \ll \frac{1}{q^{4\sigma }} \sum _{\nu _0 - n \leqslant \nu \leqslant \nu _0} \sum _{\pi \in {\mathcal {F}}_I(q)} |\lambda _{\pi }(p^{\nu })|^ 2\\ {}\ll & {} q^{n-1-4\sigma + \varepsilon }. \end{aligned}$$

For \(v = \infty \), Theorem 1 follows directly from Theorem 2 with \(m=1\), \(Z \ll q^2\) (again with a sufficiently small implied constant). \(\square \).