1 Introduction

1.1 Classical Kloosterman sums

Kloosterman sums belong the most universal exponential sums in number theory, algebra and automorphic forms. The classical Kloosterman sum for parameters \(m, m'\in \mathbb {Z}\) and a modulus \(c \in \mathbb {N}\) is

$$\begin{aligned} S(m, m', c) =\underset{d\, (\text {mod }c)}{\left. \sum \right. ^{*}} e\Big (\frac{md + m'{\bar{d}}}{c}\Big ) \end{aligned}$$
(1.1)

where \(d{\bar{d}} \equiv 1\) (mod c) and the asterisk indicates that the sum is over \((d, c) = 1\). Kloosterman [10] introduced this type of exponential sum in his thesis as a crucial tool to study the representation of integers by positive quaternary quadratic forms. Since then they have been ubiquitous in number theory, for instance as the finite Fourier transform of exponential sums containing modular inverses, as Fourier coefficients of classical Poincaré series, in various instances of delta-symbol methods and perhaps most prominently in the relative trace formula of Petersson-Kuznetsov type (to which Poincaré series are a precursor). One of their key properties is the Weil bound [16] (complemented by Salié for powerful moduli [13])

$$\begin{aligned} |S(m, m', c)| \le \tau (c) c^{1/2} (m, m', c)^{1/2} \end{aligned}$$
(1.2)

where \(\tau \) denotes the divisor function. This essentially exhibits square root cancellation relative to the trivial bound \(|S(m, m', c)| \le \phi (c) \) where \(\phi \) is Euler’s \(\phi \)-function.

We proceed to describe the more general set-up of Kloosterman sums. Let G be a reductive group, T the maximal torus, and U the standard maximal unipotent subgroup. Let \(N = N_G(T)\) be the normalizer of T in G, \(W = N/T\) the Weyl group and \(\omega : N \rightarrow W\) the quotient map. For \(w\in W\), we define \(U_w = U \cap w^{-1} U^{\top } w\) and \({\bar{U}}_w = U \cap w^{-1} U w\). For \(n \in N(\mathbb {Q}_p)\) and a finite index subgroup \(\Gamma \subseteq G(\mathbb {Z}_p)\) we define

$$\begin{aligned} C(n) = U(\mathbb {Q}_p) nU(\mathbb {Q}_p) \cap \Gamma , \quad X(n) = U(\mathbb {Z}_p)\backslash C(n)/U_{\omega (n)}(\mathbb {Z}_p) \end{aligned}$$

and the projection maps

$$\begin{aligned} u: X(n) \rightarrow U(\mathbb {Z}_p)\backslash U(\mathbb {Q}_p), \quad u': X(n) \rightarrow U_{\omega (n)}(\mathbb {Q}_p)/U_{\omega (n)}(\mathbb {Z}_p). \end{aligned}$$

Let \(\psi , \psi '\) be two characters on \(U(\mathbb {Q}_p)\) that are trivial on \(U(\mathbb {Z}_p)\). The (local) Kloosterman sum is then defined to be

$$\begin{aligned} \text {Kl}_p(\psi , \psi ', n) = \sum _{x = u(x) n u'(x) \in X(n)} \psi (u(x)) \psi '(u'(x)). \end{aligned}$$
(1.3)

In this paper we will take \(\Gamma = G(\mathbb {Z}_p)\), but we may think of \(\Gamma \) as a more general “congruence subgroup”. Of course, by the Chinese remainder theorem it suffices to study local Kloosterman sums. There is some flexibility in the definition. For instance, some authors have \(U_{\omega (n)}\) on the left and U on the right.

Example 1

For \(G = \textrm{GL}(2)\), \(n = \left( {\begin{matrix} &{} -1/c\\ c &{} \end{matrix}}\right) \) (so that \(U_{\omega (n)} = U)\) with \(c = p^r\), say, \(\psi \left( \left( {\begin{matrix} 1 &{} x\\ {} &{} 1\end{matrix}}\right) \right) = \psi _0(mx)\), \(\psi '\left( \left( {\begin{matrix} 1 &{} y\\ {} &{} 1\end{matrix}}\right) \right) = \psi _0(m'y)\) for the standard additive character \(\psi _0\) on \(\mathbb {Q}_p/\mathbb {Z}_p\) and \(m, m' \in \mathbb {Z}\) we have

$$\begin{aligned} \left( \begin{matrix}1 &{} x\\ {} &{} 1\end{matrix}\right) n \left( \begin{matrix}1 &{} y\\ {} &{} 1\end{matrix}\right) = \left( \begin{matrix}cx &{} -1/c + cxy\\ c&{} cy\end{matrix}\right) \in \textrm{GL}_2(\mathbb {Z}_p) \end{aligned}$$

if and only if \(x, y \in p^{-r}\mathbb {Z}_p/\mathbb {Z}_p\), \(xy - p^{-2r} \in p^{-r}\mathbb {Z}_p\), and we recover (1.1).

Example 2

For \(G = \textrm{GL}(3)\) and \(n = \left( {\begin{matrix} &{}&{} -1/c_1\\ {} &{}c_1/c_2 &{} \\ c_2&{}&{}\end{matrix}}\right) \) in the big Bruhat cell an explicit form of the corresponding Kloosterman sum becomes already much more complicated. Using Plücker coordinates, Bump, Friedberg and Goldfeld [4, Section 4] derived for two general characters \(\psi _{n_1, n_2}\) and \(\psi _{m_1, m_2}\) of 3-by-3 upper triangular unipotent matrices the explicit formula for the global Kloosterman sum

$$\begin{aligned} \sum _{\begin{array}{c} B_1, C_1 \, (\text {mod }c_1)\\ B_2, C_2\, (\text {mod } c_2)\\ (B_j, C_j, c_j) =1\\ c_1c_2 \mid c_1C_2 + B_1B_2 + C_1c_2 \end{array}} e\Big (\frac{m_1B_1 + n_1(Y_1c_2 - Z_1B_2)}{c_1} + \frac{m_2B_2 + n_2(Y_2c_1 - Z_2B_1)}{c_2}\Big ) \end{aligned}$$

where \(Y_1, Y_2, Z_1, Z_2\) are chosen such that \(Y_jB_j + Z_jC_j \equiv 1\, (\text {mod } c_j)\) for \(j = 1, 2\).

We see from these examples that Kloosterman sums, while defined rather naturally in terms of the Bruhat decomposition, turn out to be extremely complicated exponential sums. We are well equipped with deep technology from algebraic geometry and p-adic analysis to bound general exponential sums in a relatively sharp fashion, but it is unfortunately not clear a priori how to apply these to general Kloosterman sums given by (1.3). In fact, even determining the size of X(n) and hence the “trivial” bound for (1.3) (ignoring cancellation in the characters) is a deep result of Dąbrowski and Reeder [6], see Lemma 2 below.

One case is classical: for \(G = \textrm{GL}(n)\) and the Weyl element \(\left( {\begin{matrix} &{} 1 \\ I_{n-1} &{} \end{matrix}}\right) \) (“Voronoi element”) the Kloosterman sum (1.3) becomes a hyper-Kloosterman sum [8, Theorem B] in the sense of Deligne and hence its size is well-understood. In addition we have good bounds for Kloosterman sums on \(\textrm{GL}(3)\) [4, 5, 15], \(\textrm{Sp}(4)\) [11] and for the long Weyl element on \(\textrm{GL}(4)\) [9, Appendix]. In all other cases, the best we know is the “trivial” bound of Dąbrowski-Reeder [6]. A typical strategy, employed for instance in [3, 4, 9], is to use Plücker coordinates to understand the Bruhat decomposition of \(\textrm{GL}(n)\) in an explicit fashion. Unfortunately, for large n the Plücker relations become extremely complicated, and it seems that this path is not very promising in general.

In this paper, we investigate Kloosterman sums for the general linear group \(G = \textrm{GL}(n+1)\) for arbitrary \(n \geqslant 2\). (The notation \(n+1\) in place of n is slightly more convenient.) We parametrize the unipotent upper triangular matrices U of dimension \(n+1\) as

$$\begin{aligned} u = \left( {\begin{matrix} 1 &{}u_{11} &{} u_{12} &{} {\cdot \,\cdot \,\cdot }&{} u_{1n}\\ {} &{} 1 &{} u_{22} &{} {\cdot \,\cdot \,\cdot } &{} u_{2n}\\ &{}&{} \ddots &{}&{} \vdots \\ {} &{}&{}&{}1 &{} u_{nn}\\ {} &{}&{}&{}&{} 1\end{matrix}}\right) . \end{aligned}$$

We consider two characters \(\psi , \psi '\) of \(U(\mathbb {Q}_p)/U(\mathbb {Z}_p)\), defined by

$$\begin{aligned} \psi (u) = e\Big (\sum _{j=1}^n \psi _j u_{jj}\Big ), \quad \psi '(u) = e\Big (\sum _{j=1}^n \psi '_j u_{jj}\Big ) \end{aligned}$$
(1.4)

for \(\psi _j, \psi '_j \in \mathbb {Z}\), \(1\le j \le n\). The Weyl group of G consists of \((n+1)!\) permutation matrices, but only Weyl elements of the form

with identity matrices \(I_{d_j}\) of dimension \(d_j\) lead to well-defined objects (see [8, p. 175]), which reduces the number of relevant Weyl elements to \(2^n\). We consider two of them, namely the long Weyl element

and with a particular application in mind (to be described in Subsection 1.4)

$$\begin{aligned} w_{*} = \left( {\begin{matrix} &{} &{} \pm 1\\ &{} -I_{n-1}&{} \\ 1 &{} &{} \end{matrix}}\right) . \end{aligned}$$

The chosen representatives of the Weyl elements \(w_l\) and \(w_{*}\) here satisfy that the determinant of the \((k\times k)\)-minors formed by the bottom k rows and appropriate columns have positive determinant. The actual choice of the representatives is unimportant to the theory, but this particular choice makes the computations in later sections more convenient.

We write a typical modulus as \(C = (p^{r_1}, \ldots , p^{r_n})\) with \(r_j \in \mathbb {N}_0\) which we embed into the torus T as \(C^{*} = \text {diag}(p^{-r_1}, p^{r_1 - r_{2}}, \ldots , p^{r_{n-1} - r_n}, p^{r_n})\). The first main achievement of this paper in an explicit and reasonably compact expression of the considered Kloosterman sums as exponential sums which we are going to describe in the following two subsections. We try to present this in a user-friendly way, which may ease further investigations.

1.2 Kloosterman sums for \(G = \textrm{GL}(n+1)\) for the long Weyl element

Our stratification starts with the following decomposition. Let a modulus C be given as above with an exponent vector \(r = (r_1, \ldots , r_n)\). Let

$$\begin{aligned} \mathcal M_{w_l}(r):= \Big \{\underline{m} = (m_{ij})_{1\le i \le j\le n} \in \mathbb {N}_0^{n(n+1)/2} \mid \sum _{i\le k \le j} m_{ij} = r_k, \; 1\le k \le n\Big \}.\nonumber \\ \end{aligned}$$
(1.5)

For \(\underline{m} \in \mathcal M_{w_l}(r)\) we define

$$\begin{aligned} \mathcal C_{w_l}(\underline{m}):= \Big \{\underline{c} = (c_{ij})_{1\le i \le j \le n} \mid c_{ij} \in \mathbb {Z}/{\textstyle \left( {\prod _{k=j}^n p^{m_{ik}}}\right) } \mathbb {Z}, \; (c_{ij},p^{m_{ij}}) = 1\Big \} \end{aligned}$$

as well as the partial Kloosterman sum \(\textrm{Kl}_{p}({\underline{m}}, \psi , \psi ', w_l)\) as

$$\begin{aligned}{} & {} \displaystyle \sum _{{\underline{c}} \in {\mathcal {C}}_{w_l}({\underline{m}})} e\Bigg (\displaystyle \sum _{j=1}^n \displaystyle \sum _{i=1}^j\psi _j c_{i(j-1)} \overline{c_{ij}} \prod _{k=1}^{i-1} p^{m_{k(j-1)}} \prod _{k=1}^i p^{-m_{kj}}\nonumber \\{} & {} \qquad + \sum _{i=1}^n \displaystyle \sum _{j=1}^i\psi '_i c_{(n+1-i)(n+1-j)} \overline{c_{(n+2-i)(n+1-j)}} \prod _{k=1}^{j-1} p^{m_{(n+2-i)(n+1-k)}}\nonumber \\{} & {} \qquad \qquad \qquad \times \displaystyle \prod _{k=1}^j p^{-m_{(n+1-i)(n+1-k)}}\Bigg ). \end{aligned}$$
(1.6)

Here we employ the convention that \(c_{ij} = 1\) and \(m_{ij} = 0\) for \(i>j\). Moreover, for \(1 \le i \le j \le n\) we set \(\overline{c_{ij}} = 0\) when \(m_{ij}=0\), and of course a bar means the modular inverse. It is not obvious from the definition, but will follow from the proof, that this expression is well-defined (i.e. independent of the system of representatives chosen for the \(c_{ij}\)).

As an example, the case \(n=3\) (i.e. \(G=\textrm{GL}(4)\)) is spelled out explicitly in (8.1) in the appendix, where in addition all Weyl elements for \(\textrm{GL}(4)\) are analyzed.

We are now ready to state our first main result.

Theorem 1

For \(C = (p^{r_1}, \ldots , p^{r_n})\) we have with the above notation

$$\begin{aligned} \textrm{Kl}_{p}( \psi , \psi ', C^{*}w_l) = \sum _{{\underline{m}} \in {\mathcal {M}}_{w_l}(r)} \textrm{Kl}_{p}({\underline{m}}, \psi , \psi ', w_l) \end{aligned}$$

where \(\textrm{Kl}_{p}({\underline{m}}, \psi , \psi ', w_l)\) is defined in (1.6).

Remarks: The exact formula of Theorem 1 has a number of nice features.

First of all, we see the “trivial bound” of Dąbrowski-Reeder with bare eyes by simply counting the number of terms in the summation:

$$\begin{aligned} \#{\mathcal {C}}_{w_l}({\underline{m}}) \le \prod _{1 \le i \le j \le n} \prod _{k = j}^n p^{m_{ik}} = \prod _{1 \le i \le j \le n} p^{(j-i+1)m_{ij}} = \prod _{1 \le k \le n} p^{r_k} \end{aligned}$$
(1.7)

for \({\underline{m}} \in {\mathcal {M}}_{w_l}(r)\) as defined in (1.5). We are now in a position to exploit cancellation to obtain non-trivial bounds in Corollary 1 below.

It is also structurally quite interesting because it consists of nested \(\textrm{GL}(2)\) Kloosterman sums. This is somewhat reminiscent of the archimedean case where the (non-degenerate) \(\textrm{GL}(n)\)-Whittaker function can be expressed as nested integrals of Bessel K-functions (i.e. \(\textrm{GL}(2)\) Whittaker functions), see [15].

The exact formula can also be used for certain exact evaluations (see below), it can be generalized to congruence subgroups, and can perhaps ultimately be a means for deeper methodological machinery in analytic number theory such as Poisson summation etc.

Corollary 1

With the notation as above, there exists a \(\delta = \delta _n> 0\) such that

$$\begin{aligned} \textrm{Kl}_{p}( \psi , \psi ', C^{*}w_l) \ll \big (\max _{1 \le j \le n} |\psi _j|^{-1/2}_p \big ) \Big (\prod _{1 \le k \le n} p^{r_k}\Big )^{1-\delta }. \end{aligned}$$

Xinchen Miao kindly informed us that he obtained independently and by a slightly different method a similar bound in [12, Theorem 5.1].

In Subsection 5.3 we will give a quick argument that shows that one can choose \(\delta \gg 1/n^2\). It is an interesting question whether \(\delta \) can be chosen independently of n. In this particular situation we do not know the answer, but for general Weyl elements the answer is NO, as we will see in the next subsection.

1.3 Kloosterman sums for \(G = \textrm{GL}(n+1)\) for \(w_{*}\)

We now establish similar results for the Weyl element \(w_{*}\). For this Weyl element we have relatively severe restrictions on the moduli \(C = (p^{r_1}, \ldots , p^{r_n})\), see [8, Proposition 1.3]. For instance, if \(\psi , \psi '\) satisfy \(p \not \mid \psi _j\psi _j'\) and \(n \geqslant 3\), then the exponents \(r_k\) need to form an arithmetic progression.

In the present situation we define

$$\begin{aligned} \mathcal M_{w_{*}}(r):= \Big \{\underline{m} = (m_{ij})_{\begin{array}{c} 1\le i \le j\le n\\ i = 1 \text { or } j = n \end{array}} \in \mathbb {N}_0^{2n-1} \mid \sum _{i\le k \le j} m_{ij} = r_k, \; 1\le k \le n\Big \}. \end{aligned}$$

For \(\underline{m} \in \mathcal M_{w_{*}}(r)\) we define

$$\begin{aligned} \mathcal C_{w_{*}}(\underline{m}) \!=\! \left\{ {\underline{c} = (c_{ij})_{\begin{array}{c} 1\le i \le j \le n\\ i = 1 \text { or } j = n \end{array}}}\; \bigg |\; {\begin{array}{l} c_{1j} \in \mathbb {Z}/{\textstyle \prod _{k=j}^n} p^{m_{1k}}\mathbb {Z},\; 1\le j \le n,\\ c_{in}\in \mathbb {Z}/{\textstyle \prod _{k=2}^i} p^{m_{kn}}\mathbb {Z},\; 2\le i \le n,\end{array} \; (c_{ij},p^{m_{ij}})=1}\right\} \end{aligned}$$

as well as the partial Kloosterman sum \(\textrm{Kl}_{p}({\underline{m}}, \psi , \psi ', w_{*})\) as

$$\begin{aligned}&\sum _{{\underline{c}} \in {\mathcal {C}}_{w_{*}}({\underline{m}})} e\Bigg (\psi _1 \frac{\overline{c_{11}}}{ p^{m_{11}}} + \sum _{j=2}^n \psi _j \Big (\frac{c_{1(j-1)}\overline{c_{1j}}}{ p^{m_{1j}}} + \frac{c_{(j+1)n}\overline{c_{jn}}}{ p^{-m_{1(j-1)} + m_{1j} + m_{jn}}} \Big )\nonumber \\&\quad + \psi '_1 \frac{c_{2n}}{ p^{m_{2n}}} + \psi '_n \Big ( \frac{c_{1n}\overline{c_{nn}}}{ p^{m_{1n}}} + \frac{c_{1(n-1)}}{ p^{-m_{nn}+m_{1(n-1)}+m_{1n}}}\Big )\Bigg ). \end{aligned}$$
(1.8)

Again we employ the convention that \(c_{ij} = 1\) and \(m_{ij} = 0\) for \(i>j\). Moreover, for \(1 \le i \le j \le n\) we set \(\overline{c_{ij}} = 0\) when \(m_{ij}=0\). For the case \(n=3\), we refer to (8.2).

Theorem 2

For \(C = (p^{r_1}, \ldots , p^{r_n})\) we have with the above notation

$$\begin{aligned} \textrm{Kl}_{p}( \psi , \psi ', C^{*}w_{*}) = \sum _{{\underline{m}} \in {\mathcal {M}}_{w_{*}}(r)} \textrm{Kl}_{p}({\underline{m}}, \psi , \psi ', w_{*}) \end{aligned}$$

whenever the Kloosterman sum on the left hand side is well-defined.

Corollary 2

There exists a \(\delta = \delta _n> 0\) such that

$$\begin{aligned} \textrm{Kl}_{p}( \psi , \psi ', C^{*}w_{*}) \ll \big (\max _{1 \le j \le n} |\psi _j|^{-1/2}_p \big ) \Big (\prod _{1 \le k \le n} p^{r_k}\Big )^{1-\delta }. \end{aligned}$$

Our proof will show that one can take \(\delta \gg 1/n\). This should be seen in light of the following exact evaluation.

Corollary 3

For \(C = (p, \ldots , p)\) and characters \(\psi , \psi '\) with \(p \not \mid \psi _j\psi _j'\) for \(1 \le j \le n\) we have \(\textrm{Kl}_{p}( \psi , \psi ', C^{*}w_*) = p^{n-1} + p^{n-2}\).

This shows that Kloosterman sums can be fairly large. In particular, up to the constant the saving of size \(\delta _n \asymp 1/n\) in Corollary 2 is asymptotically best possible. Corollary 3 is a variation of [2, Theorem 3], but proved by a completely different argument, since [2, Theorem 3] is special to the congruence subgroup \(\Gamma _0(p)\). A similar, but more involved computation shows for instance

$$\begin{aligned} \textrm{Kl}_{p}(\psi , \psi ', C^{*}w_l) = (-1)^n (p+1) \end{aligned}$$

for \(C = (p, \ldots , p)\) and characters \(\psi , \psi '\) with \(p \not \mid \psi _j\psi _j'\) for \(1 \le j \le n\) (generalizing [4, Property 4.10] for \(n=2\)).

1.4 An application

Kloosterman sums come up most prominently in the Kuznetsov formula. Up until now, in higher rank one could at best employ the trivial bound. With non-trivial bounds at hand, we can now refine the main result of [1].

The Ramanujan conjecture for the group \(\textrm{GL}(n)\) states that cuspidal automorphic representations are tempered. This is way out of reach, but one may hope to prove that it holds in reasonable families in the following quantitative average sense: automorphic forms that are “far away” from being tempered should occur “rarely”. With this in mind, let \({\mathcal {F}} ={\mathcal {F}}_{\Gamma (q)}(M)\) be the finite family of cusp forms \(\varpi \) (eigenforms for the unramified Hecke algebra) for the principal congruence subgroup \(\Gamma (q) \subseteq \textrm{SL}_n(\mathbb {Z})\) with bounded spectral parameter \(\Vert \mu _{\varpi , \infty }\Vert \le M\) (see [1] for more details). For a place v of \(\mathbb {Q}\) and \(\sigma \geqslant 0\) define

$$\begin{aligned} {\mathcal {N}}_v(\sigma , {\mathcal {F}}) = \#\{ \varpi \in {\mathcal {F}} \mid \sigma _{\varpi ,v} \geqslant \sigma \} \end{aligned}$$

where \(\sigma _{\varpi ,v} = \max _j |\Re \mu _{\varpi ,v}(j)|\). The trivial bound is \(N_v(\sigma , {\mathcal {F}}) \le N_v(0, {\mathcal {F}}) = \#{\mathcal {F}}\). On the other hand, we have \(\sigma _{\text {triv}, v} = \frac{1}{2}(n-1)\). One might hope that a trace formula can interpolate linearly between these two bounds:

$$\begin{aligned} {\mathcal {N}}_v(\sigma , {\mathcal {F}}) \ll _{v, \varepsilon , n} M^{O(1)} [\textrm{SL}_n(\mathbb {Z}): \Gamma (q)]^{1 - \frac{2\sigma }{n-1} + \varepsilon } \end{aligned}$$

where \(M^{O(1)}[\textrm{SL}_n(\mathbb {Z}): \Gamma (q)]\) should be thought of as a proxy for \(\#{\mathcal {F}}\). This is a version of Sarnak’s density conjecture [14] and was proved in [1] for squarefree q. Of course, the trivial representation is not cuspidal, and so one may hope to do even better, but even for \(n=2\) this is a very hard problem when the Selberg trace formula is employed. On the other hand, the Kuznetsov formula is better suited in this situation, since the residual spectrum is a priori excluded. We use this observation to go beyond Sarnak’s density conjecture in the scenario described above for prime moduli (the primality assumption is used before (7.1)).

Proposition 4

Let \(M > 0\), \(n \geqslant 5\). Let q be a large prime and let \({\mathcal {F}}={\mathcal {F}}_{\Gamma (q)}(M)\) be the set of cuspidal automorphic forms for \(\Gamma (q) \subseteq \textrm{SL}_n(\mathbb {Z})\) with archimedean spectral parameter \(\Vert \mu \Vert \le M\). Fix a place v of \(\mathbb {Q}\). There exist constants \(K, \delta \) depending only on n, such that

$$\begin{aligned} {\mathcal {N}}_v(\sigma , {\mathcal {F}}) \ll _{v, n} M^K [\textrm{SL}_n(\mathbb {Z}): \Gamma (q)]^{1 - \frac{(2+\delta )\sigma }{n-1}} \end{aligned}$$

for \(\sigma \geqslant 0\).

2 The stratification of Dąbrowski and Reeder

Here we convert results in [6] to conform with our convention. Although nothing in this section is new, it might be convenient for the reader to compile some results.

Let G be a simply connected Chevalley group over \(\mathbb {Q}_p\) with Lie algebra \(\mathfrak g\). Let W be the Weyl group of G, and let \(\Phi \) and \(\Phi ^+\) denote the set of roots and the set of positive roots of \(\mathfrak g\) respectively. We fix a set of simple roots \(\Delta \) of \(\mathfrak g\). For each positive root \(\beta \in \Phi ^+\), there is a natural homomorphism \(\phi _\beta : {\text {SL}}(2) \rightarrow G\). For \(t\in \mathbb {Q}_p\) we write

$$\begin{aligned} x_\beta (t)&= \phi _\beta \begin{pmatrix} 1 &{} t\\ {} &{} 1\end{pmatrix},&x_{-\beta }(t)&= \phi _\beta \begin{pmatrix} 1\\ t &{} 1\end{pmatrix}. \end{aligned}$$

Through the canonical bijection \(\beta \longleftrightarrow {\check{\beta }}\) between the set of roots \(\Phi \) and the set of coroots \({\check{\Phi }}\), we have

$$\begin{aligned} {\check{\beta }}(c) = \phi _\beta \begin{pmatrix} c \\ {} &{} c^{-1} \end{pmatrix} \end{aligned}$$

for \(c\in \mathbb {Q}_p^\times \). For \(m\in \mathbb {Z}\), we write \(m{\check{\beta }}\) for \({\check{\beta }}(p^m)\). This induces a natural embedding

$$\begin{aligned} {\check{X}}:= {\text {Hom}}(\mathbb {G}_m, T) \hookrightarrow T \end{aligned}$$

from the set of cocharacters of T into the maximal torus T. If \(\lambda = -\sum _{\beta \in \Delta } r_\beta {\check{\beta }} \in {\check{X}}\) for \(r_\beta \in \mathbb {N}_0\), we define the height of \(\lambda \) to be

$$\begin{aligned} {\text {ht}}(\lambda ):= \sum _{\beta \in \Delta } r_\beta . \end{aligned}$$
(2.1)

For \(w\in W\), we write

$$\begin{aligned} R(w):= \left\{ {\beta \in \Phi ^+}\; \bigg |\; {w\beta \in -\Phi ^+}\right\} . \end{aligned}$$

Then, if \(w = s_{\beta _1}\ldots s_{\beta _l}\) is a reduced expression of w as a product of simple reflections, then

$$\begin{aligned} R(w^{-1}) = \left\{ {\gamma _j:= s_{\beta _1}\ldots s_{\beta _{j-1}} \beta _j}\; \bigg |\; {j = 1,\ldots , l}\right\} . \end{aligned}$$
(2.2)

For \(a\in \mathbb {Q}_p\) we set

$$\begin{aligned} \mu (a)&:= \max \left\{ {0, -v_p(a)}\right\} ,&a^*&:= {\left\{ \begin{array}{ll} 0 &{} \text { if } a \in \mathbb {Z}_p,\\ p^{-2v_p(a)} a^{-1} &{} \text { if } a\not \in \mathbb {Z}_p.\end{array}\right. } \end{aligned}$$

In particular, if \(a \not \in \mathbb {Z}_p\), say \(a = cp^{-m}\) for \(m\ge 1\), \(\varepsilon \in \mathbb {Z}_p^\times \), then we have \(a^* = c^{-1} p^{-m}\). This means the map \(a \mapsto a^*\) preserves \(\mu \). For \(\beta \in \Delta \) we define

$$\begin{aligned} b_\beta (a):= x_\beta (a^*) (-\mu (a){\check{\beta }}) \overline{s_\beta } x_\beta (a), \end{aligned}$$
(2.3)

where \(\overline{s_\beta }\) is some fixed representative of the simple reflection \(s_\beta \) in \(N = N_G(T)\). When no confusion can arise, we simply write \(s_\beta \) for \(\overline{s_\beta }\). We observe that \(b_\beta (a) \in G(\mathbb {Z}_p)\) for all \(a\in \mathbb {Q}_p\). Indeed, we have

$$\begin{aligned} b_\beta (a)&= \phi _\beta \begin{pmatrix} &{} -1\\ 1 &{} a\end{pmatrix} \text { for } a \in \mathbb {Z}_p,&b_\beta (a)&= \phi _\beta \begin{pmatrix} c^{-1} &{} \\ p^m &{} c \end{pmatrix} \text { for } a = cp^{-m} \not \in \mathbb {Z}_p. \end{aligned}$$

Let \(l \in \mathbb {N}\), let \(w = s_{\beta _1}\ldots s_{\beta _l}\) be a reduced representation of w as a product of simple reflections, and write \(\underline{\beta }= (\beta _1,\ldots , \beta _l)\). For \(\underline{a} = (a_1,\ldots , a_l) \in \mathbb {Q}_p^l\), let \(b(\underline{a}):= b_{\underline{\beta }}(\underline{a})\) denote the image of \(b_{\beta _1}(a_1) \ldots b_{\beta _l}(a_l)\) in \(U(\mathbb {Z}_p)\backslash G(\mathbb {Q}_p)\). By [6, Proposition 2.1], the map \(b_{\underline{\beta }}:\mathbb {Q}_p^l \rightarrow U(\mathbb {Z}_p)\backslash G(\mathbb {Q}_p)\) is injective, and its image is contained in \(U(\mathbb {Z}_p) \backslash \left( {BwU \cap G(\mathbb {Z}_p)}\right) \), where \(B = TU\) is the standard Borel subgroup of G.

We remark that the map \(b_{\underline{\beta }}\) really depends on the choice of the reduced representation of w. Moreover, \(b_{\underline{\beta }}\) also depends on the choice of the representative \(\overline{s_\beta }\) of \(s_\beta \). If we fix a representative of w in \(G(\mathbb {Z}_p)\), then by convention we choose the representatives such that the toral part of the Bruhat decomposition of \(b_{\underline{\beta }}(\underline{a})\) has positive entries.

For \(\underline{m} = (m_1,\ldots , m_l) \in \mathbb {N}_0^l\), we set

$$\begin{aligned} Y_{\underline{\beta }} (\underline{m}) = \left\{ {\underline{a} = (a_1,\ldots , a_l)\in \mathbb {Q}_p^l}\; \bigg |\; {\mu (a_i) = m_i, \; i = 1,\ldots , l}\right\} . \end{aligned}$$

Let \(\beta \in \Delta \), and

$$\begin{aligned} u = \prod _{\beta \in \Phi ^+} x_\beta (a_\beta ) \in U(\mathbb {Q}_p). \end{aligned}$$

The \(\beta \)-coordinate function \(f_\beta \) is defined by \(f_\beta (u):= a_\beta \). Note that this does not depend on the order of the product. For \(a \in \mathbb {Q}_p\) and \(u\in U(\mathbb {Q}_p)\), we define

$$\begin{aligned} R_\beta ^a(u):= b_\beta (a) u b_\beta (a+f_\beta (u))^{-1}. \end{aligned}$$

Then \(R_\beta ^a(U(\mathbb {Z}_p)) \subseteq U(\mathbb {Z}_p)\). For a sequence \(\underline{\beta }= (\beta _1,\ldots , \beta _l)\) of simple roots, we can construct a right action \(* = *_{\underline{\beta }}: \mathbb {Q}_p^l \times U(\mathbb {Z}_p) \rightarrow \mathbb {Q}_p^l\), \(\underline{a} \mapsto \underline{a} * u =: \underline{a}'\) as follows:

$$\begin{aligned} a'_l&a_l + f_{\beta _l} (u),\\ a'_j&= a_j + f_{\beta _j} \left( {R_{\beta _{j+1}}^{a_{j+1}}R_{\beta _{j+2}}^{a_{j+2}}\ldots R_{\beta _l}^{a_l}(u)}\right) ,{} & {} 1\le j \le l-1. \end{aligned}$$

Then \(b_{\underline{\beta }}: \mathbb {Q}_p^l \rightarrow U(\mathbb {Z}_p)\backslash G(\mathbb {Q}_p)\) is right \(U(\mathbb {Z}_p)\)-equivariant with respect to \(*_{\underline{\beta }}\), that is, we have \(b_{\underline{\beta }}(\underline{a} *_{\underline{\beta }} u) = b_{\underline{\beta }}(\underline{a})u\) for \(u\in U(\mathbb {Z}_p)\) and \(\underline{a} \in \mathbb {Q}_p^l\). It follows that \(b_{\underline{\beta }}\) induces a map

$$\begin{aligned} \underline{b}_{\underline{\beta }}: \mathbb {Q}_p^l / U_w(\mathbb {Z}_p) \rightarrow U(\mathbb {Z}_p) \backslash \left( {BwB \cap G(\mathbb {Z}_p)}\right) / U_w(\mathbb {Z}_p), \end{aligned}$$

where \(\mathbb {Q}_p^l/U_w(\mathbb {Z}_p)\) denotes the set of \(U_w(\mathbb {Z}_p)\)-orbits with respect to the right action \(*_{\underline{\beta }}\).

For any cocharacter \(\lambda \in {\check{X}}\), we define

$$\begin{aligned} Y_\lambda = \coprod \Big \{Y_{\underline{\beta }} (\underline{m}) \mid m \in \mathbb {N}_0^l, \lambda = -\sum _{j=1}^l m_j {\check{\gamma }}_j\Big \}, \end{aligned}$$

where \(\gamma _j\) is defined in (2.2). Now we are able to state the main result in this section.

Lemma 1

([6, Proposition 3.3]) Let \(\lambda \in {\check{X}}\), \(n = \lambda w \in N(\mathbb {Q}_p)\), and let \(w = s_{\beta _1}\ldots s_{\beta _l}\) be a reduced representation of w as a product of simple reflections. Write \(\underline{\beta }= (\beta _1,\ldots ,\beta _l)\). Then \(\underline{b}_{\underline{\beta }}\) gives a bijection between \(Y_\lambda / U_w(\mathbb {Z}_p)\) and the Kloosterman set X(n).

The following proposition gives the trivial bound for Kloosterman sums.

Lemma 2

([6, Proposition 3.4]) Assume the settings above. For \(\underline{m} \in \mathbb {N}_0^l\) we have

$$\begin{aligned} \#\left( {Y_{\underline{\beta }}(\underline{m})/U_w(\mathbb {Z}_p)}\right) = p^{{\text {ht}}(\lambda )} \left( {1-p^{-1}}\right) ^{\kappa (\underline{m})}, \end{aligned}$$

where \(\kappa (\underline{m})\) is the number of nonzero entries in \(\underline{m}\) and \({\text {ht}}(\lambda )\) was defined in (2.1).

3 Proof of Theorem 1

We apply the results from Sect. 2 to our case. Let \(G = {\text {GL}}(n+1)\). A set of simple roots of G is given by \(\Delta = \left\{ {\alpha _1,\ldots ,\alpha _n}\right\} \), where in usual notation \(\alpha _i:= e_i - e_{i+1}\). Using this root basis, the set of positive roots of G is given by

$$\begin{aligned} \Phi ^+ = \left\{ {\alpha _{ij}:= \alpha _i+\alpha _{i+1}+\ldots +\alpha _j}\; \bigg |\; {1\le i \le j \le n}\right\} . \end{aligned}$$

Throughout this section, we use the following reduced representation of \(w_l\):

$$\begin{aligned} w_l = (s_{\alpha _1}\ldots s_{\alpha _n})(s_{\alpha _1}\ldots s_{\alpha _{n-1}})\ldots (s_{\alpha _1}s_{\alpha _2}) s_{\alpha _1}. \end{aligned}$$
(3.1)

Recall the definition of \(\gamma _j\) in (2.2). For the reduced representation (3.1) of \(w_l\), we have

$$\begin{aligned} \underline{\gamma }= \left( {\alpha _{11},\alpha _{12},\ldots ,\alpha _{1n},\alpha _{22},\alpha _{2n},\ldots ,\alpha _{nn}}\right) . \end{aligned}$$

Now we give a characterisation for \(b(\underline{a})\), for \(\underline{a} = (a_{11},\ldots , a_{1n}, a_{22},\ldots ,a_{nn}) \in \mathbb {Q}_p^{ n(n+1)/2}\). Note that every \(a_{ij}\in \mathbb {Q}_p\) can be written uniquely as \(a_{ij} = c_{ij} p^{-m_{ij}}\) with \(m_{ij} \ge 0\), \(c_{ij}\in \mathbb {Z}_p\), and \((c_{ij},p^{m_{ij}}) = 1\).

Lemma 3

Let \(\underline{a} = (a_{11},\ldots ,a_{nn}) \in \mathbb {Q}_p^{n(n+1)/2}\). Write \(a_{ij} = c_{ij} p^{-m_{ij}}\), with \(m_{ij} \ge 0\), \(c_{ij}\in \mathbb {Z}_p\), and \((c_{ij},p^{m_{ij}}) = 1\). Then \(b(\underline{a})\) has a Bruhat decomposition \(b(\underline{a}) = LNR\), where

with

$$\begin{aligned} N_{i(n+2-i)}= & {} (-1)^{(n+1-i)} \frac{\prod _{k=1}^{i-1} p^{m_{k(i-1)}}}{\prod _{k=1}^n p^{m_{ik}}}, \quad (1\le i \le n+1), \\ L_{ij}= & {} \sum _{1\le \delta _1<\ldots< \delta _{j-i} \le j-1} L_{ij}\left( {\delta _1,\ldots ,\delta _{j-i}}\right) , \quad (1\le i< j \le n+1), \\ L_{ij}\left( {\delta _1,\ldots ,\delta _{j-i}}\right)= & {} \frac{\prod _{k=1}^{j-i} c_{\delta _k(i-2+k)} \prod _{k=1}^{j-i} \prod _{t=\delta _{k-1}+1}^{\delta _k-1} p^{m_{t(i-2+k)}}}{\prod _{k=1}^{j-i} c_{\delta _k(i-1+k)} \prod _{k=1}^{\delta _{j-i}} p^{m_{k(j-1)}}}, \\ R_{ij}= & {} \sum _{n+1-i \le \partial _1 \le \cdots \le \partial _{j-i} \le n} R_{ij}\left( {\partial _1,\ldots ,\partial _{j-i}}\right) , \quad (1\le i < j \le n+1), \\ R_{ij} \left( {\partial _1,\ldots ,\partial _{j-i}}\right)= & {} \frac{\prod _{k=1}^{j-i} c_{(n+2-i-k)\partial _k} \prod _{k=\partial _1+1}^n p^{m_{(n+2-i)k}}}{\prod _{k=1}^{j-i} c_{(n+3-i-k)\partial _k} \prod _{k=1}^{j-i} \prod _{t=\partial _k}^{\partial _{k+1}} p^{m_{(n+2-i-k)t}}}. \end{aligned}$$

To interpret the formula above, we set \(c_{ij}:= 1\) and \(m_{ij}:=0\) if the condition \(1\le i \le j \le n\) is not satisfied, and \(\delta _0:= 0\), \(\partial _{j-i+1}:= n\). As a convention, when \(m_{ij} = 0\) for \(1 \le i \le j \le n\), we define \(c_{ij}^{-1}:= 0\) as a formal symbol.

Remarks: The hard part is to find these explicit formulae for the Bruhat decomposition. Once the formulae are given, the proof is a straightforward inductive verification by simply matching terms on both sides of the matrix equation. The indices for N look overly complicated, but are chosen in analogy with the ones in Lemma 5 below. Our application of Lemma 3 to the proof of Theorem 1 will only require the values of \(L_{ij}\) and \(R_{ij}\) on the first off-diagonal, i.e. for \(j-i = 1\) for which the formulae simplify substantially.

Proof

First we justify the definition \(c_{ij}^{-1}:= 0\) as a formal symbol when \(m_{ij} = 0\). We recall the definition of \(b_\beta (a)\) for \(\beta \in \Phi ^+\), \(a\in \mathbb {Q}_p\). We write a as a product \(a=cp^{-m}\), with \(m\ge 0\), \(c\in \mathbb {Z}_p\), and \((c,p^m) = 1\). When \(m\ge 1\), we have

$$\begin{aligned} b_{\beta }(cp^{-m}) = \phi _\beta \left( {\begin{pmatrix} 1 &{} c^{-1}p^{-m}\\ {} &{} 1\end{pmatrix} \begin{pmatrix} &{} -p^m\\ p^m \end{pmatrix} \begin{pmatrix} 1 &{} c p^{-m}\\ {} &{} 1\end{pmatrix}}\right) . \end{aligned}$$
(3.2)

Meanwhile, when \(m=0\) we have

$$\begin{aligned} b_{\beta }(c) = \phi _\beta \left( {\begin{pmatrix} 1 &{} 0\\ {} &{} 1\end{pmatrix} \begin{pmatrix} &{} -1\\ 1 \end{pmatrix} \begin{pmatrix} 1 &{} c\\ {} &{} 1\end{pmatrix}}\right) . \end{aligned}$$

Hence (3.2) also works for \(m=0\) if we treat \(c^{-1}:= 0\) as a formal symbol. As \(b_{\underline{\beta }}(\alpha )\) is a product of such matrices, our convention is justified.

Now we prove the actual formula by induction. For easier manipulation, we assume \(m_{ij}\ge 1\) for all \(1\le i \le j \le n\); when some \(m_{ij} = 0\) we use the convention \(c_{ij}^{-1}:= 0\) to the result. When \(n=1\), the formula reads

$$\begin{aligned} b_{\alpha _1}(a_{11}) = \begin{pmatrix} 1 &{} c_{11}^{-1} p^{-m_{11}}\\ {} &{} 1\end{pmatrix} \begin{pmatrix} &{} p^{-m_{11}}\\ p^{m_{11}} \end{pmatrix} \begin{pmatrix} 1 &{} c_{11} p^{-m_{11}}\\ {} &{} 1\end{pmatrix}, \end{aligned}$$

which is precisely (2.3). For the general case, let

$$\begin{aligned} \underline{\beta }_{[n]}:= \left( {\alpha _1,\ldots ,\alpha _n,\alpha _1,\ldots ,\alpha _{n-1},\ldots ,\alpha _1,\alpha _2,\alpha _1}\right) \end{aligned}$$

denote the reduced representation (3.1). By induction, we have a Bruhat decomposition

$$\begin{aligned} b_{\underline{\beta }_{[n-1]}}(a_{22},\ldots ,a_{nn}) = \begin{pmatrix} L'\\ {} &{} 1\end{pmatrix} \begin{pmatrix} N'\\ {} &{} 1\end{pmatrix} \begin{pmatrix} R'\\ {} &{} 1\end{pmatrix}, \end{aligned}$$

where the entries \(L'\), \(N'\), \(R'\) are given by the formulae above, with indices ij replaced with \((i+1)(j+1)\). By a slight abuse of notation, we shall denote the matrices above also by \(L'\), \(N'\), \(R'\) respectively. On the other hand, it is straightforward to compute that

$$\begin{aligned} \Upsilon := b_{(\alpha _1,\ldots ,\alpha _n)}\left( {a_{11},\ldots ,a_{1n}}\right) = \left( {\begin{array}{ccccc} \frac{1}{c_{11}}\\ p^{m_{11}} &{} \frac{c_{11}}{c_{12}}\\ {} &{} p^{m_{12}} &{} \frac{c_{12}}{c_{13}}\\ {} &{}&{} \ddots &{} \ddots \\ {} &{}&{}&{} p^{m_{1n}} &{} c_{1n}\end{array}}\right) . \end{aligned}$$

So it remains to show that

$$\begin{aligned} LNR = \Upsilon L' N' R' = b_{\underline{\beta }_{[n]}}(\underline{a}) \end{aligned}$$
(3.3)

is indeed a Bruhat decomposition of \(b_{\underline{\beta }_{[n]}}(\underline{a})\). This is a straightforward brute force computation. For convenience, we provide the details. We expand

$$\begin{aligned} (LNR)_{ij}&= \sum _{k,\ell } L_{ik} N_{k\ell } R_{\ell j},&(\Upsilon L' N' R')_{ij}&= \sum _{r,k,\ell } \Upsilon _{ir} L'_{rk} N'_{k\ell } R'_{\ell j}. \end{aligned}$$

As each row of N has exactly one nonzero entry, we can collapse the sum and write

$$\begin{aligned} (LNR)_{ij} = \sum _k L_{ik} N_{k(n+2-k)} R_{(n+2-k)j} = \sum _k \sum _{\delta ,\partial } L_{ik}(\delta ) R_{(n+2-k)j}(\partial ). \end{aligned}$$
(3.4)

By the same argument, we write

$$\begin{aligned} (\Upsilon L'N'R')_{ij}= & {} \frac{c_{1(i-1)}}{c_{1i}} \left( {\sum _{k=1}^n L'_{ik} N'_{k(n+1-k)} R'_{(n+1-k)j} + L'_{i(n+1)} R'_{(n+1)j}}\right) \nonumber \\{} & {} + p^{m_{1(i-1)}} \left( {\sum _{k=1}^n L'_{(i-1)k} N'_{k(n+1-k)} R'_{(n+1-k)j} \!+\! L'_{(i-1)(n+1)} R'_{(n+1)j}}\right) .\nonumber \\ \end{aligned}$$
(3.5)

Since

$$\begin{aligned} L'_{i(n+1)} R'_{(n+1)j} = {\left\{ \begin{array}{ll} 1 &{} \text { if } i=j=n+1,\\ 0 &{} \text { otherwise,}\end{array}\right. } \end{aligned}$$

it follows that for \(1\le i,j \le n\) we have

$$\begin{aligned} \begin{aligned} (\Upsilon L'N'R')_{ij}&= \;\frac{c_{1(i-1)}}{c_{1i}} \sum _{k=1}^n L'_{ik} N'_{k(n+1-k)} R'_{(n+1-k)j}\\&\quad + p^{m_{1(n-1)}}\sum _{k=1}^n L'_{(i-1)k} N'_{k(n+1-k)} R'_{(n+1-k)j}\\&= \;\frac{c_{1(i-1)}}{c_{1i}} \sum _{k=1}^n \sum _{\delta ', \partial '} L'_{ik}(\delta ') N'_{k(n+1-k)} R'_{(n+1-k)j}(\partial ')\\&\quad + p^{m_{1(n-1)}}\sum _{k=1}^n \sum _{\delta '', \partial ''} L'_{(i-1)k}(\delta '') N'_{k(n+1-k)} R'_{(n+1-k)j}(\partial ''), \end{aligned} \end{aligned}$$

where

$$\begin{aligned} L'_{ij}(\delta )&= \frac{\prod _{k=1}^{j-i} c_{(\delta _k+1)(i-1+k)} \prod _{k=1}^{j-i} \prod _{t=\delta _{k-1}+1}^{\delta _k-1} p^{m_{(t+1)(i-1+k)}}}{\prod _{k=1}^{j-i} c_{(\delta _k+1)(i+k)} \prod _{k=1}^{\delta _{j-i}} p^{m_{(k+1)j}}},&R'_{ij}(\partial )&= R_{ij}(\partial ). \end{aligned}$$

It is then straightforward to verify that

$$\begin{aligned}{} & {} L_{ik}(1,\delta _2,\ldots ,\delta _{k-i}) N_{k(n+2-k)} R_{(n+2-k)j}(\partial )\\{} & {} \quad = \frac{c_{1(i-1)}}{c_{1i}} L'_{i(k-1)}(\delta _2-1,\ldots ,\delta _{k-i}-1) N_{(k-1)(n+2-k)} R'_{(n+2-k)j}(\partial ), \end{aligned}$$

and

$$\begin{aligned}{} & {} L_{ik}(\delta _1,\ldots ,\delta _{k-i}) N_{k(n+2-k)} R_{(n+2-k)j}(\partial ) \\{} & {} \quad = p^{m_{1(i-1)}} L'_{(i-1)(k-1)}(\delta _1-1,\ldots ,\delta _{k-i}-1) N_{(k-1)(n+2-k)} R'_{(n+2-k)j}(\partial ) \end{aligned}$$

if \(\delta _1\ge 2\). Matching the terms with (3.4) yields \((LNR)_{ij} = (\Upsilon L'N'R')_{ij}\) for \(1\le i,j\le n\).

Now consider the case where \(1\le i \le n\), \(j=n+1\). From (3.5), we deduce that

$$\begin{aligned} \begin{aligned} (\Upsilon L' N' R')_{i(n+1)} = 0, \quad 1\le i \le n. \end{aligned} \end{aligned}$$

It remains to show that \((LNR)_{i(n+1)} = 0\) for \(1\le i \le n\). By straightforward computation, we have

$$\begin{aligned}{} & {} L_{ik}(\delta _1,\ldots ,\delta _{k-i}) N_{k(n+2-k)} R_{(n+2-k)j}(\partial _1,\ldots ,\partial _{k-1})\\{} & {} \quad = -L_{ik}(\delta _1,\ldots ,\delta _{k-i+1}) N_{(k+1)(n+1-k)} R_{(n+1-k)j}(\partial '_1,\ldots ,\partial '_k), \end{aligned}$$

where \(\partial '_1 = k\), \(\partial '_\ell = \max \left\{ {k,\partial _{\ell -1}}\right\} \) for \(2\le \ell \le k\), and \(\delta _{k-i+1} = k + \sum _{\ell =1}^{k-1} \partial _\ell - \sum _{\ell =2}^k \partial '_\ell \). Putting this back into (3.4) yields \((LNR)_{i(n+1)} = 0\) as desired.

Now consider the case where \(i=n+1\), \(1\le j \le n\). Then (3.4) and (3.5) say

$$\begin{aligned} \begin{aligned} (\Upsilon L' N' R')_{(n+1)j}&= p^{m_{1n}} \sum _{k=1}^n L'_{nk} N'_{k(n+1-k)} R'_{(n+1-k)j} = p^{m_{1(i-1)}} N'_{n1} R'_{1j}.\\ (LNR)_{(n+1)j}&= \sum _{k=1}^{n+1} L_{(n+1)k} N_{k(n+2-k)} R_{(n+2-k)j} = N_{(n+1)1} R_{1j}. \end{aligned} \end{aligned}$$

Since \(r_{1j}(\partial ) = r'_{1j}(\partial )\), and \(N_{(n+1)1} = p^{m_{1n}} N'_{n1}\), it follows that \((\Upsilon L' N' R')_{(n+1)j} = (LNR)_{(n+1)j}\).

Finally, for \(i=j=n+1\), we have

$$\begin{aligned} \begin{aligned} (LNR)_{(n+1)(n+1)} = N_{(n+1)1} R_{1(n+1)} = c_{1n} = (\Upsilon L'N'R')_{(n+1)(n+1)}. \end{aligned} \end{aligned}$$

So (3.3) holds, finishing the proof. \(\square \)

Lemma 4

Assume the settings above. For \(\underline{m} \in \mathbb {N}_0^{n(n+1)/2}\), a complete system of coset representatives for \(Y_{\underline{\beta }}(\underline{m})/U(\mathbb {Z}_p)\) is given by

$$\begin{aligned} \left\{ {(c_{ij}p^{-m_{ij}})_{1\le i\le j\le n}}\; \bigg |\; { c_{ij}\, \Big ({\textrm{mod }} \prod _{k=j}^n p^{m_{ik}}\Big ), (c_{ij},p^{m_{ij}})=1}\right\} . \end{aligned}$$

Remark: The shape of the system is not completely obvious (to us), but once it is given, the verification is somewhat lengthly, but straightforward.

Proof

From Lemma 2 we already know the number of coset representatives needed. So it remains to show that all these coset representatives are inequivalent under right action by \(U(\mathbb {Z}_p)\). Again we argue inductively. The case \(n=1\) is straightforward to verify. Indeed, suppose we have

$$\begin{aligned} \begin{pmatrix} 1 &{} c'_{11}p^{-m_{11}}\\ {} &{} 1\end{pmatrix} = \begin{pmatrix} 1 &{} c_{11}p^{-m_{11}}\\ {} &{} 1\end{pmatrix} \begin{pmatrix} 1 &{} u_{11}\\ {} &{} 1\end{pmatrix} \end{aligned}$$

for some \(u_{11}\in \mathbb {Z}_p\). This actually says

$$\begin{aligned} c'_{11}p^{m_{11}} = c_{11} p^{m_{11}} + u_{11}, \end{aligned}$$

which implies \(u_{11}=0\), and \(c'_{11} = c_{11}\) as desired.

Now we consider the general case. For \(n=r\), we set

$$\begin{aligned} R:= b_{\underline{\beta }}(c_{11}p^{-m_{11}},\ldots ,c_{rr}p^{-m_{rr}}), \quad R':= b_{\underline{\beta }}(c'_{11}p^{-m_{11}},\ldots ,c'_{rr}p^{-m_{rr}}), \end{aligned}$$

and

$$\begin{aligned} u = \left( {{\begin{matrix} 1 &{} u_{11} &{} {\cdot \,\cdot \,\cdot } &{} u_{1r}\\ {} &{} 1 &{} {\cdot \,\cdot \,\cdot } &{} u_{2r}\\ {} &{}&{} \ddots &{} \vdots \\ {} &{}&{}&{} 1\end{matrix}}}\right) , \end{aligned}$$

such that

$$\begin{aligned} R' = Ru. \end{aligned}$$
(3.6)

Removing the final column and the final row of the matrices yields the problem for \(n=r-1\) (with a renaming of variables \(c_{ij}\mapsto c_{(i-1)(j-1)}, c'_{ij}\mapsto c'_{(i-1)(j-1)}\)). By induction, we deduce that the first r rows and columns of R and \(R'\) are identical, and \(u_{ij} = 0\) for all \(1\le i \le j\le r-1\). Using Lemma 3, we deduce that \(c_{ij} = c'_{ij}\) for \(2\le i\le j\le r\).

It remains to consider the final columns of the matrices. The \((1,r+1)\)-th entry of (3.6) reads

$$\begin{aligned} c'_{1r} \prod _{j=1}^r p^{-m_{jr}} = c_{1r} \prod _{j=1}^r p^{-m_{jr}} + \sum _{j=2}^{r+1} u_{(r+2-j)r} c_{jr} \prod _{k=j}^r p^{-m_{kr}}, \end{aligned}$$
(3.7)

where again we set \(c_{(r+1)r}:= 1\). From (3.7), we deduce that \(c'_{1r} = c_{1r}\). It then follows that

$$\begin{aligned} u_{rr}= & {} -R_{1r}^{-1} \left( {u_{(r-1)r} R_{1(r-1)} + \cdots + u_{2r}R_{12} + u_{1r}}\right) \nonumber \\= & {} -\sum _{j=1}^{r-1} u_{jr} \frac{c_{(r+2-j)r}}{c_{2r}} \prod _{k=2}^{r+1-j} p^{m_{kr}}. \end{aligned}$$
(3.8)

Now we turn to the \((2,r+1)\)-th entry of (3.6). It reads

$$\begin{aligned} R'_{2(r+1)} = R_{2(r+1)} + \sum _{j=2}^r u_{jn} R_{2j}. \end{aligned}$$

From Lemma 3, we see that there is exactly one term in \(R_{2(r+1)}\) and \(R'_{2(r+1)}\) that depends on \(c_{1j}\) for some \(j\le r-1\). Since \(c_{ij} = c'_{ij}\) for \(2\le i\le j \le r\), we can remove all the other terms within \(R_{2(r+1)}\) and \(R'_{2(r+1)}\), and obtain

$$\begin{aligned} \frac{c'_{1(r-1)} p^{m_{rr}-m_{1(r-1)}-m_{1r}}}{\prod _{k=2}^{r-1} p^{m_{k(r-1)}}} = \frac{c_{1(r-1)} p^{m_{rr}-m_{1(r-1)}-m_{1r}}}{\prod _{k=2}^{r-1} p^{m_{k(r-1)}}}+ \sum _{j=2}^r u_{jr} R_{2j}. \end{aligned}$$

To show that \(c'_{1(r-1)} = c_{1(r-1)}\), it suffices to prove that

$$\begin{aligned} \sum _{j=2}^r u_{jr} R_{2j} \in p^{m_{rr}} \prod _{k=2}^{r-1} p^{-m_{k(r-1)}} \mathbb {Z}_p. \end{aligned}$$

Using (3.8), we rewrite

$$\begin{aligned} \sum _{j=2}^r u_{jr} R_{2j} = \sum _{j=1}^{r-1} u_{jr} \left( {R_{2j} - \frac{R_{1j} R_{2r}}{R_{1r}}}\right) . \end{aligned}$$
(3.9)

We expand

$$\begin{aligned} R_{2j} = \sum _{n-1\le \partial _1 \le \cdots \le \partial _{j-2} \le n} R_{2j}(\partial ), \end{aligned}$$

where

$$\begin{aligned} R_{2j}(\partial ) =\frac{c_{(r-1)\partial _1} \ldots c_{(r+2-j)\partial _{j-2}}}{c_{r\partial _1}\ldots c_{(r+3-j)\partial _{j-2}}} \frac{1}{p^M} {\left\{ \begin{array}{ll} p^{m_{rr}}, &{}\partial _1 \not = r,\\ 1, &{} \partial _1 = r.\end{array}\right. } \end{aligned}$$

where

$$\begin{aligned} M= & {} m_{(r-1)\partial _1} + \cdots + m_{(r-1)\partial _2} + m_{(r-2)\partial _2} + \cdots + m_{(r-2)\partial _3} + \cdots \\{} & {} +m_{(r+2-j) \partial _{j-2}} + \cdots + m_{(r+2-j)r}. \end{aligned}$$

For \(0\le l \le j-2\), we write \(\partial _{[j]}(l) = (\partial _1,\ldots ,\partial _{j-2})\), with \(\partial _k = r-1\) if \(k\le l\), and \(\partial _k = r\) otherwise. From (3.8), it is easy to check that

$$\begin{aligned} \frac{R_{1j}R_{2r}(\partial _{[r]}(k))}{R_{1r}} \in p^{m_{rr}} \prod _{k=2}^{r-1} p^{-m_{k(r-1)}} \mathbb {Z}_p. \end{aligned}$$

for \(k\ge j-1\). On the other hand, we verify that

$$\begin{aligned} R_{2j}(\partial _{[j]}(k)) = \frac{R_{1j}R_{2r}(\partial _{[r]}(k))}{R_{1r}} \end{aligned}$$

for \(0\le j \le k-2\). So we conclude that

$$\begin{aligned} R_{2j} - \frac{R_{1j}R_{2r}}{R_{1r}} \in p^{m_{rr}} \prod _{k=2}^{r-1} p^{-m_{k(r-1)}} \mathbb {Z}_p \end{aligned}$$

for \(1\le j \le r-1\). The claim then follows from (3.9).

By similar arguments, we proceed inductively and show that

$$\begin{aligned} \sum _{j=i}^r u_{jr} R_{ij} \in \prod _{k=r+2-i}^n p^{m_{(r+2-i)k}} \prod _{k+2}^{r+1-i} p^{-m_{k(r+1-i)}} \mathbb {Z}_p \end{aligned}$$

for \(3\le i \le r\), and thus \(c'_{1j} = c_{1j}\) for \(1\le j \le r-2\). This finishes the proof of the statement. \(\square \)

Combining the previous computations, we complete the proof of Theorem 1, noting that \(\lambda = -\sum _{j=1}^n r_j \check{\alpha }_j \in \check{X}\), \(r_j \geqslant 0\) corresponds to the components of \(r = (r_1, \ldots , r_n)\). From Lemma 4 we obtain the summation condition in (1.6) and from Lemma 3 the shape of the exponential for two characters as in (1.4).

4 Proof of Theorem 2

The proof of Theorem 2 is similar. We omit the analogous straightforward verification and just write down the relevant formulae. We fix a reduced representation of \(w_{*}\) as follows

$$\begin{aligned} w_{*} = s_{\alpha _1} s_{\alpha _2} \ldots s_{\alpha _{n-1}} s_{\alpha _n} s_{\alpha _{n-1}} \ldots s_{\alpha _2} s_{\alpha _1}. \end{aligned}$$

Recall the definition of \(\gamma _j\) in (2.2). For the reduced representation of \(w_*\), we have

$$\begin{aligned} \underline{\gamma }= \left( {\alpha _{11},\alpha _{12},\ldots ,\alpha _{1n},\alpha _{nn},\ldots ,\alpha _{2n}}\right) . \end{aligned}$$

Now we give a characterisation for \(b(\underline{a})\), for \(\underline{a} = (a_{11},\ldots , a_{1n}, a_{nn},\ldots ,a_{2n}) \in \mathbb {Q}_p^{2n-1}\). Again, every \(a_{ij}\in \mathbb {Q}_p\) can be written uniquely as \(a_{ij} = c_{ij} p^{-m_{ij}}\), with \(m_{ij} \ge 0\), \(c_{ij}\in \mathbb {Z}_p\), and \((c_{ij},p^{m_{ij}}) = 1\).

Lemma 5

Let \(\underline{a} = (a_{11},\ldots ,a_{1n},a_{nn},\ldots ,a_{2n}) \in \mathbb {Q}_p^{2n-1}\). Write \(a_{ij} = c_{ij} p^{-m_{ij}}\), with \(m_{ij} \ge 0\), \(c_{ij}\in \mathbb {Z}_p\), and \((c_{ij},p^{m_{ij}}) = 1\). Then \(b(\underline{a})\) has a Bruhat decomposition \(b(\underline{a}) = LNR\), where

$$\begin{aligned} L= & {} \left( {\begin{matrix} 1 &{}L_{12} &{} L_{13} &{}{\cdot \,\cdot \,\cdot } &{} L_{1(n+1)}\\ {} &{} 1 &{} L_{23} &{} {\cdot \,\cdot \,\cdot } &{} L_{2(n+1)}\\ &{}&{} \ddots &{}&{} \vdots \\ {} &{}&{}&{}1 &{} L_{n(n+1)}\\ {} &{}&{}&{}&{} 1\end{matrix}}\right) , \quad N = \left( {\begin{matrix} &{}&{}&{}&{} N_{1(n+1)}\\ {} &{}N_{22}\\ {} &{}&{}\ddots \\ &{}&{}&{}N_{nn}\\ N_{(n+1)1}\end{matrix}}\right) ,\\ R= & {} \left( {\begin{matrix} 1 &{}R_{12} &{} R_{13} &{} {\cdot \,\cdot \,\cdot } &{} R_{1(n+1)}\\ &{} 1 &{} R_{23} &{} {\cdot \,\cdot \,\cdot } &{} R_{2(n+1)}\\ &{}&{} \ddots &{}&{} \vdots \\ {} &{}&{}&{}1 &{} R_{n(n+1)}\\ {} &{}&{}&{}&{} 1\end{matrix}}\right) , \end{aligned}$$

where

$$\begin{aligned}&N_{1(n+1)} = (-1)^n \prod _{j=1}^n p^{-m_{1j}}, N_{ii} \!=\! - p^{m_{1(i-1)}-m_{in}} \quad (2\le i \le n), N_{(n+1)1} \!=\! \prod _{j=1}^n p^{m_{jn}}, \\&L_{1j} = (-1)^j c_{1(j-1)}^{-1} \prod _{k=1}^{j-1} p^{-m_{1k}} \quad (2\le j\le n), L_{1(n+1)} = c_{11}^{-1} c_{2n}^{-1} \prod _{k=1}^n p^{-m_{kn}}, \\&L_{ij}= (-1)^{j-i+1} \left( {\frac{c_{1(i-1)}}{c_{1(j-1)}} \prod _{k=i}^{j-1} p^{-m_{1k}} + \frac{c_{1i} c_{(i+1)n}}{c_{1(j-1)} c_{in}} p^{m_{1(i-1)}-m_{in}} \prod _{k=i}^{j-1} p^{-m_{1k}}}\right) \\&\quad ( 2\le i < j \le n), \\&L_{i(n+1)} = \frac{c_{1(i-1)}}{c_{1i}c_{(i+1)n}} p^{-m_{1n}} \prod _{k=i+1}^n p^{-m_{kn}} + c_{in}^{-1} p^{m_{1(i-1)}-m_{1n}} \prod _{k=i}^n p^{-m_{in}} \quad (2\le i \le n), \\&R_{1j} = c_{jn} \prod _{k=2}^j p^{-m_{kn}}\quad (2\le j \le n), R_{1(n+1)} = c_{1n} \prod _{k=1}^n p^{-m_{kn}}, \\&R_{i(n+1)} = (-1)^{n-i} \left( {\frac{c_{1i} c_{(i+1)n}}{c_{in}} \prod _{k=i}^n p^{-m_{1k}} + c_{1(i-1)} p^{m_{in}} \prod _{k=i-1}^n p^{-m_{1k}}}\right) \quad (2\!\le \! i \!\le \! n). \end{aligned}$$

To interpret the formula above, we set \(c_{ij}:= 1\) and \(m_{ij}:=0\) if the condition \(1\le i \le j \le n\) is not satisfied. As a convention, when \(m_{ij} = 0\) for \(1 \le i \le j \le n\), we define \(c_{ij}^{-1}:= 0\) as a formal symbol.

Proof

Similar as the proof of Lemma 3. \(\square \)

Lemma 6

Assume the settings above. For \(\underline{m} \in \mathbb {N}_0^{2n-1}\), a complete system of coset representatives for \(Y_{\underline{\beta }}(\underline{m})/U(\mathbb {Z}_p)\) is given by

$$\begin{aligned} \begin{aligned} \left\{ {(c_{ij}p^{-m_{ij}})_{\begin{array}{c} i=1, 1\le j \le n\\ 2\le i\le n, j=n \end{array}}}\; \bigg |\; {\begin{array}{l} c_{1j} \left( {\textrm{mod}\,\textstyle \prod _{k=j}^n p^{m_{1k}}}\right) ,\; 1\le j \le n,\\ c_{in} \left( {\textrm{mod}\,\textstyle \prod _{k=2}^i p^{m_{kn}}}\right) ,\; 2\le i \le n,\end{array},\; (c_{ij},p^{m_{ij}})=1}\right\} . \end{aligned} \end{aligned}$$

Proof

Similar as the proof of Lemma 4. \(\square \)

Combining the above results, we complete the proof of Theorem 2.

5 Non-trivial bounds for Kloosterman sums

5.1 General preparation

In this section we prove Corollaries 1 and 2. We first prove Corollary 1. The idea is that the partial Kloosterman sum \(\textrm{Kl}_{p}({\underline{m}}, \psi , \psi ', w_l)\) defined in (1.6) with

$$\begin{aligned} \sum _{i \le k \le j} m_{ij} = r_k, \quad 1 \le k \le n, \end{aligned}$$
(5.1)

is a nested sum of classical \(\textrm{GL}(2)\) Kloosterman sums, for which we have Weil’s bound (1.2) available. We start with a simple lemma.

Lemma 7

Let \(\gamma _1, \gamma _2 \geqslant 0\), \(b_1, b_2 \in \mathbb {Z}\) with \(\min (b_1, b_2) < 0\). Then

$$\begin{aligned} \sum _{c_1 = 1}^{p^{\gamma _1}} \sum _{c_2 = 1}^{p^{\gamma _2}} |c_1 p^{b_1} + c_2 p^{b_2}|^{-1/2}_p \ll p^{\gamma _1+\gamma _2 + \frac{1}{2}\min (b_1, b_2)}. \end{aligned}$$

Proof

The sum on the left hand side equals

$$\begin{aligned} \sum _{\delta _1 = 0}^{\gamma _1} \sum _{\delta _2 = 0}^{\gamma _2} \sum _{\begin{array}{c} c_1 = 1\\ (c_1, p) = 1 \end{array}}^{p^{\gamma _1 - \delta _1}} \sum _{\begin{array}{c} c_2 = 1\\ (c_2, p) = 1 \end{array}}^{p^{\gamma _2 - \delta _2}} |c_1 p^{b_1 + \delta _1} + c_2 p^{b_2 + \delta _2}|^{-1/2}_p. \end{aligned}$$
(5.2)

Suppose for notational simplicity that \(b_2 \le b_1\) (the other case is completely analogous). Let us first assume that \(b_1 + \delta _1 \not = b_2 + \delta _2\). Then the two inner sums are bounded by

$$\begin{aligned} p^{\gamma _1 - \delta _1 + \gamma _2 - \delta _2 + \frac{1}{2}\min (\delta _1 + b_1, \delta _2 + b_2)} \le p^{\gamma _1 + \gamma _2 + \frac{1}{2} b_2 - \delta _1 - \frac{1}{2} \delta _2} \end{aligned}$$

where the second inequality can be seen by distinguishing the cases \(\delta _2 + b_2 \le \delta _1 + b_1\) and \(\delta _2 + b_2 > \delta _1 + b_1\).

Let us now assume \(b_1 + \delta _1 = b_2 + \delta _2\). Then the inner two sums are at most

$$\begin{aligned} \begin{aligned}&p^{\frac{1}{2}(b_2 + \delta _2)} \sum _{\delta \le \max (\gamma _2 - \delta _2, \gamma _1 - \delta _1)} p^{\delta /2} \underset{p^{\delta } \mid c_1 + c_2}{\sum _{ c_1 = 1 }^{p^{\gamma _1 - \delta _1}} \sum _{ c_2 = 1 }^{p^{\gamma _2 - \delta _2}}} 1\\&\quad \le p^{\frac{1}{2}(b_2 + \delta _2)} \sum _{\delta \le \max (\gamma _2 - \delta _2, \gamma _1 - \delta _1)} p^{\delta /2} \Big (p^{\min (\gamma _1 - \delta _1, \gamma _2 - \delta _2)} + \frac{p^{\gamma _1 - \delta _1 + \gamma _2 - \delta _2}}{p^{\delta }}\Big ) \\&\quad \ll p^{\frac{1}{2}(b_2 + \delta _2)}\big (p^{\frac{1}{2}(\gamma _1 + \gamma _2 - \delta _1 - 2\delta _2) } + p^{\gamma _1 + \gamma _2 - \delta _1 - \delta _2}\big ). \end{aligned} \end{aligned}$$

(For \(p = 2\) the \(\delta \)-sum runs up to \(\max (\gamma _2 - \delta _2, \gamma _1 - \delta _1) + 1\).) Thus in all cases we bound (5.2) by

$$\begin{aligned} \ll \sum _{\delta _1 = 0}^{\gamma _1} \sum _{\delta _2 = 0}^{\gamma _2} p^{\gamma _1 + \gamma _2 + \frac{1}{2}\min (b_1, b_2) - \frac{1}{2}(\delta _1 + \delta _2)} \ll p^{\gamma _1 + \gamma _2 + \frac{1}{2}\min (b_1, b_2)}, \end{aligned}$$

and the lemma follows. \(\square \)

We return to the partial Kloosterman sum (1.6) for the long Weyl element. Let

$$\begin{aligned} C_{ij} = p^{m_{ij} + \cdots + m_{in}} \end{aligned}$$
(5.3)

be the modulus of the \(c_{ij}\)-sum, for any \(1 \le i \le j \le n\). For \(j < i\) we put \(C_{ij} = 1\).

Let us fix one variable \(c_{ij}\). Then the \(c_{ij}\)-sum in (1.6) is given by

$$\begin{aligned} \Sigma _{ij}:= \sum _{\begin{array}{c} 1 \le c_{ij} \le C_{ij}\\ (c_{ij}, p^{m_{ij}}) = 1 \end{array}} e(c_{ij} A + {\bar{c}}_{ij}B) \end{aligned}$$
(5.4)

where

$$\begin{aligned} \begin{aligned} A = A_{ij}&= \psi _{j+1} {\bar{c}}_{i, j+1} p^{a_1} + \psi '_{n+1-i}{\bar{c}}_{i+1, j} p^{a_2}, \\ B = B_{ij}&= \psi _j c_{i, j-1} p^{b_1} +\psi '_{n+2-i} c_{i-1, j} p^{b_2} \end{aligned} \end{aligned}$$

with

$$\begin{aligned} \begin{aligned} a_1 = a_1(i, j)&= m_{1j} + \cdots +m_{i-1, j} - m_{1, j+1} - \cdots - m_{i, j+1},\\ a_2 = a_2(i, j)&= m_{i+1, n} + \cdots +m_{i+1, j+1} - m_{i, n} - \cdots - m_{i, j},\\ b_1 = b_1(i, j)&= m_{1,j-1} + \cdots +m_{i-1, j-1} - m_{1, j} - \cdots - m_{i, j},\\ b_2 = b_2(i, j)&= m_{i, n} + \cdots +m_{i, j+1} - m_{i-1, n} - \cdots - m_{i-1, j}. \end{aligned} \end{aligned}$$
(5.5)

Here we apply the following conventions, in this order: if \(m_{ij} = 0\) for some \(1 \le i \le j \le n\), we put \(\overline{c_{ij}} = 0\). If \(j < i\), we put \(m_{ij} = 0\) and \(c_{ij} = 1\) and \(C_{ij} = 1\). If none of the above cases apply, and \(i < 0\) or \(j > n\), we put \(c_{ij} = \overline{c_{ij}}= m_{ij} = 0\) and \(C_{ij} = 1\).

Let us assume

$$\begin{aligned} m_{ij} \not = 0. \end{aligned}$$

Let \(v_p(A) = -\alpha \), \(v_p(B) = -\beta \). Assume without loss of generality \(\alpha \geqslant \beta \), the other case is analogous. If \(\alpha \le 0\), then trivially \(|\Sigma _{ij}| \le C_{ij}\). If \(\alpha > 0\), we extend the range of summation to avoid issues of well-definedness, and obtain by Weil’s bound

$$\begin{aligned} \begin{aligned} |\Sigma _{ij}|&= \Big |p^{-\alpha }\sum _{\begin{array}{c} 1 \le c_{ij} \le C_{ij}p^{\alpha }\\ (c_{ij}, p)= 1 \end{array}} e\Big (\frac{c_{ij} A C_{ij}p^{\alpha } + {\bar{c}}_{ij}B C_{ij} p^{\alpha }}{C_{ij} p^{\alpha }}\Big )\Big | \\&\quad \le 2 p^{-\alpha } (C_{ij}p^{\alpha })^{1/2} (C_{ij}, C_{ij}p^{\alpha -\beta }, C_{ij}p^{\alpha })^{1/2} = 2 C_{ij} p^{-\alpha /2}. \end{aligned} \end{aligned}$$

We conclude in all cases (still assuming \(m_{ij} \not = 0\))

$$\begin{aligned} |\Sigma _{ij}| \le 2C_{ij} \min (1, |A_{ij}|^{-1/2}_p, |B_{ij}|^{-1/2}_p). \end{aligned}$$
(5.6)

Note that this uses no specific information about A and B and holds for any sum of the type (5.4).

From this and the previous lemma we see that

$$\begin{aligned} \begin{aligned} \sum _{c_{i, j-1}} \sum _{c_{i-1, j}} |\Sigma _{ij}|&\le \big (\max _{1 \le j \le n}|\psi _j|_p^{-1/2}\big ) C_{i, j-1} C_{i-1, j} C_{ij} p^{\frac{1}{2}\min (0, b_1(i, j))} \end{aligned} \end{aligned}$$

if \(i \not = 1\) (in which case \(i-1 > 0\)). Note that this continues to hold for \(i = j\) by our general conventions. If \(i = 1\) (in which case \(c_{i-1, j} = 0\)), a similar, but simpler argument confirms the bound, too. Here we dropped potential savings in the exponents \(a_1, a_2, b_2\).

If \(m_{ij} = 0\) we simply estimate trivially, and therefore obtain in all cases the bound

$$\begin{aligned} \begin{aligned} \sum _{c_{i, j-1}} \sum _{c_{i-1, j}} |\Sigma _{ij}|&\le \big (\max _{1 \le j \le n}|\psi _j|_p^{-1/2}\big ) C_{i, j-1} C_{i-1, j} C_{ij} p^{\frac{1}{2}b^{*}_1(i, j)} \end{aligned} \end{aligned}$$
(5.7)

where

$$\begin{aligned} b_*(i, j):= \delta _{m_{ij} \not = 0}\min (0, b_1(i,j)). \end{aligned}$$

5.2 A soft argument

With a view towards possible generalizations we first demonstrate a soft argument. We define an ordering on the set of indices (ij), \(1 \le i \le j \le n\) as follows

$$\begin{aligned} (1, n)< (1, n-1)< \cdots< (1, 1)< (2, n)< \cdots< (2, 2)< \ldots < (n, n). \end{aligned}$$
(5.8)

Let

$$\begin{aligned} \mu _{ij} = \max _{(\alpha , \beta ) < (i, j)} m_{\alpha \beta }. \end{aligned}$$

Then (5.7) implies

$$\begin{aligned} \sum _{\begin{array}{c} c_{\nu \mu }\\ (\nu , \mu ) \not = (i, j) \end{array}} \Big | \sum _{c_{ij}} (...) \Big | \ll \big (\max _{1 \le j \le n} |\psi _j|^{-1/2}_p \big ) \Big (\prod _{1 \le i \le j \le n} C_{ij}\Big ) p^{-m_{ij}/2+ O(\mu _{ij})} \end{aligned}$$

(which is trivially true if \(m_{ij} = 0\)) for any (ij). Choosing the index pair (ij) suitably, we conclude

$$\begin{aligned} \textrm{Kl}_p({\underline{m}}, \psi , \psi ', w_l) \ll \big (\max _{1 \le j \le n} |\psi _j|^{-1/2}_p \big ) p^{r_1 + \cdots + r_n -\delta _0 \max _{i, j} m_{ij}} \end{aligned}$$

for some \(\delta _0 > 0\) (depending on n), from which we easily obtain the statement of Corollary 1, observing that the number of \({\underline{m}}\) for a given vector r is \(O(p^{\varepsilon (r_1 + \cdots + r_n)})\) for every \(\varepsilon > 0\).

5.3 A refined argument

The previous argument uses cancellation only in one index pair (ij). It is very flexible and requires only the ordering (5.8), but no further computations. On the other hand, it gives only a small value of \(\delta \) (exponentially decreasing in n). A more refined argument runs as follows. We partition the index pairs into 4 classes depending on the parity of i and j and obtain

$$\begin{aligned} \textrm{Kl}_{p}({\underline{m}}, \psi , \psi ', w_l) \ll \Big (\prod _{1 \le i \le j \le n} C_{ij}\Big ) \Big (\prod _{\begin{array}{c} 1 \le i \le j \le n\\ i \equiv i_0 \, (\text {mod } 2)\\ j \equiv j_0 \, (\text {mod } 2) \end{array}} p^{\frac{1}{2}b_{*}(i, j)}\Big ) \end{aligned}$$

for \(i_0, j_0 \in \{0, 1\}\). Recall that

$$\begin{aligned} \prod _{1 \le i \le j \le n} C_{ij} = \prod _{1 \le i \le j \le n} p^{(j-i+1)m_{ij}}, \end{aligned}$$

cf. (5.3) and also (1.7). Taking geometric means, we get

$$\begin{aligned} \textrm{Kl}_{p}({\underline{m}}, \psi , \psi ', w_l) \ll \Big (\prod _{1 \le i \le j \le n} C_{ij}\Big ) \Big (\prod _{ 1 \le i \le j \le n} p^{\frac{1}{8}b_{*}(i, j)}\Big ). \end{aligned}$$

We now observe that

$$\begin{aligned} \sum _{1 \le i \le j \le n} (n+1-j)b_1(i, j) = -\sum _{1 \le i \le j \le n} (j-i+1)m_{ij} \end{aligned}$$

and

$$\begin{aligned} \sum _{i_0 = 1}^i b_{*}(i_0, j) \le b_1(i, j). \end{aligned}$$

Taken together, this implies

$$\begin{aligned}{} & {} \sum _{1 \le i \le j \le n} n^2 b_{*}(i, j) \le \sum _{1 \le i \le j \le n} (n+1-j)(n+1-i) b_{*}(i, j)\\{} & {} \quad \le -\sum _{1 \le i \le j \le n} (j-i+1)m_{ij}, \end{aligned}$$

and so

$$\begin{aligned} \textrm{Kl}_{p}({\underline{m}}, \psi , \psi ', w_l) \ll \Big (\prod _{1 \le i \le j \le n} C_{ij}\Big )^{1 - \frac{1}{8n^2}}. \end{aligned}$$

5.4 The element \(w_{*}\)

The proof of Corollary 2 is similar. We apply again a soft argument and use the ordering

$$\begin{aligned} (1n)< (1, n-1)< \cdots< (11)< (2, n)< (3, n)< \cdots < (n, n). \end{aligned}$$

Analyzing (1.8), we see that \(\Sigma _{ij}\) is of the shape (5.4) with

$$\begin{aligned} \begin{aligned} A_{1n}&= \psi '_n\overline{c_{nn}} p^{-m_{1n}}, \quad B_{1n} = \psi _n c_{1(n-1)}p^{-m_{1n}},\\ A_{1(n-1)}&= \psi '_n p^{m_{nn}- m_{1(n-1)} + m_{1n}} + \psi _n\overline{c_{1n}} p^{-m_{1n}}, \quad B_{1(n-1)} = \psi _{n-1}c_{1(n-2)} p^{-m_{1(n-1)}},\\ A_{1 j}&= \psi _{j+1}\overline{c_{1(j+1)}} p^{-m_{1( j+1)}},\quad B_{1j} = \psi _j c_{1(j-1)} p^{-m_{1j}}, \quad 1 \le j \le n-2,\\ A_{2n}&= \psi '_1p^{-m_{2n}}, \quad B_{2n} = \psi _2c_{3n}p^{-m_{12}+ m_{11} - m_{2n}},\\ A_{in}&= \psi _{i-1}\overline{c_{(i-1)n}} p^{-m_{1(i-1)} + m_{1(i-2)} - m_{(i-1)n}}, \\&\quad B_{in} = \psi _ic_{(i+1)n} p^{-m_{1i} + m_{1(i-1)} -m_{in}}, \quad 3 \le i \le n-1,\\ A_{nn}&= \psi _{n-1}\overline{c_{(n-1)n}}p^{-m_{1(n-1)} + m_{1(n-2)} - m_{(n-1)n} },\\&\quad B_{nn} = \psi '_nc_{1n}p^{-m_{1n}}+ \psi _np^{m_{1(n-1)} - m_{1n} - m_{nn}} \end{aligned} \end{aligned}$$

with the same conventions as explained after (5.5).

Arguing as before based on (5.6), we obtain

$$\begin{aligned} \sum _{\begin{array}{c} c_{\nu \mu }\\ (\nu , \mu ) \not = (i, j) \end{array}} \Big | \sum _{c_{ij}} (...) \Big | \ll \big (\max _{1 \le j \le n} |\psi _j|^{-1/2}_p \big ) p^{r_1 + \cdots + r_n -m_{ij}/2 + O(\mu _{ij})} \end{aligned}$$

and conclude the proof as in Sect. 5.2 for some \(\delta > 0\).

We can make this quantitative as in Sect. 5.3. We have

$$\begin{aligned} \begin{aligned}&\sum _{c_{11}}(...) \ll |\psi _1|_p^{-1/2} C_{11} p^{-m_{11}/2}, \\&\quad \sum _{c_{1(j-1)}} \Big | \sum _{c_{1j}} (...)\Big | \ll |\psi _j|^{-1/2}_p C_{1(j-1)} C_{1j} p^{-m_{1j}/2}, \quad 2 \le j \le n,\\&\quad \sum _{c_{(i+1)n}} \Big | \sum _{c_{in}} (...)\Big | \ll |\psi _i|_p^{-1/2} C_{(i+1)n}C_{in} p^{-\delta _{m_{in} \not = 0} (m_{in} + m_{1i} - m_{1(i-1)})/2}, \quad 2 \!\le \! i \!\le \! n-1,\\ \end{aligned} \end{aligned}$$

and by a small variation of Lemma 5.6, we also have

$$\begin{aligned} \sum _{c_{1n}} \Big | \sum _{c_{nn}} (...) \Big | \ll |\psi _n|_p^{-1/2} C_{1n} C_{nn} p^{-\delta _{m_{nn} \not = 0} (m_{nn} + m_{1n} - m_{1(n-1)})/2}. \end{aligned}$$

We put \(b(1, j) = - m_{1j}\), \(b(i, n) = -m_{in} - m_{1n} + m_{1(n-1)}\) for \(i \geqslant 2\), and \(b_{*}(i, j)= \delta _{m_{ij} \not = 0} \min (0, b(i, j))\). We put the \(2n+1\) nodes (ij) with \(i = 1\) or \(j= n\) into the two classes \({\mathcal {C}}_1\) with indices of the form \((1, \text {odd})\), \((\text {even}, n)\) and \({\mathcal {C}}_2\) with indices of the form \((1, \text {even})\), \((\text {odd}, n)\). Then

$$\begin{aligned} \textrm{Kl}_{p}({\underline{m}}, \psi , \psi ', w_{*})\ll & {} \Big (\prod _{i, j} C_{ij} \Big ) \min _{\nu = 1, 2} \Big (\prod _{(i, j) \in {\mathcal {C}}_\nu } p^{\frac{1}{2}b_{*}(i, j)}\Big )\\ {}\le & {} \Big (\prod _{i, j} C_{ij} \Big ) \Big (\prod _{i, j} p^{\frac{1}{4}b_{*}(i, j)}\Big ). \end{aligned}$$

We now observe that

$$\begin{aligned} \sum _{2 \le i \le n} (n+1 - i) b_{*}(i, n) + \sum _{1 \le j \le n} n b_{*}(1, j) \le - \sum _{i, j} (j-i+1)m_{ij}, \end{aligned}$$

and so

$$\begin{aligned} \textrm{Kl}_{p}({\underline{m}}, \psi , \psi ', w_{*}) \ll \Big (\prod _{i, j} C_{ij} \Big )^{1- \frac{1}{4n}}. \end{aligned}$$

6 An exact evaluation

Here we prove Corollary 3. For the vector \(r = (1, \ldots , 1)\), the relevant \({\underline{m}}\) satisfying (5.1) are

  • \(m_{1n} = 1\), \(m_{ij} = 0\) otherwise;

  • \(m_{1k} = m_{k+1, n} = 1\), \(m_{ij} = 0\) otherwise, for some \(1 \le k \le n-1\).

In the first case we obtain

$$\begin{aligned}{} & {} \textrm{Kl}_p({\underline{m}}, \psi , \psi ', w_{*}) =\sum _{c_{11}, \ldots , c_{1(n-2)} \, (\text {mod } p)} \sum _{c_{1(n-1)} \text {(mod } p)}\underset{c_{1n} \text {(mod } p)}{\left. \sum \right. ^{*}}\\{} & {} \quad e\Big (\frac{\psi _n c_{1(n-1)} \overline{c_{1n}}}{p} + \frac{\psi '_n c_{1(n-1)}}{p}\Big ). \end{aligned}$$

The two innermost sums equal p, and so we obtain \(\textrm{Kl}_p({\underline{m}}, \psi , \psi ', w_{*}) = p^{n-1}.\)

In the second case we consider first the case \(2 \le k \le n-1\). Then \(\textrm{Kl}_p({\underline{m}}, \psi , \psi ', w_{*})\) contains the sum

$$\begin{aligned} \sum _{\begin{array}{c} c_{1(k-1)}, c_{1k} \, (\text {mod }p)\\ (c_{1k}, p) = 1 \end{array}} e\Big (\frac{\psi _k c_{1(k-1)} \overline{c_{1k}}}{p}\Big ) = 0. \end{aligned}$$

Finally, if \(k = 1\), we obtain

$$\begin{aligned} \textrm{Kl}_p({\underline{m}}, \psi , \psi ', w_{*}) =\sum _{c_{3n}, \ldots , c_{nn} \, (\text {mod } p)} \underset{c_{11},c_{2n} \text {(mod } p)}{\left. \sum \right. ^{*}} e\Big (\frac{\psi _1 \overline{c_{11}} }{p} + \frac{\psi '_1 c_{2 n}}{p}\Big ). \end{aligned}$$

The two inner sums equal 1, and so \(\textrm{Kl}_p({\underline{m}}, \psi , \psi ', w_{*}) = p^{n-2}.\) This completes the proof.

7 Beyond Sarnak’s density conjecture

We finally prove Proposition 4. This requires some minor modifications in Sections 4 and 5 of [1] that we now describe. We use the notation from [1]. Since q is prime, the argument simplifies a bit, and we need [1, Lemma 4.2] for \((\alpha , \beta ) = (0, 0)\) and (1, 0). For \((\alpha , \beta ) = (0, 0)\) we use it as is, for \((\alpha , \beta ) = (1, 0)\) we make a small improvement. We recall that we need to count \(x'_{ij}, y'_{ij}\) satisfying the size conditions [1, (4.15)] and the congruences [1, (4.17), (4.18)], where in the case \(\beta = 0\) the congruence (4.18) can be written more simply as (4.19). The count for (4.15) is given in (4.16). In order to count the saving imposed by the congruences (4.17), (4.19), we proceed as described after (4.18), but obtain an extra saving for \(y_{1n}'\) from (4.19) of size \(p^{n-3}\). Thus we see that

$$\begin{aligned} \begin{aligned} C_{1, 0}&\le \frac{p^{\frac{1}{6}(n^3 + 3n^2 + 2n - 12) + n(n-1) }}{p^{2(n-2)} \cdot p^{\frac{1}{2}(n-2)(n-3)} \cdot p^{n-2 + \frac{1}{2}(n-1)(n-2) } \cdot p^{n-3}} = {\mathcal {N}}_q q^{(n-1)^2} q^{n+2} \\&\quad \le {\mathcal {N}}_q q^{(n-1)^2} q^{\frac{7}{4}(n-1)} \end{aligned} \end{aligned}$$

for \(n \geqslant 5\). (It is important to have exponent strictly less than 2.) Together with the improved bound of Corollary 2, we now obtain the following variation of [1, Theorem 4.3] under the additional assumptions that \(n \geqslant 5\), q is prime and \(c = (c_1, \ldots , c_{n-1}) = (q^n \gamma _1, \ldots , q^n\gamma _{n-1})\) satisfies \(\gamma _j < q^{2}\) (which implies that only the cases \((\alpha , \beta ) = (0, 0)\) and (1, 0) are relevant in the proof):

$$\begin{aligned} S_{q, w_{*}}^v(M, N, c) \ll q^{\varepsilon } \frac{{\mathcal {N}}_q}{q^{n-1}} \frac{c_1\cdot \cdots \cdot c_{n-1}}{(\gamma _1 \cdot \ldots \cdot \gamma _{n-1})^{\delta }} \big (\gamma _1 \cdots \gamma _{n-1}, q^{\infty }\big )^{3/4 + \delta } \end{aligned}$$
(7.1)

with \(\delta \) as in our Corollary 2. We can and will assume without loss of generality that \(\delta < 1/10\).

With this in hand, we move to the discussion after [1, Lemma 5.1]. The key point is that we can now slightly relax [1, (5.3)] to

$$\begin{aligned} mZ \le K^{-1} (1 + 1/r + R)^{-K} q^{n+1 + \delta _0} \end{aligned}$$

for some sufficiently small \(\delta _0 > 0\) to be chosen in a moment (cf. also [1, (1.4)]). If \(n \geqslant 4\) and \(\delta _0 < 1/10\), then by Remark 2 after [1, Lemma 4.1] we can still conclude that only the trivial Weyl element and \(w_{*}\) give a non-zero contribution. The contribution of the trivial Weyl element is given in [1, (5.4)], for the contribution of the \(w_{*}\) we invoke (7.1) getting

$$\begin{aligned} \ll q^{\varepsilon } \frac{{\mathcal {N}}_q}{q^{n-1}} Z^{2\eta _1} \sum _{ \gamma _1, \ldots , \gamma _{n-1} \ll q^{1+\delta _0}} \frac{(\gamma _1 \cdots \gamma _{n-1}, q^{\infty })}{(\gamma _1\cdots \gamma _{n-1})^{\delta }}\ll {\mathcal {N}}_q Z^{2\eta _1} \end{aligned}$$

if \((1 + \delta _0)(1 - \delta ) < 1\). In this way we obtain an improved version of [1, Proposition 5.2] where (under the current assumptions q prime, \(n \geqslant 5\)) we only need the relaxed condition \(T \le M^{-K} q^{n+1 + \delta _0}\). This can be directly inserted into [1, Corollary 6.11] and completes the proof of Proposition 4.

8 Appendix: the case \(G = \textrm{GL}(4)\)

It might be useful for applications to use the method of proof of Theorem 1 and Corollary 1 to obtain explicit formulae and non-trivial bounds for all Weyl elements in the case \(G = \textrm{GL}(4)\). See [9, Appendix] for a list of the relevant consistency relations, and [7] for a version in terms of Plücker coordinates.

There are 8 Weyl elements. We do not need to talk about the trivial Weyl element and the Voronoi element \(\left( {\begin{matrix} &{} 1\\ I_{3} &{} \end{matrix}}\right) \), which is covered in [8] (with non-trivial bounds following from Deligne’s estimates). The element \(\left( {\begin{matrix} &{} I_{3} \\ 1&{} \end{matrix}}\right) \) is analogous.

(a) For the long Weyl element \(w_l = \left( {\begin{matrix}&{}&{}&{}-1\\ {} &{}&{}1\\ {} &{}-1\\ 1\end{matrix}}\right) \) we obtain by (1.6) that \(\textrm{Kl}_{p}({\underline{m}}, \psi , \psi ', w_l)\) is given by

$$\begin{aligned} \begin{aligned}&\sum _{{\underline{c}} \in {\mathcal {C}}_{w_l}({\underline{m}})} e\Bigg (\frac{\psi _1 \overline{c_{11}}}{p^{m_{11}}} + \psi _2\Big (\frac{c_{11} \overline{c_{12}}}{p^{m_{12}}} + \frac{\overline{c_{22}}}{p^{-m_{11} + m_{12} + m_{22}}}\Big ) \\&\quad + \psi _3\Big (\frac{c_{12} \overline{c_{13}}}{p^{m_{13}}} + \frac{c_{22} \overline{c_{23}}}{p^{-m_{12} + m_{13}+ m_{23}}} + \frac{\overline{c_{33}}}{p^{-m_{12} - m_{22} + m_{13} + m_{23} + m_{33}}}\Big )\\&\quad + \frac{\psi '_{1}c_{33}}{p^{m_{33}}} + \psi '_2\Big (\frac{c_{23}\overline{c_{33}}}{p^{m_{23}}} + \frac{c_{22}}{p^{-m_{33} + m_{23} + m_{22}}}\Big ) \\&\quad + \psi '_3\Big (\frac{c_{13} \overline{c_{23}}}{p^{m_{13}}} + \frac{c_{12} \overline{c_{22}}}{p^{-m_{23} + m_{13} + m_{12}}} + \frac{c_{11}}{p^{-m_{23} - m_{22} + m_{13} + m_{12} + m_{11}}}\Big )\Bigg ) \end{aligned} \end{aligned}$$
(8.1)

where the sum runs over

$$\begin{aligned} \begin{aligned}&c_{11} \, (\text {mod } p^{m_{11} + m_{12} + m_{13}}), \quad c_{12} \, (\text {mod } p^{ m_{12} + m_{13}}), \quad c_{13} \, (\text {mod } p^{ m_{13}}),\\&\quad c_{22} \, (\text {mod } p^{m_{22} + m_{23 }}), \quad c_{23} \, (\text {mod } p^{ m_{23}}), \quad c_{33} \, (\text {mod } p^{ m_{33}}) \end{aligned} \end{aligned}$$

subject to \((c_{ij}, p^{m_{ij}}) = 1\).

(b) For the Weyl element \(w_{*} = \left( {\begin{matrix}&{}&{}&{}-1\\ {} &{}-1\\ {} &{}&{}-1\\ 1\end{matrix}}\right) \) we obtain by (1.8) that \(\textrm{Kl}_{p}({\underline{m}}, \psi , \psi ', w_{*})\) is given by

$$\begin{aligned} \begin{aligned}&\sum _{{\underline{c}} \in {\mathcal {C}}_{w_{*}}({\underline{m}})} e\Bigg (\frac{\psi _1 \overline{c_{11}}}{p^{m_{11}}} + \psi _2\Big (\frac{c_{11} \overline{c_{12}}}{p^{m_{12}}} + \frac{c_{33}\overline{c_{23}}}{p^{-m_{11} + m_{12} + m_{22}}}\Big ) \\&\qquad \qquad \quad + \psi _3\Big (\frac{c_{12} \overline{c_{13}}}{p^{m_{13}}} + \frac{ \overline{c_{33}}}{p^{-m_{12} + m_{13}+ m_{33}}} \Big )\\&\qquad \qquad \quad + \frac{\psi '_{1}c_{23}}{p^{m_{23}}} + \psi '_3\Big (\frac{c_{13} \overline{c_{33}}}{p^{m_{13}}} + \frac{c_{12} }{p^{-m_{33} + m_{13} + m_{12}}} \Big )\Bigg ) \end{aligned} \end{aligned}$$
(8.2)

where the sum runs over

$$\begin{aligned} \begin{aligned}&c_{11} \, (\text {mod } p^{m_{11} + m_{12} + m_{13}}), \quad c_{12} \, (\text {mod } p^{ m_{12} + m_{13}}), \quad c_{13} \, (\text {mod } p^{ m_{13}}),\\&\quad c_{23} \, (\text {mod } p^{ m_{33}}), \quad c_{33} \, (\text {mod } p^{ m_{23} + m_{33}}) \end{aligned} \end{aligned}$$

subject to \((c_{ij}, p^{m_{ij}}) = 1\). These two cases are illustrative examples for the general results presented in the body of the paper; Corollaries 1 and 2 establish non-trivial bounds.

(c) We are left with three remaining Weyl elements. We first consider \(w = \left( {\begin{matrix}&{}&{}1\\ {} &{}&{}&{}1\\ 1&{}&{}\\ {} &{}1\end{matrix}}\right) \). Here we choose the representative \(w = s_{\alpha _2} s_{\alpha _1} s_{\alpha _3} s_{\alpha _2}\), so that

$$\begin{aligned} \begin{aligned} \underline{\gamma }= \left( {\alpha _{22},\alpha _{12},\alpha _{23},\alpha _{13}}\right) . \end{aligned} \end{aligned}$$

We then obtain for \(\textrm{Kl}_{p}({\underline{m}}, \psi , \psi ', w )\) by similar computations the formula

$$\begin{aligned} \begin{aligned}&\underset{\begin{array}{c} c_{22} \, (\text {mod } p^{m_{12} + m_{22} + m_{23}})\\ c_{12} \, (\text {mod } p^{m_{12} + m_{13}})\\ c_{23} \, (\text {mod } p^{m_{13} + m_{23}})\\ c_{13} \, (\text {mod } p^{m_{13}}) \end{array}}{\left. \sum \right. ^{*}} e\Bigg (\psi _1 \Big (\frac{c_{22}\overline{c_{12}}}{p^{m_{12}}} + \frac{c_{23} \overline{c_{13}}}{p^{-m_{22} + m_{12} + m_{13}}}\Big ) + \frac{\psi _2\overline{c_{22}}}{p^{m_{22}}} \\&\qquad \qquad \qquad \qquad \qquad + \psi _3\Big (\frac{c_{22}\overline{c_{23}}}{p^{m_{23}}} + \frac{c_{12}\overline{c_{13}}}{p^{-m_{22} + m_{13} + m_{23}}}\Big ) + \frac{\psi _2'c_{13}}{p^{m_{13}}}\Bigg ) \end{aligned} \end{aligned}$$

where \(\sum ^{*}\) indicates the usual coprimality condition \((c_{ij}, p^{m_{ij}}) = 1\).

Here we obtain a saving relative to the trivial bound as follows: the \(c_{13}\)-sum and the \(c_{22}\)-sum save \(O ( |\psi '_2|^{-1/2}_p p^{-m_{13}/2})\) and \(O( |\psi _2|^{-1/2}_p p^{-m_{22}/2})\) respectively. If \(p \not \mid c_{22}\), an analogous argument works for the indices (12), (23), and if \(m_{22} = 0\) (which is the only situation in which we can have \(p \mid c_{22}\)), then the \(c_{22}\)-sum implies

$$\begin{aligned} \psi _1 \overline{c_{12}} p^{m_{23}} + \psi _3 \overline{c_{23}} p^{m_{12}} \equiv 0 \, (\text {mod } p^{m_{12} + m_{23}}). \end{aligned}$$

Thus in all cases we get a saving of

$$\begin{aligned} \max (|\psi _1|_p^{-1/2}, |\psi _2|_p^{-1/2}, |\psi _3|_p^{-1/2}, |\psi _1'|_p^{-1/2}) p^{-\frac{1}{2}\max (m_{12}, m_{13}, m_{22}, m_{23})}, \end{aligned}$$

so that one can choose \(\delta < 1/16\) in Corollary 2 below. This is just for concreteness—it is easy to improve the numerical value.

(d) Finally we treat the Weyl element \(w = \left( {\begin{matrix}&{}&{} &{} -1\\ {} &{}&{}1&{}\\ 1&{}&{}\\ {} &{}1\end{matrix}}\right) \), the Weyl element \( \left( {\begin{matrix}&{}&{}1 &{} \\ {} &{} &{} &{}1&{}\\ {} &{}-1&{}\\ 1&{}\end{matrix}}\right) \) being analogous. Here we choose a representative \(w = s_{\alpha _1} s_{\alpha _2} s_{\alpha _3} s_{\alpha _1} s_{\alpha _2}\), so that

$$\begin{aligned} \begin{aligned} \underline{\gamma }= \left( {\alpha _{11}, \alpha _{12}, \alpha _{13}, \alpha _{22}, \alpha _{23}}\right) . \end{aligned} \end{aligned}$$

We then obtain for \(\textrm{Kl}_{p}({\underline{m}}, \psi , \psi ', w )\) the formula

$$\begin{aligned} \begin{aligned} \underset{\begin{array}{c} c_{11} \, (\text {mod } p^{m_{11} + m_{12} + m_{13}})\\ c_{12} \, (\text {mod } p^{m_{12} + m_{13}})\\ c_{13} \, (\text {mod } p^{ m_{13}})\\ c_{22} \, (\text {mod } p^{m_{22} + m_{23}})\\ c_{23} \, (\text {mod } p^{m_{23}}) \end{array}}{\left. \sum \right. ^{*}}&e\Bigg ( \frac{\psi _1\overline{c_{11}}}{p^{m_{11}}} + \psi _2\Big (\frac{c_{11}\overline{c_{12}}}{p^{m_{12}}} + \frac{\overline{c_{22}}}{p^{-m_{11} + m_{12} + m_{22}}}\Big ) \\&\quad + \psi _3\Big (\frac{c_{12} \overline{c_{13}}}{p^{m_{13}}} + \frac{c_{22} \overline{c_{23}}}{p^{-m_{12} + m_{13} + m_{23}}}\Big ) \\&\quad + \frac{\psi _2'c_{23}}{p^{m_{23}}} + \psi _3' \Big (\frac{c_{11}}{p^{-m_{22}-m_{23} + m_{11} + m_{12} + m_{13}}}+ \frac{c_{12} \overline{c_{22}}}{p^{-m_{23} + m_{12} + m_{13}}} + \frac{c_{13} \overline{c_{23}}}{p^{m_{13}}}\Big )\Bigg ). \end{aligned} \end{aligned}$$

Arguing as in Sect. 5 with the ordering

$$\begin{aligned} (13)< (12)< (11)< (23) < (22) \end{aligned}$$

we obtain a saving of

$$\begin{aligned} \ll _{\psi , \psi '} p ^{-\frac{1}{2}\max ( m_{13}, m_{12}, m_{11}, m_{23}, m_{22} - m_{11})} \ll p ^{-\frac{1}{4}\max ( m_{13}, m_{12}, m_{11}, m_{23}, m_{22} )} \end{aligned}$$

so that we can choose \(\delta < 1/36\) in the following corollary. Again it is very easy to improve this numerical value. We conclude

Corollary 5

Let w be a non-trivial Weyl element for \(G = \textrm{GL}(4)\) and \(C = (p^{r_1}, p^{r_2}, p^{r_3})\). Then there exists an absolute constant \(\delta > 0\) such that

$$\begin{aligned} \textrm{Kl}_{p}( \psi , \psi ', C^{*}w) \ll \big (\max _{1 \le j \le n} (|\psi _j|^{-1/2}_p, |\psi '_j|^{-1/2}_p ) \big ) \big ( p^{r_1+r_2+r_3}\big )^{1-\delta }. \end{aligned}$$