1 Introduction

The classical Kloosterman sum is given by

$$\begin{aligned} \begin{aligned} S\left( {m, n; q}\right) = \sum \limits _{\begin{array}{c} x, y \in \mathbb {Z}/q\mathbb {Z}\\ xy \equiv 1\pmod {q} \end{array}}{\text {e}}\left( {\frac{mx + ny}{q}}\right) , \end{aligned} \end{aligned}$$

where \({\text {e}}\left( {x}\right) = e^{2\pi i x}\). Kloosterman sums naturally appear in the Fourier expansion of \({\text {GL}}(2)\) Poincaré series

$$\begin{aligned} \begin{aligned} P_m\left( {z; \nu }\right) = \sum \limits _{\gamma \in \Gamma _\infty \backslash {\text {SL}}\left( {2,\mathbb {Z}}\right) } {\text {Im}}\left( {\gamma z}\right) ^\nu {\text {e}}\left( {m\left( {\gamma z}\right) }\right) , \end{aligned} \end{aligned}$$

which play an important role in number theory. In [2], Bump, Friedberg and Goldfeld introduced \({\text {GL}}(r)\) Poincaré series for \(r\ge 2\), and gave a generalisation of Kloosterman sums to \({\text {GL}}(3)\). The notion of Kloosterman sums was then generalised to \({\text {GL}}(r)\) for \(r\ge 2\) by Friedberg [9], and then to arbitrary simply connected Chevalley groups by Dąbrowski [5].

By methods of algebraic geometry, Weil [16] obtained a bound for \({\text {GL}}(2)\) Kloosterman sums

$$\begin{aligned} \begin{aligned} \left| {S\left( {m, n; q}\right) } \right| \ll \tau \left( {q}\right) \left( {m, n, q}\right) ^{1/2} q^{1/2}, \end{aligned} \end{aligned}$$

where \(\tau \) denotes the divisor function. However, it remains a major open problem to give non-trivial bounds for Kloosterman sums in general, and currently only a small set of examples can be treated. Bounds for \({\text {GL}}(3)\) Kloosterman sums were first obtained by Larsen [2, Appendix] and Stevens [14], and were improved by Dąbrowski and Fisher [7]. Bounds for \({\text {GL}}(4)\) Kloosterman sums were given by Huang [10, Appendix]. Friedberg [9] generalised the results to \({\text {GL}}(r)\) Kloosterman sums attached to certain Weyl elements. On reductive groups, Dąbrowski and Reeder [8] gave the size of Kloosterman sets, establishing a trivial bound for Kloosterman sums on reductive groups.

Other than Poincaré series, another application of Kloosterman sums is found in the relative trace formula, which integrates an automorphic kernel over two subgroups with their respective characters. In particular, a prime application for bounds of Kloosterman sums is the analysis of the arithmetic side of the Petersson/Kuznetsov spectral summation formula. A more detailed description of this can be found in [4].

Now we introduce the main results. Let

$$\begin{aligned} \begin{aligned} G = {\text {Sp}}(2r)&= \left\{ { M \in {\text {GL}}(2r) }\;\big |\;{ M^T JM = J}\right\} ,&J&= \begin{pmatrix} &{} I_n\\ -I_n\end{pmatrix} \end{aligned} \end{aligned}$$

be the standard symplectic group. When k is a field, G(k) is the group of linear transformations of \(k^{2r}\) preserving a symplectic bilinear form on \(k^{2r}\). The standard torus and the standard unipotent subgroup of G are given by

$$\begin{aligned} \begin{aligned} T&= \left\{ {\begin{pmatrix} T_0\\ &{} T_0^{-1}\end{pmatrix} \in G}\;\big |\;{T_0 \text { diagonal}}\right\} , \\ U&= \left\{ {\begin{pmatrix} U_0 &{} S\\ &{} (U_0^{-1})^T \end{pmatrix}\in G}\;\big |\;{ U_0 \text { upper triangular, unipotent}}\right\} \end{aligned} \end{aligned}$$

respectively. We denote by \(N = N_G(T)\) the normaliser of T in G. The Weyl group is given by \(W := N_G(T)/T\). Let \(w: N \rightarrow W\) be the canonical projection map with respect to this decomposition. For \(n\in N\), we also define \(U_n := U \cap n^{-1} U^T n\), and \(\overline{U}_n := U \cap n^{-1} U n\). Note that \(U_n, \overline{U}_n\) depend only on the image w(n) of the canonical projection.

Here we follow the notations in Stevens [14]. Let p be a rational prime. We have a Bruhat decomposition

$$\begin{aligned} \begin{aligned} G\left( {\mathbb {Q}_p}\right) = U\left( {\mathbb {Q}_p}\right) N\left( {\mathbb {Q}_p}\right) U_n\left( {\mathbb {Q}_p}\right) . \end{aligned} \end{aligned}$$

For \(n\in N\left( {\mathbb {Q}_p}\right) \), we define

$$\begin{aligned} \begin{aligned} C(n)&= U\left( {\mathbb {Q}_p}\right) n U\left( {\mathbb {Q}_p}\right) \cap G\left( {\mathbb {Z}_p}\right) ,\\ X(n)&= U\left( {\mathbb {Z}_p}\right) \backslash C(n) / U_n\left( {\mathbb {Z}_p}\right) , \end{aligned} \end{aligned}$$

and projection maps

$$\begin{aligned} \begin{aligned} u: X(n)&\rightarrow U\left( {\mathbb {Z}_p}\right) \backslash U\left( {\mathbb {Q}_p}\right) ,\\ u': X(n)&\rightarrow U\left( {\mathbb {Q}_p}\right) / U_n\left( {\mathbb {Z}_p}\right) \end{aligned} \end{aligned}$$

by the relation \(x = u(x) n u'(x)\) for \(x \in X(n)\).

Remark

In [14], the notions above are defined for \({\text {GL}}(r)\), but it is straightforward to check that the construction also works for \({\text {Sp}}(2r)\), with essentially the same proofs. In particular, X(n) is finite, and the projection maps \(u,u'\) are well-defined.

Let \(n\in N\left( {\mathbb {Q}_p}\right) \), \(\psi _p\) a character of \(U\left( {\mathbb {Q}_p}\right) \) which is trivial on \(U\left( {\mathbb {Z}_p}\right) \), and \(\psi '_p\) a character of \(U_n\left( {\mathbb {Q}_p}\right) \) trivial on \(U_n\left( {\mathbb {Z}_p}\right) \), such that \(\psi '_p\) is the restriction of some character of \(U\left( {\mathbb {Q}_p}\right) \) trivial on \(U\left( {\mathbb {Z}_p}\right) \). Then the local Kloosterman sum is given by

$$\begin{aligned}\begin{aligned} {\text {Kl}}_p\left( {n, \psi _p, \psi '_p}\right) = \sum \limits _{x\in X(n)} \psi _p\left( {u(x)}\right) \psi '_p\left( {u'(x)}\right) . \end{aligned}\end{aligned}$$

If \(\psi '_p\) is given as a character of \(U\left( {\mathbb {Q}_p}\right) \) which is trivial on \(U\left( {\mathbb {Z}_p}\right) \), we write \({\text {Kl}}_p\left( {n, \psi _p, \psi '_p}\right) \) to mean \({\text {Kl}}_p\left( {n, \psi _p, \psi '_p|_{U_n\left( {\mathbb {Q}_p}\right) }}\right) \).

To define a global Kloosterman sum, let \(n \in N(\mathbb {Q})\), \(\psi = \prod \limits _p \psi _p\) a character of \(U(\mathbb {A})\) which is trivial on \(\prod \limits _p U\left( {\mathbb {Z}_p}\right) \), and \(\psi '\) a character of \(U_n\left( {\mathbb {A}}\right) \) trivial on \(\prod \limits _p U_n\left( {\mathbb {Z}_p}\right) \), such that \(\psi '\) is the restriction of some character of \(U\left( {\mathbb {A}}\right) \) trivial on \(\prod \limits _p U\left( {\mathbb {Z}_p}\right) \). Then the global Kloosterman sum is given by

$$\begin{aligned}\begin{aligned} {\text {Kl}}_p\left( {n, \psi , \psi '}\right) = \prod \limits _p {\text {Kl}}_p\left( {n, \psi _p, \psi '_p}\right) . \end{aligned}\end{aligned}$$

Remark

This definition of Kloosterman sums is different from the symplectic Kloosterman sums introduced by Kitaoka [11], which are more relevant for classical \({\text {Sp}}(4)\) Fourier expansions with respect to the upper right 2-by-2 block, which however is not a full parabolic subgroup. Tóth [15] proved some properties and estimates of such Kloosterman sums. The Kloosterman sums introduced here fit into the general framework of Kloosterman sums defined on reductive groups, see e.g. Dąbrowski [5].

For \(G = {\text {Sp}}\left( {4, \mathbb {Q}_p}\right) \), a set of simple roots of G with respect to the maximal torus T is given by \(\Delta = \left\{ {\alpha , \beta }\right\} \), where

$$\begin{aligned}\begin{aligned} \alpha \left( {{\text {diag}}\left( {y_1, y_2, y_1^{-1}, y_2^{-1}}\right) }\right)&= y_1y_2^{-1},&\beta \left( {{\text {diag}}\left( {y_1, y_2, y_1^{-1}, y_2^{-1}}\right) }\right)&= y_2^2. \end{aligned}\end{aligned}$$

Then \(\Psi ^+ = \left\{ {\alpha , \beta , \alpha +\beta , 2\alpha +\beta }\right\} \) is a set of positive roots. We denote by \(s_\alpha \) and \(s_\beta \) the simple reflections in the hyperplane orthogonal to \(\alpha \) and \(\beta \) respectively. Then the Weyl group of G with respect to T is given by

$$\begin{aligned}\begin{aligned} W = \left\{ {1, s_\alpha , s_\beta , s_\alpha s_\beta , s_\beta s_\alpha , s_\alpha s_\beta s_\alpha , s_\beta s_\alpha s_\beta , s_\alpha s_\beta s_\alpha s_\beta }\right\} . \end{aligned}\end{aligned}$$

We fix once and for all representatives for \(s_\alpha \) and \(s_\beta \):

$$\begin{aligned}\begin{aligned} s_\alpha&= \begin{pmatrix} &{}1\\ -1\\ &{}&{}&{}1\\ &{}&{} -1 \end{pmatrix},&s_\beta&= \begin{pmatrix} 1\\ &{}&{}&{}1 \\ &{}&{}1 \\ &{} -1 \end{pmatrix}. \end{aligned}\end{aligned}$$

We also denote the long Weyl element \(s_\alpha s_\beta s_\alpha s_\beta \) by \(w_0\). Characters of \(U\left( {\mathbb {Q}_p}\right) \) trivial on \(U\left( {\mathbb {Z}_p}\right) \) are given by \(\psi _{m_1, m_2}\) for \(m_1, m_2\in \mathbb {Z}\), where

$$\begin{aligned} \psi _{m_1, m_2} \begin{pmatrix} 1&{} x_1&{}*&{}*\\ &{}1 &{} * &{} x_2\\ &{}&{}1\\ &{}&{}-x_1&{}1 \end{pmatrix} ={\text {e}}\left( {m_1x_1+m_2x_2}\right) , \end{aligned}$$
(1.1)

where \({\text {e}}:\mathbb {Q}_p/\mathbb {Z}_p \rightarrow \mathbb {C}^\times \) is the standard additive character satisfying \({\text {e}}(p^{-r}) = e^{2\pi i/p^r}\). For \(w\in W\), \(r,s\in \mathbb {Z}\), we set

$$\begin{aligned} n_{w, r, s} {:}{=} {\text {diag}}\left( {p^{-r}, p^{r-s}, p^r, p^{s-r}}\right) w \in N\left( {\mathbb {Q}_p}\right) . \end{aligned}$$
(1.2)

The exact conditions rs have to satisfy for \(X(n_{w,r,s})\) to be nonempty are given in Sect. 3, but in general we require \(r,s\ge 0\). By counting the number of terms in the Kloosterman sum [8, Theorem 0.3], we obtain a trivial bound

$$\begin{aligned} \begin{aligned} \left| {{\text {Kl}}_p\left( {n_{w,r,s}, \psi , \psi '}\right) } \right| \le p^{r+s}. \end{aligned} \end{aligned}$$

Now we state the main results of the paper. The first result concerns non-trivial bounds for local \({\text {Sp}}(4)\) Kloosterman sums.

Theorem 1.1

Let \(\psi = \psi _{m_1,m_2}\), \(\psi ' = \psi _{n_1,n_2}\) be characters of \(U(\mathbb {Q}_p)/U(\mathbb {Z}_p)\), and rs be non-negative integers. Then we have

$$\begin{aligned} \begin{aligned} {\text {Kl}}_p(n_{{\text {id}},r,s}, \psi , \psi ')&= 1 \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \,\,\,\,\,\,\,\,\,\,\,\, \text {if } r=s=0,\\ \left| {{\text {Kl}}_p(n_{s_\alpha ,r,s}, \psi , \psi ')} \right|&\ll p^{r/2}(m_1,n_1,p^r)^{1/2} \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \,\,\,\,\,\,\,\,\, \text {if } s=0,\\ \left| {{\text {Kl}}_p(n_{s_\beta ,r,s}, \psi , \psi ')} \right|&\ll p^{s/2}(m_2,n_2,p^s)^{1/2} \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \,\,\,\,\,\,\,\,\,\,\qquad \text {if } r=0,\\ \left| {{\text {Kl}}_p\left( {n_{s_\alpha s_\beta , r, s}, \psi , \psi '}\right) } \right|&\ll \min \left\{ {p^{2s} \left( {m_1, p^{r-s}}\right) , p^r \left( {m_2, p^s}\right) ^{1/2} \left( {n_2, p^s}\right) ^{1/2}}\right\} \qquad \qquad \qquad \quad \text {if } s\le r,\\ \left| {{\text {Kl}}_p\left( {n_{s_\beta s_\alpha , r, s}, \psi , \psi '}\right) } \right|&\ll \min \left\{ {p^{3r} \left( {m_2, p^{s-2r}}\right) , p^s \left( {m_1, n_1, p^r}\right) }\right\} \qquad \qquad \qquad \qquad \qquad \qquad \quad \text {if } 2r\le s,\\ \left| {{\text {Kl}}_p\left( {n_{s_\alpha s_\beta s_\alpha , r,s}, \psi , \psi '}\right) } \right|&\ll {\left\{ \begin{array}{ll} p^{\frac{r}{3} + \frac{2s}{3} + \frac{2}{3}\min \left\{ {{\text {ord}}_p(m_1)+s, {\text {ord}}_p(n_1)+r}\right\} + \frac{1}{3}{\text {ord}}_p(m_2)}\\ p^{r+\min \left\{ {{\text {ord}}_p(m_2), r+{\text {ord}}_p(n_1)}\right\} } + p^{r+\min \left\{ {\frac{s}{2}+{\text {ord}}_p(m_1), r-\frac{s}{2}+{\text {ord}}_p(n_1)}\right\} }\\ p^{r+\min \left\{ {{\text {ord}}_p(m_2), r+{\text {ord}}_p(n_1)}\right\} }\end{array}\right. }&\begin{array}{l}\text {if } s\le r,\\ \text {if } r<s<2r,\\ \text {if } s=2r,\end{array}\\ \left| {{\text {Kl}}_p\left( {n_{s_\beta s_\alpha s_\beta ,r,s}, \psi , \psi '}\right) } \right|&\ll {\left\{ \begin{array}{ll} p^{\frac{s}{2} + \frac{r}{2} + \frac{1}{2} {\text {ord}}_p(m_1) + \frac{1}{2}\min \left\{ {2r+{\text {ord}}_p(m_2), s+{\text {ord}}_p(n_2)}\right\} },\\ p^{s-\frac{r}{2}+\frac{1}{2}{\text {ord}}_p(m_1)+\frac{1}{2}\min \left\{ {2r+{\text {ord}}_p(m_2), s+{\text {ord}}_p(n_2)}\right\} } \\ p^{s+\min \left\{ {{\text {ord}}_p(m_1), {\text {ord}}_p(n_2)}\right\} }\end{array}\right. }.&\begin{array}{l}\text {if } r\le s/2,\\ \text {if } s/2<r<s,\\ \text {if } r=s,\end{array}\\ \left| {{\text {Kl}}_p\left( {n_{w_0, r, s}, \psi , \psi '}\right) } \right|&\ll \min \left\{ {p^{\frac{1}{2}{\text {ord}}_p(m_1m_2)},p^{ \frac{1}{2}{\text {ord}}_p(n_1n_2)}}\right\} \left( {s+1}\right) p^{\frac{r}{2} + \frac{3s}{4} + \frac{1}{2}\min \left\{ {r,s}\right\} }. \end{aligned} \end{aligned}$$

Moreover, the Kloosterman sum \({\text {Kl}}_p(n_{w,r,s},\psi ,\psi ')\) vanishes if the conditions on the right are not satisfied.

Now we state the bounds for global \({\text {Sp}}(4)\) Kloosterman sums. Recall that characters of \(U(\mathbb {Q})/U(\mathbb {Z})\) are given by \(\psi _{m_1,m_2}\) for \(m_1,m_2\in \mathbb {Z}\), where

$$\begin{aligned} \begin{aligned} \psi _{m_1,m_2}\begin{pmatrix} 1&{} x_1 &{} *&{}*\\ &{}1&{}*&{}x_2\\ &{}&{}1\\ &{}&{}-x_1&{}1\end{pmatrix} = {\text {e}}\left( {m_1x_1+m_2x_2}\right) . \end{aligned} \end{aligned}$$

For \(w\in W\), \(c_1,c_2\in \mathbb {N}\), let

$$\begin{aligned} \begin{aligned} n_w(c_1,c_2) := {\text {diag}}\left( {c_1^{-1},c_1c_2^{-1},c_1,c_1^{-1}c_2}\right) w \in N(\mathbb {Q}). \end{aligned} \end{aligned}$$

Theorem 1.2

Let \(\psi = \psi _{m_1,m_2}\), \(\psi ' = \psi _{n_1,n_2}\) be characters of \(U(\mathbb {Q})/U(\mathbb {Z})\). For every \(\varepsilon >0\) and \(c_1,c_2\in \mathbb {N}\), the following bounds hold:

$$\begin{aligned} \begin{aligned} {\text {Kl}}(n_{{\text {id}}}(c_1,c_2),\psi ,\psi ')&= 1 \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \quad \, \text {if } c_1=c_2=1,\\ \left| {{\text {Kl}}(n_{s_\alpha }(c_1,c_2),\psi ,\psi ')} \right|&\ll _\varepsilon (m_1,n_1,c_1)^{1/2} c_1^{1/2+\varepsilon } \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \quad \text {if } c_2=1,\\ \left| {{\text {Kl}}(n_{s_\beta }(c_1,c_2),\psi ,\psi ')} \right|&\ll _\varepsilon (m_2,n_2,c_2)^{1/2} c_2^{1/2+\varepsilon } \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \quad \text {if } c_1=1,\\ \left| {{\text {Kl}}(n_{s_\alpha s_\beta }(c_1,c_2),\psi ,\psi ')} \right|&\ll _\varepsilon \left( {c_2^2(m_1,c_1/c_2),c_1(m_2,c_2)^{1/2}(n_2,c_2)^{1/2}}\right) (c_1c_2)^\varepsilon \quad \text {if } c_2\mid c_1,\\ \left| {{\text {Kl}}(n_{s_\beta s_\alpha }(c_1,c_2),\psi ,\psi ')} \right|&\ll _\varepsilon \left( {c_1^3(m_2,c_2/c_1^2),c_2(m_1,n_1,c_1)}\right) (c_1c_2)^\varepsilon \qquad \qquad \qquad \, \text {if } c_1^2\mid c_2,\\ \left| {{\text {Kl}}(n_{s_\alpha s_\beta s_\alpha }(c_1,c_2),\psi ,\psi ')} \right|&\ll _\varepsilon (m_1,n_1,c_1)(m_2,c_2)(c_1,c_2)(c_1c_2)^{1/3+\varepsilon } \qquad \qquad \qquad \,\, \text {if } c_2 \mid c_1^2,\\ \left| {{\text {Kl}}(n_{s_\beta s_\alpha s_\beta }(c_1,c_2),\psi ,\psi ')} \right|&\ll _\varepsilon (m_1,c_1)(m_2,n_2,c_2)(c_1^2,c_2) c_1^{-1/2} c_2^{1/2} (c_1c_2)^\varepsilon \qquad \quad \, \text {if } c_1\mid c_2,\\ \left| {{\text {Kl}}(n_{w_0}(c_1,c_2),\psi ,\psi ')} \right|&\ll _\varepsilon (m_1m_2,n_1n_2,c_1c_2)^{1/2} (c_1,c_2)^{1/2} c_1^{1/2} c_2^{3/4} (c_1c_2)^\varepsilon . \end{aligned} \end{aligned}$$

Moreover, the Kloosterman sum \({\text {Kl}}(n_w(c_1,c_2),\psi ,\psi ')\) vanishes if the conditions on the right are not satisfied.

Theorems 1.1, 1.2 are proved in Sect. 4. We develop a stratification of \({\text {Sp}}(2r)\) Kloosterman sums in Sect. 2, generalising the stratification of \({\text {GL}}(r)\) Kloosterman sums introduced by Stevens [14]. Let

$$\begin{aligned} \begin{aligned} \mathcal T := \left\{ {\begin{pmatrix} A\\ &{} cA^{-1} \end{pmatrix} \in {\text {GL}}\left( {2r, \mathbb {Z}_p}\right) }\;\big |\;{A = {\text {diag}}\left( {a_1, a_2, \ldots , a_r}\right) , a_1, \ldots , a_r, c \in \mathbb {Z}_p^\times }\right\} . \end{aligned}\end{aligned}$$

be a subgroup of diagonal matrices. We will show that for \(n\in N\left( {\mathbb {Q}_p}\right) \), there is a group action \(\mathcal T\times X(n) \rightarrow X(n)\) sending (tx) to \(txs^{-1}\), where \(s = n^{-1}tn\). The Kloosterman sum, as a sum over X(n), can then be partitioned into sums over \(\mathcal T\)-orbits, in Theorem 2.4. The summands are then evaluated using results of Adolphson and Sperber [1] for multi-dimensional exponential sums of Laurent polynomials, as well as the p-adic stationary phase method for higher prime powers.

We give some brief comments on the bounds obtained here. As we shall see in Sect. 3, the Kloosterman sums corresponding \(w=s_\alpha , s_\beta \) are just classical Kloosterman sums, and the bounds given here are just the optimal bounds for classical Kloosterman sums. The Kloosterman sums corresponding to \(w=s_\alpha s_\beta , s_\beta s_\alpha \) can be expressed in terms of exponential sums of Laurent polynomials, whose optimal bounds are also well understood. Meanwhile, the bounds for \(w=s_\alpha s_\beta s_\alpha , s_\beta s_\alpha s_\beta , w_0\), obtained using the stratification technique, are believed to be not optimal, since only the cancellations within individual \(\mathcal T\)-orbits are considered. In particular, we expect square-root cancellation for Kloosterman sums \({\text {Kl}}_p(n_{w_0,r,s},\psi _{m_1,m_2},\psi _{n_1,n_2})\) when \(m_1,m_2,n_1,n_2\) are coprime to p, but this is beyond reach of current methods.

Let \(F: T\left( {\mathbb {R}^+}\right) \rightarrow \mathbb {C}\) be a smooth function with rapid decay. Let \(\psi , \psi '\) be characters of \(U(\mathbb {R})\) trivial on \(U(\mathbb {Z})\). For \(g = uy\in G/K\), where \(u \in U(\mathbb {R})\), \(y \in T\left( {\mathbb {R}^+}\right) \), define \(\mathcal F_\psi (g) := \psi \left( {\eta }\right) F\left( {y}\right) \). The symplectic Poincaré series associated to F is given by

$$\begin{aligned}\begin{aligned} P_\psi (g) = \sum \limits _{\gamma \in P_0 \cap \Gamma \backslash \Gamma } \mathcal F_\psi (\gamma g), \end{aligned}\end{aligned}$$

where \(\Gamma = {\text {Sp}}(2r, \mathbb {Z})\), and \(P_0\) is the standard minimal parabolic subgroup of G. The \(\psi '\)-th Fourier coefficient of \(P_\psi (g)\) is given by

$$\begin{aligned}\begin{aligned} P_{\psi , \psi '} (g) =&\int _{U(\mathbb {Z}) \backslash U(\mathbb {R})} P_\psi \left( {ug}\right) \overline{\psi '} (u) du. \end{aligned}\end{aligned}$$

We compute in Sect. 5 the Fourier coefficients \(P_{\psi , \psi '} (g)\) of the Poincaré series \(P_\psi (g)\), in terms of auxiliary Kloosterman sums, which are also defined in Sect. 5. The bounds given in Theorem 1.1 also apply to these auxiliary Kloosterman sums, via Proposition 5.2.

2 Stratification of symplectic Kloosterman sums

Consider the subgroup of diagonal matrices

$$\begin{aligned}\begin{aligned} \mathcal T := \left\{ {\begin{pmatrix} A\\ &{} cA^{-1} \end{pmatrix} \in {\text {GL}}\left( {2r, \mathbb {Z}_p}\right) }\;\big |\;{A = {\text {diag}}\left( {a_1, a_2, \ldots , a_r}\right) , a_1, \ldots , a_r, c \in \mathbb {Z}_p^\times }\right\} . \end{aligned}\end{aligned}$$

Note that in general elements of \(\mathcal T\) are not symplectic.

Lemma 2.1

Let \(u \in U\left( {\mathbb {Q}_p}\right) \), and \(t\in \mathcal T\). Then \(tut^{-1}\in U\left( {\mathbb {Q}_p}\right) \).

Proof

Trivial.

Lemma 2.2

Let \(n\in N\left( {\mathbb {Q}_p}\right) \), and \(t \in \mathcal T\). Then \(n^{-1} t n \in \mathcal T\).

Proof

Suppose \(t = {\text {diag}}\left( {a_1, \ldots , a_r, ca_1^{-1}, \ldots , ca_r^{-1}}\right) \). Then in general \(n^{-1} t n\) has the form

$$\begin{aligned}\begin{aligned} n^{-1}tn = {\text {diag}}\left( {\tau _{\sigma (1)} \left( {a_{\sigma (1)}}\right) , \ldots , \tau _{\sigma (r)} \left( {a_{\sigma (r)}}\right) , \tau _{\sigma (1)} \left( {ca_{\sigma (1)}^{-1}}\right) , \ldots , \tau _{\sigma (r)} \left( {ca_{\sigma (r)}^{-1}}\right) }\right) , \end{aligned}\end{aligned}$$

where \(\sigma \) is a permutation of \(\left\{ {1,\ldots , r}\right\} \), and \(\tau _i: \left\{ {a_i, ca_i^{-1}}\right\} \rightarrow \left\{ {a_i, ca_i^{-1}}\right\} \) are permutations for \(i = 1,\ldots , r\). Since \(a_i = c\left( {ca_i^{-1}}\right) ^{-1}\), we see that

$$\begin{aligned} n^{-1}tn = {\text {diag}}\left( {\tau _{\sigma (1)} \left( {a_{\sigma (1)}}\right) , \ldots , \tau _{\sigma (r)} \left( {a_{\sigma (r)}}\right) , c\left( {\tau _{\sigma (1)} \left( {a_{\sigma (1)}}\right) }\right) ^{-1}, \ldots , c\left( {\tau _{\sigma (r)} \left( {a_{\sigma (r)}}\right) }\right) ^{-1}}\right) \in \mathcal T. \end{aligned}$$

Let \(x = unu' \in C(n)\), and \(t \in \mathcal T\). By Lemma 2.2, \(s := n^{-1}tn \in \mathcal T\). By Lemma 2.1, we see that

$$\begin{aligned}\begin{aligned} t*x := txs^{-1} = \left( {tut^{-1}}\right) n \left( {su's^{-1}}\right) \in U\left( {\mathbb {Q}_p}\right) n U\left( {\mathbb {Q}_p}\right) \cap G\left( {\mathbb {Z}_p}\right) = C(n). \end{aligned}\end{aligned}$$

As conjugation by t and s preserves \(U\left( {\mathbb {Z}_p}\right) \) and \(U_n\left( {\mathbb {Z}_p}\right) \), this induces an action on \(X(n)= U(\mathbb {Z}_p) \backslash C(n) / U_n(\mathbb {Z}_p)\):

$$\begin{aligned}\begin{aligned} \mathcal T \times X(n)&\rightarrow X(n),&(t, x)&\mapsto t * x. \end{aligned}\end{aligned}$$

For characters \(\psi : U\left( {\mathbb {Q}_p}\right) /U\left( {\mathbb {Z}_p}\right) \rightarrow \mathbb {C}^\times \), \(\psi ': U_n\left( {\mathbb {Q}_p}\right) /U_n\left( {\mathbb {Z}_p}\right) \rightarrow \mathbb {C}^\times \), decomposition of X(n) into \(\mathcal T\)-orbits gives a decomposition of Kloosterman sums:

$$\begin{aligned}\begin{aligned} {\text {Kl}}_p\left( {n, \psi , \psi }\right) = \sum \limits _{x\in \mathcal T\backslash X(n)} \sum \limits _{y \in \mathcal T * x} \psi \left( {u(y)}\right) \psi ' \left( {u'(y)}\right) . \end{aligned}\end{aligned}$$

Let \(\alpha _i = e_i - e_{i+1}\), \(1 \le i \le r-1\), and \(\alpha _r = 2e_r\) be the simple roots of T in G. Denote \(\Delta = \left\{ {\alpha _1, \ldots , \alpha _r}\right\} \), and \(\Delta _w = \left\{ {\alpha \in \Delta }\;\big |\;{w(\alpha )<0}\right\} \). Let \(U_{\alpha _i}(\mathbb {Q}_p) \subseteq U(\mathbb {Q}_p)\) be the root subgroup corresponding to \(\alpha _i\). For \(u\in U(\mathbb {Q}_p)\), we also use \(\alpha _i\) to denote the canonical projection map

$$\begin{aligned}\begin{aligned} \alpha _i: U(\mathbb {Q}_p) \rightarrow U_{\alpha _i}(\mathbb {Q}_p) \simeq \mathbb {Q}_p. \end{aligned}\end{aligned}$$

Explicitly, for \(u = (u_{ij}) \in U(\mathbb {Q}_p)\), the projection maps are given by

$$\begin{aligned}\begin{aligned} \alpha _i(u)&= u_{i,i+1},&1\le i\le r-1,\\ \alpha _r(u)&= u_{r,2r}. \end{aligned}\end{aligned}$$

Characters of \(U\left( {\mathbb {Q}_p}\right) /U\left( {\mathbb {Z}_p}\right) \) have the form

$$\begin{aligned}\begin{aligned} \psi (u) = \psi _{n_1,\ldots , n_r}(u)&:= \prod \limits _{i=1}^r{\text {e}}\left( {n_i \alpha _i(u)}\right) ,&n_i\in \mathbb {Z}, \end{aligned}\end{aligned}$$

where \({\text {e}}: \mathbb {Q}_p/\mathbb {Z}_p\rightarrow \mathbb {C}^\times \) is the standard additive character. For \(x = u(x) n u'(x)\), define projections

$$\begin{aligned}\begin{aligned} \kappa _i (x)&:=\alpha _i(u(x)),&\kappa '_i (x)&:= \alpha _i(u'(x)),&1\le i\le r. \end{aligned}\end{aligned}$$

Note that \(\kappa '_i(x) = 0\) unless \(\alpha _i \in \Delta _w(n)\). For \(u\in U(\mathbb {Q}_p)\), and \(t = {\text {diag}}\left( {a_1, \ldots , a_r, ca_1^{-1}, \ldots , ca_r^{-1}}\right) \in \mathcal T\), we compute that

$$\begin{aligned}\begin{aligned} \alpha _i(tut^{-1})&= a_ia_{i+1}^{-1} \alpha _i(u),&1\le i\le r-1,\\ \alpha _r(tut^{-1})&= c^{-1}a_r^2 \alpha _r(u). \end{aligned}\end{aligned}$$

Suppose \(t = {\text {diag}}(a_1, \ldots , a_r, ca_1^{-1}, \ldots , ca_r^{-1}) \in \mathcal T\), and \(s = n^{-1}tn = {\text {diag}}(a'_1, \ldots , a'_r, {ca'_1}^{-1}, \ldots {ca'_r}^{-1}) \in \mathcal T\). Note from the proof of Lemma 2.2 that we have the same c. Then we have

$$\begin{aligned} {\begin{matrix} \kappa _i \left( {t*x}\right) &{}= a_ia_{i+1}^{-1} \kappa _i(x), \quad 1\le i\le r-1,\\ \kappa _r \left( {t*x}\right) &{}= c^{-1}a_r^2 \kappa _r(x), \end{matrix}} \end{aligned}$$
(2.1)

and

$$\begin{aligned} {\begin{matrix} \kappa '_i \left( {t*x}\right) &{}= a'_i {a'_{i+1}}^{-1} \kappa '_i(x), \quad 1\le i\le r-1,\\ \kappa '_r \left( {t*x}\right) &{}= c^{-1}{a'_r}^2 \kappa '_r(x). \end{matrix}} \end{aligned}$$
(2.2)

For \(\ell \in \mathbb {N}\), we define

$$\begin{aligned}\begin{aligned} A_w(\ell )&:= (\mathbb {Z}/p^\ell \mathbb {Z})^\Delta \times (\mathbb {Z}/p^\ell \mathbb {Z})^{\Delta _w}. \end{aligned}\end{aligned}$$

and

$$\begin{aligned}\begin{aligned} V_w(\ell )&:= \left\{ {(\lambda , \lambda ') \in A_w(\ell )^\times }\;\big |\;{\begin{array}{l} \exists t \in \mathcal T \text { such that } \kappa _i(t*x) = \lambda _i \kappa _i(x), \kappa '_j(t*x) = \lambda '_j \kappa '_j(x)\\ \text {for } x\in X(n), \; 1\le i, j\le r, \; \alpha _j\in \Delta _w \end{array}}\right\} . \end{aligned}\end{aligned}$$

Lemma 2.3

We have \(\left| {V_w(\ell )} \right| = \left( {p^\ell \left( {1-p^{-1}}\right) }\right) ^r\).

Proof

For every \(\lambda \in \left( {(\mathbb {Z}/p^\ell \mathbb {Z})^\times }\right) ^\Delta \), we can find \(t\in \mathcal T\) such that \(\kappa _i(t*x) = \lambda _i\kappa _i(x)\) for \(1\le i\le r\). Using (2.2), we find a unique \(\lambda ' \in \left( {(\mathbb {Z}/p^\ell \mathbb {Z})^\times }\right) ^{\Delta _w}\) such that \(\kappa '_i(t*x) = \lambda '_i\kappa _i(x)\) for \(1\le j\le r\) with \(\alpha _j\in \Delta _w\), and it is straightforward to check that \(\lambda '\) is independent of the choice of \(t\in \mathcal T\). Therefore

$$\begin{aligned}\begin{aligned} \left| {V_w(\ell )} \right| = \left| {\left( {(\mathbb {Z}/p^\ell \mathbb {Z})^\times }\right) ^\Delta } \right| = \left( {p^\ell \left( {1-p^{-1}}\right) }\right) ^r \end{aligned}\end{aligned}$$

as claimed.\(\square \)

For a character \(\theta : A_w(\ell ) \rightarrow \mathbb {C}^\times \), we define

$$\begin{aligned}\begin{aligned} S_w\left( {\theta ; \ell }\right) = \sum \limits _{v \in V_w(\ell )} \theta (v). \end{aligned}\end{aligned}$$

Theorem 2.4

Let \(n\in N\left( {\mathbb {Q}_p}\right) \), and suppose \(\ell \) is large enough such that the matrix entries of \(u(x), u'(x)\) lie in \(p^{-\ell }\mathbb {Z}_p / \mathbb {Z}_p\) for every \(x\in X(n)\). Let \(\psi = \psi _{n_1, \ldots , n_r}: U\left( {\mathbb {Q}_p}\right) /U\left( {\mathbb {Z}_p}\right) \rightarrow \mathbb {C}^\times \) and \(\psi ' = \psi _{n'_1, \ldots , n'_r}|_{U_n\left( {\mathbb {Q}_p}\right) }: U_n\left( {\mathbb {Q}_p}\right) / U_n\left( {\mathbb {Z}_p}\right) \rightarrow \mathbb {C}^\times \) be characters. Define the character \(\theta _x: A_w(\ell ) \rightarrow \mathbb {C}^\times \) by

$$\begin{aligned}\begin{aligned} \theta _x (\lambda , \lambda ') = \prod \limits _{i=1}^r{\text {e}}\left( {\lambda _i n_i \kappa _i(x)}\right) \prod \limits _{\begin{array}{c} i=1\\ w(\alpha _i)<0 \end{array}}^r{\text {e}}\left( {\lambda '_i n'_i \kappa '_i(x)}\right) . \end{aligned}\end{aligned}$$

Then

$$\begin{aligned}\begin{aligned} {\text {Kl}}_p\left( {n, \psi , \psi '}\right) = \left( {p^\ell \left( {1-p^{-1}}\right) }\right) ^{-r} \sum \limits _{x\in \mathcal T\backslash X(n)} \mathfrak N(x) S_w\left( {\theta _x; \ell }\right) , \end{aligned}\end{aligned}$$

where \(\mathfrak N(x) = \left| {\mathcal T*x} \right| \) is the size of \(\mathcal T\)-orbit of \(x\in X(n)\).

Proof

We rewrite the Kloosterman sum

$$\begin{aligned}\begin{aligned} {\text {Kl}}_p\left( {n, \psi , \psi '}\right) =&\sum \limits _{x\in \mathcal T\backslash X(n)} \sum \limits _{y \in \mathcal T * x} \psi \left( {u(y)}\right) \psi ' \left( {u'(y)}\right) \\ =&\sum \limits _{x\in \mathcal T\backslash X(n)} \sum \limits _{y \in \mathcal T * x} \prod \limits _{i=1}^r{\text {e}}\left( {n_i \kappa _i(y)}\right) \prod \limits _{\begin{array}{c} i=1\\ w(\alpha _i)<0 \end{array}}^r{\text {e}}\left( {n'_i \kappa '_i(y)}\right) . \end{aligned}\end{aligned}$$

For \((\lambda ,\lambda ')\in V_w(\ell )\), we can find \(t\in \mathcal T\) such that \(\kappa _i(t*x) = \lambda _i\kappa _i(x)\), \(\kappa '_j(t*x) = \lambda '_i\kappa '_j(x)\) for \(x\in X(n)\), \(1\le i,j\le r\), \(w(\alpha _j)<0\). Hence

$$\begin{aligned}\begin{aligned}&\sum \limits _{y \in \mathcal T * x} \prod \limits _{i=1}^r{\text {e}}\left( {\lambda _i n_i \kappa _i(y)}\right) \prod \limits _{\begin{array}{c} i=1\\ w(\alpha _i)<0 \end{array}}^r{\text {e}}\left( {\lambda '_i n'_i \kappa '_i(y)}\right) \\&\qquad = \sum \limits _{y \in \mathcal T * x} \prod \limits _{i=1}^r{\text {e}}\left( {n_i \kappa _i(t*y)}\right) \prod \limits _{\begin{array}{c} i=1\\ w(\alpha _i)<0 \end{array}}^r{\text {e}}\left( {n'_i \kappa '_i(t*y)}\right) \\&\qquad = \sum \limits _{y \in \mathcal T * x} \prod \limits _{i=1}^r{\text {e}}\left( {n_i \kappa _i(y)}\right) \prod \limits _{\begin{array}{c} i=1\\ w(\alpha _i)<0 \end{array}}^r{\text {e}}\left( {n'_i \kappa '_i(y)}\right) . \end{aligned}\end{aligned}$$

Summing over \(V_w(\ell )\), we have

$$\begin{aligned}\begin{aligned}&\left| {V_w(\ell )} \right| {\text {Kl}}_p(n,\psi ,\psi ') \\&\qquad = \sum \limits _{x\in \mathcal T\backslash X(n)} \sum \limits _{y \in \mathcal T * x} \sum \limits _{(\lambda ,\lambda ') \in V_w(\ell )} \prod \limits _{i=1}^r{\text {e}}\left( {\lambda _i n_i \kappa _i(y)}\right) \prod \limits _{\begin{array}{c} i=1\\ w(\alpha _i)<0 \end{array}}^r{\text {e}}\left( {\lambda '_i n'_i \kappa '_i(y)}\right) \\&\qquad = \sum \limits _{x\in \mathcal T\backslash X(n)} \mathfrak N(x) \sum \limits _{(\lambda ,\lambda ') \in V_w(\ell )} \prod \limits _{i=1}^r{\text {e}}\left( {\lambda _i n_i \kappa _i(y)}\right) \prod \limits _{\begin{array}{c} i=1\\ w(\alpha _i)<0 \end{array}}^r{\text {e}}\left( {\lambda '_i n'_i \kappa '_i(y)}\right) \\&\qquad = \sum \limits _{x\in \mathcal T\backslash X(n)} \mathfrak N(x) S_w\left( {\theta _x; \ell }\right) . \end{aligned}\end{aligned}$$

Dividing both sides by \(\left| {V_w(\ell )} \right| \) yields the statement.

3 \({\text {Sp}}(4)\) Kloosterman sums

Now we give explicit formulations for Kloosterman sums for \(G = {\text {Sp}}\left( {4, \mathbb {Q}_p}\right) \), classified by the image w(n) of the projection onto W. Fix \(\psi = \psi _{m_1, m_2}\), \(\psi ' = \psi _{n_1, n_2}\), with \(\psi _{m_1, m_2}\), \(\psi _{n_1, n_2}\) as in (1.1).

Proposition 3.1

[14, Theorem 3.2] Let \(n\in N\left( {\mathbb {Q}_p}\right) \), and \(\psi : U\left( {\mathbb {Q}_p}\right) /U\left( {\mathbb {Z}_p}\right) \rightarrow \mathbb {C}^\times \), \(\psi ': U_n\left( {\mathbb {Q}_p}\right) / U_n\left( {\mathbb {Z}_p}\right) \rightarrow \mathbb {C}^\times \) be characters. If \(t\in T\left( {\mathbb {Z}_p^\times }\right) \), then

$$\begin{aligned}\begin{aligned} {\text {Kl}}_p\left( {tn, \psi , \psi '}\right)&= {\text {Kl}}_p\left( {n, \psi _t, \psi '}\right) ,\\ {\text {Kl}}_p\left( {nt^{-1}, \psi , \psi '}\right)&= {\text {Kl}}_p\left( {n, \psi , \psi '_t}\right) , \end{aligned}\end{aligned}$$

where \(\psi _t (u) = \psi (tut^{-1})\).

By Proposition 3.1, it suffices to consider Kloosterman sums \({\text {Kl}}_p\left( {n, \psi , \psi '}\right) \) for n such that entries of n are powers of p, and X(n) is nonempty. To express the Kloosterman sums, we express the coset representatives for X(n) in terms of Plücker coordinates, which were introduced in [3, 12]. For \(g = (g_{ij}) \in G = {\text {Sp}}(4,\mathbb {Q}_p)\), we define Plücker coordinates

$$\begin{aligned}\begin{aligned} v_i&:= g_{3,i},&1\le i\le 4,\\ v_{ij}&:= g_{3,i}g_{4,j}-g_{3,j}g_{4,i},&1\le i<j\le 4. \end{aligned}\end{aligned}$$

The Plücker coordinates satisfy the following relations:

$$\begin{aligned} {\begin{matrix} v_iv_{jk}-v_jv_{ik}+v_kv_{ij} &{}= 0, \quad 1\le i<j<k\le 4,\\ v_{13}+v_{24} &{}= 0. \end{matrix}} \end{aligned}$$
(3.1)

Hence, we can associate to every \(g\in G(\mathbb {Q}_p)\) its Plücker coordinates

$$\begin{aligned}\begin{aligned} v = v_g = (v_1, v_2, v_3, v_4; v_{12}, v_{13}, v_{14}, v_{23}, v_{24}, v_{34}) \in \mathbb {Q}_p^{10}. \end{aligned}\end{aligned}$$

It follows from the definition that if \(g\in G(\mathbb {Z}_p)\), then the corresponding Plücker coordinates \(v_g\) are integral, and satisfy the coprimality conditions

$$\begin{aligned} (v_1,v_2,v_3,v_4)&= 1,&(v_{12},v_{13},v_{14},v_{23},v_{24},v_{34})&= 1. \end{aligned}$$
(3.2)

It is proved in [12] that there is a bijection

In particular, this means the coset representatives for X(n) can be described using Plücker coordinates. For notational convenience, for \(n\in N(\mathbb {Q}_p)\) we write \(X^v(n)\) for a complete system of coset representatives of X(n), in terms of Plücker coordinates.

Now we give explicit formulations for Kloosterman sums \({\text {Kl}}_p(n,\psi ,\psi ')\). Note that by Proposition 3.1, it suffices to consider the case \(n = n_{w,r,s}\), where \(n_{w,r,s}\) is given as in (1.2). By looking at the Plücker coordinates, one deduces that \(X(n_{w,r,s})\) is nonempty only if \(r,s\ge 0\). Explicit formulations for \(X^v(n_{w,r,s})\) are obtained by unfolding the conditions (3.1) and (3.2), and are given in [12], and we shall use the results from there directly.

  1. (i)

    \(n = n_{{\text {id}},r,s}\). Then \(X(n_{{\text {id}},r,s})\) is empty unless \(r=s=0\), where \(X(n_{{\text {id}},0,0}) = \left\{ {I_4}\right\} \) is a singleton. So the Kloosterman sum is trivial:

    $$\begin{aligned}\begin{aligned} {\text {Kl}}_p\left( {n_{{\text {id}},0,0}, \psi , \psi '}\right) = 1. \end{aligned}\end{aligned}$$
  2. (ii)

    \(n = n_{s_\alpha ,r,s}\). Then \(X(n_{s_\alpha ,r,s})\) is nonempty when \(s=0\). In this case we have

    $$\begin{aligned}\begin{aligned} X^v(n_{s_\alpha ,r,0}) = \left\{ {(0,0,v_3,p^r;0,0,0,0,0,1)}\right\} , \end{aligned}\end{aligned}$$

    where \(v_3 \pmod {p^r}\) such that \((v_3,p^r) = 1\). The corresponding Kloosterman sum is actually a \({\text {GL}}(2)\) Kloosterman sum:

    $$\begin{aligned}\begin{aligned} {\text {Kl}}_p\left( {n_{s_\alpha ,r,0}, \psi , \psi '}\right) = S\left( {m_1, n_1; p^r}\right) . \end{aligned}\end{aligned}$$
  3. (iii)

    \(n = n_{s_\beta ,r,s}\). Then \(X(n_{s_\beta ,r,s})\) is nonempty when \(r=0\). In this case we have

    $$\begin{aligned}\begin{aligned} X^v(n_{s_\beta ,0,s}) = \left\{ {(0,0,1,0;0,0,0,p^s,0,v_{34})}\right\} , \end{aligned}\end{aligned}$$

    where \(v_{34}\pmod {p^s}\) such that \((v_{34},p^s) = 1\). The corresponding Kloosterman sum is actually a \({\text {GL}}(2)\) Kloosterman sum:

    $$\begin{aligned}\begin{aligned} {\text {Kl}}_p\left( {n_{s_\beta ,0,s}, \psi , \psi '}\right) = S\left( {m_2, n_2; p^s}\right) . \end{aligned}\end{aligned}$$
  4. (iv)

    \(n = n_{s_\alpha s_\beta , r,s}\). Then \(X(n_{s_\alpha s_\beta ,r,s})\) is nonempty when \(r\ge s\). Unfolding the conditions (3.1) and (3.2), we compute

    $$\begin{aligned}\begin{aligned} X^v(n_{s_\alpha s_\beta ,r,s}) = \left\{ {(0,p^r,v_3,v_4;0,0,0,p^s,0,-v_4p^{s-r})}\right\} , \end{aligned}\end{aligned}$$

    where \(v_3,v_4\pmod {p^r}\) such that \((v_4,p^r) = p^{r-s}\) and \((v_3,p^{r-s}) = 1\). We write \(v_4 = v'_4 p^{r-s}\), so \((v'_4,p^s) = 1\). Bruhat decomposition gives

    $$\begin{aligned}\begin{aligned} x =&\begin{pmatrix} 1 &{} \beta _1 &{} \beta _2 &{} \beta _3\\ &{} 1 &{} \beta _4 &{} \beta _5\\ &{}&{}1\\ &{}&{}-\beta _1 &{} 1\end{pmatrix} \begin{pmatrix} &{}&{}&{}-p^{-r}\\ p^{r-s}\\ &{}p^r\\ &{}&{}p^{s-r}\end{pmatrix} \begin{pmatrix} 1 &{}&{}&{} v_3p^{-r}\\ &{}1&{}v_3p^{-r}&{}v'_4 p^{-s}\\ &{}&{}1\\ &{}&{}&{}1\end{pmatrix}\\ =&\begin{pmatrix} \beta _1 p^{r-s} &{} \beta _2p^r &{} \beta _2v_3 + \beta _3 p^{s-r} &{} \beta _2 v'_4 p^{r-s} + \beta _1 v_3 p^{-s} - p^{-r}\\ p^{r-s} &{} \beta _4 p^r &{} \beta _4 v_3 + \beta _5 p^{s-r} &{} \beta _4 v'_4 p^{r-s} + v_3 p^{-s}\\ 0 &{} p^r &{} v_3 &{} v'_4 p^{r-s}\\ 0 &{} -\beta _1 p^r &{} -\beta _1 v_3 + p^{s-r} &{} -\beta _1 v'_4 p^{r-s}\end{pmatrix}. \end{aligned}\end{aligned}$$

    The entry \(-\beta _1 v_3 + p^{s-r}\) being an integer says \(\beta _1 \equiv \overline{v_3} p^{s-r} \pmod {1}\). The entry \(\beta _4 v'_4 p^{r-s} + v_3 p^{-s}\) being an integer says \(\beta _4 \equiv -\overline{v'_4} v_3 p^{-r} \pmod {p^{s-r}}\). Write \(\beta _4 = -\overline{v'_4} v_3 p^{-r} + \gamma _4 p^{s-r}\) for some \(\gamma _4\in \mathbb {Z}\). The entry \(\beta _4 v_3 + \beta _5 p^{s-r}\) being an integer says \(\gamma _4 v_3 + \beta _5 \equiv \overline{v'_4} v_3^2 p^{-s} \pmod {p^{r-s}}\), hence \(\beta _5 \equiv \overline{v'_4} v_3^2 p^{-s} \pmod {1}\). After writing \(v_4\) for \(v'_4\), the Kloosterman sum is given by

    $$\begin{aligned}\begin{aligned} {\text {Kl}}_p \left( {n_{s_\alpha s_\beta , r, s}, \psi , \psi '}\right) = \sum \limits _{\begin{array}{c} v_4 \pmod {p^s}\\ (v_4, p^s) = 1 \end{array}} \sum \limits _{\begin{array}{c} v_3 \pmod {p^r}\\ (v_3, p^{r-s}) = 1 \end{array}}{\text {e}}\left( {\frac{m_1\overline{v_3}p^s}{p^r}}\right) {\text {e}}\left( {\frac{m_2 \overline{v_4} v_3^2 + n_2 v_4}{p^s}}\right) . \end{aligned}\end{aligned}$$
  5. (v)

    \(n = n_{s_\beta s_\alpha ,r,s}\). Then \(X(n_{s_\beta s_\alpha ,r,s})\) is nonempty when \(s\ge 2r\). Unfolding the conditions (3.1) and (3.2), we compute

    $$\begin{aligned}\begin{aligned} X^v(n_{s_\beta s_\alpha ,r,s}) = \left\{ {(0,0,-v_{24}p^{r-s},p^r;0,-v_{24},p^s,-v_{24}p^{-s},v_{24},v_{34})}\right\} , \end{aligned}\end{aligned}$$

    where \(v_{24}, v_{34}\pmod {p^s}\) such that \((v_{24},p^s) = p^{s-r}\) and \((v_{34}, p^{s-2r}) = 1\). We write \(v_{24} = v'_{24} p^{s-r}\), so \((v'_{24},p^r) = 1\). Bruhat decomposition gives

    $$\begin{aligned}\begin{aligned} x =&\begin{pmatrix} 1 &{} \beta _1 &{} \beta _2 &{} \beta _3\\ &{} 1 &{} \beta _4 &{} \beta _5\\ &{}&{}1\\ &{}&{}-\beta _1 &{} 1\end{pmatrix} \begin{pmatrix} &{} p^{-r}\\ &{}&{} p^{r-s}\\ &{}&{}&{} p^r\\ -p^{s-r}\end{pmatrix} \begin{pmatrix} 1 &{} v'_{24} p^{-r} &{} v_{34} p^{-s}\\ &{}1\\ &{}&{}1\\ &{}&{}-v'_{24} p^{-r} &{} 1\end{pmatrix}\\ =&\begin{pmatrix} -\beta _3 p^{s-r} &{} -\beta _3 v'_{24} p^{s-2r} + p^{-r} &{} -\beta _2 v'_{24} - \beta _3 v_{34} p^{-r} + \beta _1 p^{r-s} &{} \beta _2 p^r\\ -\beta _5 p^{s-r} &{} -\beta _5 v'_{24} p^{s-2r} &{} -\beta _4 v'_{24} - \beta _5 v_{34} p^{-r} + p^{r-s} &{} \beta _4 p^r\\ 0 &{} 0 &{} -v'_{24} &{} p^r\\ -p^{s-r} &{} -v'_{24} p^{s-2r} &{} \beta _1 v'_{24} - v_{34} p^{-r} &{} -\beta _1 p^r\end{pmatrix}. \end{aligned}\end{aligned}$$

    The entry \(\beta _1 v'_{24} - v_{34} p^{-r}\) being an integer says \(\beta _1 \equiv \overline{v'_{24}} v_{34} p^{-r} \pmod {1}\). The entry \(\beta _4 p^r\) being an integer says \(\beta _4 = \beta '_4 p^{-r}\) for some \(\beta '_4\in \mathbb {Z}\). The entry \(-\beta _4 v'_{24} - \beta _5 v_{34} p^{-r} + p^{r-s}\) being an integer says \(\beta '_4 v'_{24} + \beta _5 v_{34} \equiv p^{2r-s} \pmod {p^r}\), hence \(\beta _5 \equiv \overline{v_{34}} p^{2r-s} \pmod {1}\). After writing \(v_{24}\) for \(v'_{24}\), the Kloosterman sum is given by

    $$\begin{aligned}\begin{aligned}&{\text {Kl}}_p \left( {n_{s_\beta s_\alpha , r, s}, \psi , \psi '}\right) \\&\qquad = \sum \limits _{\begin{array}{c} v_{24} \pmod {p^r}\\ (v_{24}, p^r) = 1 \end{array}} \sum \limits _{\begin{array}{c} v_{34}\pmod {p^s}\\ (v_{34}, p^{s-2r}) = 1 \end{array}}{\text {e}}\left( {\frac{m_1 \overline{v_{24}} v_{34} + n_1 v_{24}}{p^r}}\right) {\text {e}}\left( {\frac{m_2 \overline{v_{34}} p^{2r}}{p^s}}\right) . \end{aligned} \end{aligned}$$

    Remark. This Kloosterman sum is related to a \({\text {GL}}(3)\) Kloosterman sum. Precisely, following the notation in [2, (4.3)], we have

    $$\begin{aligned}\begin{aligned} {\text {Kl}}_p \left( {n_{s_\beta s_\alpha , r, s}, \psi , \psi '}\right) = p^r S\left( {n_1, m_1, m_2; p^r, p^{s-r}}\right) . \end{aligned}\end{aligned}$$

    A non-trivial bound for \({\text {Kl}}_p \left( {n_{s_\beta s_\alpha , r, s}, \psi , \psi '}\right) \) then follows from Larsen [2, Appendix].

  6. (vi)

    \(n=n_{s_\alpha s_\beta s_\alpha ,r,s}\). Then \(X(n_{s_\alpha s_\beta s_\alpha ,r,s})\) is nonempty when \(2r\ge s\). Unfolding the conditions (3.1) and (3.2), we compute

    $$\begin{aligned}\begin{aligned}&X(n_{s_\alpha s_\beta s_\alpha , r,s}) \\&\qquad = \left\{ {\left( {p^r,v_2,v_3,v_4; 0, -v_2 p^{s-r}, p^s, -v_2^2 p^{s-2r}, v_2 p^{s-r}, (p^rv_3+v_2v_4)p^{s-2r}}\right) }\right\} , \end{aligned}\end{aligned}$$

    where \(v_2, v_3, v_4 \pmod {p^r}\), such that \((v_2,v_3,v_4,p^r) = 1\), and if \(d := (v_2, p^r)\), then \((d^2, p^rv_3+v_2v_4) = p^{2r-s}\). Let \(d = p^{r-a}\). Then a satisfies \(s-r\le a \le s/2\). We write \(v_2 = v'_2 p^{r-a}\), so \((v'_2, p^a) = 1\). Bruhat decomposition gives

    $$\begin{aligned}\begin{aligned} x =&\begin{pmatrix} 1 &{} \beta _1 &{} \beta _2 &{} \beta _3\\ &{} 1 &{} \beta _4 &{} \beta _5\\ &{}&{}1\\ &{}&{}-\beta _1 &{} 1\end{pmatrix} \begin{pmatrix} &{}&{}-p^{-r}\\ {} &{}p^{r-s}\\ p^r\\ &{}&{}&{}p^{s-r}\end{pmatrix} \begin{pmatrix} 1 &{} v'_2 p^{-a} &{} v_3 p^{-r} &{} v_4 p^{-r}\\ &{} 1 &{} v_4 p^{-r} \\ &{}&{}1\\ &{}&{}-v'_2 p^{-a} &{} 1\end{pmatrix}\\ =&\begin{pmatrix} \beta _2 p^r &{} \beta _2 v'_2 p^{r-a} + \beta _1 p^{r-s} &{} \beta _2 v_3 - \beta _3 v'_2 p^{s-a-r} + \beta _1 v_4 p^{-s} - p^{-r} &{} \beta _2 v_4 + \beta _3 p^{s-r}\\ \beta _4 p^r &{} \beta _4 v'_2 p^{r-a} + p^{r-s} &{} \beta _4 v_3 - \beta _5 v'_2 p^{s-a-r} + v_4 p^{-s} &{} \beta _4 v_4 + \beta _5 p^{s-r}\\ p^r &{} v'_2 p^{r-a} &{} v_3 &{} v_4\\ -\beta _1 p^r &{} -\beta _1 v'_2 p^{r-a} &{} -\beta _1 v_3 - v'_2 p^{s-a-r} &{} -\beta _1 v_4 + p^{s-r}\end{pmatrix}. \end{aligned}\end{aligned}$$

    The entry \(-\beta _1 v'_2 p^{r-a}\) being an integer says \(\beta _1 = \beta '_1 p^{a-r}\) for some \(\beta '_1 \in \mathbb {Z}\). Entries \(-\beta _1 v_3 - v'_2 p^{s-a-r}\) and \(-\beta _1 v_4 + p^{s-r}\) being integers says

    $$\begin{aligned} \beta '_1 v_3&\equiv -v'_2 p^{s-2a} \pmod {p^{r-a}},&\beta '_1 v_4&\equiv p^{s-a} \pmod {p^{r-a}}. \end{aligned}$$
    (3.3)

    As \(\left( {v_3, v_4, p^{r-a}}\right) = 1\), these equations determine \(\beta _1\) uniquely modulo 1. The entry \(\beta _4 v'_2 p^{r-a} + p^{r-s}\) being an integer says \(\beta _4 \equiv -\overline{v'_2} p^{a-s} \pmod {p^{a-r}}\). Write \(\beta _4 = -\overline{v'_2} p^{a-s} + \gamma _4 p^{a-r}\) for some \(\gamma _4 \in \mathbb {Z}\). Then \(\beta _4 v_3 - \beta _5 v'_2 p^{s-a-r} + v_4 p^{-s}\) being an integer says

    $$\begin{aligned} -\overline{v'_2} v_3 p^a + \gamma _4 v_3 p^{s+a-r} - \beta _5 v'_2 p^{2s-a-r} + v_4 \equiv 0 \pmod {p^s}. \end{aligned}$$
    (3.4)

    Write \(\beta _5 = \beta '_5 p^{a+r-2s}\) for some \(\beta '_5 \in \mathbb {Z}\). Then we solve

    $$\begin{aligned} \beta '_5 \equiv -\overline{v'_2}^2 v_3 p^a + \gamma _4 \overline{v'_2} v_3 p^{s+a-r} + \overline{v'_2} v_4 \pmod {p^s}. \end{aligned}$$
    (3.5)

    Then \(\beta _4 v_4 + \beta _5 p^{s-r}\) being an integer says

    $$\begin{aligned} \gamma _4 \left( {p^a v_3 + v'_2 v_4}\right) p^{s+a-r} \equiv v_3 p^{2a} \pmod {p^s}. \end{aligned}$$
    (3.6)

    Recall that \(\left( {p^{r-a}, p^a v_3 + v'_2 v_4}\right) = p^{r+a-s}\). Hence, unless \(a = \frac{s}{2}\), we can write \(p^a v_3 + v'_2 v_4 = V' p^{r+a-s}\), with \((V',p) = 1\). Then we solve (3.6):

    $$\begin{aligned}\begin{aligned} \gamma _4 \equiv \overline{V'} v_3 \pmod {p^{s-2a}}. \end{aligned}\end{aligned}$$

    Putting back to (3.5) gives

    $$\begin{aligned}\begin{aligned} \beta '_5 \equiv -\overline{v'_2}^2 v_3 p^a + \overline{V' v'_2} v_3^2 p^{s+a-r} + \overline{v'_2} v_4 \pmod {p^{2s-a-r}}, \end{aligned}\end{aligned}$$

    hence \(\beta _5\) is uniquely determined modulo 1. When \(a=\frac{s}{2}\), \(\gamma _4\) can be arbitrary, and we have

    $$\begin{aligned}\begin{aligned} \beta '_5 \equiv -\overline{v'_2}^2 v_3 p^a + \overline{v'_2} v_4 \pmod {p^{2s-a-r}}, \end{aligned}\end{aligned}$$

    hence \(\beta _5\) is also uniquely determined modulo 1 in this case. So, after writing u for \(\beta _5 p^s\), the Kloosterman sum is given by

    $$\begin{aligned}\begin{aligned}&{\text {Kl}}_p\left( {n_{s_\alpha s_\beta s_\alpha , r, s}, \psi , \psi '}\right) \\&\qquad = \sum \limits _{s-r \le a \le s/2}\sum \limits _{\begin{array}{c} v_2, v_3, v_4 \pmod {p^r}\\ v_2 = v'_2 p^{r-a}, \; (v'_2, p^a) = 1\\ (v_3, v_4, p^{r-a}) = 1\\ \left( {p^{r-a}, p^a v_3 + v'_2 v_4}\right) = p^{r+a-s} \end{array}}{\text {e}}\left( {\frac{m_1\hat{v}_2 + n_1v_2}{p^r}}\right) {\text {e}}\left( {\frac{m_2 u}{p^s}}\right) , \end{aligned}\end{aligned}$$

    where \(\hat{v}_2\) is chosen modulo \(p^r\) such that

    $$\begin{aligned} \hat{v}_2 v_3&\equiv -v'_2 p^{s-a} \pmod {p^r},&\hat{v}_2 v_4 \equiv p^s \pmod {p^r}, \end{aligned}$$
    (3.7)

    and

    $$\begin{aligned} u \equiv {\left\{ \begin{array}{ll} -\overline{v'_2}^2 v_3 p^{2a+r-s} + \overline{V' v'_2} v_3^2 p^{2a} + \overline{v'_2} v_4 p^{a+r-s} \pmod {p^s} &{} \text {if } a<\frac{s}{2},\\ -\overline{v'_2}^2 v_3 p^{2a+r-s} + \overline{v'_2} v_4p^{a+r-s} \pmod {p^s} &{} \text {if } a=\frac{s}{2},\end{array}\right. } \end{aligned}$$
    (3.8)

    where \(V' = p^{s-r-a} \left( {p^a v_3 + v'_2 v_4}\right) \).

  7. (vii)

    \(n = n_{s_\beta s_\alpha s_\beta ,r,s}\). Then \(X(n_{s_\beta s_\alpha s_\beta ,r,s})\) is nonempty when \(s\ge r\). Unfolding the conditions (3.1) and (3.2), we compute

    $$\begin{aligned}\begin{aligned}&X(n_{s_\beta s_\alpha s_\beta , r,s}) \\&\qquad = \left\{ {\left( {0,p^r, v_{13} p^{r-s}, v_{14} p^{r-s}; p^s, v_{13}, v_{14}, v_{23}, -v_{13}, -(v_{13}^2+v_{14}v_{23}) p^{-s}}\right) }\right\} , \end{aligned}\end{aligned}$$

    where \(v_{13}, v_{14}, v_{23} \pmod {p^s}\), such that \((v_{13}, v_{14}, p^s) = p^{s-r}\), \((v_{14}, p^s) \mid v_{13}^2\), and \((p^{s-r}, v_{23}, v_{34}) = 1\). Recall that \(v_{34} = -(v_{13}^2+v_{14}v_{23})p^{-s}\). We write \(v_{13} = v'_{13} p^{s-r}\), \(v_{14} = v'_{14} p^{s-r}\), so \((v'_{13}, v'_{14}, p^r) = 1\). Bruhat decomposition gives

    $$\begin{aligned} \begin{aligned} x =&\begin{pmatrix} 1 &{} \beta _1 &{} \beta _2 &{} \beta _3\\ &{} 1 &{} \beta _4 &{} \beta _5\\ &{}&{}1\\ &{}&{}-\beta _1 &{} 1\end{pmatrix} \begin{pmatrix} &{}&{}&{} -p^{-r}\\ &{}&{}p^{r-s}\\ &{}p^r\\ -p^{s-r}\end{pmatrix} \begin{pmatrix} 1 &{}&{} -v_{23} p^{-s} &{} v'_{13} p^{-r}\\ &{}1&{} v'_{13} p^{-r} &{} v'_{14} p^{-r}\\ &{}&{}1\\ &{}&{}&{}1\end{pmatrix}\\ =&\begin{pmatrix} -\beta _3 p^{s-r} &{} \beta _2 p^r &{} \beta _2 v'_{13} + \beta _1 p^{r-s} + \beta _3 v_{23} p^{-r} &{} \beta _2 v'_{14} - \beta _3 v'_{13} p^{s-2r} - p^{-r}\\ -\beta _5 p^{s-r} &{} \beta _4 p^r &{} \beta _4 v'_{13} + \beta _5 v_{23} p^{-r} + p^{r-s} &{} \beta _4 v'_{14} - \beta _5 v'_{13} p^{s-2r}\\ 0 &{} p^r &{} v'_{13} &{} v'_{14}\\ -p^{s-r} &{} -\beta _1 p^r &{} -\beta _1 v'_{13} + v_{23} p^{-r} &{} -\beta _1 v'_{14} - v'_{13} p^{s-2r}\end{pmatrix}. \end{aligned}\end{aligned}$$

    The entry \(-\beta _1 p^r\) being an integer says \(\beta _1 = \beta '_1 p^{-r}\) for \(\beta '_1 \in \mathbb {Z}\). Entries \(-\beta _1 v'_{13} + v_{23} p^{-r}\) and \(-\beta _1 v'_{14} - v'_{13} p^{s-2r}\) being integers says

    $$\begin{aligned} \beta '_1 v'_{13}&\equiv v_{23} \pmod {p^r},&\beta '_1 v'_{14}&\equiv - v'_{13} p^{s-r} \pmod {p^r}. \end{aligned}$$
    (3.9)

    As \((v'_{13}, v'_{14},p^r)=1\), this determines \(\beta _1\) uniquely modulo 1. Entries \(\beta _4 p^r\) and \(-\beta _5 p^{s-r}\) being integers says \(\beta _4 = \beta '_4 p^{-r}\) and \(\beta _5 = \beta '_5 p^{r-s}\) for some \(\beta '_4, \beta '_5 \in \mathbb {Z}\). The entry \(\beta _4 v'_{13} + \beta _5 v_{23} p^{-r} + p^{r-s}\) being an integer says

    $$\begin{aligned} \beta '_4 v'_{13} p^{s-r} + \beta '_5 v_{23} + p^r \equiv 0 \pmod {p^s}, \end{aligned}$$
    (3.10)

    which implies

    $$\begin{aligned} \beta '_5 v_{23} \equiv -p^r \pmod {p^{s-r}}. \end{aligned}$$
    (3.11)

    The entry \(\beta _4 v'_{14} - \beta _5 v'_{13} p^{s-2r}\) being an integer says

    $$\begin{aligned} \beta '_4 v'_{14} p^{s-r} - \beta '_5 v'_{13} p^{s-r} \equiv 0 \pmod {p^s}. \end{aligned}$$
    (3.12)

    Then, \(v'_{13}\) times (3.12) minus \(v'_{14}\) times (3.10) gives

    $$\begin{aligned} \beta '_5 (- {v'_{13}}^2 p^{s-r} - v'_{14} v_{23})&\equiv p^r v'_{14} \pmod {p^s},\nonumber \\ \beta '_5 p^r v_{34}&\equiv p^r v'_{14} \pmod {p^s},\nonumber \\ \beta '_5 v_{34}&\equiv v'_{14} \pmod {p^{s-r}}. \end{aligned}$$
    (3.13)

    As \(\left( {p^{s-r}, v_{23}, v_{34}}\right) = 1\), (3.11) and (3.13) determine \(\beta _5\) uniquely modulo 1. So, after writing u for \(\beta _1 p^r\) and \(\hat{v}_{14}\) for \(\beta _5 p^s\), the Kloosterman sum is given by

    $$\begin{aligned}\begin{aligned} {\text {Kl}}_p\left( {n_{s_\beta s_\alpha s_\beta , r, s}, \psi , \psi '}\right) = \sum \limits _{\begin{array}{c} v_{13}, v_{14}, v_{23} \pmod {p^s}\\ \left( {p^s, v_{13}, v_{14}}\right) = p^{s-r}\\ \left( {p^s, v_{14}}\right) \mid v_{13}^2\\ \left( {p^{s-r}, v_{23}, v_{34}}\right) = 1 \end{array}}{\text {e}}\left( {\frac{m_1 u}{p^r}}\right) {\text {e}}\left( {\frac{m_2 \hat{v}_{14} + n_2 v_{14}}{p^s}}\right) , \end{aligned}\end{aligned}$$

    where u is chosen modulo \(p^r\) such that

    $$\begin{aligned} u v_{13} p^{r-s}&\equiv v_{23} \pmod {p^r},&u v_{14} p^{r-s}&\equiv - v_{13} \pmod {p^r}, \end{aligned}$$
    (3.14)

    and \(\hat{v}_{14}\) is chosen modulo \(p^s\) such that

    $$\begin{aligned} \hat{v}_{14} v_{23}&\equiv -p^{2r} \pmod {p^s},&\hat{v}_{14} v_{34}&\equiv v_{14} p^{2r-s} \pmod {p^s}. \end{aligned}$$
    (3.15)
  8. (viii)

    \(n=n_{w_0,r,s}\). Then \(X(n_{w_0,r,s})\) is nonempty whenever \(r,s\ge 0\). Unfolding the conditions (3.1) and (3.2), we compute

    $$\begin{aligned}\begin{aligned} X(n_{w_0,r,s}) = \left\{ {\left( {p^r,v_2,v_3,v_4;p^s,v_{13},v_{14},(v_2v_{13}-v_3p^s)p^{-r},-v_{13},(v_3v_{14}-v_4v_{13})p^{-r}}\right) }\right\} , \end{aligned}\end{aligned}$$

    where \(v_2,v_3,v_4\pmod {p^r}\), \(v_{13},v_{14}\pmod {p^s}\), such that \(v_{13}p^r+v_2v_{14}-v_4p^s = 0\), \((v_2,v_3,v_4,p^r)=1\), and \((v_{13},v_{14},v_{23},v_{34},p^s)=1\). Recall that

    $$\begin{aligned}\begin{aligned} v_{23} = (v_2v_{13}-v_3p^s)p^{-r},&\quad v_{34} = (v_3v_{14}-v_4v_{13})p^{-r}. \end{aligned}\end{aligned}$$

    Bruhat decomposition gives

    $$\begin{aligned} \begin{aligned} x =&\begin{pmatrix} 1 &{} \beta _1 &{} \beta _2 &{} \beta _3\\ &{} 1 &{} \beta _4 &{} \beta _5\\ &{}&{}1\\ &{}&{}-\beta _1 &{} 1\end{pmatrix} \begin{pmatrix} &{}&{} -p^{-r}\\ &{}&{}&{}-p^{r-s}\\ p^r\\ &{}p^{s-r}\end{pmatrix} \begin{pmatrix} 1 &{} v_2 p^{-r} &{} v_3 p^{-r} &{} v_4 p^{-r}\\ &{} 1 &{} v_{13} p^{-s} &{} v_{14} p^{-s}\\ &{}&{}1\\ &{}&{}-v_2 p^{-r} &{} 1\end{pmatrix}\\ =&\displaystyle \begin{pmatrix} \beta _2 p^r &{} \beta _2 v_2 + \beta _3 p^{s-r} &{} \beta _2 v_3 + \beta _3 v_{13} p^{-r} + \beta _1 v_2 p^{-s} - p^{-r} &{} \beta _2 v_4 - \beta _1 p^{r-s} + \beta _3 v_{14} p^{-r}\\ \beta _4 p^r &{} \beta _4 v_2 + \beta _5 p^{s-r} &{} \beta _4 v_3 + \beta _5 v_{13} p^{-r} + v_2 p^{-s} &{} \beta _4 v_4 + \beta _5 v_{14} p^{-r} - p^{r-s}\\ p^r &{} v_2 &{} v_3 &{} v_4\\ -\beta _1 p^r &{} -\beta _1 v_2 + p^{s-r} &{} -\beta _1 v_3 + v_{13} p^{-r} &{} -\beta _1 v_4 + v_{14} p^{-r}\end{pmatrix}. \end{aligned}\end{aligned}$$

    The entry \(-\beta _1 p^r\) being an integer says \(\beta _1 = \beta '_1 p^{-r}\) for some \(\beta '_1\in \mathbb {Z}\). The last row of \(\gamma \) being integral gives

    $$\begin{aligned} \beta '_1 v_2&\equiv p^s \pmod {p^r},&\beta '_1 v_3&\equiv v_{13} \pmod {p^r},&\beta '_1 v_4&\equiv v_{14} \pmod {p^r}. \end{aligned}$$
    (3.16)

    As \(\left( {p^r, v_2, v_3, v_4}\right) = 1\), these equations determine \(\beta _1\) uniquely modulo 1. The entry \(\beta _4 p^r\) being an integer says \(\beta _4 = \beta '_4 p^{-r}\) for some \(\beta '_4 \in \mathbb {Z}\). Then \(\beta _4 v_2 + \beta _5 p^{s-r}\) being an integer says

    $$\begin{aligned} \beta '_4 v_2 + \beta _5 p^s \equiv 0 \pmod {p^r}. \end{aligned}$$
    (3.17)

    In particular, this means \(\beta _5 = \beta '_5 p^{-s}\) for some \(\beta '_5 \in \mathbb {Z}\). The entries \(\beta _4 v_3 + \beta _5 v_{13} p^{-r} + v_2 p^{-s}\) and \(\beta _4 v_4 + \beta _5 v_{14} p^{-r} - p^{r-s}\) being integers says

    $$\begin{aligned} \beta '_4 v_3 p^s + \beta '_5 v_{13} + v_2 p^r&\equiv 0 \pmod {p^{r+s}}, \end{aligned}$$
    (3.18)
    $$\begin{aligned} \beta '_4 v_4 p^s + \beta '_5 v_{14} - p^{2r}&\equiv 0 \pmod {p^{r+s}} . \end{aligned}$$
    (3.19)

    In particular we deduce

    $$\begin{aligned} \beta '_5 v_{13} + v_2 p^r&\equiv 0 \pmod {p^s}, \end{aligned}$$
    (3.20)
    $$\begin{aligned} \beta '_5 v_{14} - p^{2r}&\equiv 0 \pmod {p^s}. \end{aligned}$$
    (3.21)

    Then, \(v_2\) times (3.18) minus \(v_3 p^s\) times (3.17) gives

    $$\begin{aligned} \beta '_5 \left( {v_2v_{13} - v_3p^s}\right) + v_2^2 p^r \equiv 0 \pmod {p^{r+s}}, \end{aligned}$$

    which implies

    $$\begin{aligned} \beta '_5 v_{23} + v_2^2 \equiv 0 \pmod {p^s}. \end{aligned}$$
    (3.22)

    Similarly, \(v_3\) times (3.18) minus \(v_4\) times (3.19) gives

    $$\begin{aligned}&\beta '_5 \left( {v_3v_{14}-v_4v_{13}}\right) - p^r\left( {v_3 p^r + v_2v_4}\right) \equiv 0 \pmod {p^{r+s}}, \end{aligned}$$

    which implies

    $$\begin{aligned} \beta '_5 v_{34} \equiv \left( {v_3 p^r + v_2 v_4}\right) \pmod {p^s}. \end{aligned}$$
    (3.23)

    In summary, \(\beta '_5\) satisfies the following equations:

    $$\begin{aligned}\begin{aligned} \beta '_5 v_{13}&\equiv - v_2 p^r \pmod {p^s},&\beta '_5 v_{14}&\equiv p^{2r} \pmod {p^s},\\ \beta '_5 v_{23}&\equiv -v_2^2 \pmod {p^s},&\beta '_5 v_{34}&\equiv v_3 p^r + v_2v_4 \pmod {p^s}. \end{aligned}\end{aligned}$$

    As \(\left( {p^s, v_{13}, v_{14}, v_{23}, v_{34}}\right) = 1\), these equations determine \(\beta _5\) uniquely modulo 1. So, after writing \(\hat{v}_2\) for \(\beta _1 p^r\) and \(\hat{v}_{14}\) for \(\beta _2 p^s\), the Kloosterman sum is given by

    $$\begin{aligned}\begin{aligned} {\text {Kl}}_p \left( {n_{w_0, r, s}, \psi , \psi '}\right) = \sum \limits _{\begin{array}{c} v_2, v_3, v_4\pmod {p^r}\\ v_{13}, v_{14}\pmod {p^s}\\ v_{13}p^r + v_2v_{14} - v_4p^s = 0\\ (p^r, v_2, v_3, v_4) = 1\\ (p^s, v_{13}, v_{14}, v_{23}, v_{34}) = 1 \end{array}}{\text {e}}\left( {\frac{m_1\hat{v}_2 + n_1 v_2}{p^r}}\right) {\text {e}}\left( {\frac{m_2\hat{v}_{14} + n_2 v_{14}}{p^s}}\right) , \end{aligned}\end{aligned}$$

    where \(\hat{v}_2\) is chosen modulo \(p^r\) such that

    $$\begin{aligned} \hat{v}_2 v_2 \equiv p^s \pmod {p^r}, \quad \hat{v}_2 v_3 \equiv v_{13} \pmod {p^r}, \quad \hat{v}_2 v_4 \equiv v_{14} \pmod {p^r}; \end{aligned}$$
    (3.24)

    and \(\hat{v}_{14}\) chosen modulo \(p^s\) such that

    $$\begin{aligned} \begin{aligned} \hat{v}_{14}v_{13}&\equiv -v_2p^r \pmod {p^s},&\hat{v}_{14}v_{14}&\equiv p^{2r} \pmod {p^s},\\ \hat{v}_{14}v_{23}&\equiv -v_2^2 \pmod {p^s},&\hat{v}_{14} v_{34}&\equiv v_3p^r+v_2v_4\pmod {p^s}. \end{aligned} \end{aligned}$$
    (3.25)

Now we give a few reduction formulae for Kloosterman sums, which are straightforward to prove.

Proposition 3.2

Let \(\psi = \psi _{m_1, m_2}\), \(\psi ' = \psi _{n_1, n_2}\). Then

$$\begin{aligned}\begin{aligned}&{\text {Kl}}_p\left( {n_{w_0, r, 0}, \psi , \psi '}\right)&= S\left( {m_1, n_1; p^r}\right) ,&{\text {Kl}}_p\left( {n_{w_0, 0, s}, \psi , \psi '}\right)&= S\left( {m_2, n_2; p^s}\right) ,\\&{\text {Kl}}_p\left( {n_{s_\alpha s_\beta s_\alpha , r, 0}, \psi , \psi '}\right)&= S\left( {m_1,0;p^r}\right) ,&{\text {Kl}}_p\left( {n_{s_\beta s_\alpha s_\beta , 0, s}, \psi , \psi '}\right)&= S\left( {0,m_2;p^s}\right) ,\\&{\text {Kl}}_p\left( {n_{s_\alpha s_\beta , r, 0}, \psi , \psi '}\right)&= S\left( {m_1,0;p^r}\right) ,&{\text {Kl}}_p\left( {n_{s_\beta s_\alpha , 0, s}, \psi , \psi '}\right)&= S\left( {0,m_2;p^s}\right) . \end{aligned}\end{aligned}$$

We end the section by proving that the Kloosterman sum attached to the long Weyl element \(w_0\) is symmetric with respect to characters \(\psi , \psi '\). Note that this holds for \(G = {\text {Sp}}(2r)\) in general.

Proposition 3.3

Let \(G = {\text {Sp}}(2r,\mathbb {Q}_p)\), and \(n\in N\left( {\mathbb {Q}_p}\right) \), such that \(w(n) = w_0\) is the long Weyl element. Let \(\psi , \psi ': U\left( {\mathbb {Q}_p}\right) / U\left( {\mathbb {Z}_p}\right) \rightarrow \mathbb {C}^\times \) be characters. Then

$$\begin{aligned}\begin{aligned} {\text {Kl}}_p\left( {n, \psi , \psi '}\right) = {\text {Kl}}_p\left( {n, \psi ', \psi }\right) . \end{aligned}\end{aligned}$$

Proof

The definition of Kloosterman sums reads

$$\begin{aligned}\begin{aligned} {\text {Kl}}_p\left( {n, \psi , \psi '}\right) = \sum \limits _{x\in X(n)} \psi \left( {u(x)}\right) \psi '\left( {u'(x)}\right) . \end{aligned}\end{aligned}$$

The key idea of the proof is to find a bijection \(X(n) \rightarrow X(n)\), \(x\mapsto \tilde{x}\) such that \(\psi (u(\tilde{x})) = \psi (u'(x))\) and \(\psi '(u'(\tilde{x})) = \psi '(u(x))\). Since \(w(n) = w_0\) is the long Weyl element, \(n\in N(\mathbb {Q}_p)\) is of the form

$$\begin{aligned}\begin{aligned} n&= \begin{pmatrix} &{} -D^{-1}\\ D \end{pmatrix},&D&={\text {diag}}(d_1,\ldots , d_r),&d_i\in \mathbb {Q}_p^\times . \end{aligned}\end{aligned}$$

Let \(x\in X(n)\), and suppose

$$\begin{aligned}\begin{aligned} u(x)&= \begin{pmatrix} U &{} S\\ &{} (U^{-1})^T\end{pmatrix} \in U(\mathbb {Q}_p),&u'(x)&= \begin{pmatrix} U' &{} S'\\ &{} ({U'}^{-1})^T\end{pmatrix} \in U(\mathbb {Q}_p). \end{aligned}\end{aligned}$$

Then we have

$$\begin{aligned}\begin{aligned} x = \begin{pmatrix} SDU' &{} SDS' - UD^{-1}({U'}^{-1})^T\\ (U^{-1})^T D U' &{} (U^{-1})^T D S'\end{pmatrix} \in G(\mathbb {Z}_p). \end{aligned}\end{aligned}$$

Now set

$$\begin{aligned}\begin{aligned} \tilde{u}&= \begin{pmatrix} ({\tilde{U}'}{}^{-1})^T &{} \tilde{S}'\\ &{} \tilde{U}'\end{pmatrix},&\tilde{u}'&= \begin{pmatrix} ({\tilde{U}}^{-1})^T &{} \tilde{S}\\ &{} \tilde{U}\end{pmatrix}, \end{aligned}\end{aligned}$$

where

$$\begin{aligned}\begin{aligned} \tilde{U}_{ij}&:= (-1)^{i-j} U_{ji},&\tilde{S}_{ij}&:= (-1)^{i-j} S_{ji},&\tilde{U}'_{ij}&:= (-1)^{i-j} U'_{ji},&\tilde{S}'_{ij}&:= (-1)^{i-j} S'_{ji}. \end{aligned}\end{aligned}$$

It is straightforward to verify that \(\tilde{u}, \tilde{u}' \in U(\mathbb {Q}_p)\). Now set

$$\begin{aligned}\begin{aligned} \tilde{x} = \tilde{u} n \tilde{u}' = \begin{pmatrix} \tilde{S}' D ({\tilde{U}}^{-1})^T &{} \tilde{S}' D \tilde{S} - ({\tilde{U}'}{}^{-1})^T D ^{-1} \tilde{U}\\ \tilde{U}' D ({\tilde{U}}^{-1})^T &{} \tilde{U}' D \tilde{S}\end{pmatrix} \in G(\mathbb {Q}_p). \end{aligned}\end{aligned}$$

Now observe

$$\begin{aligned}\begin{aligned} \left( {\tilde{S}' D ({\tilde{U}}^{-1})^T}\right) _{ij}&= \sum \limits _k \tilde{S}'_{ik} d_k ({\tilde{U}}^{-1})^T_{kj} \\&= \sum \limits _k (-1)^{i+j} (U^{-1})^T_{jk} d_k S'_{ki} = (-1)^{i+j} \left( {(U^{-1})^T D S'}\right) _{ji} \in \mathbb {Z}_p, \end{aligned}\end{aligned}$$

and similarly

$$\begin{aligned}\begin{aligned} \left( {\tilde{S}' D \tilde{S} - ({\tilde{U}'}{}^{-1})^T D ^{-1} \tilde{U}}\right) _{ij}&= (-1)^{i+j} \left( {SDS' - UD^{-1}({U'}^{-1})^T}\right) _{ji} \in \mathbb {Z}_p,\\ \left( {\tilde{U}' D ({\tilde{U}}^{-1})^T}\right) _{ij}&= (-1)^{i+j} \left( {(U^{-1})^T D U'}\right) _{ji} \in \mathbb {Z}_p. \end{aligned}\end{aligned}$$

Hence \(\tilde{x} \in G(\mathbb {Z}_p)\). Moreover, we may directly verify that \(\alpha _i(\tilde{u}) = \alpha _i(u')\), \(\alpha _i(\tilde{u}') = \alpha _i(u)\) for \(1\le i\le r\). So \(\psi (u(\tilde{x})) = \psi (u'(x))\) and \(\psi '(u'(\tilde{x})) = \psi '(u(x))\). Finally, using the bijection \(X(n) \rightarrow X(n)\), \(x\mapsto \tilde{x}\), we deduce that

$$\begin{aligned} {\text {Kl}}_p\left( {n, \psi , \psi '}\right)&= \sum \limits _{x\in X(n)} \psi \left( {u(x)}\right) \psi '\left( {u'(x)}\right) \\&= \sum \limits _{x\in X(n)} \psi \left( {u(\tilde{x})}\right) \psi '\left( {u'(\tilde{x})}\right) \\&= \sum \limits _{x\in X(n)} \psi '\left( {u(x)}\right) \psi \left( {u'(x)}\right) = {\text {Kl}}_p\left( {n, \psi ', \psi }\right) . \end{aligned}$$

\(\square \)

4 Bounds for \({\text {Sp}}(4)\) Kloosterman sums

Fix \(\psi = \psi _{m_1,m_2}\), \(\psi ' = \psi _{n_1,n_2}\) as in (1.1). We first establish non-trivial bounds for local Kloosterman sums \({\text {Kl}}_p\left( {n_{w,r,s}, \psi , \psi '}\right) \), that is, prove Theorem 1.1.

We start with the local bounds. For \({\text {Kl}}_p(n_{{\text {id}},0,0},\psi ,\psi ')\), there is nothing to prove. Meanwhile, \({\text {Kl}}_p\left( {n_{s_\alpha ,r,0}, \psi , \psi '}\right) \) and \({\text {Kl}}_p\left( {n_{s_\beta ,0,s}, \psi , \psi '}\right) \) are just \({\text {GL}}(2)\) Kloosterman sums. A well-known bound for \({\text {GL}}(2)\) Kloosterman sums is given by [13]

$$\begin{aligned} \left| {S(\mu , \nu ; p^k)} \right| \le 2 p^{k/2} (\left| {\mu } \right| _p^{-1}, \left| {\nu } \right| _p^{-1}, p^k)^{1/2}. \end{aligned}$$
(4.1)

So

$$\begin{aligned}\begin{aligned} \left| {{\text {Kl}}_p\left( {n_{s_\alpha , r}, \psi , \psi '}\right) } \right| = S(m_1,n_1,p^r)&\ll p^{r/2}(m_1,n_1,p^r)^{1/2}, \\ \left| {{\text {Kl}}_p\left( {n_{s_\beta , s}, \psi , \psi '}\right) } \right| = S(m_2,n_2,p^s)&\ll p^{s/2} (m_2,n_2,p^s)^{1/2} \end{aligned}\end{aligned}$$

as claimed.

4.1 Bound for \({\text {Kl}}_p\left( {n_{s_\alpha s_\beta , r, s}, \psi , \psi '}\right) \)

We recall

$$\begin{aligned}\begin{aligned} {\text {Kl}}_p \left( {n_{s_\alpha s_\beta , r, s}, \psi , \psi '}\right) = \sum \limits _{\begin{array}{c} v_4 \pmod {p^s}\\ (v_4, p^s) = 1 \end{array}} \sum \limits _{\begin{array}{c} v_3 \pmod {p^r}\\ (v_3, p^{r-s}) = 1 \end{array}}{\text {e}}\left( {\frac{m_1\overline{v_3}p^s}{p^r}}\right) {\text {e}}\left( {\frac{m_2 \overline{v_4} v_3^2 + n_2 v_4}{p^s}}\right) . \end{aligned}\end{aligned}$$

Without loss of generality, we assume \({\text {ord}}_p(m_1) \le r-s\), and \({\text {ord}}_p(m_2), {\text {ord}}_p(n_2) \le s\). Observe that

$$\begin{aligned}\begin{aligned} {\text {Kl}}_p\left( {n_{s_\alpha s_\beta , r,s}, \psi _{m_1, m_2}, \psi _{n_1, n_2}}\right) = p^{k+2l} {\text {Kl}}_p\left( {n_{s_\alpha s_\beta , r-k-l, s-l} \psi _{m_1 p^{-k}, m_2 p^{-l}}, \psi _{n_1, n_2 p^{-l}}}\right) \end{aligned}\end{aligned}$$

whenever \(p^k \mid \left( {m_1, p^{r-s}}\right) \) and \(p^l \mid \left( {m_2, n_2, p^s}\right) \). So we may assume \(s=0\), \(r=s\), or \(p\not \mid m_1\left( {m_2, n_2}\right) \).

If \(s=0\), then

$$\begin{aligned}\begin{aligned} \left| {{\text {Kl}}\left( {n_{s_\alpha s_\beta , r,0}, \psi , \psi '}\right) } \right| = \bigg |\sum \limits _{\begin{array}{c} v_3\pmod {p^r}\\ (v_3, p^r) = 1 \end{array}}{\text {e}}\left( {\frac{m_1\overline{v_3}}{p^r}}\right) \bigg | \le p^{{\text {ord}}_p(m_1)}. \end{aligned}\end{aligned}$$

If \(r=s\), then

$$\begin{aligned}\begin{aligned} \left| {{\text {Kl}}\left( {n_{s_\alpha s_\beta , r,0}, \psi , \psi '}\right) } \right|&= \bigg |\sum \limits _{\begin{array}{c} v_4\pmod {p^r}\\ (v_4, p) = 1 \end{array}} \sum \limits _{v_3\pmod {p^r}}{\text {e}}\left( {\frac{m_2\overline{v_4}v_3^2 + n_2v_4}{p^r}}\right) \bigg | \\&\le p^{r+\frac{{\text {ord}}_p(m_2)}{2} + \frac{{\text {ord}}_p(n_2)}{2}} \end{aligned}\end{aligned}$$

is just a summation of quadratic Gauss sums, and is easily evaluated.

Now suppose \(p \not \mid m_1 \left( {m_2, n_2}\right) \). If \(p \mid m_2\) and \(s>1\), then

$$\begin{aligned}\begin{aligned} {\text {Kl}}\left( {n_{s_\alpha s_\beta , r,s}, \psi , \psi '}\right) =&\sum \limits _{\begin{array}{c} v_4\pmod {p^{s-1}}\\ (v_4, p) = 1 \end{array}} \sum \limits _{\begin{array}{c} v_3\pmod {p^r}\\ (v_3, p^{r-s}) = 1 \end{array}} \sum \limits _{k=0}^{p-1}{\text {e}}\left( {\frac{m_1\overline{v_3}}{p^{r-s}}}\right) {\text {e}}\left( {\frac{m_2 \overline{v_4} v_3^2 + n_2 \left( {v_4+ k p^{s-1}}\right) }{p^s}}\right) \\ =&p \sum \limits _{k=0}^{p-1}{\text {e}}\left( {\frac{n_2k}{p}}\right) {\text {Kl}}\left( {n_{s_\alpha s_\beta , r-1, s-1}, \psi _{m_1, m_2/p}, \psi '}\right) = 0. \end{aligned}\end{aligned}$$

Now suppose \(p \mid m_2\) and \(s=1\). We may also assume \(r\ge 2\). Then

$$\begin{aligned}\begin{aligned} {\text {Kl}}\left( {n_{s_\alpha s_\beta , r,1}, \psi , \psi '}\right) =&\sum \limits _{\begin{array}{c} v_4\pmod {p}\\ (v_4,p)=1 \end{array}} \sum \limits _{\begin{array}{c} v_3\pmod {p^r}\\ (v_3,p) = 1 \end{array}}{\text {e}}\left( {\frac{m_1\overline{v_3}}{p^{r-1}}}\right) {\text {e}}\left( {\frac{n_2v_4}{p}}\right) = {\left\{ \begin{array}{ll} p &{} \text {if } r=2,\\ 0 &{} \text {if } r>2.\end{array}\right. } \end{aligned}\end{aligned}$$

The same argument shows that the bound also holds when \(p\mid n_2\). Now assume \(p\not \mid m_1m_2n_2\). When p is odd, the same argument shows that the sum is zero unless \(r=2s\); when \(p=2\), the sum is zero unless \(r=2s\) or \(r=2s-1\).

We first consider the case \(r=2s\). When \(s=1\), we have

$$\begin{aligned}\begin{aligned} {\text {Kl}}_p\left( {n_{s_\alpha s_\beta , 2,1}, \psi , \psi '}\right) = \sum \limits _{\begin{array}{c} v_4\pmod {p}\\ (v_4, p) = 1 \end{array}} \sum \limits _{\begin{array}{c} v_3\pmod {p^2}\\ (v_3, p) = 1 \end{array}}{\text {e}}\left( {\frac{m_1\overline{v_3} + m_2 \overline{v_4} v_3^2 + n_2 v_4}{p}}\right) . \end{aligned}\end{aligned}$$

We apply a theorem of Adolphson and Sperber on exponential sums of Laurent polynomials. Let \(k = \mathbb {F}_q\) be a finite field of characteristic p. Let

$$\begin{aligned}\begin{aligned} f = \sum \limits _{j\in J} a_j x^j \in k[x_1,\ldots , x_n, (x_1 \ldots x_n)^{-1}] \end{aligned}\end{aligned}$$

be a Laurent polynomial in n variables. We assume that \(a_j \ne 0\) for all \(j\in J\). Let \(\Psi \) be a nontrivial additive character of k, we set

$$\begin{aligned}\begin{aligned} S^*(f) = \sum \limits _{x\in (k^\times )^n} \Psi (f(x)). \end{aligned}\end{aligned}$$

The Newton polyhedron of f, denoted by \(\Delta (f)\), is the convex hull in \(\mathbb {R}^n\) of the set \(J \cup \left\{ {(0,\ldots , 0)}\right\} \). We denote by V(f) the volume of \(\Delta (f)\) with respect to the Lebesgue measure on \(\mathbb {R}^n\). For a face \(\sigma \) (of any dimension) of \(\Delta (f)\), we set

$$\begin{aligned}\begin{aligned} f_\sigma = \sum \limits _{j\in \sigma \cap J} a_j x^j. \end{aligned}\end{aligned}$$

We say that f is non-degenerate with respect to \(\Delta (f)\) if for every face \(\sigma \) of \(\Delta (f)\) that does not contain the origin, the polynomials

$$\begin{aligned}\begin{aligned} \frac{\partial {f_\sigma }}{\partial {x_1}}, \ldots , \frac{\partial {f_\sigma }}{\partial {x_n}} \end{aligned}\end{aligned}$$

have no common zeroes in \((\overline{k}^\times )^n\), where \(\overline{k}\) denotes an algebraic closure of k. Then we have the following estimate.

Theorem 4.1

[1, Corollary 4.3] Given an n-dimensional integral polyhedron \(\Delta \) in \(\mathbb {R}^n\), there is a set \(\mathscr {S}_\Delta \) consisting of all but finitely many prime numbers, such that if \({\text {char}}(k) \in \mathscr {S}_\Delta \), and

$$\begin{aligned}\begin{aligned} f \in k[x_1,\ldots , x_n, (x_1\ldots x_n)^{-1}] \end{aligned}\end{aligned}$$

is a non-degenerate Laurent polynomial with \(\Delta (f) = \Delta \), then \(\left| {S^*(f)} \right| \le n! V(f) q^{n/2}\). Moreover, when \(n=2\), the restriction on \({\text {char}}(k)\) can be removed.

Now set \(f(x,y) = \frac{m_1}{x}+\frac{m_2x^2}{y}+n_2y \in \mathbb {F}_p[x,y,(xy)^{-1}]\), and \(\Psi :\mathbb {F}_p\rightarrow \mathbb {C}^\times \) the standard additive character on \(\mathbb {F}_p\). Then \(p S^*(f) = {\text {Kl}}_p(n_{s_\alpha s_\beta ,2,1},\psi ,\psi ')\). We claim that f is non-degenerate whenever \(p\ne 2\). The Newton polyhedron \(\Delta (f)\) is the triangle with vertices \((x,y) = (-1,0), (2,-1), (0,1)\), and we evaluate \(V(f) = 2\). We list the faces \(\sigma \) that do not contain the origin, and compute the derivatives \(\frac{\partial {f_\sigma }}{\partial {x}}, \frac{\partial {f_\sigma }}{\partial {y}}\). We denote by \(\left\langle {a_0, \ldots , a_j}\right\rangle \) the j-dimensional face of \(\Delta (f)\) containing \(a_0,\ldots , a_j\). We compute:

$$\begin{aligned} \sigma _1= & {} \left\langle {(-1,0),(2,-1)}\right\rangle , f_{\sigma _1} = \frac{m_1}{x}+\frac{m_2x^2}{y}, \\&\left( {\frac{\partial {f_{\sigma _1}}}{\partial {x}}, \frac{\partial {f_{\sigma _1}}}{\partial {y}}}\right) = \left( {-\frac{m_1}{x^2}+\frac{2m_2x}{y}, -\frac{m_2x^2}{y^2}}\right) ;\\ \sigma _2= & {} \left\langle {(-1,0),(0,1)}\right\rangle , f_{\sigma _2} = \frac{m_1}{x}+n_2y, \\&\left( {\frac{\partial {f_{\sigma _2}}}{\partial {x}}, \frac{\partial {f_{\sigma _2}}}{\partial {y}}}\right) = \left( {-\frac{m_1}{x^2}, n_2}\right) ;\\ \sigma _3= & {} \left\langle {(2,-1),(0,1)}\right\rangle , f_{\sigma _3} = \frac{m_2x^2}{y}+n_2y, \\&\left( {\frac{\partial {f_{\sigma _3}}}{\partial {x}}, \frac{\partial {f_{\sigma _3}}}{\partial {y}}}\right) = \left( {\frac{2m_2x}{y}, -\frac{m_2x^2}{y^2}+n_2}\right) ;\\ \sigma _4= & {} \left\langle {(-1,0)}\right\rangle , f_{\sigma _4} = \frac{m_1}{x}, \\&\left( {\frac{\partial {f_{\sigma _4}}}{\partial {x}}, \frac{\partial {f_{\sigma _4}}}{\partial {y}}}\right) =\left( {-\frac{m_1}{x^2},0}\right) ;\\ \sigma _5= & {} \left\langle {(2,-1)}\right\rangle , f_{\sigma _5} = \frac{m_2x^2}{y}, \\&\left( {\frac{\partial {f_{\sigma _5}}}{\partial {x}}, \frac{\partial {f_{\sigma _5}}}{\partial {y}}}\right) =\left( {\frac{2m_2x}{y},-\frac{m_2x^2}{y^2}}\right) ;\\ \sigma _6= & {} \left\langle {(0,1)}\right\rangle , f_{\sigma _6} = n_2y, \\&\left( {\frac{\partial {f_{\sigma _6}}}{\partial {x}}, \frac{\partial {f_{\sigma _6}}}{\partial {y}}}\right) =\left( {0,n_2}\right) . \end{aligned}$$

Observe that when \(p\ne 2\), the terms \(-\frac{m_1}{x^2}\), \(\frac{2m_2x}{y}\), \(-\frac{m_2x^2}{y^2}\), \(n_2\) have no zeroes in \((\overline{\mathbb {F}_p}^\times )^2\). Hence we conclude that f is non-degenerate when \(p\ne 2\). Now we apply Theorem 4.1 and conclude that

$$\begin{aligned} \left| {S^*(f)} \right| \le 2V(f)p = 4p. \end{aligned}$$
(4.2)

for \(p\ne 2\). However, by direct computation, the bound (4.2) also holds for \(p=2\). Therefore, for all primes p, we have

$$\begin{aligned}\begin{aligned} \left| {{\text {Kl}}_p(n_{s_\alpha s_\beta ,2,1},\psi ,\psi ')} \right| \le 4p^2. \end{aligned}\end{aligned}$$

So the bound holds in this case.

If \(s>1\), we apply the stationary phase method, following [7]. Let V be a smooth scheme of dimension n, and \(f:V\rightarrow \mathbb {A}^1 = \mathbb {A}_{\mathbb {Z}_p}^1\) a \(\mathbb {Z}_p\)-morphism. We consider the exponential sum

$$\begin{aligned} S_m(f) := \sum \limits _{x\in V(\mathbb {Z}/p^m\mathbb {Z})}{\text {e}}\left( {\frac{f(x)}{p^m}}\right) . \end{aligned}$$
(4.3)

Let \(j\le m\) be a positive integer. We write

$$\begin{aligned} D(\mathbb {Z}/p^j\mathbb {Z}) := \left\{ {x \in V(\mathbb {Z}/p^j\mathbb {Z})}\;\big |\;{\nabla f(x) \equiv 0\pmod {p^j}}\right\} \end{aligned}$$
(4.4)

to denote the “approximate critical points” of f. For \(\overline{x} \in (\mathbb {Z}/p^j\mathbb {Z})^n\), we define

$$\begin{aligned}\begin{aligned} S_m(f)_{\overline{x}} = \sum \limits _{\begin{array}{c} x \in V(\mathbb {Z}/p^m\mathbb {Z})\\ x\equiv \overline{x}\pmod {p^j} \end{array}}{\text {e}}\left( {\frac{f(x)}{p^m}}\right) . \end{aligned}\end{aligned}$$

Clearly we have

$$\begin{aligned}\begin{aligned} S_m(f) = \sum \limits _{\overline{x} \in (\mathbb {Z}/p^j\mathbb {Z})^n} S_m(f)_{\overline{x}}. \end{aligned}\end{aligned}$$

Theorem 4.2

[7, Theorem 1.8(a)] If \(2j\le m\), then \(S_{\overline{x}} = 0\) unless \(\overline{x} \in D(\mathbb {Z}/p^j\mathbb {Z})\). Now suppose \(m=2j\) or \(2j+1\), and let \(x\in (\mathbb {Z}/p^m\mathbb {Z})^n\) map to \(\overline{x} \in D(\mathbb {Z}/p^j\mathbb {Z})\). If \(m=2j\), then we have

$$\begin{aligned} \begin{aligned} S_m(f)_{\overline{x}} = p^{mn/2}{\text {e}}\left( {\frac{f(x)}{p^m}}\right) . \end{aligned} \end{aligned}$$

If \(m=2j+1\), then we have

$$\begin{aligned}\begin{aligned} S_m(f)_{\overline{x}} = p^{(m-1)n/2}{\text {e}}\left( {\frac{f(x)}{p^m}}\right) \sum \limits _{y \in (\mathbb {Z}/p\mathbb {Z})^n}{\text {e}}\left( {\frac{\frac{1}{2} y^T H_x y + p^{-j} \nabla f(x) \cdot y}{p}}\right) , \end{aligned}\end{aligned}$$

where \(H_x\) is the Hessian matrix of f at x. In particular, if we let t denote the maximum value of \(n - {\text {rank}}_{\mathbb {F}_p} H_{\overline{x}}\) for \(\overline{x}\in D(\mathbb {Z}/p^j\mathbb {Z})\), then \(\left| {S} \right| \le \left| {D(\mathbb {Z}/p^j\mathbb {Z})} \right| p^{(mn+t)/2}\).

Now we apply the stationary phase method. Let \(f(x,y) = \frac{m_1}{x} + \frac{m_2 x^2}{y} + n_2 y\). Consider the sum

$$\begin{aligned}\begin{aligned} S = \sum \limits _{x, y\in \left( {\mathbb {Z}/p^s\mathbb {Z}}\right) ^\times }{\text {e}}\left( {\frac{f(x,y)}{p^s}}\right) = p^{-s} {\text {Kl}}_p \left( {n_{s_\alpha s_\beta , 2s, s}, \psi , \psi '}\right) . \end{aligned}\end{aligned}$$

Let \(j\ge 1\) be such that \(2j\le s\). Define as in (4.4)

$$\begin{aligned} D\left( {\mathbb {Z}/p^j\mathbb {Z}}\right)&= \left\{ {(x,y) \in \left( {\mathbb {Z}/p^j\mathbb {Z}}\right) ^\times \times \left( {\mathbb {Z}/p^j\mathbb {Z}}\right) ^\times }\;\big |\;{\nabla f(x,y) \equiv 0\pmod {p^j}}\right\} \\&= \left\{ {\left( {x,y}\right) \in \left( {\mathbb {Z}/p^j\mathbb {Z}}\right) ^\times \times \left( {\mathbb {Z}/p^j\mathbb {Z}}\right) ^\times }\;\big |\;{\begin{array}{l} 2m_2 x^3 \equiv m_1y \pmod {p^j},\\ m_2 x^2 \equiv n_2 y^2 \pmod {p^j}\end{array}}\right\} . \end{aligned}$$

It is straightforward to check that \(\left| {D\left( {\mathbb {Z}/p^j\mathbb {Z}}\right) } \right| \le 4\), and \(H_{x,y}\) is invertible over \(\mathbb {F}_p\) for all \((x,y)\in D\left( {\mathbb {Z}/p^j\mathbb {Z}}\right) \), so \({\text {rank}}_{\mathbb {F}_p} H_{x,y} = 2\). So we deduce from Theorem 4.2 that

$$\begin{aligned}\begin{aligned} \left| {{\text {Kl}}_p\left( {n_{s_\alpha s_\beta , r,s}, \psi , \psi '}\right) } \right| \le 4p^{2s}. \end{aligned}\end{aligned}$$

Now it remains to tackle the case \(p=2\), \(r=2s-1\). As p is fixed, it suffices to prove the bound for sufficiently large s, so we can always use the stationary phase method. Let \(f(x,y) = \frac{2m_1}{x} + \frac{m_2x^2}{y} + n_2 y\). Consider the sum

$$\begin{aligned}\begin{aligned} S = \sum \limits _{x,y\in (\mathbb {Z}/p^s\mathbb {Z})^\times }{\text {e}}\left( {\frac{f(x,y)}{p^s}}\right) = p^{-s+1} {\text {Kl}}_p(n_{s_\alpha s_\beta , 2s-1,s},\psi ,\psi '). \end{aligned}\end{aligned}$$

Let \(j\ge 1\) be such that \(2j\le s\). Define as in (4.4)

$$\begin{aligned}&D\left( {\mathbb {Z}/p^j\mathbb {Z}}\right) \\&\qquad = \left\{ {(x,y) \in \left( {\mathbb {Z}/p^j\mathbb {Z}}\right) ^\times \times \left( {\mathbb {Z}/p^j\mathbb {Z}}\right) ^\times }\;\big |\;{\nabla f(x,y) \equiv 0\pmod {p^j}}\right\} \\&\qquad = \left\{ {\left( {x,y}\right) \in \left( {\mathbb {Z}/p^j\mathbb {Z}}\right) ^\times \times \left( {\mathbb {Z}/p^j\mathbb {Z}}\right) ^\times }\;\big |\;{\begin{array}{l} 2m_2 x^3 \equiv 2m_1y \pmod {p^j},\\ m_2 x^2 \equiv n_2 y^2 \pmod {p^j}\end{array}}\right\} . \end{aligned}$$

Then we have \(\left| {D\left( {\mathbb {Z}/p^j\mathbb {Z}}\right) } \right| \le 16\). The Hessian \(H_{x,y}\) is not invertible, but nevertheless we have from Theorem 4.2 that

$$\begin{aligned}\begin{aligned} \left| {{\text {Kl}}_p\left( {n_{s_\alpha s_\beta , 2s-1,s}, \psi , \psi '}\right) } \right| \le 64p^{2s-1}. \end{aligned}\end{aligned}$$

This finishes the proof of the bound for \({\text {Kl}}_p\left( {n_{s_\alpha s_\beta , r,s}, \psi , \psi '}\right) \).

4.2 Bound for \({\text {Kl}}_p\left( {n_{s_\beta s_\alpha , r, s}, \psi , \psi '}\right) \)

As we have mentioned in Sect. 3, the Kloosterman sum \({\text {Kl}}_p\left( {n_{s_\beta s_\alpha , r, s}, \psi , \psi '}\right) \) differs from the \({\text {GL}}(3)\) Kloosterman sum \(S\left( {n_1, m_1, m_2; p^r, p^{s-r}}\right) \) just by a factor of \(p^r\). So our bound immediately follows from the estimate given by Larsen [2, Appendix], whose proof we omit here.

4.3 Bound for \({\text {Kl}}_p\left( {n_{s_\alpha s_\beta s_\alpha , r, s}, \psi , \psi '}\right) \)

We make use of the decomposition for Kloosterman sums in Sect. 2 to obtain a non-trivial bound for \({\text {Kl}}_p\left( {n_{s_\alpha s_\beta s_\alpha , r, s}, \psi , \psi '}\right) \).

Let \(w = s_\alpha s_\beta s_\alpha \), and \(n = n_{s_\alpha s_\beta s_\alpha , r, s}\). Note that we have \(s\le 2r\). Then \(\Delta _w = \left\{ {\alpha }\right\} \), and

$$\begin{aligned}\begin{aligned} A_w(\ell ) = \left( {\mathbb {Z}/p^\ell \mathbb {Z}}\right) ^2 \times \left( {\mathbb {Z}/p^\ell \mathbb {Z}}\right) . \end{aligned}\end{aligned}$$

Let \(t = {\text {diag}}\left( {a_1, a_2, ca_1^{-1}, ca_2^{-1}}\right) \in \mathcal T\). Then \(s = n^{-1}tn = {\text {diag}}\left( {ca_1^{-1}, a_2, a_1, ca_2^{-1}}\right) \). We compute

$$\begin{aligned}\begin{aligned} \kappa '_1\left( {t*x}\right) = ca_1^{-1}a_2^{-1} \kappa _1'(x). \end{aligned}\end{aligned}$$

So

$$\begin{aligned}\begin{aligned} V_w(\ell ) = \left\{ {(\lambda ,\lambda ') \in A_w(\ell )^\times }\;\big |\;{\lambda _1\lambda _2\lambda '_1= 1}\right\} . \end{aligned}\end{aligned}$$

If \(\theta :A_w(\ell ) \rightarrow \mathbb {C}^\times \) is given by

$$\begin{aligned}\begin{aligned} \theta (\lambda , \lambda ') ={\text {e}}\left( {\frac{n_1\lambda _1+n_2\lambda _2}{p^\ell }}\right) {\text {e}}\left( {\frac{n'_1\lambda '_1}{p^\ell }}\right) , \quad n_1, n_2, n'_1\in \mathbb {Z}, \end{aligned}\end{aligned}$$

then

$$\begin{aligned} S_w\left( {\theta , \ell }\right) = \sum \limits _{\lambda _2\in \left( {\mathbb {Z}/p^\ell \mathbb {Z}}\right) ^\times } {\text {e}}\left( {\frac{n_2\lambda _2}{p^\ell }}\right) S\left( {n_1\lambda _2^{-1}, n'_1; p^\ell }\right) . \end{aligned}$$
(4.5)

Suppose \(x_{a,b}^{v_3} \in X(n)\) has Plücker coordinates

$$\begin{aligned}\begin{aligned} \left( {v_1, v_2, v_3, v_4; v_{14}}\right) = \left( {p^r, p^{r-a}, v_3, p^{r-b}; p^s}\right) . \end{aligned}\end{aligned}$$

Let \(\delta = \left( {p^{r-a}, p^a v_3 + p^{r-b}}\right) \). Then \(v_{14} = \frac{p^{r+a}}{\delta }\). This says \(s-r \le a \le \frac{s}{2}\), \(b\le r\). Then \(\delta = p^{r+a-s}\). Then

$$\begin{aligned}\begin{aligned} u'\left( {x_{a,b}^{v_3}}\right) = \begin{pmatrix} 1 &{} p^{-a} &{} v_3p^{-r} &{} p^{-b}\\ &{} 1 &{} p^{-b}\\ &{}&{}1\\ &{}&{}-p^{-a}&{}1\end{pmatrix} \pmod { U\left( {\mathbb {Z}_p}\right) }. \end{aligned}\end{aligned}$$

Let \(X_{a,b}^{v_3}(n) = \mathcal T * x_{a,b}^{v_3}\), and define

$$\begin{aligned}\begin{aligned} S_{a,b}^{v_3} \left( {n,\psi ,\psi '}\right) = \sum \limits _{x \in X_{a,b}^{v_3}(n)} \psi \left( {u(x)}\right) \psi '\left( {u'(x)}\right) . \end{aligned}\end{aligned}$$

We also let

$$\begin{aligned}\begin{aligned} X_{a,b}(n) = \coprod \limits _{\begin{array}{c} v_3\pmod {p^r}\\ \left( {p^{r-a}, p^a v_3 + p^{r-b}}\right) = p^{r+a-s} \end{array}} X_{a,b}^{v_3}(n), \end{aligned}\end{aligned}$$

and

$$\begin{aligned}\begin{aligned} S_{a,b}\left( {n,\psi ,\psi '}\right) = \sum \limits _{x\in X_{a,b}(n)} \psi \left( {u(x)}\right) \psi '\left( {u'(x)}\right) . \end{aligned}\end{aligned}$$

Let \(x\in X(n)\), with Plücker coordinates \(v_2 = v_{2,x}\), \(v_4 = v_{4,x}\). Then \({\text {ord}}_p(v_{2,x}) = r-a\), \({\text {ord}}_p(v_{4,x}) = r-b\) for some \(s-r \le a \le s/2\), \(0\le b\le r\). So x lies in the \(\mathcal T\)-orbit of \(x_{a,b}^{v_3}\) for some \(v_3\pmod {p^r}\), and hence \(x \in X_{a,b}(n)\). This gives a partition

$$\begin{aligned}\begin{aligned} X\left( {n}\right) = \coprod \limits _{\begin{array}{c} s-r\le a \le s/2\\ 0\le b\le r \end{array}} X_{a,b}(n). \end{aligned}\end{aligned}$$

As \(r\ge \frac{s}{2} \ge a\), \(r\ge b\), we see that \(u(x), u'(x)\) have entries in \(p^{-r}\mathbb {Z}_p/\mathbb {Z}_p\) for all \(x \in X(n)\). Let \(\mathcal S_{a,b}\) be a finite subset of \(\mathbb {Z}_p\) such that

$$\begin{aligned}\begin{aligned} X_{a,b}(n) = \coprod \limits _{v_3\in \mathcal S_{a,b}} X_{a,b}^{v_3} (n). \end{aligned}\end{aligned}$$

By Theorem 2.4, we have

$$\begin{aligned}\begin{aligned} S_{a,b}\left( {n,\psi ,\psi '}\right) = p^{-4r} \left( {1-p^{-1}}\right) ^{-2} \sum \limits _{v_3\in \mathcal S_{a,b}} \left| {X_{a,b}^{v_3}(n)} \right| S_w \left( {\theta _{a,b}^{v_3}; 2r}\right) , \end{aligned}\end{aligned}$$

where

$$\begin{aligned}\begin{aligned} \theta _{a,b}^{v_3} (\lambda ,\lambda ') ={\text {e}}\left( {\frac{m_2u\lambda _2}{p^s}}\right) {\text {e}}\left( {\frac{m_1\hat{v}_2 \lambda _1 + n_1 p^{r-a} \lambda '_1}{p^r}}\right) , \end{aligned}\end{aligned}$$

with \(\hat{v}_2\) and u given as in (3.7) and (3.8). By (4.5), we have

$$\begin{aligned} S_w \left( {\theta _{a,b}^{v_3}; 2r}\right) = \sum \limits _{x, y\in \left( {\mathbb {Z}/p^{2r}\mathbb {Z}}\right) ^\times } {\text {e}}\left( {\frac{m_2u x}{p^s}}\right) {\text {e}}\left( {\frac{m_1\hat{v}_2 \overline{x}y + n_1p^{r-a}\overline{y}}{p^r}}\right) . \end{aligned}$$
(4.6)

Since the size of the \(\mathcal T\)-orbit of \(x_{a,b}^{v_3}\) is bounded by \(p^{a+b}\), we have

$$\begin{aligned} \sum \limits _{v_3\in \mathcal S_{a,b}} \left| {X_{a,b}^{v_3}(n)} \right| \le \left| {\mathcal S_{a,b}} \right| p^{a+b} \le p^{r+a+b}. \end{aligned}$$
(4.7)

We estimate the size of \(S_w \left( {\theta _{a,b}^{v_3}; 2r}\right) \) below. We start by computing the order of \(\hat{v}_2\) and u in (4.6). From (3.7), it is clear that \({\text {ord}}_p\left( {\hat{v}_2}\right) = s-a\). Now we consider \({\text {ord}}_p(u)\). If \(a\ne \frac{s}{2}\), then we have (after putting \(v'_2 = \overline{v'_2} = 1\))

$$\begin{aligned}\begin{aligned} u&= p^{a+r-s} \left( {-p^av_3 + v_4}\right) + \overline{V'} v_3^2p^{2a}\\&= p^{a+r-s} \left( {p^av_3 + v_4}\right) - 2 v_3 p^{2a+r-s} + \overline{V'} v_3^2p^{2a}\\&= p^{2a+2r-2s} V' - 2 v_3 p^{2a+r-s} + \overline{V'} v_3^2p^{2a}\\&= p^{2a} \overline{V'} \left( {p^{2r-2s} V'^2 - 2p^{r-s}v_3 V' + v_3^2}\right) \\&= p^{2a} \overline{V'} \left( {p^{r-s}V'-v_3}\right) ^2\\&= p^{2a} \overline{V'} \left( {p^{-a} v_4}\right) ^2\\&= v_4^2 \overline{V'}. \end{aligned}\end{aligned}$$

So \({\text {ord}}_p(u) = 2\left( {r-b}\right) \). If \(a=\frac{s}{2}\), then (again we set \(v'_2 = \overline{v'_2} = 1\))

$$\begin{aligned} u =&-v_3 p^{2a+r-s} + v_4 p^{a+r-s} = p^{a+r-s} \left( {2v_4 - \left( {p^a v_3 + v_4}\right) }\right) . \end{aligned}$$
(4.8)

This form will be useful in computing \({\text {ord}}_p(u)\), when more conditions are given.

Case I: Suppose \(s<r\). We deduce from (3.7) that \({\text {ord}}_p(v_3) = 0, {\text {ord}}_p(v_4) = a\), so only terms with \(r=a+b\) contribute. When \(a \ne \frac{s}{2}\), we have \({\text {ord}}_p(u) = 2\left( {r-b}\right) = 2a\). When \(a = \frac{s}{2}\), we can still take \({\text {ord}}_p(u) = s = 2a\). So \({\text {ord}}_p(u) = 2a\) always holds.

  1. (i)

    Suppose \(a\le \frac{2s-r}{3}\). Write \(u = p^{2a} u'\). Let

    $$\begin{aligned}\begin{aligned} t = \min \left\{ {{\text {ord}}_p(m_2), {\text {ord}}_p(m_1)+2s-r-3a, {\text {ord}}_p(n_1) + s-3a}\right\} , \end{aligned}\end{aligned}$$

    and

    $$\begin{aligned}\begin{aligned} f(x,y) = p^{-t} \left( {m_2 u' y + \frac{m_1 \hat{v}_2 p^{s-r-2a} x}{y} + \frac{n_1 p^{s-3a}}{x}}\right) = m'_2 y + \frac{m'_1 x}{y} + \frac{n'_1}{x}, \end{aligned}\end{aligned}$$

    where \(m'_1 = m_1 \hat{v}_2 p^{s-r-2a-t}\), \(m'_2 = m_2 u' p^{-t}\), \(n'_1 = n_1p^{s-3a-t}\). Consider the sum

    $$\begin{aligned}\begin{aligned} S = \sum \limits _{x,y\in (\mathbb {Z}/p^{s-2a-t}\mathbb {Z})^\times } e\left( {\frac{f(x,y)}{p^{s-2a-t}}}\right) = p^{2s-4a-4r-2t} S_w\left( {\theta _{a,b}^{v_3}; 2r}\right) . \end{aligned}\end{aligned}$$

    When \(s-2a-t>1\), let \(j\ge 1\) be such that \(2j\le s-2a-t\). Define as in (4.4)

    $$\begin{aligned} D\left( {\mathbb {Z}/p^j\mathbb {Z}}\right)&= \left\{ {(x,y) \in \left( {\mathbb {Z}/p^j\mathbb {Z}}\right) ^\times \times \left( {\mathbb {Z}/p^j\mathbb {Z}}\right) ^\times }\;\big |\;{\nabla f(x,y) \equiv 0\pmod {p^j}}\right\} \\&= \left\{ {\left( {x,y}\right) \in \left( {\mathbb {Z}/p^j\mathbb {Z}}\right) ^\times \times \left( {\mathbb {Z}/p^j\mathbb {Z}}\right) ^\times }\;\big |\;{\begin{array}{l} m'_1 x^2 \equiv n'_1y \pmod {p^j},\\ m'_2 y^2 \equiv m'_1 x \pmod {p^j}\end{array}}\right\} . \end{aligned}$$

    Note that at least one of \(m'_1\), \(m'_2\) and \(n'_1\) is not divisible by p. It then follows that \(D\left( {\mathbb {Z}/p^j\mathbb {Z}}\right) \) is empty unless \({\text {ord}}_p(m_2) = {\text {ord}}_p(m_1)+2s-r-3a = {\text {ord}}_p(n_1) + s-3a = t\). But then

    $$\begin{aligned}\begin{aligned} S = p^{4a+2t-2s} {\text {Kl}}_p(n_{s_\beta s_\alpha , s-2a-t, 3s-6a-3t}, \psi _{m'_1,m'_2}, \psi _{n'_1,0}), \end{aligned}\end{aligned}$$

    with \(p\not \mid m'_1m'_2n'_1\). So it follows from the bound for \({\text {Kl}}_p(n_{s_\beta s_\alpha ,r,s},\psi ,\psi ')\) that

    $$\begin{aligned} \left| {S_w\left( {\theta _{a,b}^{v_3}; 2r}\right) } \right| \ll p^{4r+2a-s+t}. \end{aligned}$$
    (4.9)

    Now suppose \(s-2a-t=1\). If \(p \not \mid m'_1m'_2n'_1\), then it follows by the theorem of Deligne [6, Sommes. trig., 7.1.3] that \(S\ll p\). When p divides some (but not all) of \(m'_1\), \(m'_2\), \(n'_1\), then the sum reduces to a Ramanujan sum, and is easily evaluated that \(S\ll p\) as well. So the bound (4.9) also holds for this case. Remark. Theorem 4.1, itself a generalisation of Deligne’s theorem, also applies to give the same bound. The bounds for \(S_w\left( {\theta _{a,b}^{v_3}; 2r}\right) \) in other cases are obtained analogously, and we shall omit the repetitive computations thereafter.

  2. (ii)

    Suppose \(a>\frac{2s-r}{3}\). Write \(\hat{v}_2 = p^{s-a} \hat{v}'_2\). Let

    $$\begin{aligned}\begin{aligned} t = \min \left\{ {{\text {ord}}_p(m_2)+r+3a-2s, {\text {ord}}_p(m_1), {\text {ord}}_p(n_1)+r-s}\right\} , \end{aligned}\end{aligned}$$

    and

    $$\begin{aligned}\begin{aligned} f(x,y) = p^{-t} \left( {m_2 u p^{r+a-2s} y+\frac{m_1 \hat{v}'_2 x}{y} + \frac{n_1 p^{r-s}}{x}}\right) = m'_2 y + \frac{m'_1 x}{y} + \frac{n'_1}{x}, \end{aligned}\end{aligned}$$

    where \(m'_1 \hat{v}'_2 p^{-t}\), \(m'_2 = m_2 u p^{r+a-2s-t}\), \(n'_1 = n_1 p^{r-s-t}\). Then we have

    $$\begin{aligned}\begin{aligned} S = \sum \limits _{x,y\in (\mathbb {Z}/p^{r+a-s-t}\mathbb {Z})^\times } e\left( {\frac{f(x,y)}{p^{r+a-s-t}}}\right) = p^{2a-2r-2s-2t} S_w\left( {\theta _{a,b}^{v_3}; 2r}\right) . \end{aligned}\end{aligned}$$

    Then we obtain analogously

    $$\begin{aligned}\begin{aligned} \left| {S_w\left( {\theta _{a,b}^{v_3}; 2r}\right) } \right| \ll p^{3r-a+s+t}. \end{aligned}\end{aligned}$$

Note that we have \(\left( {p^{r-a}, p^a \left( {v_3+1}\right) }\right) = p^{r+a-s}\). A necessary condition for this to hold is that \(p^{r-s} \mid v_3+1\). So \(\left| {\mathcal S_{a,b}} \right| \le p^s\). So, from (4.7) we actually have

$$\begin{aligned}\begin{aligned} \sum \limits _{v_3\in \mathcal S_{a,b}} \left| {X_{a,b}^{v_3}(n)} \right| \le p^{s+a+b}. \end{aligned}\end{aligned}$$

Hence

$$\begin{aligned} \begin{aligned} \left| {{\text {Kl}}_p\left( {n,\psi ,\psi '}\right) } \right|&\le \sum \limits _{\begin{array}{c} 0\le a \le s/2\\ b = r-a \end{array}} \left| {S_{a,b}\left( {n,\psi ,\psi '}\right) } \right| \\&\ll \sum \limits _{\begin{array}{c} 0\le a \le s/2\\ b = r-a \end{array}} p^{-4r} p^{s+a+b} S_w\left( {\theta _{a,b}^{v_3};2r}\right) \\&\ll \sum \limits _{\begin{array}{c} 0\le a \le s/2 \end{array}} \min \left\{ {p^{r+2a+{\text {ord}}_p(m_2)}, p^{s-a+\min \left\{ {s+{\text {ord}}_p(m_1), r+{\text {ord}}_p(n_1)}\right\} }}\right\} \\&\ll p^{\frac{r}{3} + \frac{2s}{3} + \frac{2}{3} \min \left\{ {{\text {ord}}_p(m_1)+s, {\text {ord}}_p(n_1)+r}\right\} + \frac{1}{3} {\text {ord}}_p(m_2)}. \end{aligned}\end{aligned}$$

Case II: Suppose \(s=r\). We deduce from (3.7) that when \(a\ne 0\), then \({\text {ord}}_p(v_3) = 0, {\text {ord}}_p(v_4) \ge a\). So, only terms with \(r\ge a+b\) contribute. When \(a\ne \frac{s}{2}\), we have \({\text {ord}}_p(u) = 2\left( {r-b}\right) \). When \(a=\frac{s}{2}\), we still have \({\text {ord}}_p(u) \le s = 2\left( {r-b}\right) \). So \({\text {ord}}_p(u) \le 2\left( {r-b}\right) \) always holds. We compute

$$\begin{aligned}\begin{aligned} \left| {S_w\left( {\theta _{a,b}^{v_3}; 2r}\right) } \right| \ll p^{2r} \min \left\{ {p^{3r-2b+{\text {ord}}_p(m_2)}, p^{2r-a+\min \left\{ {{\text {ord}}_p(m_1), {\text {ord}}_p(n_1)}\right\} }}\right\} . \end{aligned}\end{aligned}$$

Hence

$$\begin{aligned}\begin{aligned}&\left| {{\text {Kl}}_p\left( {n,\psi ,\psi '}\right) } \right| \\&\quad \le \sum \limits _{\begin{array}{c} 0\le a \le r/2\\ b \le r-a \end{array}} \left| {S_{a,b}\left( {n,\psi ,\psi '}\right) } \right| \\&\quad \ll \sum \limits _{\begin{array}{c} 0\le a \le s/2\\ b \le r-a \end{array}} p^{-4r} p^{r+a+b} \left( {p^{2r} \min \left\{ {p^{3r-2b+{\text {ord}}_p(m_2)}, p^{2r-a+\min \left\{ {{\text {ord}}_p(m_1), {\text {ord}}_p(n_1)}\right\} }}\right\} }\right) \\&\quad \ll \sum \limits _{\begin{array}{c} 0\le a \le s/2\\ b \le r-a \end{array}} p^{-r+a+b} \min \left\{ {p^{3r-2b+{\text {ord}}_p(m_2)}, p^{2r-a+\min \left\{ {{\text {ord}}_p(m_1), {\text {ord}}_p(n_1)}\right\} }}\right\} \\&\quad \ll p^{\frac{5r}{3} + \frac{2}{3}\min \left\{ {{\text {ord}}_p(m_1), {\text {ord}}_p(n_1)}\right\} + \frac{1}{3} {\text {ord}}_p(m_2)}. \end{aligned}\end{aligned}$$

Case III: \(2r>s>r\). We consider the following subcases:

  1. (a)

    Suppose \(a=s-r\). Then the condition \(\left( {p^{r-a}, p^a v_3 + p^{r-b}}\right) = 1\) implies \(b=r\). So \({\text {ord}}_p(u) = 0\). We deduce from (3.7) that \(\hat{v}_2 = 0\). So

    $$\begin{aligned}\begin{aligned} \left| {S_w\left( {\theta _{a,b}^{v_3}; 2r}\right) } \right| \ll p^{3r-s} \min \left\{ {p^{r+{\text {ord}}_p(m_2)}, p^{2r+{\text {ord}}_p(n_1)}}\right\} . \end{aligned}\end{aligned}$$
  2. (b)

    Suppose \(s-r< a < \frac{s}{2}\). Then we deduce from (3.7) that \({\text {ord}}_p(v_3) = 0\), \({\text {ord}}_p(v_4) \ge a\). So \(a+b\le r\). Meanwhile, as \(r+a-s < a\), the condition \(\left( {p^{r-a}, p^a v_3 + p^{r-b}}\right) = p^{r+a-s}\) says \(r-b = r+a-s\), which implies \(a+b = s >r\), a contradiction. So there is no contribution from this case.

  3. (c)

    Suppose \(a = \frac{s}{2}\). Again, we deduce from (3.7) that \({\text {ord}}_p(v_3) = 0\), \({\text {ord}}_p(v_4) \ge a\). So, only terms with \(r\ge a+b\) contribute. In this case, we don’t have a good bound for \({\text {ord}}_p(u)\). So

    $$\begin{aligned}\begin{aligned} \left| {S_w\left( {\theta _{a,b}^{v_3}; 2r}\right) } \right| \ll p^{3r+\min \left\{ {\frac{s}{2}+{\text {ord}}_p(m_1), r-\frac{s}{2}+{\text {ord}}_p(n_1)}\right\} }. \end{aligned}\end{aligned}$$

Hence

$$\begin{aligned} \begin{aligned} \left| {{\text {Kl}}_p\left( {n,\psi ,\psi '}\right) } \right| \le&\sum \limits _{\begin{array}{c} s-r\le a \le s/2\\ b \le r-a \end{array}} \left| {S_{a,b}\left( {n,\psi ,\psi '}\right) } \right| \\ \ll&\sum \limits _{\begin{array}{c} a=s-r\\ b=r \end{array}} p^{-4r} p^{r+a+b} \left( {p^{3r-s} \min \left\{ {p^{r+{\text {ord}}_p(m_2)}, p^{2r+{\text {ord}}_p(n_1)}}\right\} }\right) \\&+\sum \limits _{\begin{array}{c} a=s/2\\ b\le r-s/2 \end{array}} p^{-4r} p^{r+a+b} \left( {p^{3r+\min \left\{ {\frac{s}{2}+{\text {ord}}_p(m_1), r-\frac{s}{2}+{\text {ord}}_p(n_1)}\right\} }}\right) \\ \ll&p^{r+\min \left\{ {{\text {ord}}_p(m_2), r+{\text {ord}}_p(n_1)}\right\} } + p^{r+\min \left\{ {\frac{s}{2}+{\text {ord}}_p(m_1), r-\frac{s}{2}+{\text {ord}}_p(n_1)}\right\} }. \end{aligned}\end{aligned}$$

Case IV: \(s = 2r\). In this case, we have \(a = r\), and \(v_3, v_4 = p^{r-b}\) is arbitrary. We deduce from (3.7) that \(\hat{v}_2 = 0\). We consider the following subcases:

  1. (a)

    Suppose \(b=0\). We may assume \(v_4=0\). Then \({\text {ord}}_p(u) = r+{\text {ord}}_p(v_3)\). We compute

    $$\begin{aligned}\begin{aligned} \left| {S_w\left( {\theta _{a,b}^{v_3}; 2r}\right) } \right| \ll p^r \min \left\{ {p^{2r+{\text {ord}}_p(v_3)+{\text {ord}}_p(m_2)}, p^{2r+{\text {ord}}_p(n_1)}}\right\} . \end{aligned}\end{aligned}$$

    Fix \(c\le r\). Then

    $$\begin{aligned}\begin{aligned} \left| {\left\{ {v_3\in \mathcal S_{a,b}}\;\big |\;{{\text {ord}}_p(v_3) = c}\right\} } \right| \le p^{r-c}. \end{aligned}\end{aligned}$$
  2. (b)

    Suppose \(b>0\). Then \({\text {ord}}_p(u) = r-b\). We compute

    $$\begin{aligned}\begin{aligned} \left| {S_w\left( {\theta _{a,b}^{v_3}; 2r}\right) } \right| \ll p^r \min \left\{ {p^{2r-b+{\text {ord}}_p(m_2)}, p^{2r+{\text {ord}}_p(n_1)}}\right\} . \end{aligned}\end{aligned}$$

Hence

$$\begin{aligned} \begin{aligned} \left| {{\text {Kl}}_p\left( {n,\psi ,\psi '}\right) } \right| \le&\sum \limits _{\begin{array}{c} a = r/2\\ b\le r \end{array}} \left| {S_{a,b}\left( {n,\psi ,\psi '}\right) } \right| \\ \ll&\sum \limits _{\begin{array}{c} a=r/2\\ b=0\\ c\le r \end{array}} p^{-4r} p^{r-c+a+b} \left( {p^r \min \left\{ {p^{2r+c+{\text {ord}}_p(m_2)}, p^{2r+{\text {ord}}_p(n_1)}}\right\} }\right) \\&+\sum \limits _{\begin{array}{c} a=r/2\\ b>0 \end{array}} p^{-4r} p^{r+a+b} \left( {p^r \min \left\{ {p^{2r-b+{\text {ord}}_p(m_2)}, p^{2r+{\text {ord}}_p(n_1)}}\right\} }\right) \\ \ll&p^{r+\min \left\{ {{\text {ord}}_p(m_2), r+{\text {ord}}_p(n_1)}\right\} }. \end{aligned} \end{aligned}$$

This finishes the proof of the bound for \({\text {Kl}}_p\left( {n_{s_\alpha s_\beta s_\alpha ,r,s}, \psi , \psi '}\right) \).

4.4 Bound for \({\text {Kl}}_p\left( {n_{s_\beta s_\alpha s_\beta , r, s}, \psi , \psi '}\right) \)

We make use of the decomposition for Kloosterman sums in Sect. 2 to obtain a non-trivial bound for \({\text {Kl}}_p\left( {n_{s_\beta s_\alpha s_\beta , r, s}, \psi , \psi '}\right) \).

Let \(w = s_\beta s_\alpha s_\beta \), and \(n = n_{s_\beta s_\alpha s_\beta , r, s}\). Note that we have \(r\le s\). Then \(\Delta _w = \left\{ {\beta }\right\} \), and

$$\begin{aligned}\begin{aligned} A_w\left( {\ell }\right) = \left( {\mathbb {Z}/p^\ell \mathbb {Z}}\right) ^2 \times \left( {\mathbb {Z}/p^\ell \mathbb {Z}}\right) . \end{aligned}\end{aligned}$$

Let \(t = {\text {diag}}\left( {a_1, a_2, ca_1^{-1}, ca_2^{-1}}\right) \in \mathcal T\). Then \(s = n^{-1}tn = {\text {diag}}\left( {ca_2^{-1}, ca_1^{-1}, a_2, a_1}\right) \). We compute

$$\begin{aligned}\begin{aligned} \kappa '_2 \left( {t * x}\right) = ca_1^{-2} \kappa '_2(x). \end{aligned}\end{aligned}$$

So

$$\begin{aligned}\begin{aligned} V_w(\ell ) = \left\{ {(\lambda ,\lambda ') \in A_w(\ell )^\times }\;\big |\;{\lambda _1^2\lambda _2\lambda '_2 = 1}\right\} . \end{aligned}\end{aligned}$$

If \(\theta : A_w(\ell ) \rightarrow \mathbb {C}^\times \) is given by

$$\begin{aligned}\begin{aligned} \theta (\lambda ,\lambda ') = {\text {e}}\left( {\frac{n_1\lambda _1+n_2\lambda _2}{p^\ell }}\right) {\text {e}}\left( {\frac{n'_2\lambda '_2}{p^\ell }}\right) , \quad n_1, n_2, n'_2\in \mathbb {Z}, \end{aligned}\end{aligned}$$

then

$$\begin{aligned} S_w\left( {\theta , \ell }\right) = \sum \limits _{\lambda _1\in \left( {\mathbb {Z}/p^\ell \mathbb {Z}}\right) ^\times } {\text {e}}\left( {\frac{n_1\lambda _1}{p^\ell }}\right) S \left( {n_2 \lambda _1^{-2}, n'_2; p^\ell }\right) . \end{aligned}$$
(4.10)

Suppose \(x_{a,b}^{v_{23}} \in X(n)\) has Plücker coordinates

$$\begin{aligned}\begin{aligned} \left( {v_{12}, v_{13}, v_{14}, v_{23}}\right) = \left( {p^s, p^{s-a}, p^{s-b}, v_{23}}\right) . \end{aligned}\end{aligned}$$

The condition \((v_{12}, v_{14}) \mid v_{13}^2\) says \(s-b \le 2\left( {s-a}\right) \), that is, \(2a-b\le s\). We also have \(\max \left\{ {a,b}\right\} = r\). Then

$$\begin{aligned}\begin{aligned} u'\left( {x_{a,b}^{v_{23}}}\right) = \begin{pmatrix} 1 &{} &{} -v_{23}p^{-s} &{} p^{-a}\\ &{}1&{}p^{-a}&{}p^{-b}\\ &{}&{}1\\ &{}&{}&{}1\end{pmatrix} \pmod {U\left( {\mathbb {Z}_p}\right) }. \end{aligned}\end{aligned}$$

Let \(X_{a,b}^{v_{23}} (n) = \mathcal T * x_{a,b}^{v_{23}}\), and define

$$\begin{aligned}\begin{aligned} S_{a,b}^{v_{23}} \left( {n, \psi , \psi '}\right) = \sum \limits _{x\in X_{a,b}^{v_{23}} (n)} \psi \left( {u(x)}\right) \psi '\left( {u'(x)}\right) . \end{aligned}\end{aligned}$$

We also let

$$\begin{aligned}\begin{aligned} X_{a,b} (n) = \coprod \limits _{\begin{array}{c} v_{23} \pmod {p^s}\\ \left( {p^{s-r}, v_{23}, p^{-b}v_{23}-p^{s-2a}}\right) =1 \end{array}} X_{a,b}^{v_{23}} (n), \end{aligned}\end{aligned}$$

and

$$\begin{aligned}\begin{aligned} S_{a,b}\left( {n, \psi , \psi '}\right) = \sum \limits _{x\in X_{a,b}(n)} \psi \left( {u(x)}\right) \psi '\left( {u'(x)}\right) . \end{aligned}\end{aligned}$$

Again we have a partition

$$\begin{aligned}\begin{aligned} X(n) = \coprod \limits _{\begin{array}{c} 0\le a, b \le r\\ \max \left\{ {a,b}\right\} = r\\ 2a-b \le s \end{array}} X_{a,b} (n). \end{aligned}\end{aligned}$$

It is clear that \(u(x), u'(x)\) have entries in \(p^{-s}\mathbb {Z}_p/\mathbb {Z}_p\) for all \(x\in X(n)\). Let \(\mathcal S_{a,b}\) be a finite subset of \(\mathbb {Z}_p\) such that

$$\begin{aligned}\begin{aligned} X_{a,b}(n) = \coprod \limits _{v_{23} \in \mathcal S_{a,b}} X_{a,b}^{v_{23}} (n). \end{aligned}\end{aligned}$$

By Theorem 2.4, we have

$$\begin{aligned}\begin{aligned} S_{a,b} \left( {n, \psi , \psi '}\right) = p^{-2s} \left( {1-p^{-1}}\right) ^{-2} \sum \limits _{v_{23}\in \mathcal S_{a,b}} \left| {X_{a,b}^{v_{23}} (n)} \right| S_w \left( {\theta _{a,b}^{v_{23}}; s}\right) , \end{aligned}\end{aligned}$$

where

$$\begin{aligned}\begin{aligned} \theta _{a,b}^{v_{23}} (\lambda ,\lambda ') = {\text {e}}\left( {\frac{m_1 u \lambda _1}{p^r}}\right) {\text {e}}\left( {\frac{m_2 \hat{v}_{14} \lambda _2 + n_2 p^{s-b} \lambda '_2}{p^s}}\right) . \end{aligned}\end{aligned}$$

with \(\hat{v}_{14}\) and u given as in (3.14) and (3.15). By (4.10), we have

$$\begin{aligned} S_w\left( {\theta _{a,b}^{v_{23}}; s}\right) = \sum \limits _{x,y \in \left( {\mathbb {Z}/p^s\mathbb {Z}}\right) ^\times } {\text {e}}\left( {\frac{m_1u\overline{x}}{p^r}}\right) {\text {e}}\left( {\frac{m_2 \hat{v}_{14} x^2 \overline{y} + n_2 p^{s-b} y}{p^s}}\right) . \end{aligned}$$
(4.11)

Since the size of the \(\mathcal T\)-orbit of \(x_{a,b}^{v_{23}}\) is bounded by \(p^{a+b}\), we have

$$\begin{aligned} \sum \limits _{v_{23}\in \mathcal S_{a,b}} \left| {X_{a,b}^{v_{23}} (n)} \right| \le \left| {\mathcal S_{a,b}} \right| p^{a+b} \le p^{s+a}. \end{aligned}$$
(4.12)

We estimate the size of \(S_w \left( {\theta _{a,b}^{v_{23}}; s}\right) \). We start by computing the order of \(\hat{v}_{14}\) and u in (4.11). From (3.14), we see that

$$\begin{aligned} u p^{r-a}&\equiv v_{23} \pmod {p^r},&u p^{r-b}&\equiv -p^{s-a} \pmod {p^r}. \end{aligned}$$
(4.13)

So, if \(a=r\), then \(u\equiv v_{23}\pmod {p^r}\), and if \(b=r\), then \(u\equiv -p^{s-a}\pmod {p^r}\). (Recall that \(\max \left\{ {a,b}\right\} =r\).) Also, we know that

$$\begin{aligned} v_{23} = -p^{s-2a+b} + \beta p^b \end{aligned}$$
(4.14)

for some \(\beta \in \mathbb {Z}\) such that \(\left( {\beta , p^{s-2r+b}}\right) = 1\) (see [12, Sect. 3.2]). Meanwhile, from (3.15), we see that unless \(r=s\), we have \({\text {ord}}_p\left( {\hat{v}_{14}}\right) = 2r-b\).

Case I: Suppose \(r<\frac{s}{2}\). We deduce from (4.14) that \({\text {ord}}_p(v_{23}) = b\). From (4.13), we deduce \(a\ge b\). So we actually have \(a=r\), and then \({\text {ord}}_p(u) = b\).

  1. (i)

    Suppose \(b\le \frac{3r-s}{2}\). Write \(u = p^b u'\). Let

    $$\begin{aligned}\begin{aligned} t = \min \left\{ {{\text {ord}}_p(m_1), {\text {ord}}_p(m_2)+3r-2b-s, {\text {ord}}_p(n_2)+r-2b}\right\} \end{aligned}\end{aligned}$$

    and

    $$\begin{aligned}\begin{aligned} f(x,y) = p^{-t} \left( {\frac{m_1u'}{x} + \frac{m_2 \hat{v}_{14} p^{r-b-s} x^2}{y} + n_2 p^{r-2b}y}\right) = \frac{m'_1}{x} + \frac{m'_2 x^2}{y} + n'_2 y, \end{aligned}\end{aligned}$$

    where \(m'_1 = m_1 u' p^{-t}\), \(m'_2 = m_2 \hat{v}_{14} p^{r-b-s-t}\), \(n'_2 = n_2 p^{r-2b-t}\). Consider the sum

    $$\begin{aligned}\begin{aligned} S = \sum \limits _{x,y\in (\mathbb {Z}/p^{r-b-t}\mathbb {Z})^\times } e\left( {\frac{f(x,y)}{p^{r-b-t}}}\right) = p^{2r-2s-2b-2t} S_w\left( {\theta _{a,b}^{v_{23}}; s}\right) . \end{aligned}\end{aligned}$$

    When \(r-b-t>1\), let \(j\ge 1\) be such that \(2j\le r-b-t\). Define as in (4.4)

    $$\begin{aligned}\begin{aligned} D(\mathbb {Z}/p^j\mathbb {Z}) =&\left\{ {(x,y)\in (\mathbb {Z}/p^j\mathbb {Z})^\times \times (\mathbb {Z}/p^j\mathbb {Z})^\times }\;\big |\;{\nabla f(x,y) \equiv 0\pmod {p^j}}\right\} \\ =&\left\{ {(x,y)\in (\mathbb {Z}/p^j\mathbb {Z})^\times \times (\mathbb {Z}/p^j\mathbb {Z})^\times }\;\big |\;{\begin{array}{l} 2m'_2 x^3 \equiv m'_1 y \pmod {p^j}\\ m'_2 x^2 \equiv n'_2 y^2 \pmod {p^j}\end{array}}\right\} . \end{aligned}\end{aligned}$$

    Note that at least one of \(m'_1\), \(m'_2\) and \(n'_2\) is not divisible by p. It then follows that when p is odd, \(D(\mathbb {Z}/p^j\mathbb {Z})\) is empty unless \({\text {ord}}_p(m_1) = {\text {ord}}_p(m_2)+3r-2b-s = {\text {ord}}_p(n_2)+r-2b = t\). But then

    $$\begin{aligned}\begin{aligned} S = p^{t+b-r} {\text {Kl}}_p(n_{s_\alpha s_\beta , 2r-2b-2t, r-b-t}, \psi _{m'_1,m'_2}, \psi _{0,n'_2}), \end{aligned}\end{aligned}$$

    with \(p\not \mid m'_1m'_2n'_2\). So it follows from the bound for \({\text {Kl}}_p(n_{s_\alpha s_\beta ,r,s},\psi ,\psi ')\) that

    $$\begin{aligned} \left| {S_w\left( {\theta _{a,b}^{v_{23}};s}\right) } \right| \ll p^{2s-r+b+t}. \end{aligned}$$
    (4.15)

    When \(p=2\), \(D(\mathbb {Z}/p^j\mathbb {Z})\) is empty unless \({\text {ord}}_p(m_1)-1 = {\text {ord}}_p(m_2)+3r-2b-s = {\text {ord}}_p(n_2)+r-2b = t\). Then

    $$\begin{aligned}\begin{aligned} S = p^{t+b-r+1} {\text {Kl}}_p(n_{s_\alpha s_\beta , 2r-2b-2t-1, r-b-t}, \psi _{m'_1/2,m'_2}, \psi _{0,n'_2}), \end{aligned}\end{aligned}$$

    with \(p \not \mid (m'_1/2)m'_2n'_2\). Again, from the bound for \({\text {Kl}}_p(n_{s_\alpha s_\beta ,r,s},\psi ,\psi ')\), we see that (4.15) also holds for this case. Now suppose \(r-b-t=1\). If \(p\not \mid m'_1 m'_2 n'_1\), then it again follows from Theorem 4.1 that \(\left| {S} \right| \ll p\). When p divides some (but not all) of \(m'_1, m'_2, n'_1\), then the sum reduces to Gauß sums or Ramanujan sums, and is easily evaluated that \(\left| {S} \right| \ll p\) as well. So the bound (4.15) also holds for this case. The bounds for \(S_w\left( {\theta _{a,b}^{v_{23}}; s}\right) \) in other cases are obtained analogously, and we shall omit the repetitive computations thereafter.

  2. (ii)

    Suppose \(b> \frac{3r-s}{2}\). Write \(\hat{v}_{14} = p^{2r-b} \hat{v}'_{14}\). Let

    $$\begin{aligned}\begin{aligned} t = \min \left\{ {{\text {ord}}_p(m_1)+s+2b-3r, {\text {ord}}_p(m_2), {\text {ord}}_p(n_2)+s-2r}\right\} , \end{aligned}\end{aligned}$$

    and

    $$\begin{aligned}\begin{aligned} f(x,y) = p^{-t} \left( {\frac{m_1up^{s+b-3r}}{x} + \frac{m_2 \hat{v}'_{14} x^2}{y} + n_2 p^{s-2r} y}\right) = \frac{m'_1}{x} + \frac{m'_2 x^2}{y} + n'_2 y, \end{aligned}\end{aligned}$$

    where \(m'_1 = m_1 u p^{s+b-3r-t}\), \(m'_2 = m_2 \hat{v}'_{14} p^{-t}\), \(n'_2 = n_2 p^{s-2r-t}\). Then we have

    $$\begin{aligned}\begin{aligned} S = \sum \limits _{x,y\in (\mathbb {Z}/p^{s+b-2r-t}\mathbb {Z})^\times } e\left( {\frac{f(x,y)}{p^{s+b-2r-t}}}\right) = p^{2b-4r-2t} S_w\left( {\theta _{a,b}^{v_{23}}; s}\right) . \end{aligned}\end{aligned}$$

    Then we obtain analogously

    $$\begin{aligned}\begin{aligned} \left| {S_w\left( {\theta _{a,b}^{v_{23}}; s}\right) } \right| \ll p^{s+2r-b+t}. \end{aligned}\end{aligned}$$

Hence

$$\begin{aligned} \begin{aligned}&\left| {{\text {Kl}}_p\left( {n, \psi , \psi '}\right) } \right| \\&\qquad \le \sum \limits _{\begin{array}{c} a=r\\ 0\le b\le r \end{array}} \left| {S_{a,b}\left( {n,\psi ,\psi '}\right) } \right| \\&\qquad \ll \sum \limits _{\begin{array}{c} a=r\\ 0\le b\le r \end{array}} p^{-2s} p^{s+a} \left| {S_w\left( {\theta _{a,b}^{v_{23}}; s}\right) } \right| \\&\qquad \ll \sum \limits _{\begin{array}{c} a=r\\ 0\le b\le r \end{array}} p^{-2s} p^{s+a} \left( {p^{s-r} \min \left\{ {p^{s+b+{\text {ord}}_p(m_1)}, p^{r-b+\min \left\{ {2r+{\text {ord}}_p(m_2), s+{\text {ord}}_p(n_2)}\right\} }}\right\} }\right) \\&\qquad \ll p^{\frac{s}{2}+\frac{r}{2}+\frac{1}{2}\min \left\{ {2r+{\text {ord}}_p(m_2), s+{\text {ord}}_p(n_2)}\right\} +\frac{1}{2}{\text {ord}}_p(m_1)}. \end{aligned}\end{aligned}$$

Case II: Suppose \(r=\frac{s}{2}\). We consider the following subcases:

  1. (a)

    Suppose \(b=r\). From (4.13), we may assume \(u=0\). We compute

    $$\begin{aligned}\begin{aligned} \left| {S_w\left( {\theta _{a,b}^{v_{23}}; s}\right) } \right| \ll p^{\frac{3s}{2}+\min \left\{ {{\text {ord}}_p(m_2), {\text {ord}}_p(n_2)}\right\} }. \end{aligned}\end{aligned}$$
  2. (b)

    Suppose \(b<r\). Then \(a=r\). From (4.14), we see that \(v_{23} = \left( {\beta -1}\right) p^b\) for some \(\beta \in \mathbb {Z}\) such that \(\left( {\beta , p^b}\right) = 1\). So \({\text {ord}}_p(v_{23}) \ge b\). And from (4.13), we deduce that \({\text {ord}}_p(u) = {\text {ord}}_p(v_{23})\). We compute

    $$\begin{aligned}\begin{aligned} \left| {S_w\left( {\theta _{a,b}^{v_{23}}; s}\right) } \right| \ll p^{s/2} \min \left\{ {p^{s+{\text {ord}}_p(v_{23}) + {\text {ord}}_p(m_1)}, p^{\frac{3s}{2}-b+\min \left\{ {{\text {ord}}_p(m_2), {\text {ord}}_p(n_2)}\right\} }}\right\} . \end{aligned}\end{aligned}$$

Fix \(c\ge b\). Then

$$\begin{aligned}\begin{aligned} \left| {\left\{ {v_{23} \in \mathcal S_{a,b}}\;\big |\;{{\text {ord}}_p(v_{23}) = c}\right\} } \right| \le p^{s-c}. \end{aligned}\end{aligned}$$

Hence

$$\begin{aligned}\begin{aligned}&\left| {{\text {Kl}}_p\left( {n, \psi , \psi '}\right) } \right| \le \sum \limits _{\begin{array}{c} a,b\le r\\ \max \left\{ {a,b}\right\} =r \end{array}} \left| {S_{a,b}\left( {n, \psi , \psi '}\right) } \right| \\&\qquad \ll \sum \limits _{\begin{array}{c} b=r\\ a\le r \end{array}} p^{-2s} p^{s+a} \left( {p^{\frac{3s}{2}+\min \left\{ {{\text {ord}}_p(m_2), {\text {ord}}_p(n_2)}\right\} }}\right) \\&\qquad \quad +\sum \limits _{\begin{array}{c} a=r\\ b<r\\ b\le c\le r \end{array}} p^{-2s} p^{s-c+a+b} \left( {p^{s/2} \min \left\{ {p^{s+{\text {ord}}_p(v_{23}) + {\text {ord}}_p(m_1)}, p^{\frac{3s}{2}-b+\min \left\{ {{\text {ord}}_p(m_2), {\text {ord}}_p(n_2)}\right\} }}\right\} }\right) \\&\qquad \ll p^{\frac{5s}{4} + \frac{1}{2}{\text {ord}}_p(m_1) + \frac{1}{2}\min \left\{ {{\text {ord}}_p(m_2), {\text {ord}}_p(n_2)}\right\} }. \end{aligned}\end{aligned}$$

Case III: Suppose \(s>r>\frac{s}{2}\). We consider the following subcases:

  1. (a)

    Suppose \(b=r\). Then \({\text {ord}}_p(u) = s-a\), and \({\text {ord}}_p(\hat{v}_{14}) = r\). We compute

    $$\begin{aligned}\begin{aligned} \left| {S_w\left( {\theta _{a,b}^{v_{23}}; s}\right) } \right| \ll p^{s-r} \min \left\{ {p^{2s-a+{\text {ord}}_p(m_1)}, p^{r+\min \left\{ {r+{\text {ord}}_p(m_2)}\right\} , s-r+{\text {ord}}_p(n_2)}}\right\} . \end{aligned}\end{aligned}$$
  2. (b)

    Suppose \(b<r\). Then \(a=r\). Then from (4.14) we deduce that \({\text {ord}}_p(v_{23}) = p^{s-2r+b}\), and hence \({\text {ord}}_p(u) = p^{s-2r+b}\). We compute

    $$\begin{aligned}\begin{aligned} \left| {S_w\left( {\theta _{a,b}^{v_{23}}; s}\right) } \right| \ll p^{s-r} \min \left\{ {p^{2s-2r+b+{\text {ord}}_p(m_1)}, p^{r-b+\min \left\{ {2r+{\text {ord}}_p(m_2), s+{\text {ord}}_p(n_2)}\right\} }}\right\} . \end{aligned}\end{aligned}$$

Hence

$$\begin{aligned}\begin{aligned}&\left| {{\text {Kl}}_p\left( {n, \psi , \psi '}\right) } \right| \le \sum \limits _{\begin{array}{c} a,b\le r\\ \max \left\{ {a,b}\right\} =r\\ 2a-b\le s \end{array}} \left| {S_{a,b}\left( {n, \psi , \psi '}\right) } \right| \\&\quad \ll \sum \limits _{\begin{array}{c} b=r\\ a\le r \end{array}} p^{-2s} p^{s+a} \left( {p^{s-r} \min \left\{ {p^{2s-a+{\text {ord}}_p(m_1)}, p^{r+\min \left\{ {r+{\text {ord}}_p(m_2)}\right\} , s-r+{\text {ord}}_p(n_2)}}\right\} }\right) \\&\quad +\sum \limits _{\begin{array}{c} a=r\\ 2r-s\le b < r \end{array}} p^{-2s} p^{s+a} \left( {p^{s-r} \min \left\{ {p^{2s-2r+b+{\text {ord}}_p(m_1)}, p^{r-b+\min \left\{ {2r+{\text {ord}}_p(m_2), s+{\text {ord}}_p(n_2)}\right\} }}\right\} }\right) \\&\quad \ll p^{s-\frac{r}{2}+\frac{1}{2}{\text {ord}}_p(m_1)+\frac{1}{2}\min \left\{ {2r+{\text {ord}}_p(m_2), s+{\text {ord}}_p(n_2)}\right\} }. \end{aligned}\end{aligned}$$

Case IV: \(r=s\). In this case we only have to consider terms with \(b=r\). Indeed, if \(b<r\), then \(a=r\), and then by (4.13), we see that \(u p^{r-b} \equiv -1\pmod {p^r}\), which says \(b=r\), a contradiction. When \(b=r\), we have \({\text {ord}}_p(u) = s-a\), and from (3.15) we may assume \(\hat{v}_{14} = 0\). We compute

$$\begin{aligned}\begin{aligned} \left| {S_w\left( {\theta _{a,b}^{v_{23}}; s}\right) } \right| \ll \min \left\{ {p^{2s-a+{\text {ord}}_p(m_1)}, p^{s+{\text {ord}}_p(n_2)}}\right\} . \end{aligned}\end{aligned}$$

Hence

$$\begin{aligned}\begin{aligned} \left| {{\text {Kl}}_p\left( {n, \psi , \psi '}\right) } \right| \le&\sum \limits _{\begin{array}{c} b=s\\ a\le s \end{array}} \left| {S_{a,b}\left( {n, \psi , \psi '}\right) } \right| \\ \ll&\sum \limits _{\begin{array}{c} b=s\\ a\le s \end{array}} p^{-2s} p^{s+a} \left( {\min \left\{ {p^{2s-a+{\text {ord}}_p(m_1)}, p^{s+{\text {ord}}_p(n_2)}}\right\} }\right) \\ \ll&p^{s+\min \left\{ {{\text {ord}}_p(m_1), {\text {ord}}_p(n_2)}\right\} }. \end{aligned}\end{aligned}$$

This finishes the proof of the bound for \({\text {Kl}}_p\left( {n_{s_\beta s_\alpha s_\beta ,r,s}, \psi , \psi '}\right) \).

4.5 Bounds for \({\text {Kl}}_p\left( {n_{w_0, r, s}, \psi , \psi '}\right) \)

We show that under the stratification introduced in Sect. 2, \({\text {Kl}}_p\left( {n_{w_0, r, s}, \psi , \psi '}\right) \) decomposes into a sum of products of \({\text {GL}}(2)\) Kloosterman sums. So the Kloosterman sum can be bounded using (4.1).

Let \(w = w_0\), and \(n = n_{w_0, s, r}\). Then \(\Delta _{w_0} = \Delta \), and

$$\begin{aligned}\begin{aligned} A_{w_0}(\ell ) = \left( {\mathbb {Z}/p^\ell \mathbb {Z}}\right) ^2 \times \left( {\mathbb {Z}/p^\ell \mathbb {Z}}\right) ^2. \end{aligned}\end{aligned}$$

Let \(t = {\text {diag}}\left( {a_1, a_2, ca_1^{-1}, ca_2^{-1}}\right) \in \mathcal T\). Then \(s = n^{-1}tn = {\text {diag}}\left( {ca_1^{-1}, ca_2^{-1}, a_1, a_2}\right) \). We compute

$$\begin{aligned}\begin{aligned} \kappa '_1(t*x)&= a_2a_1^{-1} \kappa '_1(x),&\kappa '_2(t*x)&= c a_2^{-2} \kappa '_2(x). \end{aligned}\end{aligned}$$

So

$$\begin{aligned}\begin{aligned} V_{w_0}(\ell ) = \left\{ {(\lambda ,\lambda ') \in A_{w_0}(\ell )^\times }\;\big |\;{ \lambda _1\lambda '_1 = 1, \lambda _2\lambda '_2 = 1}\right\} . \end{aligned}\end{aligned}$$

If \(\theta : A_{w_0}(\ell ) \rightarrow \mathbb {C}^\times \) is given by

$$\begin{aligned}\begin{aligned} \theta (\lambda ,\lambda ')&= \prod \limits _{i=1}^2 {\text {e}}\left( {\frac{n_i\lambda _i}{p^\ell }}\right) \prod \limits _{i=1}^2 {\text {e}}\left( {\frac{n'_i\lambda '_i}{p^\ell }}\right) ,&n_1, n_2, n'_1, n'_2\in \mathbb {Z}, \end{aligned}\end{aligned}$$

then

$$\begin{aligned} S_{w_0}\left( {\theta ; \ell }\right) = S \left( {n_1, n'_1; p^\ell }\right) S \left( {n_2, n'_2; p^\ell }\right) . \end{aligned}$$
(4.16)

Suppose \(x_{a,b}^{v_3, v_4, v_{13}}\in X(n)\) has Plücker coordinates

$$\begin{aligned}\begin{aligned} \left( {v_1, v_2, v_3, v_4; v_{12}, v_{13}, v_{14}}\right) = \left( {p^r, p^{r-a}, v_3, v_4; p^s, v_{13}, p^{s-b}}\right) . \end{aligned}\end{aligned}$$

Note that this also says \(r\ge a, s\ge b\). Then

$$\begin{aligned}\begin{aligned} u'\left( {x_{a,b}^{v_3, v_4, v_{13}}}\right) = \begin{pmatrix} 1&{}p^{-a} &{} v_3p^{-r} &{} v_4p^{-r}\\ &{}1&{}v_{13}p^{-s} &{} p^{-b}\\ &{}&{}1\\ &{}&{}-p^{-a}&{}1\end{pmatrix} \pmod {U\left( {\mathbb {Z}_p}\right) }. \end{aligned}\end{aligned}$$

Let \(X_{a,b}^{v_3, v_4, v_{13}}(n) = \mathcal T * x_{a,b}^{v_3, v_4, v_{13}}\), and define

$$\begin{aligned}\begin{aligned} S_{a,b}^{v_3, v_4, v_{13}} \left( {n, \psi , \psi '}\right) = \sum \limits _{x \in X_{a,b}^{v_3, v_4, v_{13}}(n)} \psi \left( {u(x)}\right) \psi '\left( {u'(x)}\right) . \end{aligned}\end{aligned}$$

We also let

$$\begin{aligned}\begin{aligned} X_{a,b} (n) = \coprod \limits _{\begin{array}{c} v_3, v_4 \pmod {p^r}\\ v_{13}\pmod {p^s}\\ \text {conditions} \end{array}} X_{a,b}^{v_3, v_4, v_{13}} (n), \end{aligned}\end{aligned}$$

and

$$\begin{aligned}\begin{aligned} S_{a,b} \left( {n, \psi , \psi '}\right) = \sum \limits _{x \in X_{a,b}(n)} \psi \left( {u(x)}\right) \psi '\left( {u'(x)}\right) . \end{aligned}\end{aligned}$$

We have a partition

$$\begin{aligned}\begin{aligned} X(n) = \coprod \limits _{\begin{array}{c} 0\le a\le r\\ 0\le b\le s \end{array}} X_{a,b}(n). \end{aligned}\end{aligned}$$

Now we consider cases \(r\ge s\) and \(r<s\) separately.

  1. (i)

    Suppose \(r> s\). As \(r\ge a, r\ge s\ge b\), we see that \(u(x), u'(x)\) have entries in \(p^{-r}\mathbb {Z}_p/\mathbb {Z}_p\) for all \(x\in X(n)\). Let \(\mathcal S_{a,b}\) be a finite subset of \(\mathbb {Z}_p^3\) such that

    $$\begin{aligned}\begin{aligned} X_{a,b}(n) = \coprod \limits _{(v_3, v_4, v_{13}) \in \mathcal S_{a,b}} X_{a,b}^{v_3, v_4, v_{13}} (n). \end{aligned}\end{aligned}$$

    By Theorem 2.4, we have

    $$\begin{aligned}\begin{aligned} S_{a,b} \left( {n, \psi , \psi '}\right) = p^{-2r} \left( {1-p^{-1}}\right) ^{-2} \sum \limits _{(v_3, v_4, v_{13}) \in \mathcal S_{a,b}} \left| {X_{a,b}^{v_3, v_4, v_{13}} (n)} \right| S_{w_0} \left( {\theta _{a,b}^{v_3, v_4, v_{13}}; r}\right) , \end{aligned}\end{aligned}$$

    where

    $$\begin{aligned}\begin{aligned} \theta _{a,b}^{v_3, v_4, v_{13}} (\lambda ,\lambda ') = {\text {e}}\left( {\frac{m_1\hat{v}_2\lambda _1 + n_1p^{r-a}\lambda '_1}{p^r}}\right) {\text {e}}\left( {\frac{m_2\hat{v}_{14} + n_2 p^{s-b}}{p^s}}\right) . \end{aligned}\end{aligned}$$

    By (4.16), we have

    $$\begin{aligned}\begin{aligned} S_{w_0} \left( {\theta _{a,b}^{v_3, v_4, v_{13}}; r}\right) = S\left( {m_1\hat{v}_2, n_1\hat{p}^{r-a}; p^r}\right) S\left( {m_2\hat{v}_{14} p^{r-s}, n_2p^{r-b}; p^r}\right) . \end{aligned}\end{aligned}$$

    And we obtain a bound by applying (4.1):

    $$\begin{aligned}\begin{aligned}&\left| {S_{w_0}\left( {\theta _{a,b}^{v_3, v_4, v_{13}}; r}\right) } \right| \\&\qquad \le 4 p^r \left( {\gcd \left( {m_1\hat{v}_2, n_1p^{r-a}, p^r}\right) \gcd \left( {m_2\hat{v}_{14} p^{r-s}, n_2p^{r-b}, p^r}\right) }\right) ^{1/2}. \end{aligned}\end{aligned}$$
  2. (ii)

    Suppose \(s\ge r\). Then \(u(x), u'(x)\) has entries in \(p^{-s}\mathbb {Z}_p/\mathbb {Z}_p\) for all \(x\in X(n)\). Again, by Theorem 2.4 we have

    $$\begin{aligned}\begin{aligned} S_{a,b}\left( {n,\psi ,\psi '}\right)&= p^{-2s} \left( {1-p^{-1}}\right) ^{-2} \\&\quad \times \sum \limits _{(v_3, v_4, v_{13}) \in \mathcal S_{a,b}} \left| {X_{a,b}^{v_3, v_4, v_{13}} (n)} \right| S_{w_0} \left( {\theta _{a,b}^{v_3, v_4, v_{13}}; s}\right) , \end{aligned}\end{aligned}$$

    where

    $$\begin{aligned}\begin{aligned}&\theta _{a,b}^{v_3, v_4, v_{13}} (\lambda ,\lambda ') \\&\qquad = {\text {e}}\left( {\frac{\left( {m_1\hat{v}_2 p^{s-r}}\right) \lambda _1 + \left( {m_2\hat{v}_{14}}\right) \lambda _2 + \left( {n_1p^{s-a}}\right) \lambda '_1 + \left( {n_2p^{s-b}}\right) \lambda '_2}{p^s}}\right) . \end{aligned}\end{aligned}$$

    By (4.16), we have

    $$\begin{aligned}\begin{aligned} S_{w_0} \left( {\theta _{a,b}^{v_3, v_4, v_{13}}; s}\right) = S\left( {m_1\hat{v}_2 p^{s-r}, n_1p^{s-a}; p^s}\right) S\left( {m_2\hat{v}_{14}, n_2p^{s-b}; p^s}\right) . \end{aligned}\end{aligned}$$

    Applying (4.1) gives

    $$\begin{aligned}\begin{aligned}&\left| {S_{w_0} \left( {\theta _{a,b}^{v_3, v_4, v_{13}}; s}\right) } \right| \\&\qquad \le 4 p^s \left( {\gcd \left( {m_1\hat{v}_2p^{s-r}, n_1p^{s-a}, p^s}\right) , \gcd \left( {m_2\hat{v}_{14}, n_2p^{s-b}, p^s}\right) }\right) ^{1/2}. \end{aligned}\end{aligned}$$

Now we give a bound to the size of \({\text {Kl}}_p\left( {n, \psi , \psi '}\right) \). To ease computation, we consider a relaxed bound by ignoring \(\hat{v}_2\) and \(\hat{v}_{14}\).

Suppose \(r>s\). Then the bound says

$$\begin{aligned}\begin{aligned}&\left| {S_{w_0}\left( {\theta _{a,b}^{v_3, v_4, v_{13}}; r}\right) } \right| \\&\quad \le 4 p^r \left( {\gcd \left( {m_1\hat{v}_2, n_1p^{r-a}, p^r}\right) \gcd \left( {m_2\hat{v}_{14} p^{r-s}, n_2p^{r-b}, p^r}\right) }\right) ^{1/2}\\&\quad \le 4 p^r\left( {\left| {n_1n_2} \right| _p^{-1} p^{2r-a-b}}\right) ^{1/2}\\&\quad = 4 p^{2r-\frac{a+b}{2}} \left| {n_1n_2} \right| _p^{-1/2}. \end{aligned}\end{aligned}$$

Note that

$$\begin{aligned}\begin{aligned} \sum \limits _{\left( {v_3, v_4, v_{13}}\right) \in \mathcal S_{a,b}} \left| {X_{a,b}^{v_3, v_4, v_{13}}(n)} \right| \le \left| {\mathcal S_{a,b}} \right| p^{a+b}. \end{aligned}\end{aligned}$$

Hence

$$\begin{aligned} \begin{aligned} \left| {{\text {Kl}}_p\left( {n, \psi , \psi '}\right) } \right|&\le \sum \limits _{\begin{array}{c} a\le r\\ b\le s \end{array}} \left| {S_{a,b} \left( {n,\psi , \psi '}\right) } \right| \\&\le \sum \limits _{\begin{array}{c} a\le r\\ b\le s \end{array}} p^{-2r}\left( {1-p^{-1}}\right) ^{-2} 4\left| {n_1n_2} \right| _p^{-1/2} \left| {\mathcal S_{a,b}} \right| p^{2r+\frac{a+b}{2}}\\&\ll \left| {n_1n_2} \right| _p^{-1/2} \sum \limits _{\begin{array}{c} a\le r\\ b\le s \end{array}} \left| {\mathcal S_{a,b}} \right| p^{\frac{a+b}{2}}. \end{aligned}\end{aligned}$$

So it suffices to give an upper bound to \(\left| {\mathcal S_{a,b}} \right| \). Such bounds were computed in [12, Sect. 5]. Note that we require \(r\ge a+b\) in order to have \(\mathcal S_{a,b}\) nonempty.

Case I: Suppose \(s-r+a\ge 0\).

  1. (a)

    If \(s-2r+2a+b \ge 0\), then \(\left| {\mathcal S_{a,b}} \right| \le p^{r+s-a-b}\).

  2. (b)

    If \(s-2r+2a+b < 0\), then \(\left| {\mathcal S_{a,b}} \right| \le p^{2s-b-\lceil \frac{s-b}{2}\rceil } \le p^{3s/2-b/2}\).

Case II: Suppose \(s-r+a<0\). Then \(\left| {\mathcal S_{a,b}} \right| \le p^{2s-b-\lceil \frac{s-b}{2}\rceil } \le p^{3s/2-b/2}\).

Combining the cases, we obtain

$$\begin{aligned}\begin{aligned} \sum \limits _{\begin{array}{c} a\le r\\ b\le s \end{array}} \left| {\mathcal S_{a,b}} \right| p^{\frac{a+b}{2}}&\le \sum \limits _{\begin{array}{c} r-s\le a\le r\\ 2r-2a-s\le b\le r-a \end{array}} p^{r+s-\frac{a}{2}-\frac{b}{2}} + \sum \limits _{\begin{array}{c} r-s\le a\le r\\ b<2r-2a-s \end{array}} p^{\frac{3s}{2}+\frac{a}{2}} + \sum \limits _{\begin{array}{c} a<r-s\\ b\le s \end{array}} p^{\frac{3s}{2}+\frac{a}{2}}\\&\ll \left( {s+1}\right) p^{\frac{r}{2}+\frac{5s}{4}}.\\ \end{aligned}\end{aligned}$$

Hence, we have for \(r>s\)

$$\begin{aligned} \left| {{\text {Kl}}_p\left( {n, \psi , \psi '}\right) } \right| \ll \left| {n_1n_2} \right| _p^{-1/2} \left( {s+1}\right) p^{\frac{r}{2}+\frac{5s}{4}}. \end{aligned}$$
(4.17)

For \(r\le s\), applying the same argument gives

$$\begin{aligned} \left| {{\text {Kl}}_p\left( {n, \psi , \psi '}\right) } \right| \ll \left| {n_1n_2} \right| _p^{-1/2} \left( {s-r+1}\right) p^{r+\frac{3s}{4}}. \end{aligned}$$
(4.18)

Combining (4.17) and (4.18), we get

$$\begin{aligned} \left| {{\text {Kl}}_p\left( {n, \psi , \psi '}\right) } \right| \ll \left| {n_1n_2} \right| _p^{-1/2} \left( {s+1}\right) p^{\frac{r}{2} + \frac{3s}{4} + \frac{1}{2}\min \left\{ {r,s}\right\} }. \end{aligned}$$
(4.19)

By Proposition 3.3, we can swap the characters, so

$$\begin{aligned} \left| {{\text {Kl}}_p\left( {n, \psi , \psi '}\right) } \right| \ll \left| {m_1m_2} \right| _p^{-1/2} \left( {s+1}\right) p^{\frac{r}{2} + \frac{3s}{4} + \frac{1}{2}\min \left\{ {r,s}\right\} } \end{aligned}$$
(4.20)

as well. Combining (4.19) and (4.20) yields the bound for \({\text {Kl}}_p(n_{w_0,r,s},\psi ,\psi ')\).

4.6 Bounds for global Kloosterman sums

By combining the bounds for local Kloosterman sums \({\text {Kl}}_p(n_{w,r,s}, \psi , \psi ')\), we obtain bounds for global Kloosterman sums, and prove Theorem 1.2.

Proof of Theorem 1.2

The statement for \({\text {Kl}}(n_{{\text {id}}}(c_1,c_2),\psi ,\psi ')\) follows because

$$\begin{aligned}\begin{aligned} {\text {Kl}}_p(n_{{\text {id}},r,s},\psi ,\psi ') = {\left\{ \begin{array}{ll} 1 &{} \text {if } r=s=0,\\ 0 &{} \text {otherwise.}\end{array}\right. } \end{aligned}\end{aligned}$$

Meanwhile, \({\text {Kl}}(n_{s_\alpha }(c_1,1),\psi ,\psi ')\) and \({\text {Kl}}(n_{s_\beta }(1,c_2),\psi ,\psi ')\) are just classical Kloosterman sums. Combining local bounds for classical Kloosterman sums gives the global bounds, which read

$$\begin{aligned}\begin{aligned} \left| {{\text {Kl}}_p\left( {n_{s_\alpha }(c_1,1), \psi , \psi '}\right) } \right|&\ll _\varepsilon (m_1,n_1,c_1)^{1/2} c_1^{1/2+\varepsilon },\\ \left| {{\text {Kl}}_p\left( {n_{s_\beta }(1,c_2), \psi , \psi '}\right) } \right|&\ll _\varepsilon (m_2, n_2, c_2)^{1/2} c_2^{1/2+\varepsilon }. \end{aligned}\end{aligned}$$

For \({\text {Kl}}(n_{s_\alpha s_\beta }(c_1,c_2),\psi ,\psi ')\) and \({\text {Kl}}(n_{s_\beta s_\alpha }(c_1,c_2),\psi ,\psi ')\), we again combine the local bounds given in Theorem 1.1 yields the global bounds.

For \({\text {Kl}}(n_{s_\alpha s_\beta s_\alpha }(c_1,c_2),\psi ,\psi ')\) and \({\text {Kl}}(n_{s_\beta s_\alpha s_\beta }(c_1,c_2),\psi ,\psi ')\), the situation is more complicated, since the shapes of the local bounds depend on the relative size of rs. Therefore, in order to obtain a global bound, we have to find an expression for the local bound that works for all values of rs.

We start with \({\text {Kl}}(n_{s_\alpha s_\beta s_\alpha }(c_1,c_2),\psi ,\psi ')\). For \(s\le r\), we have

$$\begin{aligned}\begin{aligned} \left| {{\text {Kl}}_p\left( {n_{s_\alpha s_\beta s_\alpha , r,s}, \psi , \psi '}\right) } \right|&\ll p^{\frac{r}{3}+\frac{4s}{3}+\frac{2}{3}\min \left\{ {{\text {ord}}_p(m_1),{\text {ord}}_p(n_1)}\right\} +\frac{1}{3} {\text {ord}}_p(m_2)} \\&\le p^{\frac{4r}{3}+\frac{s}{3}+\frac{2}{3}\min \left\{ {{\text {ord}}_p(m_1),{\text {ord}}_p(n_1)}\right\} +\frac{1}{3} {\text {ord}}_p(m_2)}. \end{aligned}\end{aligned}$$

For \(r<s<2r\), we have

$$\begin{aligned}\begin{aligned} \left| {{\text {Kl}}_p\left( {n_{s_\alpha s_\beta s_\alpha , r,s}, \psi , \psi '}\right) } \right| \ll p^{r+{\text {ord}}_p(m_2)} + p^{r+\frac{s}{2}+\min \left\{ {{\text {ord}}_p(m_1),{\text {ord}}_p(n_1)}\right\} }, \end{aligned}\end{aligned}$$

and we have inequalities

$$\begin{aligned}\begin{aligned}&p^{r+{\text {ord}}_p(m_2)} + p^{r+\frac{s}{2}+\min \left\{ {{\text {ord}}_p(m_1),{\text {ord}}_p(n_1)}\right\} } \\&\quad \le p^{r+{\text {ord}}_p(m_2)} + p^{\frac{4r}{3}+\frac{s}{3}+\min \left\{ {{\text {ord}}_p(m_1),{\text {ord}}_p(n_1)}\right\} },\\&p^{r+{\text {ord}}_p(m_2)} + p^{r+\frac{s}{2}+\min \left\{ {{\text {ord}}_p(m_1),{\text {ord}}_p(n_1)}\right\} } \\&\quad \le p^{s+{\text {ord}}_p(m_2)} + p^{\frac{r}{6}+\frac{4s}{3}+\min \left\{ {{\text {ord}}_p(m_1),{\text {ord}}_p(n_1)}\right\} }. \end{aligned}\end{aligned}$$

For \(s=2r\), we have

$$\begin{aligned}\begin{aligned} \left| {{\text {Kl}}_p\left( {n_{s_\alpha s_\beta s_\alpha , r,s}, \psi , \psi '}\right) } \right| \ll p^{r+{\text {ord}}_p(m_2)} = p^{\frac{s}{2}+{\text {ord}}_p(m_2)}. \end{aligned}\end{aligned}$$

So we can conclude for \(0\le s \le 2r\) that

$$\begin{aligned}\begin{aligned} \left| {{\text {Kl}}_p\left( {n_{s_\alpha s_\beta s_\alpha , r,s}, \psi , \psi '}\right) } \right| \ll p^{\min \left\{ {\frac{4r}{3}+\frac{s}{3}, \frac{r}{3}+\frac{4s}{3}}\right\} +{\text {ord}}_p(m_2)+\min \left\{ {{\text {ord}}_p(m_1),{\text {ord}}_p(n_1)}\right\} }. \end{aligned}\end{aligned}$$

Since we may assume from (4.6) that \({\text {ord}}_p(m_1), {\text {ord}}_p(n_1)\le r\), and \({\text {ord}}_p(m_2) \le s\), we have

$$\begin{aligned}\begin{aligned} \left| {{\text {Kl}}\left( {n_{s_\alpha s_\beta s_\alpha }(c_1,c_2), \psi ,\psi '}\right) } \right| \ll _\varepsilon (m_1,n_1,c_1) (m_2,c_2) (c_1,c_2) (c_1c_2)^{1/3+\varepsilon } \end{aligned}\end{aligned}$$

for every \(\varepsilon >0\).

Now we consider \({\text {Kl}}(n_{s_\beta s_\alpha s_\beta }(c_1,c_2),\psi ,\psi ')\). For \(r\le s/2\), we have

$$\begin{aligned}\begin{aligned} \displaystyle \left| {{\text {Kl}}_p\left( {n_{s_\beta s_\alpha s_\beta ,r,s}, \psi , \psi '}\right) } \right|&\ll p^{\frac{3r}{2}+\frac{s}{2}+\frac{1}{2}{\text {ord}}_p(m_1)+\frac{1}{2}\min \left\{ {{\text {ord}}_p(m_2),{\text {ord}}_p(n_2)}\right\} } \\&\le p^{-\frac{r}{2}+\frac{3s}{2}+\frac{1}{2}{\text {ord}}_p(m_1)+\frac{1}{2}\min \left\{ {{\text {ord}}_p(m_2),{\text {ord}}_p(n_2)}\right\} }. \end{aligned}\end{aligned}$$

For \(s/2<r<s\), we have

$$\begin{aligned}\begin{aligned} \displaystyle \left| {{\text {Kl}}_p\left( {n_{s_\beta s_\alpha s_\beta ,r,s}, \psi , \psi '}\right) } \right|&\ll p^{-\frac{r}{2}+\frac{3s}{2}+\frac{1}{2}{\text {ord}}_p(m_1)+\frac{1}{2}\min \left\{ {{\text {ord}}_p(m_2),{\text {ord}}_p(n_2)}\right\} } \\&\le p^{\frac{3r}{2}+\frac{s}{2}+\frac{1}{2}{\text {ord}}_p(m_1)+\frac{1}{2}\min \left\{ {{\text {ord}}_p(m_2),{\text {ord}}_p(n_2)}\right\} }. \end{aligned}\end{aligned}$$

For \(s=r\), we have

$$\begin{aligned}\begin{aligned} \left| {{\text {Kl}}_p\left( {n_{s_\beta s_\alpha s_\beta ,r,s}, \psi , \psi '}\right) } \right| \ll p^{s+\min \left\{ {{\text {ord}}_p(m_1),{\text {ord}}_p(n_2)}\right\} } = p^{r+\min \left\{ {{\text {ord}}_p(m_1),{\text {ord}}_p(n_2)}\right\} }. \end{aligned}\end{aligned}$$

So we can conclude for \(0\le r\le s\) that

$$\begin{aligned}\begin{aligned} \left| {{\text {Kl}}_p\left( {n_{s_\beta s_\alpha s_\beta ,r,s}, \psi , \psi '}\right) } \right| \ll p^{\min \left\{ {\frac{3r}{2}+\frac{s}{2}, -\frac{r}{2}+\frac{3s}{2}}\right\} +{\text {ord}}_p(m_1)+\frac{1}{2}\min \left\{ {{\text {ord}}_p(m_2),{\text {ord}}_p(n_2)}\right\} }. \end{aligned}\end{aligned}$$

Since we may assume from (4.11) that \({\text {ord}}_p(m_1)\le r\), and \({\text {ord}}_p(m_2), {\text {ord}}_p(n_2) \le s\), we have

$$\begin{aligned}\begin{aligned} \left| {{\text {Kl}}\left( {n_{s_\beta s_\alpha s_\beta }(c_1,c_2), \psi ,\psi '}\right) } \right| \ll _\varepsilon (m_1,c_1) (m_2,n_2,c_2) (c_1^2,c_2) c_1^{-1/2} c_2^{1/2} (c_1c_2)^\varepsilon \end{aligned}\end{aligned}$$

for every \(\varepsilon >0\).

For \({\text {Kl}}(n_{w_0}(c_1,c_2),\psi ,\psi ')\), the local bound again consists of a single expression, so the local bounds given in Theorem 1.1 can be combined directly to give the stated global bound.

5 Symplectic Poincaré series

In this section, we compute the Fourier coefficients of symplectic Poincaré series, in terms of auxiliary Kloosterman sums.

Definition

  1. (a)

    Let \(n\in N\left( {\mathbb {Q}_p}\right) \), and \(\psi _p, \psi '_p\) be characters of \(U\left( {\mathbb {Q}_p}\right) \) which are trivial on \(U\left( {\mathbb {Z}_p}\right) \). Then the local auxiliary Kloosterman sum is defined to be

    $$\begin{aligned}\begin{aligned} \underline{{\text {Kl}}}_p\left( {n, \psi _p, \psi '_p}\right) = \sum \limits _{\begin{array}{c} x\in X(n)\\ x = b_1 n b_2 \end{array}} \psi _p\left( {b_1}\right) \psi '_p\left( {b_2}\right) \end{aligned}\end{aligned}$$

    if \(\psi _p\left( {nun^{-1}}\right) = \psi '_p\left( {u}\right) \) for \(u \in \overline{U}_n\left( {\mathbb {Q}_p}\right) \), and zero otherwise. We say \(\underline{{\text {Kl}}}_p\left( {n, \psi _p, \psi '_p}\right) \) is well-defined if \(\psi _p\left( {nun^{-1}}\right) = \psi '_p\left( {u}\right) \) for \(u \in \overline{U}_n\left( {\mathbb {Q}_p}\right) \).

  2. (b)

    Let \(n\in N\left( {\mathbb {Q}}\right) \), and \(\psi = \prod \limits _p \psi _p\), \(\psi ' = \prod \limits _p \psi '_p\) be characters of \(U\left( {\mathbb {A}}\right) \) which are trivial on \(\prod \limits _p U\left( {\mathbb {Z}_p}\right) \). Then the global auxiliary Kloosterman sum is defined to be

    $$\begin{aligned}\begin{aligned} \underline{{\text {Kl}}}\left( {n, \psi , \psi '}\right) = \prod \limits _p \underline{{\text {Kl}}}_p\left( {n, \psi _p, \psi '_p}\right) . \end{aligned}\end{aligned}$$

We first show that the auxiliary Kloosterman sums are well-defined.

Proposition 5.1

[9, Proposition 1.3] Let \(G = {\text {Sp}}\left( {2r, \mathbb {Q}_p}\right) \), \(n \in N\left( {\mathbb {Q}_p}\right) \), and \(x \in X(n)\), with Bruhat decomposition \(x = b_1 n b_2\), with \(b_1, b_2 \in U\left( {\mathbb {Q}_p}\right) \). Let \(\psi , \psi '\) be characters of \(U\left( {\mathbb {Q}_p}\right) \) which are trivial on \(U\left( {\mathbb {Z}_p}\right) \). Then the quantity \(\psi \left( {b_1}\right) \psi '\left( {b_2}\right) \) is well-defined as a function on X(n) if \(\psi \left( {nun^{-1}}\right) = \psi '\left( {u}\right) \) for \(u \in \overline{U}_n\left( {\mathbb {Q}_p}\right) \).

Proof

Suppose \(\psi \left( {nun^{-1}}\right) = \psi '\left( {u}\right) \) for all \(u \in \overline{U}_n\left( {\mathbb {Q}_p}\right) \). Let \(x = b_1 n b_2 = b'_1 n b'_2\) be two Bruhat decompositions. This says \(b'_1 = \gamma b_1\) for some \(\gamma \in U(\mathbb {Z}_p)\), and \(b'_2 = b_2 \delta \) for some \(\delta \in U_n\left( {\mathbb {Z}_p}\right) \). Then we have

$$\begin{aligned}\begin{aligned} U\left( {\mathbb {Z}_p}\right) b_1 n b_2 \delta ^{-1} = U\left( {\mathbb {Z}_p}\right) b_1 n b_2, \end{aligned}\end{aligned}$$

which implies \(b_2 {b'_2}^{-1} = b_2 \delta ^{-1} b_2^{-1} \in \overline{U}_n\left( {\mathbb {Q}_p}\right) \). Now, from the equivalence of Bruhat decompositions, we deduce that

$$\begin{aligned}\begin{aligned} U\left( {\mathbb {Z}_p}\right) n b_2 {b'_2}^{-1} n^{-1} U_n\left( {\mathbb {Z}_p}\right) = U\left( {\mathbb {Z}_p}\right) b_1^{-1} b'_1 U_n\left( {\mathbb {Z}_p}\right) , \end{aligned}\end{aligned}$$

which implies \(\psi ' \left( {b_2{b'_2}^{-1}}\right) = \psi \left( {nb_2{b'_2}^{-1} n^{-1}}\right) = \psi \left( {b_1^{-1} b'_1}\right) \).

Proposition 5.2

If \(\underline{{\text {Kl}}}_p\left( {n, \psi _p, \psi '_p}\right) \) is well-defined, then \(\underline{{\text {Kl}}}_p\left( {n, \psi _p, \psi '_p}\right) = {\text {Kl}}_p\left( {n, \psi _p, \psi '_p}\right) \).

Proof

Trivial.

The Fourier coefficients \(P_{\psi , \psi '} (g)\) can be evaluated using the following theorem of Friedberg:

Theorem 5.3

[9, Theorem A] The Fourier coefficient \(P_{\psi , \psi '} (g)\) of \({\text {Sp}}(2r)\) Poincaré series is given by

$$\begin{aligned}\begin{aligned} P_{\psi , \psi '} (g) = \sum \limits _{w\in W} \sum \limits _{\begin{array}{c} n \in N\left( {\mathbb {Q}}\right) \\ w(n) = w \end{array}} \underline{{\text {Kl}}}\left( {n, \psi , \psi '}\right) \int _{U_w\left( {\mathbb {R}}\right) } \mathcal F_\psi \left( {n u_1 y}\right) \overline{\psi '} \left( {u_1}\right) du_1. \end{aligned}\end{aligned}$$

Remark

In [9], the statement concerns \({\text {GL}}(r)\) Poincaré series, but the proof also works for \({\text {Sp}}(2r)\) Poincaré series.

5.1 \({\text {Sp}}(4)\) Poincaré series

Let \(G = {\text {Sp}}\left( {4, \mathbb {Q}_p}\right) \), and \(\psi = \psi _{m_1, m_2}\), \(\psi ' = \psi _{n_1, n_2}\). We give a table of conditions for auxiliary \({\text {Sp}}(4)\) Kloosterman sums \(\underline{{\text {Kl}}}_p\left( {n_{w, r, s}, \psi , \psi '}\right) \) to be well-defined.

w

Well-definedness conditions

w

Well-definedness conditions

\({\text {id}}\)

\(m_1=n_1, m_2=n_2\)

\(s_\beta s_\alpha \)

\(m_1=n_2=0\)

\(s_\alpha \)

\(m_2=n_2=0\)

\(s_\alpha s_\beta s_\alpha \)

\(n_2 = m_2 p^{2r-2s}\)

\(s_\beta \)

\(m_1=n_1=0\)

\(s_\beta s_\alpha s_\beta \)

\(n_1 = m_1 p^{s-2r}\)

\(s_\alpha s_\beta \)

\(m_2=n_1=0\)

\(w_0\)

-

Remark

From this table, we see that not all Kloosterman sums \({\text {Kl}}_p\left( {n, \psi , \psi '}\right) \) correspond to a well-defined auxiliary Kloosterman sum \(\underline{{\text {Kl}}}_p\left( {n, \psi , \psi '}\right) \).

From the well-definedness conditions for \(\underline{{\text {Kl}}}_p(n_{w,r,s},\psi ,\psi ')\), we see that when \(\psi = \psi _{m_1,m_2}\), \(\psi ' = \psi _{n_1,n_2}\) are non-degenerate, i.e. \(m_1m_2,n_1n_2\ne 0\), then

$$\begin{aligned}\begin{aligned} w = {\text {id}}, s_\alpha s_\beta s_\alpha , s_\beta s_\alpha s_\beta , w_0 \in W \end{aligned}\end{aligned}$$

are the only Weyl elements that contribute to the Fourier coefficient \(P_{\psi ,\psi '}(g)\).