1 Introduction

The purpose of this paper is to give good upper bounds for the sums

$$\begin{aligned} K_n(A,B; p^k)=\sum _{X\in GL_n(\mathbb {Z}/p^k\mathbb {Z})} \psi ( (AX+X^{-1}B)/p^k), \end{aligned}$$
(1)

with given \(A,B\in \mathbb {Z}^{n\times n}\) where for an \(n\times n\) matrix \(X\) we let \(\psi (X)=e^{2 \pi i {{\,\textrm{Tr}\,}}X}\). Note that \(\psi (XY)=\psi (YX)\). A good upper bound may mean different things, it could be optimal, or somewhat crude but easily usable, and we will provide both.

These sums themselves are of independent interest. They arise naturally in certain equi-distribution problems and are natural analogues of the classical Kloosterman sums \(\sum _{x\ (p)} e^{2\pi i (ax+bx^{-1})/p}\).

In our earlier paper [6] we dealt with the case \(k=1\). As usual upper bounds modulo a prime require the heavy machinery of some type of Weil cohomology. For higher prime powers the methods are usually of a very different sort, based on Taylor expansions, and occasionally referred to as the stationary phase [3]. One such example is provided by Salie’s explicit evaluation of the one-dimensional case in [14]. We will prove that such a result holds generically even for \(n\times n\) matrices, but generic here is much more restricted than being non-zero, or even invertible.

We summarize the main results. Clearly if \(p\) divides all matrix entries of \(A\) and \(B\) then one may clear appropriate powers and either arrive at a trivial sum, or one where one of \(A\) or \(B\) is different from the zero-matrix. From now on we will assume that this is the case and denote it as

$$\begin{aligned} \gcd (A,B,p)=1. \end{aligned}$$

First we have the following reductions to a counting argument.

Proposition 1.1

Assume \(\gcd (A,B,p)=1\) and that \(k>1\).

  1. 1.

    If \(k = 2\,l\) then

    $$\begin{aligned} K_n(A,B; p^k)= \sum _{X} \psi ( (AX+X^{-1}B)/p^k) \end{aligned}$$

    where the sum is over \(X\in GL_n(\mathbb {Z}/p^k\mathbb {Z})\) satisfying \(XAX\equiv B \mod p^l\).

  2. 2.

    If \(k=2\,l+1\) then

    $$\begin{aligned} K_n(A,B; p^k)=\frac{1}{p^{n^2}} \sum _{X} \psi ( (AX+X^{-1}B)/p^k)S_{A,B}(X;p) \end{aligned}$$

    where the sum is over \(X\in GL_n(\mathbb {Z}/p^k\mathbb {Z})\), \(XAX\equiv B \mod p^l\), and where

    $$\begin{aligned} S_{A,B}(X;p) = \sum _{U {{\,\textrm{mod}\,}}p} \psi ((SU+TU^2)/p), \end{aligned}$$

    with \(S=(AX-X^{-1}B)/p^l\) and \(T=AX\).

Remark 1.2

Note that if \(p \ne 2\), then \(S_{A,B}(X;p)\) is either 0 or equals \(G(AX;p)\), for the generalized Gauss sum

$$\begin{aligned} G(T;p)=\sum _{U \, {{\,\textrm{mod}\,}}\,p} \psi (TU^2/p) \end{aligned}$$

See below in Sect. 3.

The case when \(A\) is invertible modulo \(p\) is special and can be made more explicit, in complete analogue of Salie’s evaluation of the classical one dimensional Kloosterman sum [14].

Corollary 1.3

  1. 1.

    \(K_n(A,B;p^k)=0\) unless the invariant factors, (the Smith normal forms), of \(A\) and \(B\) agree up to \({{\,\textrm{mod}\,}}p^l\), where \(l = [k/2]\). In particular if \(k>1\) and \(\gcd (\det A,p)=1\) (ie. A is invertible mod p) then \(K_n(A,B;p^k)=0\) unless \(\gcd (\det B,p)=1\) as well.

  2. 2.

    Assume that \(\gcd (\det (AB),p)=1\) and that \(AB\) is regular semisimple \({{\,\textrm{mod}\,}}p\), (i.e. all eigenvalues are different). Then

    1. (a)

      If \(k = 2\,l\) then

      $$\begin{aligned} K_n(A,B; p^k)= p^{k n^2/2}\sum _{Y} \psi ( 2Y/p^k) \end{aligned}$$

      where the sum is over \(Y\in GL_n(\mathbb {Z}/p\mathbb {Z})\), \(Y^2\equiv AB \mod p^k\).

    2. (b)

      If \(k = 2\,l+1\) then

      $$\begin{aligned} K_n(A,B; p^k)= \zeta p^{k n^2/2}\sum _{Y} \psi (2Y/p^k) \end{aligned}$$

      where the sum is over \(Y\in GL_n(\mathbb {Z}/p\mathbb {Z})\), \(Y^2\equiv AB \mod p^k\), and where \(\zeta \) is a \(p\)-th root of unity.

    3. (c)

      In particular we have \(|K_n(A,B; p^k)| \le 2^n p^{k n^2/2} \).

Note that in the regular semisimple case we have square root cancellation. Also note that we do not assume that the eigenvalues of \(AB\) are defined over \(\mathbb {F}_p\). Finally, this condition is generic, its complement is a Zariski closed set.

We now return to the non-generic cases. By Proposition 1.1 in order to bound the sums in 1 we need to bound

$$\begin{aligned} N^*(A,B;p^l) = \# \{X \in GL_n(\mathbb {Z}/p^l\mathbb {Z})|AX\equiv X^{-1}B {{\,\textrm{mod}\,}}p^l \}\ . \end{aligned}$$
(2)

Note that

$$\begin{aligned} N^*(A,B;p^l) = N^*(XA,BX^{-1};p^l) \end{aligned}$$

for any \(X\in GL_n(\mathbb {Z}/p^l\mathbb {Z})\) and so if \(N^*(A,B;p^l)\) is not zero, then it equals \(N^*(C,C;p^l)\) for \(C=XA\), for any \(X\) for which \(XA\equiv BX^{-1}\pmod {p^l}\).

For \(l=1\) it is possible (see Thm. 1.5 below) to describe \(N^*(C,C;p)\) explicitly. This allows one to show that for a well defined exponent \(e=e_C\) we have \(m_n<N^*(C,C;p)/p^e<M_n\) for some absolute constants \(m_n,M_n\) that depend on \(n\) only. The exponent \(e=e_C\) itself depends on the combinatorial type of the Jordan decomposition of \(C\) over an algebraic closure of \(\mathbb {F}_p\).

As a first step towards this goal we have the following reduction.

Proposition 1.4

For any \(C\in \mathbb {F}_p^{n\times n}\) let \(m_{C^2}(x)=\prod _{j=1}^r f_j(x)^{k_j} \in \mathbb {F}_p[x]\) be the minimal polynomial of \(C^2\), where the \(f_j(x)\in \mathbb {F}_p[x]\) are irreducible. Let \(V_j=\ker f_j(C^2)^{k_j}\) and \(C_j=C_{|V_j}\) be the restriction of \(C\) to \(V_j\). Then we have

$$\begin{aligned} N^*(C,C;p)=\prod _{j=1}^r N^*(C_j,C_j;p)\ . \end{aligned}$$
(3)

In case of a primary minimal polynomial the explicit counting formulas depend on the value of \(\left( \frac{x}{f(x)}\right) \) where as usual [13] \(\left( \frac{x}{f(x)}\right) \) is the quadratic residue symbol, defined for an irreducible polynomial \(f\),

$$\begin{aligned} \left( {\frac{{g(x)}}{{f(x)}}} \right) = \left\{ {\begin{array}{*{20}c}1 &{} {{\text { if }}g{\text { is a non - zero square in }}{\mathbb {F}}_{p} [x]/(f),}\\ { - 1} &{} {{\text {if }}g{\text { is a not a square in }}{\mathbb {F}}_{p} [x]/(f),} \\ 0 &{} {{\text {if }}f|g}\\ \end{array} } \right. \end{aligned}$$

To state our main result we need to introduce further notation the details of which are presented in Sect. 3. For any partition \(\lambda =[n_1,...,n_k]\) we let \(N_\lambda \) be a nilpotent matrix with Jordan blocks of size \(n_1,...,n_k\). For \(q=p^d\) let \(\mathbb {F}_q\) be the field with \(q\) elements, and \(Z_{GL_{|\lambda |}(\mathbb {F}_q)}(N_\lambda )\) be the centralizer of \(N_\lambda \) in the group \(GL_{|\lambda |}(\mathbb {F}_q)\) where \(|\lambda |:=n_1+\cdots +n_k\). (See Proposition 3.1).

If we have partitions \(\mu , \nu \) we let \(\lambda =\mu +\nu \) be their join. Also the dual partition \(\lambda '\) of \(\lambda \) may be defined via the matrix \(N_\lambda \) as \(\lambda '=[d_1,...,d_k]\), \(d_1\ge d_2\ge \dots \ge d_k\) where

$$\begin{aligned} \dim \ker N_\lambda ^j=d_1+\cdots +d_j. \end{aligned}$$

Theorem 1.5

Assume that \(p\ne 2\). Let \(C\) be a \(m \times m\) matrix and assume that the minimal polynomial of \(C^2\) is of the form \(f(x)^k\), where \(f(x)\in \mathbb {F}_p[x]\) is irreducible. We let \(q=p^d\), \(d={\deg f}\), and \(\lambda \) be the partition of m/d with dual \(\lambda '=[d_1,...,d_k]\), \(d_1\ge d_2\ge \dots \ge d_k\) where

$$\begin{aligned} \frac{1}{d} \dim _{\mathbb {F}_p} \ker f^j(C^2)=d_1+\cdots +d_j. \end{aligned}$$
  1. 1.

    If \(\left( \frac{x}{f(x)}\right) =1\) we have \(N^*(C,C,p)= N_+^*(\lambda ,q)\), where \(q=p^{\deg f}\) and

    $$\begin{aligned} N^*_+(\lambda ,q)=\sum _{\lambda =\mu +\nu } \frac{\# Z_{GL_{|\lambda |}(\mathbb {F}_q)}(N_\lambda )}{\# Z_{GL_{|\mu |}(\mathbb {F}_q)}(N_\mu ) \# Z_{GL_{|\nu |}(\mathbb {F}_q)}(N_\nu )}\ . \end{aligned}$$
    (4)
  2. 2.

    If \(\left( \frac{x}{f(x)}\right) =-1\) then all \(d_j\) will be even, and \(\lambda =\mu +\mu \) for some partition \(\mu \). Then we have \(N^*(C,C,p)= N^*_-(\lambda ,q)\) where

    $$\begin{aligned} N^*_-(\lambda ,q)=\frac{\# Z_{GL_{|\lambda |}(\mathbb {F}_q)}(N_\lambda )}{\# Z_{GL_{|\mu |}(\mathbb {F}_{q^2})}(N_\mu )}\ . \end{aligned}$$
    (5)
  3. 3.

    If \(f(x)=x\) we have

    $$\begin{aligned} N^*(C,C;p)= p^{\sum _{j=1}^k d_j^2}\prod _{j=1}^k\prod _{t_j=1}^{r_j}\left( 1-\frac{1}{p^{t_j}}\right) \end{aligned}$$
    (6)

    where we put \(r_j:=d_j-d_{j+1}\) (\(j=1,\dots ,k\) where \(d_{k+1}:=0\)), ie. \(r_j\) is the number of blocks of size \(j\times j\) (\(j=1,\ldots ,k\)) in the Jordan normal form of C.

We are now ready to state our main bounds. For a refined statement we need the stable rank of a matrix \(A\) defined as \({{\,\textrm{rk}\,}}_\infty A = \lim _{m\rightarrow \infty } {{\,\textrm{rk}\,}}A^m\).

Theorem 1.6

Assume that \(p\ne 2\) and \(l\ge 1\).

  1. 1.

    Let \(r={{\,\textrm{rk}\,}}(C {{\,\textrm{mod}\,}}p)\), \(r_\infty ={{\,\textrm{rk}\,}}_\infty (C {{\,\textrm{mod}\,}}p)\),

    $$\begin{aligned} N^*(C,C;p^l)\le 2^{ r_\infty } p^{ e(l,n,r,r_\infty )} \end{aligned}$$

    where

    $$\begin{aligned} e(l,n,r,r_\infty )= (n-r)^2 + (r-r_\infty )^2 + r_\infty ^2/2 +(l-1)\left( (n-r)(n-r_\infty ) + r_\infty ^2/2\right) . \end{aligned}$$
  2. 2.

    Assume that \(r={{\,\textrm{rk}\,}}(A {{\,\textrm{mod}\,}}p)={{\,\textrm{rk}\,}}(B {{\,\textrm{mod}\,}}p)>0\). We have

    $$\begin{aligned} N^*( A,B; p^l) \le 2^{r} {\left\{ \begin{array}{ll} p^{ln(n-r)} &{} \text { if } \ r\le n/2\\ p^{ln^2/2 } &{} \text { if } \ n/2<r\le n \end{array}\right. }. \end{aligned}$$
    (7)
  3. 3.

    In particular if \(n>1\) and \( \gcd (A,B,p)=1\) then \(N^*(A,B;p^l) \le 2^n p^{l(n^2-n)}\). If \( \gcd (\det A,\det B,p)=1\) then \(N^*(A,B;p^l) \le 2^n p^{ln^2/2}\).

Remark 1.7

The case \(p=2\) is special, in view of (14). Assume \( AX=X^{-1}B=C {{\,\textrm{mod}\,}}p \). If \(C \ne I_n\), in particular, if \( AB\ne I_n {{\,\textrm{mod}\,}}2 \), the bound \(2^{l(n^2-n)}\) still holds.

We also need the following general bound for the sum \(S_{A,B}(X;p)\).

Proposition 1.8

Assume that \(A,B\) are such that there exists \(X\) so that \(AX\equiv X^{-1}B {{\,\textrm{mod}\,}}p^l\). Let

$$\begin{aligned} S_{A,B}(X;p) = \sum _{U {{\,\textrm{mod}\,}}p} \psi ((SU+TU^2)/p) \, \end{aligned}$$

with \(S=(AX-X^{-1}B)/p^l\) and \(T=AX {{\,\textrm{mod}\,}}p\). Assume that \(p \ne 2\). We have that

$$\begin{aligned} |S_{A,B}(X;p)| \le p^{(n-r)(n-r_\infty )+r_\infty ^2/2}\, \end{aligned}$$

where \(r ={{\,\textrm{rk}\,}}T\), \(r_\infty = {{\,\textrm{rk}\,}}_\infty T\). In particular, we always have \(|S_{A,B}(X;p)| \le p^{n^2-n}\) and under the additional assumption that A is invertible, we have \(|S_{A,B}(X;p)| \le p^{n^2/2}\).

In view of Proposition 1.1 as a corollary of the above we have the following

Theorem 1.9

Assume that \(n>1\) and the matrices \(A,B\) are not both \(0 {{\,\textrm{mod}\,}}p\). We then have the following bounds.

  1. 1.

    If \(k=1\), by [6]

    $$\begin{aligned} |K_n(A,B;p)|\le {\left\{ \begin{array}{ll} 2p^{n^2-n+1} &{} \text { for all } A,B\\ 4p^{3n^2/4} &{} \text { if } \gcd (\det A,\det B,p)=1\\ 4p^{n^2/2} &{} \text { if } \gcd (\det B,p)=1, \text { and } AB^{-1} \text { regular semisimple}. \end{array}\right. } \end{aligned}$$
  2. 2.

    If \(k>1\) we have

    $$\begin{aligned} |K_n(A,B;p^k)|\le 2^n {\left\{ \begin{array}{ll} p^{kn^2-\lceil \frac{k}{2}\rceil n} &{} \text { for all } A,B\\ p^{kn^2-\lceil \frac{k}{2}\rceil n^2} &{} \text { if } \gcd (\det A,\det B,p)=1. \end{array}\right. } \end{aligned}$$

The paper is organized as follows. First in section 2 we prove Proposition 1.1, this then gives the optimal bounds for the generic situation. In the next section we list some facts concerning partitions, the Sylvester equation and multivariable Gauss sums. These will be used in the following sections. First in Sect. 4 we give upper bounds for the number of solutions of various quadratic equations in matrices modulo a prime. In the last section Sect. 5 we then prove the estimates in the last three statements above.

While this work was in progress El-Baz, Lee and Strömbergsson [5] independently arrived to quantitavely similar bounds in their work on the equidistribution of rational points on horocycles. While there are some overlaps the main results are different in nature.

2 Reduction to counting

In this section we prove Proposition 1.1 and its corollaries. As in the statement we need to deal with the case of even and odd exponents separately.

2.1 The case \(p^{2\,l}\)

Let \(k=2\,l\). For any unit \(U \in GL_n(\mathbb {Z}/p^k\mathbb {Z})\) \(K_n(A,B; p^k)= \sum _{X\in GL_n(\mathbb {Z}/p^k\mathbb {Z})} \psi ((AXU+U^{-1}X^{-1}B)/p^k).\)

Let \(H < GL_n(\mathbb {Z}/p^k\mathbb {Z})\) be the subgroup of matrices \(U\) such that \(U\equiv I \ (p^l)\). Explicitly we have

$$\begin{aligned} H=\{ I+U_1p^l \,|\, U_1 \ (p^l)\}. \end{aligned}$$

Since \((I+U_1p^l)^{-1} =I-U_1p^l\), we have that

$$\begin{aligned} K_n(A,B; p^k)= \frac{1}{p^{l n^2}} \sum _{X \in GL_n(\mathbb {Z}/p^k \mathbb {Z})} \psi ((AX+BX^{-1})/p^k)\sum _{U_1 {{\,\textrm{mod}\,}}p^l} \psi ((AXU+BUX^{-1})/p^l)). \end{aligned}$$

As noted above, \(\psi (BUX^{-1})= \psi (X^{-1}BU)\), and so the inner sum

$$\begin{aligned} \sum _{U_1 {{\,\textrm{mod}\,}}p^l} \psi ((AX-X^{-1}B)U/p^l)) \end{aligned}$$

vanishes, unless \(AX\equiv X^{-1}B {{\,\textrm{mod}\,}}p^l\). This proves the first claim in Proposition 1.1.

2.2 The case \(p^{2\,l+1}\)

Let \(k=2\,l+1\). We again use the subgroup \(H\) defined above which in this case consists of matrices \(U=I+U_1p^l+U_2p^{2\,l}\) where \(U_1\) (resp. \(U_2\)) runs on \((\mathbb {Z}/p^l\mathbb {Z})^{n\times n}\) (resp. on \((\mathbb {Z}/p\mathbb {Z})^{n\times n}\)), with inverse

$$\begin{aligned} U^{-1}=I-U_1p^l+(U_1^2-U_2)p^{2l}. \end{aligned}$$

Therefore

$$\begin{aligned} K_n(A,B; p^k)=\frac{1}{p^{n^2(l+1)}}\sum _{X\in GL_n(\mathbb {Z}/p^k\mathbb {Z})}\sum _{U} \psi ((AXU+X^{-1}BU^{-1})/p^k) \end{aligned}$$

where \(U=I+U_1p^l+U_2p^{2\,l}\) is such that \(U_1\) will run \({{\,\textrm{mod}\,}}p^l\) and \(U_2\) will run \({{\,\textrm{mod}\,}}p\).

Now fix \(X \in GL_n(\mathbb {Z}/p^k\mathbb {Z})\), and consider

$$\begin{aligned}&\sum _{U} \psi ((AX(I+U_1p^l+U_2p^{2l}) + X^{-1}B(I-U_1p^l+(U_1^2-U_2)p^{2l}))/p^k)\\&\quad =\psi ( (AX+BX^{-1})/p^k) S_1(X)S_2(X) \end{aligned}$$

where

$$\begin{aligned} S_1(X)=\sum _{U_1 \ (p^l)} \psi ((AX-X^{-1}B)U_1 +X^{-1}BU_1^2p^l)/p^{l+1})\, \end{aligned}$$

and

$$\begin{aligned} S_2(X)=\sum _{U_2 \ (p)} \psi ((AX-X^{-1}B)U_2/p). \end{aligned}$$

Note that \(S_2=0\) unless \(AX\equiv X^{-1}B {{\,\textrm{mod}\,}}p\) in which case \(S_2=p^{n^2}\). \(S_1\) is a Gauss sum in matrices, albeit a very special one. By the condition from \(S_2\) we have that \(AX-X^{-1}B=pM\) for some integral matrix \(M\), then we have

$$\begin{aligned} S_1(X)=\sum _{U_1 \ (p^l)} \psi ((MU_1 +TU_1^2)/p^l)\, \end{aligned}$$

where \(T \equiv X^{-1}Bp^{l-1}\equiv AXp^{l-1} {{\,\textrm{mod}\,}}p^l\). This gives the claim when \(l=1\). For \(l>1\) note that in view of \(pT \equiv 0 {{\,\textrm{mod}\,}}p^l\) we have for any \(V\) that

$$\begin{aligned} \sum _{U_1\ (p^l)} \psi ((MU_1 +TU_1^2)/p^l)&= \sum _{U_1 \ (p^l)} \psi ((M(U_1+pV) +T(U_1+pV)^2)/p^l)\\&=\psi (MV/p^{l-1})\sum _{U_1 \ (p^l)} \psi ((MU +TU^2)/p^l)\ . \end{aligned}$$

and so that \(S_1(X)=\psi (MV/p^{l-1})S_1(X)\). A suitable choice of \(V\) shows that \(S_1(X)=0\) unless \(M\equiv 0 \mod p^{l-1}\). In the original matrices \(A,B\) this is equivalent to \(AX \equiv X^{-1}B {{\,\textrm{mod}\,}}p^l\) in which case \(S_1=p^{(l-1)n^2}S_{A,B}(X;p)\). This gives the second claim of Proposition 1.1.

2.3 The regular semisimple case

The proof of Corollary 1.3

Note that if \(X\in GL_n(\mathbb {Z}/p^k\mathbb {Z})\), \(AX\equiv X^{-1} B {{\,\textrm{mod}\,}}p^l\), and \(A\) is invertible then so is \(B\). Moreover if \(Y=AX\) then \(Y^2\equiv AB {{\,\textrm{mod}\,}}p^l\). Assume now that \(Y^2\equiv AB {{\,\textrm{mod}\,}}p^l\) and let \(X=A^{-1}Y\). Then \(AX\equiv Y {{\,\textrm{mod}\,}}p^l\) and \(X^{-1}B\equiv Y^{-1}AB\equiv Y {{\,\textrm{mod}\,}}p^l\). If \(AB=Y^2\) is regular semisimple then all the eigenvalues of \(Y\) are different and no two of them sum to 0. This is exactly the condition (see Sect. 3.3) to modify \(Y\) by adding a suitable \(p^l Z\) in such a way that \(Y^2 \equiv AB \) hold \( {{\,\textrm{mod}\,}}p^k \) as well. In this case

$$\begin{aligned} K_n(A,B;p^k)=K_n(Y,Y;p^k) \end{aligned}$$

and the claim is an easy corollary of the calculations done in previous two subsections and the regularity of \(Y^2\).

3 Technical background

3.1 Partitions

A partition of an integer \(n\) is an ordered set \(\lambda =[n_{1}, n_{2}, \ldots , n_{r}]\), \(n_1\ge n_2 \ge \dots \ge n_r>0\), of integers satisfying \(\sum _i n_i =n\). We will write \( n=|\lambda |\). If \(\lambda \) and \(\mu \) are two partitions, \(\lambda +\mu \) is the partition obtained by taking the parts of \(\lambda \) and \(\mu \) together (and ordering them). We denote by \([n]\) the partition with one part \(n\). In general if a number \(j\) appears \(r_j\) times in \(\lambda \), the sequence \([...,j,...,j,...]\) will be replaced by \([...,j^{r_j},...]\), so for example the partition with \(n\) parts all equal to 1 is written as \([1^n]\).

Given a partition \(\lambda \) its associated Young (Ferrer) diagram has \(r\) rows with \(n_1,n_2,...n_r\) boxes in each row. For example for \(\lambda =[4,3,1]\) we have the diagram

figure a

The transpose of the diagram of \(\lambda \) is also a Young diagram of a partition \(\lambda '\) called the conjugate or dual partition to \(\lambda \) which may be described as follows. Let \(r_{i}=r_i(\lambda )\) be the number of parts of \(\lambda \) which are equal to \(i \ge 1\) and \(d_{i}=\sum _{j \ge i} r_{j}.\) Then

$$\begin{aligned} \lambda ^{\prime }=\left[ d_{1}, d_{2}, \ldots \right] \end{aligned}$$
(8)

which has the diagram

figure b

in our example \(\lambda =[4,3,1]\).

3.2 Centralizers in \(GL_n(\mathbb {F}_q)\)

At first assume \(p\ne 2\) and let \(N\) be the nilpotent transformation of an \(\mathbb {F}_q\) vector space \(V\) of dimension \(n\). Then \(V\) becomes an \(\mathbb {F}_q[T]\)-module where \(T\) acts via \(N\), \(Tv=Nv\). Such modules are isomorphic to the module \(V_\lambda =\oplus _{j} \mathbb {F}_q[T]/(T^{n_j})\), for some partition \(\lambda = [n_1,\dots ,n_k]\), \(n_1+\dots +n_k=n\) which is unique by the structure theorem of finitely generated modules over principal ideal domains. To show the partition \(\lambda \) associated to \(N\) we will use the notation \(N=N_\lambda \).

Note that the the dual partition \(\lambda '\) arises from considering \(d_i = \dim (\ker (N^i))-\dim (\ker (N^{i-1}))\). To see this assume we switch to the matrix point of view and assume that \(N\in M_n(\mathbb {F}_q)\) is a nilpotent matrix over \(\mathbb {F}_q\) with \(r_j\) blocks of size \(j\times j\) (\(j=1,\dots , n_1\)) in the Jordan normal form of N. Let \(d_i = \sum _{j \ge i} r_{j}.\) as above. Then it is easy to see that \(d_1\) is the number blocks, which also equals \(\dim \ker N\). The claim then follows from an easy inductive argument. One can alternatively define

$$\begin{aligned} d_i=r_i+r_{i+1}+\cdots +r_k=\dim (\ker (N)\cap \textrm{Im}(N^{i-1})). \end{aligned}$$

Finally we will need the order of the centralizer of unipotent elements in \(GL_n(\mathbb {F}_q)\). Note that the centralizer of the unipotent \(I+N\) is the same as the centralizer of the nilpotent transformation N.

Proposition 3.1

Let \(N=N_\lambda =[n_1,...,n_k]\) with dual partition \(\lambda '=[d_1,...,d_{n_1}]\). Then the centralizer of N has cardinality

$$\begin{aligned} \# Z_{GL_n(\mathbb {F}_q)}(N)= & {} \left( \prod _{j=1}^{n_1} (q^{r_j}-1)(q^{r_j}-q)\dots (q^{r_j}-q^{r_j-1})\right) \cdot q^{\sum _{j=1}^{n_1} (d_j^2-r_j^2)}\\= & {} q^{\sum _{j=1}^k d_j^2} \prod _{j=1}^k\prod _{t_j=1}^{r_j}\left( 1-\frac{1}{q^{t_j}}\right) . \end{aligned}$$

Proof

This is Corollary IV.I.8 in [16]. \(\square \)

Remark 3.2

Note that if we define \(\phi _r(T)=\prod _{j=1}^r (1-T^j)\) and let \( \phi _\lambda (T)=\prod \phi _{r_i(\lambda )}(T)\) then the statement can be rewritten as

$$\begin{aligned} \# Z_{GL_n(\mathbb {F}_q)}(N)=q^{\sum _{j=1}^k d_j^2} \, \phi _\lambda \left( 1/q\right) . \end{aligned}$$

3.3 Sylvester’s equation

Assume that \(A\) is \(m\times m\), \(B\) is \(n\times n\) and \(X\) and \(C\) are \(m\times n\) matrices. The matrix equation

$$\begin{aligned} AX-XB=C, \end{aligned}$$

called Sylvester’s equation [17], has a rich literature over the real, or complex fields in view of the important role it plays in various applications. (See. e.g. [4].) There are two important questions here, existence of solutions, and a description of all solutions \(\left\{ X \mid AX-XB=C\right\} \).

For our task of estimating \(K_n(A,B;p^k)\) we will concentrate on estimating the number of solutions. If the field of coefficients is \(\mathbb {F}_q\) for some p-power q, then the number of solutions is clearly either \(0\) or \( q^d \), where

$$\begin{aligned} d = d_{A,B}=\dim \{ X\,|\,AX-XB=0 \}\le mn. \end{aligned}$$
(9)

While the bound by \(mn\) is trivial, in the case when \( A=\lambda I_n,B=\lambda I_m \) for the same scalar \(\lambda \), one has \(d_{A,B}= mn\).

Note that we may interpret the equation via linear transformations. To do so let \( W=\mathbb {F}_q^m, V=\mathbb {F}_q^n\) viewed as column vectors. Both \(W\) and \(V\) become \(\mathbb {F}_q[T]\)-modules via mapping \(T\) to \(A\) and \(B\) respectively.

If \(AX=XB\) then \(g(A)X=Xg(B)\) for any polynomial \(g \in \mathbb {F}_q[T]\) and so \(X\) gives rise to a module homomorphism from \(W\) to \(V\) which we denote \( \hom _{\mathbb {F}_q[T]}(W,V)\).

For an irreducible polynomial \(f \in \mathbb {F}_q[T]\) let

$$\begin{aligned} V_{f^e}=\{ v\in V\mid f^e(A)v=0\}, \end{aligned}$$

and similarly for \(W\). The \(f\)-primary component of \(V\) is \(\cup _{e=1}^\infty V_{f^e}\), which we denote by \( V_{f^\infty } \).

Clearly if \(X\in \hom _{\mathbb {F}_q[T]}(W,V)\) then

$$\begin{aligned} X(W_{f^e})\subset V_{f^e} \end{aligned}$$
(10)

which implies

$$\begin{aligned} \hom _{\mathbb {F}_q[T]}(W,V) = \bigoplus _f \hom _{\mathbb {F}_q[T]}(W_{f^\infty },V_{f^\infty }) \end{aligned}$$

the sum over \(f \in \mathbb {F}_q[T]\) irreducible. Since the problem is linear, we may go to a finite field extension if needed and then assume that the eigenvalues of \( A,B\) are in \(\mathbb {F}_q\). For \(f=T-\lambda \) and \(e\ge 1\) we let

$$\begin{aligned} d_e(A-\lambda ) = \dim V_{(T-\lambda )^e}=\dim \ker (A-\lambda I)^e,\ d_\infty (A-\lambda )=d_n(A-\lambda ). \end{aligned}$$
(11)

As an immediate corollary of the trivial bound (9 we get the following

$$\begin{aligned} d_{A,B}\le \sum _{\lambda \in \mathbb {F}_q} d_\infty (A-\lambda ) d_\infty (B-\lambda ). \end{aligned}$$
(12)

Note that that for \(A\) semisimple the inequality above becomes an equality, showing that the bound is sharp.

The application for us involves the special case when \(B=-A\). If \(A\) is invertible the above bound is sufficient, but the nilpotent case needs a more refined version given in the following lemma.

Lemma 3.3

Assume that \(A\in M_m(\mathbb {F}_q),B\in M_n(\mathbb {F}_q)\) are nilpotent. Let

$$\begin{aligned} k=\dim \ker (A), \quad l=\dim \ker (B), \end{aligned}$$

Then

$$\begin{aligned} \dim \{ X \in M_{m,n}(\mathbb {F}_q) \mid AX=XB \} \le \frac{kn+ml}{2}. \end{aligned}$$

Proof

As above let \( W=\mathbb {F}_q^m, V=\mathbb {F}_q^n\) viewed as \( \mathbb {F}_q[T]\) modules via mapping \(T\) to \(A\) and \(B\). Then

$$\begin{aligned} W\simeq \bigoplus _{i=1}^k \mathbb {F}_q[T]/(T^{m_i}), \quad V\simeq \bigoplus _{j=1}^l \mathbb {F}_q[T]/(T^{n_j}). \end{aligned}$$

for some partitions \((m_1,...,m_k), (n_1,...,n_l)\) of \(m\) and \(n\) respectively (with \(k,l\) as defined above).

Note that any element \(X\) of \( \hom _{\mathbb {F}_q[T]}(\mathbb {F}_q[T]/(T^b), \mathbb {F}_q[T]/(T^a))\) is determined by the value \(X\) on \(1 {{\,\textrm{mod}\,}}T^b\) and so by (10)

$$\begin{aligned} \dim \hom _{\mathbb {F}_q[T]}(\mathbb {F}_q[T]/(T^b), \mathbb {F}_q[T]/(T^a)) = \min (a,b) \end{aligned}$$
(13)

This gives

$$\begin{aligned} \dim \{ X \in M_{m,n}(\mathbb {F}_q) \mid AX=XB \} =\sum _{i,j=1}^{k,l} \min (m_i,n_j) \le \sum _{i,j=1}^{k,l} \frac{m_i+n_j}{2} \le \frac{kn+ml}{2}. \end{aligned}$$

\(\square \)

Corollary 3.4

The trivial bound (12) can be strengthened to

$$\begin{aligned} \dim \{ X {{\,\textrm{mod}\,}}p\mid AX+XA =0\} \le \sum _{\lambda \in \mathbb {F}_q } d_1(A-\lambda ) d_\infty (A+\lambda ) \ . \end{aligned}$$
(14)

3.4 Generalities on multivariable Gauss sums.

We start with a general setup on \(V=\mathbb {F}_p^m\). Let \(F(x)=Q(x)+L(x)\) with a quadratic form \(Q\), and a linear form \(L\) on \(V\) and define

$$\begin{aligned} G(F;p)=\sum _{x\in V} e( F(x)/p) \end{aligned}$$

where \(e(z)=e^{2\pi i z}\). If \(L=0\) and \(p\ne 2\), the sum \(G(Q;p)\) is easy to evaluate after diagonalizing \(Q\); it is a product of trivial factors and Gauss sums. The case when \(p=2\) is slightly more involved [8], but still explicit.

For our use in what follows, some of the details are relevant, and so we sketch these. Assume that \(p\ne 2\) when we have that \(Q\) comes from a bilinear form

$$\begin{aligned} B(x,y)=Q(x+y)-Q(x)-Q(y) \end{aligned}$$

so that \(Q(x)=\frac{1}{2}B(x,x)\). \(B\) gives rise to a Riesz map \(R: V \rightarrow V^*\), \(R: y \mapsto (R_y: x \mapsto B(x,y))\), which may not be surjective if \(B\) is degenerate. Still we have the following dichotomy.

Proposition 3.5

Assume that \(p\ne 2\). Let \(F(x)=Q(x)+L(x)\), where \(Q\) is a quadratic and \(L\) is a linear form on V.

  1. 1.

    If \(L\) has a Riesz-representative, i.e. \(L=R_y\) for some \(y\), then \(G(F;p)= e(-Q(y)/p) G(Q;p) \).

  2. 2.

    If \(L\) does not have a Riesz-representative then \(G(F;p)=0\).

Proof

The first statement is trivial. To see the second, note that \(L\) does not have a Riesz representative if and only if \(\ker L\) does not contain

$$\begin{aligned} V^\perp =\{ v \in V | B(v,y)=0 \text { for all } y \in V\}. \end{aligned}$$

Therefore there is \(y\), such that \(L(y)\ne 0\) but \(B(x,y)=0\) for all \(x\), and so that \(Q(x+y)=Q(x)+Q(y)\), and in particular \(Q(y)=0\). But then

$$\begin{aligned} G(F;p)=\sum _x e(F(x+y)/p)=\sum _{x} e((Q(x+y)+L(x+y))/p)= e(L(y)/p)G(F;p) \end{aligned}$$

showing \(G(F;p)=0\) since \(e(L(y)/p)\ne 1\). \(\square \)

The folllowing is an easy corollary of the evaluation of the standard Gauss sum [1].

Corollary 3.6

Assume that \(F=Q+L\) and that \(L\) has a Riesz representative, \(L=R_y\). Write \(Q=Q_0\perp Q_1\) where \(Q_0\) is totally isotropic, and \(Q_1\) non-degenarate. Let \(\dim Q_1=r\). Then

$$\begin{aligned} G(F)=\left( \det Q_1/p\right) e(-Q(y)/p)p^{n-r/2}, \end{aligned}$$

where \( \left( {\cdot }/{p}\right) \) is Legendre’s symbol.

4 The equation \(AX\equiv X^{-1}B\) to prime modulus

4.1 Preliminary observations

We will concentrate on the case when the equation \(AX=X^{-1}B\) is solvable. As noted above we may then simply assume that \(C=AX_0=X_0^{-1}B\) for some fixed solution \(X_0\) and consider the equation \(CX=X^{-1}C\). For this equation we start with the proof of Proposition 1.4.

Proof of Proposition 1.4

For any solution X of the equation \(CX=X^{-1}C\) we also have \(XC=CX^{-1}\) by multiplying from the left by \(X^{-1}\) and from the right by X. So we compute \(C^2X=C(CX)=CX^{-1}C=XC^2\), ie. X and \(C^2\) commute. Let us write \(m_{C^2}(x)\in \mathbb {F}_p[x]\) for the minimal polynomial of \(C^2\) and write it as \(m_{C^2}(x)=\prod _{j=1}^r f_j(x)^{k_j}\) where \(f_j(x)\in \mathbb {F}[x]\) is irreducible.

Then we may decompose \(\mathbb {F}_p^n=V_1\oplus \dots \oplus V_r\) as the direct sum of generalized eigenspaces where \(V_j:=\ker (f_j(C^2)^{k_j})\). Since X and \(C^2\) commute, \(V_j\) is an X-invariant subspace in \(\mathbb {F}_p^n\) for \(1\le j\le r\): indeed, for any \(v\in V_j\) we have \(f_j(C^2)^{k_j}v=0\) whence \(f_j(C^2)^{k_j}(Xv)=Xf_j(C^2)^{k_j}v=0\) showing \(Xv\in \ker (f_j(C^2)^{k_j})=V_j\). By a similar argument we also deduce that \(V_j\) is also C-invariant (as C also commutes with \(C^2\)). Therefore restricting the identity \(CX=X^{-1}C\) to the subspace \(V_j\) we deduce \(C_jX_j=X_j^{-1}C_j\) where \(C_j\) (resp. \(X_j\)) is the restriction of C (resp. of X) to \(V_j\). On the other hand, whenever we have matrices \(X_j\in GL(V_j)\) with \(C_jX_j=X_j^{-1}C_j\) (\(1\le j\le r\)) then we may form the block matrix X from the matrices \(X_j\) to obtain a solution of the equation \(CX=X^{-1}C\). So the number of solutions of the equation \(CX=X^{-1}C\) is the product of the number of solutions on each \(V_j\).

This proves the proposition.

Now assume \(C\) is invertible. Since \(X\) is also invertible \(XCX=C\) if and only if \(Y=CX\) satisfies \(Y^2=C^2\) and we will count the number of solutions to this equation under the assumption that \(C^2\) has minimal polynomial \(f^k(x)\), with \(f(x)\ne x\) irreducible. For this recall that any linear transformation \(T\) has a unique multiplicative Jordan–Chevalley decomposition as

$$\begin{aligned} T=T_sT_u =T_uT_s \end{aligned}$$
(15)

where \(T_s\) is semisimple, \(T_u\) is unipotent. If \(Y=Y_sY_u\) and \(Y^2=C^2\) then

$$\begin{aligned} Y_s^2=C_s^2 \text { and } Y_u^2=C_u^2. \end{aligned}$$

In case \(p\ne 2\), we can immediately infer that \(Y_u=C_u\) from the following

Lemma 4.1

Assume that \(Z_1,Z_2\in GL_n(\mathbb {F}_p)\) are unipotent such that \(Z_1^2=Z_2^2\). If \(p \ne 2\) then \(Z_1=Z_2\).

Proof

For simplicity we use the simple property that for any unipotent element \(Z\) in \( GL_n(\mathbb {F}_p)\) we have \(Z^{p^r}=I\) for some \(r\). Since \(p\) is odd

$$\begin{aligned} Z_1=(Z_1^2)^{(p^r+1)/2}=(Z_2^2)^{(p^r+1)/2}=Z_2. \end{aligned}$$

\(\square \)

Therefore estimating \( N^*(A,B;p) \) is reduced to estimating

$$\begin{aligned} n(C,C;p)=\{ Y \in M_n(\mathbb {F}_p) \mid Y^2=C^2, YC_u=C_uY \}. \end{aligned}$$
(16)

Our last observation is now the following

Lemma 4.2

Put \(V:=\mathbb {F}_p^n\) and assume that the minimal polynomial of \(C^2:V\rightarrow V\) is \(m_{C^2}(x)=f^k(x)\) where \(f\in \mathbb {F}_p[x]\) is irreducible and let \(q=p^{\deg f}\). Then V has the structure of an \(\mathbb {F}_q\)-vectorspace such that all \(C,X,Y,Y_s,Y_u:V\rightarrow V\) as above are \(\mathbb {F}_q\)-linear for any invertible solution X of the equation \(XCX=C\). Further, \(Y_s^2\) is an \(\mathbb {F}_q\)-scalar multiple of the identity.

Proof

We assume that \(f(x)\ne x\) otherwise the claim is trivial. Note that \( \mathbb {F}_p[x]/(f)\) is isomorphic to the field \(\mathbb {F}_q\) with \(q=p^{\deg f}\) elements. We choose such an isomorphism and let \(\alpha \) denote the image of \(x\) in \(\mathbb {F}_q\), so that \(\mathbb {F}_q=\mathbb {F}_p[\alpha ]\). The ring \(\mathbb {F}_p[x]/(f^k)\) is then isomorphic to \(\mathbb {F}_q[t]/((t-\alpha )^k)\) and since \(C_s^2\in \mathbb {F}_p[C^2]\) is a semisimple element, it may be identified with \(\alpha \in \mathbb {F}_q^*\).

Therefore the action of \(\mathbb {F}_p[C_s^2]\) on \(V\) gives an \( \mathbb {F}_q\)-linear structure and since \(X,Y,Y_s,Y_u\) all commute with \(C_s^2\) they may be viewed as an \(\mathbb {F}_q\)-linear transformation. Finally, we have \(Y_s^2=C_s^2=\alpha I\). \(\square \)

4.2 The proof of Theorem 1.5 in the invertible cases

Let \(V:=\mathbb {F}_p^m\) and \(C:V\rightarrow V\) be an invertible \(\mathbb {F}_p\)-linear map such that we have \(m_{C^2}(x)=f^k(x)\) for the minimal polynomial of \(C^2\) with some irreducible polynomial \(f(x)\in \mathbb {F}_p[x]\). By Lemma 4.2 we even have an \(\mathbb {F}_q\)-linear structure on V (with \(q:=p^{\deg f}\)) such that both C and any solution X to the equation \(CX=X^{-1}C\) are \(\mathbb {F}_q\)-linear. Further, C has the Jordan–Chevalley decomposition \(C=C_uC_s=C_sC_u\) with \(C_u\) unipotent and \(C_s\) semisimple. At first assume \(p\ne 2\) and let \(N\) be the nilpotent transformation \(C_u-I\). Then \(V\) becomes an \(\mathbb {F}_q[T]\)-module where \(T\) acts via \(N\), \(Tv=Nv\). Such modules are isomorphic to the module

$$\begin{aligned} V_\lambda =\oplus _{j} \mathbb {F}_q[T]/(T^{n_j}), \end{aligned}$$
(17)

for some partition \(\lambda = [n_1,..,n_k]\), \(n_1+\cdots +n_k=n\) which is unique by the structure theorem of finitely generated modules over PIDs. To show the partition \(\lambda \) associated to \(N\) we will use the notation \(N=N_\lambda \).

By Lemmas 4.1 and 4.2 it is enough to count elements in the set

$$\begin{aligned} \mathcal {R}_\alpha (\lambda )=\{ Y\in GL_n(\mathbb {F}_q) \,|\, Y^2=\alpha I, YN_\lambda =N_\lambda Y\}, \end{aligned}$$
(18)

ie. we have \(N^*(A,B;p)=n(C,C;p)=\#\mathcal {R}_\alpha (\lambda )\) where \(\alpha \in \mathbb {F}_q^*\) denotes the unique eigenvalue of \(C^2\) as an \(\mathbb {F}_q\)-linear.

We start with the case when \( \left( \frac{x}{f(x)} \right) = 1\) when we have \(\alpha = \beta ^2\) for some \(\beta \in \mathbb {F}_q^*\). We have the following

Lemma 4.3

Let \(\mathcal {S}(\lambda )\) denote set of pairs \((U_+,U_-)\) of \(\mathbb {F}_q[T]\)-submodules \(U_+,U_-\le V\) such that \(V=U_+\oplus U_-\). The maps

are inverse bijections between \(\mathcal {R}_\alpha (\lambda )\) and \(\mathcal {S}(\lambda )\). Here \(u_+\in U_+\), \(u_-\in U_-\) and \(U_{Y,+}\) (resp. \(U_{Y,-}\)) denotes the \(\beta \)-eigenspace (resp. \(-\beta \)-eigenspace) of Y.

Proof

Since \(p\ne 2\), \(Y^2=\alpha I\) implies Y is semisimple whence V is the direct sum of the two eigenspaces of Y. Moreover, these eigenspaces are N-invariant, ie. they are \(\mathbb {F}_q[T]\)-submodules. Conversely, given such a decomposition \(V=U_+\oplus U_-\), we have \(Y_U^2=\alpha I\) and \(Y_U\) commutes with N. \(\square \)

By the theorem of elementary divisors, for any decomposition \(V=U_+\oplus U_-\) of the \(\mathbb {F}_q[T]\)-module V, \(\lambda \) is the sum of the multisets \(\mu \) and \(\nu \) where \(\mu \) (resp. \(\nu \)) is the partition of \(\dim U_+\) (resp. of \(\dim U_-\)) corresponding to the restriction of N to \(U_+\) (resp. to \(U_-\)). Therefore we may write \(\mathcal {S}(\lambda )\) as the union of

$$\begin{aligned} \mathcal {S}(\lambda ,\mu ,\nu )=\left\{ (U_+,U_-)\mid U_+,U_-\le _{\mathbb {F}_q[T]}V, U_+\oplus U_-=V,U_+\cong \bigoplus _{j\in \mu }\mathbb {F}_q[T]/(T^j),U_-\cong \bigoplus _{j\in \nu }\mathbb {F}_q[T]/(T^j) \right\} \end{aligned}$$

where \(\mu \) runs over the multisets included in \(\lambda \) and \(\nu =\lambda -\mu \) is the difference.

Lemma 4.4

For any decomposition \(\lambda =\mu +\nu \) the centralizer \(Z_{GL_{|\lambda |}(\mathbb {F}_q)}(N)\) of N acts transitively on the set \(\mathcal {S}(\lambda ,\mu ,\nu )\).

Proof

Assume we are given two decompositions \(U_+\oplus U_-=V=U'_+\oplus U'_-\) in \(\mathcal {S}(\lambda ,\mu ,\nu )\). Then we have the isomorphisms \(U_+\cong \bigoplus _{j\in \mu }\mathbb {F}_q[T]/(T^j)\cong U'_+\) and \(U_-\cong \bigoplus _{j\in \nu }\mathbb {F}_q[T]/(T^j)\cong U'_-\) of \(\mathbb {F}_q[T]\)-modules. Taking the direct sum of these two isomorphisms we obtain an automorphism \(g:V=U_+\oplus U_-\rightarrow U'_+\oplus U'_-=V\). Being \(\mathbb {F}_q[T]\)-linear it means that g lies in \(Z_{GL_{|\lambda |}(\mathbb {F}_q)}(N)\) when viewed as an \(\mathbb {F}_q\)-linear transformation. \(\square \)

Proposition 4.5

If \(\alpha \in (\mathbb {F}_q^*)^2\) then

$$\begin{aligned} n(C,C;p)=\#\mathcal {R}_{\alpha }(\lambda )=\sum _{\lambda =\mu +\nu } \frac{\# Z_{GL_{|\lambda |}(\mathbb {F}_q)}(N_\lambda )}{\# Z_{GL_{|\mu |}(\mathbb {F}_q)}(N_\mu ) \# Z_{GL_{|\nu |}(\mathbb {F}_q)}(N_\nu )}. \end{aligned}$$

Proof

By Lemma 4.3 we obtain \(\# \mathcal {R}_{\alpha }(\lambda )=\# \mathcal {S}(\lambda )=\sum _{\lambda =\mu +\nu }\# \mathcal {S}(\lambda ,\mu ,\nu )\). The statement follows from Lemma 4.4 noting that the stabilizer of a given decomposition \(V=U_+\oplus U_-\) in \(\mathcal {S}(\lambda ,\mu ,\nu )\) equals

$$\begin{aligned} Z_{GL_{|\lambda |}(\mathbb {F}_q)}(N_\lambda )\cap (GL_{|\mu |}(\mathbb {F}_q)\times GL_{|\nu |}(\mathbb {F}_q))=Z_{GL_{|\mu |}(\mathbb {F}_q)}(N_\mu )\times Z_{GL_{|\nu |}(\mathbb {F}_q)}(N_\nu ). \end{aligned}$$

\(\square \)

Now assume \( \left( \frac{x}{f(x)} \right) = -1\), so we have \(\alpha = \beta ^2\) for some \(\beta \in \mathbb {F}_{q^2}^*{\setminus } \mathbb {F}_{q}^*\). Put \(\sigma \) for the nontrivial element in \({\text {Gal}}(\mathbb {F}_{q^{2}}/\mathbb {F}_{q})\). Then \(\varphi =\sigma \otimes I:\mathbb {F}_{q^{2}}\otimes _{\mathbb {F}_q}V\) is a \(\sigma \)-semilinear map (ie. \(\mathbb {F}_q[T]\)-linear with \(\varphi (\beta v)=\sigma (\beta )\varphi (v)=-\beta \varphi (v)\)). Further, for a \(\mathbb {F}_{q^2}[T]\)-module homomorphism \(f:\mathbb {F}_{q^{2}}\otimes _{\mathbb {F}_q}V\rightarrow \mathbb {F}_{q^{2}}\otimes _{\mathbb {F}_q}V\) there exists an \(\mathbb {F}_q[T]\)-module homomorphism \(\widetilde{f}:V\rightarrow V\) with \(f=1\otimes \widetilde{f}\) if and only if f commutes with \(\varphi \), ie. \(f\circ \varphi =\varphi \circ f\).

Lemma 4.6

Assume \(\alpha \notin (\mathbb {F}^*_q)^2\). Then the centralizer \(Z_{GL_{|\lambda |}(\mathbb {F}_q)}(N_\lambda )\) acts transitively on the set \(\mathcal {R}_\alpha (\lambda )\) (by conjugation).

Proof

Let \(Y,Y'\in \mathcal {R}_\alpha (\lambda )\), ie. \(Y,Y':V\rightarrow V\) are both \(\mathbb {F}_q[T]\)-linear isomorphisms with \(Y^2=\alpha I=Y'^2\). Put \(U\le \mathbb {F}_{q^2}\otimes _{\mathbb {F}_q}V\) (resp. \(U'\le \mathbb {F}_{q^2}\otimes _{\mathbb {F}_q}V\)) for the \(\beta \)-eigenspace of \(1\otimes Y\) (resp. of \(1\otimes Y'\)). Then \(\varphi (U)\) (resp. \(\varphi (U')\)) is the \(\sigma (\beta )=-\beta \)-eigenspace of Y (resp. of \(Y'\)). In particular, we have \(U\oplus \varphi (U)=\mathbb {F}_{q^2}\otimes _{\mathbb {F}_q}V=U'\oplus \varphi (U')\). Now if \(U\cong \bigoplus _j\mathbb {F}_{q^2}[T]/(T^{m_j})\) for some partition \(\mu =[m_1,\dots ,m_s]\) then we have the isomorphism

$$\begin{aligned} \varphi (U)\cong \bigoplus _j\mathbb {F}_{q^2}[T]/(\sigma (T)^{m_j})\cong \bigoplus _j\mathbb {F}_{q^2}[T]/(T^{m_j})\cong U \end{aligned}$$

of \(\mathbb {F}_{q^2}[T]\)-modules whence \(\lambda =\mu +\mu \). Similarly, \(U'\cong \varphi (U')\). By the structure theorem for finitely generated modules over the PID \(\mathbb {F}_{q^2}[T]\), we must have \(U\cong U'\) as \(\mathbb {F}_{q^2}[T]\)-modules, as well. Taking such an isomorphism \(S:U\rightarrow U'\) we also define \(S(\varphi (u)):=\varphi (S(u))\) on \(\varphi (U)\) giving rise to an \(\mathbb {F}_{q^2}[T]\)-linear automorphism \(S:\mathbb {F}_{q^2}\otimes _{\mathbb {F}_q}V=U\oplus \varphi (U)\rightarrow U'\oplus \varphi (U')=\mathbb {F}_{q^2}\otimes _{\mathbb {F}_q}V\) that satisfies \(S(1\otimes Y)S^{-1}=1\otimes Y'\). Moreover, S descends to a map \(\widetilde{S}:V\rightarrow V\) (such that \(S=1\otimes \widetilde{S}\)) since it commutes with \(\varphi \). Finally, \(\widetilde{S}\) satisfies \(\widetilde{S}Y\widetilde{S}^{-1}=Y'\) and lies in the centralizer of \(N_\lambda \) as it is \(\mathbb {F}_q[T]\)-linear. \(\square \)

Proposition 4.7

Assume \(\alpha \notin (\mathbb {F}^*_q)^2\). Then we have \(\lambda =\mu +\mu \) for some partition \(\mu \) and

$$\begin{aligned} n(C,C;p)=\#\mathcal {R}_\alpha (\lambda )=\frac{\# Z_{GL_{|\lambda |}(\mathbb {F}_q)}(N_\lambda )}{\# Z_{GL_{|\mu |}(\mathbb {F}_{q^2})}(N_\mu )}. \end{aligned}$$

Proof

By Lemma 4.6\(\mathcal {R}_\alpha (\lambda )\) is the conjugacy class of \(C_s\) in \(Z_{GL_{|\lambda |}(\mathbb {F}_q)}(N_\lambda )\). A moments thought shows that we may define an \(\mathbb {F}_{q^2}\)-linear structure on \(V\) where \(C_s\) acts via multiplication by \(\beta \) where the \(\mathbb {F}_{q^2}\)-linear maps are exactly those which are \(\mathbb {F}_q\)-linear and commute with \(C_\beta \). In particular, the centralizer of \(C_\beta \) in \(Z_{GL_{|\lambda |}(\mathbb {F}_q)}(N_\lambda )\) equals \(Z_{GL_{|\mu |}(\mathbb {F}_{q^2})}(N_\mu )\). \(\square \)

This leads to formula (5).

Corollary 4.8

Assume that C has minimal polynomial \(f(x)^k\) where \(f(x)\ne x\) is irreducible and \(p\ne 2\). Then we have \(n(C,C,p)<\frac{q^2+1}{q^2-1}q^{\lfloor \frac{n^2}{2\deg ^2 f}\rfloor }=\frac{q^2+1}{q^2-1}p^{\deg f\lfloor \frac{n^2}{2\deg ^2 f}\rfloor }\le \frac{p^2+1}{p^2-1}p^{\frac{n^2}{2}}\) and if \(k=1\) then there exists a constant \(0<c(q)<1\) (with \(\lim _{q\rightarrow \infty }c(q)=1\)) such that

$$\begin{aligned} c(q)q^{\lfloor \frac{n^2}{2\deg ^2 f}\rfloor }<n(C,C;p)<\frac{q^2+1}{q^2-1}q^{\lfloor \frac{n^2}{2\deg ^2 f}\rfloor }. \end{aligned}$$

In particular

$$\begin{aligned} n(C,C;p)\le 2 p^{n^2/2\deg f}. \end{aligned}$$

Proof

Since n(CCp) is the number of square roots of \(C_s^2\) commuting with \(C_u\), the case \(C_u=I\) gives an upper bound for the number of solutions in general. So we may assume \(N=0\). Put \(n_1:=\dim _{\mathbb {F}_q}V=\frac{n}{\deg f}\). Then in the split case we compute

$$\begin{aligned} n(C,C;p)&=\sum _{j=0}^{n_1}\frac{\#GL_{n_1}(\mathbb {F}_q)}{\#GL_{j} (\mathbb {F}_q)\#GL_{n_1-j}(\mathbb {F}_q)}\\&=\sum _{j=0}^{n_1}\frac{(q^{n_1}-1)\dots (q^{n_1}-q^{n_1-1})}{(q^j-1)\dots (q^j-q^{j-1})(q^{n_1-j}-1)\dots (q^{n_1-j}-q^{n_1-j-1})}\\&=\sum _{j=0}^{n_1}q^{n_1^2-j^2-(n_1-j)^2}\frac{(1-1/q^{n_1})\dots (1-1/q)}{(1-1/q^j) \dots (1-1/q)(1-1/q^{n_1-j})\dots (1-1/q)}\\&<\sum _{j=0}^{n_1}q^{n_1^2-j^2-(n_1-j)^2}<q^{\lfloor \frac{n_1^2}{2}\rfloor }(1+\sum _{j=1}^ \infty \frac{2}{q^{2j}})=\frac{q^2+1}{q^2-1}q^{\lfloor \frac{n_1^2}{2}\rfloor }. \end{aligned}$$

On the other hand, we have

$$\begin{aligned} n(C,C;p)&=\sum _{j=0}^{n_1}q^{n_1^2-j^2-(n_1-j)^2}\frac{(1-1/q^{n_1})\dots (1-1/q)}{(1-1/q^j)\dots (1-1/q)(1-1/q^{n_1-j})\dots (1-1/q)}\\&>q^{n_1^2-\lfloor \frac{n_1}{2}\rfloor ^2-\lceil \frac{n_1}{2}\rceil ^2}(1-1/q^{n_1})\dots (1-1/q)>c(q)q^{\lceil \frac{n_1^2}{2}\rceil } \end{aligned}$$

with constant \(c(q)=\prod _{j=1}^\infty (1-1/q^j)\) that clearly satisfies \(\lim _{q\rightarrow \infty }c(q)=1\). \(\square \)

Finally, assume \(p=2\). Since the 2-Frobenius is bijective on finite fields of characteristic 2, \(C_s^2\) has a unique square root \(Y_s=C_s\). So we need to count the square roots of the unipotent matrix \(C_u^2\) or equivalently the square roots of the nilpotent matrix \(C_u^2+I=(C_u+I)^2\).

Lemma 4.9

Assume that q is a power of 2. For any integer \(n>0\) we have the identification

$$\begin{aligned} \mathbb {F}_q[T]/(T^n)\cong \mathbb {F}_q[T^2]/((T^2)^{\lfloor \frac{n}{2}\rfloor })\oplus \mathbb {F}_q[T^2]/((T^2)^{\lceil \frac{n}{2}\rceil }) \end{aligned}$$

as \(\mathbb {F}_q[T^2]\)-modules.

Proof

This amounts to the fact that the square of a nilpotent Jordan block of size n splits into two blocks of size \(\lfloor \frac{n}{2}\rfloor \) and \(\lceil \frac{n}{2}\rceil \). \(\square \)

Proposition 4.10

Assume q is a power of 2. Then the number of solutions of the matrix equation \(Y^2=C^2\) equals

$$\begin{aligned} \sum _\mu \frac{\# Z_{GL_n(\mathbb {F}_q)}(N_\lambda ^2)}{\# Z_{GL_n(\mathbb {F}_q)}(N_\mu )} \end{aligned}$$

where \(\mu =[m_1,\dots ,m_k]\) runs on the set of partitions such that

$$\begin{aligned} \left[ \lfloor \frac{m_1}{2}\rfloor ,\lceil \frac{m_1}{2}\rceil ,\dots ,\lfloor \frac{m_k}{2}\rfloor ,\lceil \frac{m_k}{2}\rceil \right] =\left[ \lfloor \frac{n_1}{2}\rfloor ,\lceil \frac{n_1}{2}\rceil ,\dots ,\lfloor \frac{n_k}{2}\rfloor ,\lceil \frac{n_k}{2}\rceil \right] . \end{aligned}$$

Proof

By Lemma 4.9\(N_\mu ^2\) is similar to \(N_\lambda ^2\) if and only if

$$\begin{aligned} \left[ \lfloor \frac{m_1}{2}\rfloor ,\lceil \frac{m_1}{2}\rceil ,\dots ,\lfloor \frac{m_k}{2} \rfloor ,\lceil \frac{m_k}{2}\rceil \right] =\left[ \lfloor \frac{n_1}{2}\rfloor ,\lceil \frac{n_1}{2} \rceil ,\dots ,\lfloor \frac{n_k}{2}\rfloor ,\lceil \frac{n_k}{2}\rceil \right] . \end{aligned}$$

So for each such \(\mu \) we are reduced to determine the cardinality of the fiber at \(N_\lambda ^2\) of the map

$$\begin{aligned} \{\text {conjugacy class of }N_\mu \}\rightarrow & {} \{\text {conjugacy class of }N_\lambda ^2\}\\ M\mapsto & {} M^2. \end{aligned}$$

However, all the fibers of the above map have the same cardinality by conjugation, so the number of solutions to the above equation is

$$\begin{aligned} \frac{\# \{\text {conjugacy class of }N_\mu \}}{\# \{\text {conjugacy class of }N_\lambda ^2\}}=\frac{\# Z_{GL_n(\mathbb {F}_q)}(N_\lambda ^2)}{\# Z_{GL_n(\mathbb {F}_q)}(N_\mu )}. \end{aligned}$$

\(\square \)

4.3 The proof of Theorem 1.5 in the nilpotent case

Lemma 4.11

Assume \(XCX=C\) for some \(C\in M_n(\mathbb {F}_p) \) and \(X\in \textrm{GL}_n(\mathbb {F}_p)\). Then for any integer \(j\ge 1\) the subspaces \(\ker (C^j)\) and \(\textrm{Im}(C^j)\) are X-invariant.

Proof

Since X is invertible we may write \(CX=X^{-1}C\) and \(XC=CX^{-1}\), so by induction on j we deduce \(C^jX=X^{(-1)^j}C^j\) and \(XC^j=C^jX^{(-1)^j}\). Therefore if \(v\in \ker (C^j)\) then we have \(C^jXv=X^{(-1)^j}C^jv=0\), ie. \(Xv\in \ker (C^j)\). On the other hand, we compute \(XC^jw=C^jX^{(-1)^j}w\in \textrm{Im}(C^j)\) for any \(w\in \mathbb {F}_p^n\). \(\square \)

Proposition 4.12

Let \(C\in M_n(\mathbb {F}_p) \) be a nilpotent matrix such that there are \(r_j\) blocks of size \(j\times j\) (\(j=1,\dots ,k\)) in the Jordan normal form of C. Then the number of solutions of the equation \(XCX=C\) in \(X\in \textrm{GL}_n(\mathbb {F}_p)\) equals

$$\begin{aligned} \left( \prod _{j=1}^k(p^{r_j}-1)(p^{r_j}-p)\dots (p^{r_j}-p^{r_j-1})\right) \cdot p^{\sum _{j=1}^k(d_j^2-r_j^2)}=p^{\sum _{j=1}^k d_j^2}\prod _{j=1}^k\prod _{t_j=1}^{r_j}\left( 1-\frac{1}{p^{t_j}}\right) \, \end{aligned}$$

where we put \(d_i=r_i+r_{i+1}+\dots +r_k=\dim (\ker (C)\cap \textrm{Im}(C^{i-1}))=\dim (\ker (C^i))-\dim (\ker (C^{i-1}))\).

Proof

First of all, we have \(C^k=0\) and \(\dim _{\mathbb {F}_p}(\ker (C)\cap \textrm{Im}(C^i))=\sum _{j=i+1}^k r_j\). By lemma 4.11 the flag \(0\le \ker (C)\cap \textrm{Im}(C^{k-1})\le \dots \le \ker (C)\cap \textrm{Im}(C^i)\le \dots \le \ker (C)\cap \textrm{Im}(C)\le \ker (C)\) in \(\ker (C)\) must be X-invariant for any solution X of the equation \(XCX=C\). The set of such maps \(X_1:=X_{\mid \ker (C)}:\ker (C)\rightarrow \ker (C)\) is the parabolic subgroup \(P_{(r_1,\dots ,r_k)}\) of \(\textrm{GL}_{r_1+\dots +r_k}(\mathbb {F}_p)\) of type \((r_1,\dots ,r_k)\) which has cardinality

$$\begin{aligned} \# P_{(r_1,\dots ,r_k)}=\left( \prod _{j=1}^k(p^{r_j}-1)(p^{r_j}-p)\dots (p^{r_j}-p^{r_j-1})\right) \cdot p^{\sum _{1\le i<j\le k}r_ir_j}. \end{aligned}$$

Lemma 4.13

For any \(2\le t\le k\) and \(X_1\in P_{(r_1,\dots ,r_k)}\) the number of extensions of \(X_1\) to a one-to-one linear map \(X_t:\ker (C^t)\rightarrow \ker (C^t)\) satisfying

  1. (i)

    \(XCX=C\) and

  2. (ii)t

    \(\ker (C^i)\cap \textrm{Im}(C^j)\) is \(X_t\)-invariant for all \(1\le i\le t\) and \(0\le j\le k\)

equals

$$\begin{aligned} \prod _{2\le j\le t}p^{r_j(r_1+\cdots +r_k)+r_{j+1}(r_2+\cdots +r_k)+\cdots +r_k(r_{k-j+1}+\cdots +r_k)}. \end{aligned}$$

Proof

We proceed by induction on t. Assume we have a map \(X_{t-1}:\ker (C^{t-1})\rightarrow \ker (C^{t-1})\) satisfying (i) and \((ii)_{t-1}\) and pick an element \(v\in \ker (C^t)\cap \textrm{Im}(C^j)\) not lying in \(\ker (C^{t-1})+\textrm{Im}(C^{j+1})\). We need to choose \(X_t v\in \ker (C^t)\cap \textrm{Im}(C^j)\) so that \(CX_tv=X_{t-1}^{-1}Cv\) is satisfied since \(Cv\in \ker (C^{t-1})\cap \textrm{Im}(C^{j+1})\) on which subspace the map \(X_t^{-1}\) is already defined by \(X_{t-1}^{-1}\) (as \(X_{t-1}\) is one-to-one). Moreover, \(X_{t-1}^{-1}Cv\) lies in \(\ker (C^{t-1})\cap \textrm{Im}(C^{j+1})\) by assumption \((ii)_{t-1}\). In particular, there exists a vector \(w\in \textrm{Im}(C^{j})\) such that \(Cw=X_{t-1}^{-1}Cv\) and \(w\in \ker (C^t)\) (as we have \(C^tw=C^{t-1}X_{t-1}^{-1}Cv=0\)). Further, w is unique upto \(\ker (C)\cap \textrm{Im}(C^j)\), so the possible values of \(X_tv\) is exactly \(w+(\ker (C)\cap \textrm{Im}(C^j))\) which has cardinality \(\# (\ker (C)\cap \textrm{Im}(C^j))=p^{r_{j+1}+\dots +r_k}\). Finally, we let v run on the lift of a basis of the quotient space \((\ker (C^t)\cap \textrm{Im}(C^j))/(\ker (C^{t-1})+\textrm{Im}(C^{j+1}))\) for any \(j=k-t,k-t-1,\dots ,1,0\) (noting \(\textrm{Im}(C^{k-t+1})\subseteq \ker (C^{t-1})\)) we deduce that the number of extensions of \(X_{t-1}\) to a map \(X_t:\ker (C^t)\rightarrow \ker (C^t)\) satisfying (i) and \((ii)_t\) is

$$\begin{aligned}&\prod _{j=0}^{k-t}\# (\ker (C)\cap \textrm{Im}(C^j))^{\dim _{\mathbb {F}_p}(\ker (C^t)\cap \textrm{Im}(C^j))/(\ker (C^{t-1})+\textrm{Im}(C^{j+1}))}\\&\quad =p^{r_t(r_1+\cdots +r_k)+r_{t+1}(r_2+\cdots +r_k)+\cdots +r_k(r_{k-t+1}+\cdots +r_k)} \end{aligned}$$

as we have \(\dim _{\mathbb {F}_p}(\ker (C^t)\cap \textrm{Im}(C^j))/(\ker (C^{t-1})+\textrm{Im}(C^{j+1}))=r_{j+t}\). \(\square \)

The statement follows from the above lemma by taking \(t=k\): the number of solutions of \(XCX=C\) in invertible X equals

$$\begin{aligned}&\# P_{(r_1,\dots ,r_k)}\prod _{2\le j\le k}p^{r_j(r_1+\cdots +r_k)+r_{j+1}(r_2+\cdots +r_k)+\cdots +r_k(r_{k-j+1}+\dots +r_k)}\\&\quad =\left( \prod _{j=1}^k(p^{r_j}-1)(p^{r_j}-p)\dots (p^{r_j}-p^{r_j-1})\right) \cdot p^{\sum _{j=2}^k(j-1)r_j^2+\sum _{1\le i<j\le k}2ir_ir_j}\\&\quad =\left( \prod _{j=1}^k(p^{r_j}-1)(p^{r_j}-p)\dots (p^{r_j}-p^{r_j-1})\right) \cdot p^{\sum _{j=1}^k(d_j^2-r_j^2)} \end{aligned}$$

as claimed.\(\square \)

Corollary 4.14

Assume C is nilpotent. Then we have \(n(C,C;p)\le p^{\sum _{j=1}^kd_j^2}\le p^{\textrm{rk}(C)^2+(n-\textrm{rk}(C))^2}\).

Proof

Using Proposition 4.12 we compute

$$\begin{aligned}&n(C,C;p)=\left( \prod _{j=1}^k(p^{r_j}-1)(p^{r_j}-p)\dots (p^{r_j}-p^{r_j-1})\right) \cdot p^{\sum _{j=1}^k(d_j^2-r_j^2)}\\&\quad \le \left( \prod _{j=1}^k(p^{r_j})^{r_j})\right) \cdot p^{\sum _{j=1}^k(d_j^2-r_j^2)} =p^{\sum _{j=1}^kd_j^2}\le p^{d_1^2+(n-d_1)^2}=p^{\textrm{rk}(C)^2+(n-\textrm{rk}(C))^2} \end{aligned}$$

by noting \(d_1=\dim \ker (C)=n-\textrm{rk}(C)\) and \(\sum _{j=1}^kd_j=n\). \(\square \)

Remark 4.15

For fixed n and \(p\rightarrow \infty \) the above upper estimate \(p^{\sum _{j=1}^k d_j^2}\) is in fact the order of magnitude of n(CCp):

$$\begin{aligned} \frac{n(C,C;p)}{p^{\sum _{j=1}^k d_j^2}}=\prod _{j=1}^k\prod _{t_j=1}^{r_j}\left( 1-\frac{1}{p^{t_j}}\right) >\left( 1-\frac{1}{p}\right) ^n\ . \end{aligned}$$

5 The proofs of the bounds

5.1 The equation \(AX\equiv X^{-1}B\) modulo prime powers

Assume that \(A,B\in M_n(\mathbb {Z})\). We are interested in estimating the size of the affine variety \(V_{A,B}(p^l)\) where

$$\begin{aligned} V_{A,B}(p^l)=\{ X\in GL_n(\mathbb {Z}/p^l\mathbb {Z})\mid AX = X^{-1}B\}. \end{aligned}$$
(19)

We will collect elements of \(V_{A,B}(p^{l+1})\) according to their image in \(V_{A,B}(p^l)\). The final push down to \(l=1\) will play a special role and we let

$$\begin{aligned} V_{A,B}^{(C)}(p^l)=\{ X\in V_{A,B}(p^l) \mid AX\equiv X^{-1}B \equiv C {{\,\textrm{mod}\,}}p\}. \end{aligned}$$
(20)

Let now \(X_0 \in V_{A,B}^{(C)}(p^l)\) be given. Then all \(X \in V_{A,B}(p^{l+1})\) such that \(X\equiv X_0 {{\,\textrm{mod}\,}}p^l\) may be written as \(X=X_0(I+p^l Y)\), for some \(Y {{\,\textrm{mod}\,}}p\). The goal is to bound \(Y\) for which (19) also holds \({{\,\textrm{mod}\,}}p^{l+1}\). This leads to

$$\begin{aligned} AX_0 Y+Y X_0^{-1}B \equiv (X_0^{-1}B-AX_0)/p^l {{\,\textrm{mod}\,}}p. \end{aligned}$$

Since \(X_0 \in V_{A,B}^{(C)}(p^l)\) we have that \(Y\) is a solution to the Sylvester equation

$$\begin{aligned} CY+Y C \equiv (X_0^{-1}B-AX_0)/p^l {{\,\textrm{mod}\,}}p. \end{aligned}$$
(21)

Note that the equation above might have no solution, or exactly as many solution as

$$\begin{aligned} CY=-YC {{\,\textrm{mod}\,}}p \end{aligned}$$

for which we may apply Proposition 3.3 and its corollary. This gives

Lemma 5.1

Let \( {{\,\textrm{rk}\,}}C = r\), \({{\,\textrm{rk}\,}}_\infty C=r_\infty \). If \(l\ge 1\) and \(p\ne 2\) then

$$\begin{aligned} \#V_{A,B}^{(C)}(p^{l+1}) \le p^{(n-r)(n-r_\infty )+r_\infty ^2/2}\#V_{A,B}^{(C)}(p^{l}). \end{aligned}$$

Proof

This is merely a restatement of Lemma 3.3 and (12). First note that

$$\begin{aligned} \dim \ker C =n-r \text { and } \dim \ker C^n=n-r_\infty . \end{aligned}$$

To simplify the contribution of the non-zero eigenvalues in (12) use that

$$\begin{aligned} 2 d_\infty (A-\lambda )d_\infty (A+\lambda ) \le \frac{ \bigl ( d_\infty (A-\lambda )+d_\infty (A+\lambda ) \bigr )^2}{2} \end{aligned}$$

and that for \(a_1,...,a_k\) positive integers, \( a_1^2+\cdots +a_k^2\le \left( a_1+\cdots +a_k \right) ^2 \). \(\square \)

This estimate is wasteful since the solution set could be empty. However, this will suffice for us.

The proof of Theorem 1.6

When \(l=1\) the bound for \(N^*(C,C;p)\) follows from Theorem 1.5 together with exact formulae in Proposition 3.1 which gave Corollaries 4.8 and 4.14.

To see this, note that we may decompose \(C\) (over the ground field \(\mathbb {F}_p\)) as a block matrix with one block invertible of size \(r_\infty \) and one block nilpotent of size \( n-r_\infty \).

For the invertible part we have the upper bound \( 2^{r_\infty } p^{r_\infty ^2/2} \) using Corollary 4.8 for each irreducible factor \(\ne X\) of the minimal polynomial of \(C^2\), noting that there at most \(r_\infty \) such factors, and applying \( a_1^2+\cdots +a_k^2\le \left( a_1+\cdots +a_k \right) ^2 \) for positive integers \(a_1,...,a_k\). The nilpotent block has rank \(r-r_\infty \) and so by Corollary 4.14 we have the upper bound \( p^{(n-r)^2 + (r-r_\infty )^2} \) from which

$$\begin{aligned} N^*(C,C;p) \le 2^{r_\infty } p ^{(n-r)^2 + (r-r_\infty )^2 + r_\infty ^2/2} . \end{aligned}$$
(22)

This proves the first statement in case \(l=1\). Whenever \( l>1\) we use Lemma 5.1 inductively to get that

$$\begin{aligned} N^*(C,C;p^l)\le 2^{r_\infty } p^{ e(l,n,r,r_\infty )} \end{aligned}$$

where

$$\begin{aligned} e(l,n,r,r_\infty )= (n-r)^2 + (r-r_\infty )^2 + r_\infty ^2/2 +(l-1)\left( (n-r)(n-r_\infty ) + r_\infty ^2/2\right) . \end{aligned}$$

In order to prove the second statement, note that unless \(N^*(A,B;p^l)=0\), we find a common value \(C:=AX_0=X_0^{-1}B\) such that \(N^*(A,B;p^l)=N^*(C,C;p^l)\) and put \(r_\infty :=r_\infty (C)\). Moreover, from Lemma 4.1 the value of \(r_\infty \) is the same for any of the \( C \)-s that arise. So we are bound to estimate \(e(l,n,r,r_\infty )\).

To simplify the exponent assume first that \( n/2\le r \le n \). A calculation shows that the maximum of the function

$$\begin{aligned} (1-x)^2 + (x-y)^2 + y^2/2 +(l-1)\left( (1-x)(1-y) + y^2/2 \right) \end{aligned}$$

on the domain

$$\begin{aligned} D=\{(x,y) \mid 1/2\le x \le 1,\, 0 \le y \le x\} . \end{aligned}$$
(23)

is \(l/2\), proving the claim in this case.

For \(r< n/2\) we use that \( (n-r)(n-r_\infty ) + r_\infty ^2/2 \le n(n-r)\), and so

$$\begin{aligned} e(l,n,r,r_\infty ) \le ln(n-r) \end{aligned}$$

in view of \((n-r)^2+r^2-n(n-r)=r(2r-n)\le 0. \)

This establishes both bounds in (7).

Finally to prove the universal bound \( N^*(A,B;p^l)\le 2^np^{l(n^2-n)}\) note that it holds trivially for \(l\ge 2\), since \(n^2-n\ge n^2/2\).

For \(N^*(A,B;p)\) start with the bound in (22). Note that if \(r=n\) then also \(r_\infty =n\) and so it is enough to prove that, for \( 0 \le r_\infty \le r\), \(1 \le r \le n-1\), we have

$$\begin{aligned} (n-r)^2 + (r-r_\infty )^2 + r_\infty ^2/2 \le n^2-n. \end{aligned}$$

However since \(0\le r_\infty \le r\) we have

$$\begin{aligned} (r-r_\infty )^2 + r_\infty ^2/2 \le r^2 \, \end{aligned}$$

and for \(1 \le r \le n-1\)

$$\begin{aligned} (n-r)^2+r^2 \le n^2-2n+2. \end{aligned}$$

Finally \(n^2-2n+2 \le n^2-n \) holds since \(n\ge 2\). \(\square \)

5.2 Gauss sums of matrices

There are various ways exponential sums with quadratic functions of the entries of an \(n \times n\) matrix arise. For example in the theory of Siegel modular forms \(Q(X)={{\,\textrm{Tr}\,}}X^tAX\), and the associated Gauss sums play an important role see e.g [18]. These have a very different flavor than ours, as the tensor properties allow one to diagonalize \(A\), which immediately yields a diagonalization of the quadratic form \(Q(x_{11},x_{12},\dots ,x_{nn})\). This approach is not directly applicable to our situation since we have \(Q(X)= {{\,\textrm{Tr}\,}}TX^2\) for some matrix \(T\). While this case appeared in the literature, see e.g. [7] our treatment is based directly on Proposition 3.5 and its corollary 3.6.

The proof of Proposition 1.8

We have to estimate the sum

$$\begin{aligned} S_{A,B}(X;p) = \sum _{U {{\,\textrm{mod}\,}}p} \psi ((SU+TU^2)/p), \end{aligned}$$

where \(X\) is such that \(AX\equiv X^{-1}B {{\,\textrm{mod}\,}}p^l\), and where \(S=(AX-X^{-1}B)/p^l\) and \(T=AX {{\,\textrm{mod}\,}}p\). This is clearly a general Gauss sum. To apply Corollary 3.6 let \(B(U,Y)=Q(U+Y)-Q(U)-Q(Y)={{\,\textrm{Tr}\,}}((TY+YT)U )\) be the associated bilinear form.

We have that either \(S_{A,B}(X;p)=0\) or there exists \(Y\) such that \({{\,\textrm{Tr}\,}}SU= B(U,Y) \) for some \(Y\) in which case

$$\begin{aligned} S_{A,B}(X;p)=\left( \frac{\det (Q_1)}{p}\right) e^{-2\pi iQ(Y)/p} p^{n^2-R}\, \end{aligned}$$

where \(R\) is the rank of the quadratic form \(Q(X)={{\,\textrm{Tr}\,}}(TX^2)\).

Whether there exists \(Y\) such that \({{\,\textrm{Tr}\,}}SU=B(U,Y)\) for all U is again determined by the solubility of a Sylvester equation

$$\begin{aligned} TY+YT=S. \end{aligned}$$

Moreover it implies that the rank of \(Q\) is \(R=n^2-K\), where \(K=\# \{ Y: TY+YT\equiv 0 \mod p\}\). This is estimated as in Lemma 5.1 using Lemma 3.3 which gives the claim.

5.3 Bounding \(K_n(A,B;p^k)\)

The proof of Theorem 1.9

The case of \(k=1\) was handled in [6].

When \(k=2\,l\) we have \(|K_n(A,B;p^k)|\le p^{ln^2}N^*(A,B;p^l)\) by Proposition 1.1 noting that any mod \(p^l\) solution of the matrix equation \(AX\equiv X^{-1}B\pmod {p^l}\) has exactly \(p^{ln^2}\) lifts to \(GL_n(\mathbb {Z}/p^k\mathbb {Z})\).

Similarly, in case \(k=2\,l+1\) (\(l\ge 1\)) we deduce \(|K_n(A,B;p^k)|\le p^{ln^2}N^*(A,B;p^l)\max _X|S_{A,B}(X;p)|\) from Proposition 1.1. The estimate for the Gauss sum \(S_{A,B}(X;p)\) is given in Proposition 1.8 while the estimate for \(N^*(A,B;p^l)\) is in Theorem 1.6 both in the general case and under the stronger assumption \(\gcd (\det A, \det B,p)=1\).