1 Introduction

Let G be a group and S a symmetric (\(S = S^{-1}\)) subset of G. Write \({\text {Cay}}(G, S)\) for the associated Cayley graph: the graph whose vertices are the elements \(g \in G\) and whose edges are pairs \(\{g, sg\}\) with \(g\in G, s\in S\). The graph \({\text {Cay}}(G, S)\) is connected if and only if S generates G, and its diameter is equal to the smallest d such that \((S \cup \{1\})^d = G\). A well-known conjecture of Babai [9] states that

$$\begin{aligned} {\text {diam}}{\text {Cay}}(G, S) = (\log |G|)^{O(1)}, \end{aligned}$$

uniformly over all nonabelian finite simple groups G and symmetric generating sets S. In other words, every connected Cayley graph of a nonabelian finite simple group has diameter within a power of the trivial lower bound.

By the classification of finite simple groups, Babai’s conjecture splits into essentially three broad cases:

  1. 1.

    Groups of Lie type of bounded rank over \({\mathbf {F}}_q\) with \(q \rightarrow \infty \);

  2. 2.

    Classical groups of unbounded rank over \({\mathbf {F}}_q\) with q arbitrary;

  3. 3.

    Alternating groups \(A_n\) with \(n \rightarrow \infty \).

For groups of Lie type and bounded rank, Babai’s conjecture is now completely resolved, following breakthrough work of Helfgott [23], Pyber–Szabó [39], and Breuillard–Green–Tao [5]. In the other two cases the conjecture remains open. For the alternating groups, Helfgott and Seress [25] proved that

$$\begin{aligned} {\text {diam}}{\text {Cay}}(A_n, S) = \exp O((\log n)^4 \log \log n). \end{aligned}$$

For comparison, Babai’s conjecture (folkloric in this case) asserts that

$$\begin{aligned} {\text {diam}}{\text {Cay}}(A_n, S) = n^{O(1)}; \end{aligned}$$

thus we have a quasipolynomial bound instead of the expected polynomial bound. The case of classical groups of unbounded rank on the other hand is still wide open. The best bounds currently known are due to Biswas–Yang and Halasi–Maróti–Pyber–Qiao:

$$\begin{aligned} {\text {diam}}{\text {Cay}}(G, S)&\le q^{O(n (\log n + \log q)^3)}&\text {([BY17])};\nonumber \\ {\text {diam}}{\text {Cay}}(G, S)&\le q^{O(n (\log n)^2)}&\text {([HMPQ19])}. \end{aligned}$$
(1)

By contrast, Babai’s conjecture in this case asserts that

$$\begin{aligned} {\text {diam}}{\text {Cay}}(G, S) \le (n \log q)^{O(1)}, \end{aligned}$$

so we are still exponentially stupid. A key open case is the family of groups \({\text {SL}}_n(2)\) with n tending to infinity.

In all cases, an important subproblem is the case of random generators (see, e.g., [38, Problem 10.8.6]). Let \(k \ge 2\) be a small constant and let \(S = \{x_1^{\pm 1}, \ldots , x_k^{\pm 1}\}\), where \(x_1, \ldots , x_k \in G\) are uniform and independent. For groups of Lie type of bounded rank, it was proved by Breuillard, Green, Guralnick, and Tao [4] that \({\text {Cay}}(G, S)\) is almost surelyFootnote 1 an expander, and in particular

$$\begin{aligned} {\text {diam}}{\text {Cay}}(G, S) = O(\log |G|). \end{aligned}$$

There is no consensus about whether such a strong bound is likely to hold for groups of unbounded rank. Babai’s conjecture for \(A_n\) and random generators was an open problem for some time. The first polynomial bound was proved by Babai and Hayes, and the exponent has been lowered by Schlage-Puchta and Helfgott–Seress–Zuk:

$$\begin{aligned} {\text {diam}}{\text {Cay}}(A_n, S)&\le n^{7+o(1)}&\text {([BH05])}; \nonumber \\ {\text {diam}}{\text {Cay}}(A_n, S)&\le O(n^3 \log n)&\text {([SP12])}; \nonumber \\ {\text {diam}}{\text {Cay}}(A_n, S)&\le n^2 (\log n)^{O(1)}&\text {([HSZ15])}. \end{aligned}$$
(2)

In this paper we consider the case of high-rank classical groups over a small field. Recall that these are obtained from the groups

$$\begin{aligned} \begin{array}{llll} {\text {GL}}_n(q),&{\text {Sp}}_n(q),&{\text {GO}}_{n}^{(\pm )}(q),&{\text {GU}}_n(q), \end{array} \end{aligned}$$

of automorphisms of a finite vector space \(V = {\mathbf {F}}_q^n\), in the latter three cases equipped with a nondegenerate alternating, quadratic, or hermitian form, respectively. Throughout we write \({\text {GCl}}_n(q)\) for any of these groups, and \({\text {SCl}}_n(q)\) for the corresponding derived subgroup

$$\begin{aligned} \begin{array}{llll} {\text {SL}}_n(q),&{\text {Sp}}_n(q),&\Omega _{n}^{(\pm )}(q),&{\text {SU}}_n(q). \end{array} \end{aligned}$$

We will write \({\text {Cl}}_n(q)\) for any intermediate group:

$$\begin{aligned} {\text {SCl}}_n(q) \le {\text {Cl}}_n(q) \le {\text {GCl}}_n(q). \end{aligned}$$

Omitting a few small exceptional cases, \({\text {SCl}}_n(q)\) is a quasisimple group, so Babai’s conjecture applies.Footnote 2 For \({\text {SCl}}_n(q)\) with n large and random generators, the best bound out there is just the uniform bound (1).

There is a promising programme of Pyber, which aims to prove Babai’s conjecture in three steps. The programme is motivated by the positive solution in the case of random generators in alternating groups, especially the result of Babai–Beals–Seress [3] that \({\text {diam}}{\text {Cay}}(A_n, S) \le n^{O(1)}\) provided only that S contains an element of degree at most \(n/(3 + \epsilon )\). Here the degree of a permutation is the number of non-fixed points. Analogously, the degree of an element \(g \in {\text {GL}}_n(q)\) is defined to be the rank of \(g - 1\), and Pyber’s programme is the following.

  1. 1.

    Given some generators, find an element whose degree is at most \((1-\epsilon )n\).

  2. 2.

    Given an element of degree \((1-\epsilon )n\), find an element of minimal degree.

  3. 3.

    Given an element whose degree is minimal, finish the proof.

In the case of alternating groups, step 3 is essentially trivial, since there are only \(O(n^3)\) 3-cycles in \(A_n\), but for \({\text {SCl}}_n(q)\) it is highly nontrivial. In the case of \({\text {SL}}_n(p)\), p prime, step 3 was accomplished recently by Halasi [22].

We have two things to contribute in the case of large n, small q. First, assuming we have at least 3 random generators, we will do steps 1 and 2 of Pyber’s programme.

Theorem 1.1

Let \(G = {\text {Cl}}_n(q)\), and assume \(\log q < c n / \log ^2 n\) for a sufficiently small constant \(c>0\). Let \(x, y, z \in G\) be random. Then with probability \(1 - e^{-cn}\) there is a word \(w \in F_3\) of length \(n^{O(\log q)}\) such that w(xyz) has minimal degree in \(G' = {\text {SCl}}_n(q)\).

Combined with Halasi’s result, this settles Babai’s conjecture for \({\text {SL}}_n(p)\), p prime and bounded, with at least 3 random generators.

Theorem 1.2

Let \({\text {SL}}_n(p) \le G \le {\text {GL}}_n(p)\), where p is prime and \(\log p < cn / \log ^2 n\). Let xyz be elements of G chosen uniformly at random, and let \(S = \{ x^{\pm 1}, y^{\pm 1}, z^{\pm 1} \}\). Then with probability \(1 - e^{-cn}\) we have

$$\begin{aligned}&\langle S \rangle \ge {\text {SL}}_n(p),~\text {and} \\&{\text {diam}}{\text {Cay}}(\langle S \rangle , S) \le n^{O(\log p)}. \end{aligned}$$

Second, assuming we have sufficiently many random generators depending on q, we will do step 3 in a particularly satisfactory way. In fact, we will prove that the Schreier graph of the action of G on O(1)-tuples of vectors is almost surely a union of expander graphs. (The analogous result for the symmetric group is a result of Friedman, Joux, Roichman, Stern, and Tillich [15], and was essential in [26].)

Theorem 1.3

Let \(G = {\text {Cl}}_n(q)\), and let \(x_1, \ldots , x_k \in G\) be random. Let W be the set of r-tuples of vectors in the natural module \(V = {\mathbf {F}}_q^n\). Assume that \(r < cn^{1/3}\), and that \(k \ge q^{C r^3}\). Then almost surely the Schreier graph of G generated by \(x_1, \ldots , x_k\) on any of its orbits in W has a uniform spectral gap.

As we will explain, this implies that if we have an element of minimal degree then by conjugation we can rapidly obtain a full conjugacy class of elements of minimal degree, and it follows in short order that the diameter of G is not too large. This completes the proof of Babai’s conjecture for \({\text {SCl}}_n(q)\) for k random generators, as long as k is sufficiently large compared to q.

Theorem 1.4

There are constants \(c, C>0\) so that the following holds. Let \(G = {\text {Cl}}_n(q)\), where \(n > C\). Let \(x_1, \ldots , x_k\) be elements of G chosen uniformly at random, where \(k > q^C\), and let \(S = \{ x_1^{\pm 1}, \ldots , x_k^{\pm 1}\}\). Then with probability \(1 - q^{-cn}\) we have

$$\begin{aligned}&\langle S \rangle \ge {\text {SCl}}_n(q),~\text {and} \\&{\text {diam}}{\text {Cay}}(\langle S \rangle , S) \le q^2 n^C. \end{aligned}$$

Corollary 1.5

Babai’s conjecture holds in the following two cases:

  1. (1)

    \({\text {SL}}_n(p)\), p prime and bounded, and at least 3 random generators;

  2. (2)

    \({\text {SCl}}_n(q)\) and at least \(q^C\) random generators, where C is an absolute constant.

Our method does not depend on the classification of finite simple groups (CFSG) in any way. Having a CFSG-free method is valuable for transparency, but moreover we think it is essential for attacking Babai’s conjecture. It is well-known that two random elements of \({\text {SCl}}_n(q)\) almost surely generate the group: this is a result of Kantor and Lubotzky [30]. Kantor and Lubotzky rely on CFSG through Aschbacher’s theorem, so unfortunately their method does not adapt well to proving diameter bounds. By contrast, in [14] the first author and Virchow found a CFSG-free proof in the case of \({\text {SL}}_n(q)\) and expressed the hope that the method would be generalizable. We recycle several ideas from that paper in the present one.

Perhaps the most important idea in our method is the idea that if \(x, y, z \in G\) are random and independent, then the elements xw(yz) for all short words \(w \in F_2\) behave roughly independently, which allows us to imitate having many more than just 3 generators. This is a more powerful version of the “\(xy^i\) trick”, which comes originally from [3, Sect. 4] and has been essential in all subsequent work on the random generator subproblem in high rank.

Let us mention one further result, of independent interest. In the appendix we give analogous arguments for \(A_n\), based on the standard fanciful idea that \(A_n = {\text {PSL}}_n(1)\). The value of doing so is mostly motivational, but we also obtain a new result. Provided \(k \ge 3\), we sharpen (2) to

$$\begin{aligned} {\text {diam}}{\text {Cay}}(A_n, S) \le O(n^2 \log n). \end{aligned}$$

This is a modest improvement, but it is interesting for being conjecturally sharp for any proof which uses elements of small support as a stepping stone. Decreasing the exponent 2 appears to require a radically new idea.

1.1 Reader’s guide

We first record some preliminaries (Sect. 2) regarding asymptotic notation, Cayley and Schreier graphs, classical groups and their associated formed spaces and the notions of degree and support, and adjacency operators.

Next we turn to a more specialized preparatory section (Sect. 3) dealing with word maps, where we introduce the vocabulary of queries, coincidences, and trajectories. Briefly, the idea is that if \(w \in F_k\) is a given word, \(v \in V\) a given vector, and \(x_1, x_2, \ldots , x_k \in G\) random, then evaluating \(w(x_1, \ldots , x_k) v\) can be thought of as a kind of random walk. As much as possible we recycle the key language used by [15] in the case of the symmetric group. The tools of this section will be used in two essentially different ways in the rest of the paper.

We proceed (Sect. 4) by showing that a given short word w evaluated at random elements \(x_1, \ldots , x_k \in G\) almost surely has large support (Theorem 4.2). This is a kind of antithesis to step 1 of Pyber’s programme: all sufficiently short words in random generators will in fact fail to have degree \((1-\epsilon )n\). However, this is interesting when combined with recent character bounds of Guralnick–Larsen–Tiep [18, 19], as it implies that the character ratio \(\chi (w(x_1, \ldots , x_k)) / \chi (1)\) is almost surely small for each nonlinear character \(\chi \) (Corollary 5.3).

This bound on the expectation of \(\chi (w(x_1, \ldots , x_k)) / \chi (1)\) is one of the two main ingredients in the “xw(yz) trick”, which is the subject of Sect. 6. This trick shows that, given random generators \(x_0, x_1, \ldots , x_k\), one can almost surely find a short word \(x_0 w(x_1, \ldots , x_k)\) lying in a given normal subset \({\mathfrak {C}}\subseteq G\), provided that the density of \({\mathfrak {C}}\) is large compared to the expected values of character ratios. The trick is a simple consequence of the second moment method, following the observation that the elements \(x_0 w(x_1, \ldots , x_k)\) for various w are approximately pairwise independent.

The other main ingredient is the construction of an appropriate normal set \({\mathfrak {C}}\). This is the subject of Sect. 7. For each classical group we find a large normal set \({\mathfrak {C}}\), all of whose fibres over \(G^\text {ab}\) are large (allowing us to ignore linear characters), and a small integer m such that for every \(g \in {\mathfrak {C}}\) the power \(g^m\) has minimal degree in \({\text {SCl}}_n(q)\). This completes the proof of Theorem 1.1.

Once we have an element of minimal degree, we can act on that element by conjugation. Since the minimal degree in all cases is at most 2, this action is a constituent of the usual permutation action on 4-tuples of vectors. We analyze this action by again using the language of trajectories and coincidences, and the trace method: we bound a high moment of the second eigenvalue by bounding the trace of the corresponding power of the adjacency matrix, interpretting the latter in terms of closed trajectories. This is analogous to a result for the symmetric group due to Friedman, Joux, Roichman, Stern, and Tillich [15], building on earlier work of Broder–Shamir [8]. However, in the case of classical groups there are some extra combinatorial complications that do not arise for symmetric groups.

We first focus (Sect. 8) on describing the structure of a closed trajectory with only one coincidence. We deal with the motivational case of G acting on V first, and then generalize to the action on tuples of vectors.

These results are then (Sect. 9) used to show that, in an orbit of G of size N, the probability that a trajectory closes is close to 1/N, with a small relative error. Again we first deal with the motivational case of G acting on V. Provided that we have sufficiently many generators in terms of q, these bounds are good enough for the trace method to work. This completes the proof of Theorem 1.3.

Finally, in Sect. 10 we collect results and deduce Theorems 1.2 and 1.4.

Many (but not all) of our arguments have natural analogues for the symmetric group. For independent interest and for motivation, these are presented in Appendix A.

2 Preliminaries

This section fixes some notation and definitions that will be relevant throughout the paper. The reader needing an introduction to expansion, particularly in Cayley and Schreier graphs, could consult Kowalski [31]. For an introduction to classical groups, see Aschbacher [2, Chapter 7] or Grove [20].

2.1 Asymptotic notation

Many of the arguments we will use are of asymptotic nature and we adopt standard asymptotic notation to state these. Given functions fg, we write \(f \ll g\) or equivalently \(f = O(g)\) to denote that there are absolute constants \(N, C > 0\) so that \(|f(n)| \le C \cdot g(n)\) for all \(n \ge N\). Let \(f \asymp g\) mean that \(f \ll g\) and \(g \ll f\). We write \(f = o(g)\) to denote that for every \(\epsilon > 0\) there is a constant N so that \(|f(n)| \le \epsilon \cdot g(n)\) for all \(n \ge N\). Let \(f = \omega (g)\) mean that \(g = o(f)\).

We will generally write statements that involve anonymous (usually absolute) constants by using c for small constants and C for big constants.

2.2 Cayley and Schreier graphs

Let G be a group with generating set S satisfying \(S = S^{-1}\). The (undirected, left) Cayley graph \({\text {Cay}}(G,S)\) is the graph whose vertices are elements of G and whose edges are pairs \(\{ g, s g \}\) for \(g \in G, s \in S\).

More generally, the (undirected) Schreier graph \({\text {Sch}}(G,S,\Omega )\) associated to a transitive action of G on a set \(\Omega \) is the graph whose vertices are elements of \(\Omega \) and whose edges are pairs \(\{ \omega , s \omega \}\) for \(\omega \in \Omega , s \in S\). Cayley graphs are Schreier graphs for the left regular representation of G on itself.

Let \(\Gamma \) be a connected graph. One can view \(\Gamma \) as a metric space in the following way. Define the length of a path in \(\Gamma \) to be the number of edges on the path, and let the distance \(d_\Gamma (v_1, v_2)\) between any two vertices \(v_1, v_2 \in V(\Gamma )\) be the length of the shortest path between \(v_1, v_2\). The diameter of a graph \(\Gamma \) is

$$\begin{aligned} {\text {diam}}\Gamma = \max _{v_1, v_2 \in V(\Gamma )} d_{\Gamma }(v_1, v_2). \end{aligned}$$

The diameter of \({\text {Cay}}(G, S)\) is just the smallest \(d \ge 0\) such that \((S \cup \{1\})^d = G\).

2.3 Classical groups

Throughout the paper we write \({\text {SCl}}_n(q) \le {\text {GCl}}_n(q) \le {\text {GL}}_n(q)\) for any of the following groups:

$$\begin{aligned} \begin{array}{rlllll} {\text {GCl}}_n(q):&{} &{}{\text {GL}}_n(q), &{}{\text {Sp}}_n(q), &{}{\text {GO}}_{n}^{(\pm )}(q), &{}{\text {GU}}_n(q), \\ {\text {SCl}}_n(q):&{} &{}{\text {SL}}_n(q), &{}{\text {Sp}}_n(q), &{}\Omega _{n}^{(\pm )}(q), &{}{\text {SU}}_n(q). \end{array} \end{aligned}$$

In all cases the defining module is \(V = {\mathbf {F}}_q^n\). We sometimes refer to the first case as the linear case. We make the following conventions in the other cases (notation in other literature sometimes differs, particular in the \({\text {GU}}\) case):

\({\text {Sp}}_n\)::

n must be even.

\({\text {GO}}_n^{(\pm )}\)::

\(\Omega _n(q) = {\text {SO}}_n(q)'\). If n is even there are two possibilities, denoted \({\text {GO}}_n^+(q)\) and \({\text {GO}}_n^-(q)\), depending on the choice of quadratic form. If n is odd there is only \({\text {GO}}_n(q)\), and q must be odd.

\({\text {GU}}_n\) ::

q must be a square \(q_0^2\). The field automorphism of \({\mathbf {F}}_q\) of order 2 is denoted \(\theta \).

We write \({\text {Cl}}_n(q)\) for any intermediate group:

$$\begin{aligned} {\text {SCl}}_n(q) \le {\text {Cl}}_n(q) \le {\text {GCl}}_n(q). \end{aligned}$$

Note that any such group corresponds to a subgroup of the abelianization \({\text {GCl}}_n(q)^\text {ab}\), which is given as follows:

$$\begin{aligned} {\text {GL}}_n(q)^\text {ab}&\cong {\mathbf {F}}_q^\times ,\\ {\text {Sp}}_n(q)^\text {ab}&\cong 1,\\ {\text {GO}}_n^{(\pm )}(q)^\text {ab}&\cong C_2 \times C_2&(q~\text {odd}, n\ge 2),\\ {\text {GO}}_n^{\pm }(q)^\text {ab}&\cong C_2&(q~\text {even}, n~\text {even}),\\ {\text {GU}}_n(q)^\text {ab}&\cong \{u \in {\mathbf {F}}_q : u u^\theta = 1\}. \end{aligned}$$

2.4 Binary and quadratic forms

In all cases we write f for the defining invariant binary form; thus f is zero in the linear case, alternating in the symplectic case, symmetric in the orthogonal case, and hermitian in the unitary case. Except in the linear case, f is nondegenerate.

In the orthogonal case, we write Q for the relevant quadratic form. Recall that Q is related to f by

$$\begin{aligned} Q(u + v) = Q(u) + Q(v) + f(u, v); \end{aligned}$$
(3)

in particular, in odd characteristic,

$$\begin{aligned} Q(v) = f(v, v) / 2. \end{aligned}$$

In even characteristic, Q is not determined by f, but is part of the defining data (and f is determined by Q via (3)). In the unitary case we write Q for the function

$$\begin{aligned} Q(v) = f(v, v), \end{aligned}$$

which we may regard as a quadratic form over \({\mathbf {F}}_{q_0}\). In the other cases define \(Q \equiv 0\). Define also \(q_0 = q\) in the orthogonal case and \(q_0=1\) in the linear and symplectic cases, so that Q always takes values in a \(q_0\)-element space.

It is important that we are able to count solutions to \(Q(v) = x\) in any affine subspace.

Lemma 2.1

Let \(v_0 + W\) be an affine subspace of V of codimension s. The number of \(v \in v_0 + W\) with a specified value of Q(v) is within \(q^{n - s}/q_0 \pm q^{n/2}\).

Proof

(Cf. Dickson [11, Chapter IV].) This is trivial in the linear and symplectic cases: \(Q \equiv 0\), so the number is exactly \(q^{n-s}\). The unitary case reduces to the orthogonal case by restriction of scalars, so it suffices to consider the orthogonal case.

For \(x \in {\mathbf {F}}_q\), let

$$\begin{aligned} \Phi (x) = |\{v \in v_0 + W : Q(v) = x\}|. \end{aligned}$$

The Fourier transform of \(\Phi \) is

$$\begin{aligned} \widehat{\Phi }(\chi )&= \sum _{x \in {\mathbf {F}}_q} \Phi (x) \overline{\chi (x)}&(\chi \in \widehat{{\mathbf {F}}_q}) \\&= \sum _{w \in W} \chi (-Q(v_0 + w)). \end{aligned}$$

For nontrivial \(\chi \) we have

$$\begin{aligned} |\widehat{\Phi }(\chi )|^2&= \sum _{w, h \in W} \chi (-Q(v_0+w) + Q(v_0+w+h)) \\&= \sum _{w, h \in W} \chi (Q(h) + \Phi (v_0+w,h)). \end{aligned}$$

The sum over w is zero unless \(h \in W^\perp \). Note that \(\dim W^\perp = s\). Hence

$$\begin{aligned} |\widehat{\Phi }(\chi )|^2 \le |W|\,|W^\perp | = q^n. \end{aligned}$$

By Fourier inversion we have

$$\begin{aligned} \Phi (x) = q^{n-s-1} + \frac{1}{q} \sum _{1\ne \chi \in \widehat{{\mathbf {F}}_q}} \widehat{\Phi }(\chi ) \chi (x), \end{aligned}$$

so

$$\begin{aligned} |\Phi (x) - q^{n-s-1}| \le \frac{1}{q} \sum _{1 \ne \chi \in \widehat{{\mathbf {F}}_q}} |\widehat{\Phi }(\chi )| \le q^{n/2}. \end{aligned}$$

Relatedly, we have Witt’s lemma, which characterizes the orbits of \({\text {GCl}}_n(q)\) in terms of f and Q.

Lemma 2.2

(Witt’s lemma) Let \(u_1, \ldots , u_k, v_1, \ldots , v_k \in V\) be vectors such that

$$\begin{aligned} \dim \langle u_1, \ldots , u_k \rangle&= \dim \langle v_1, \ldots , v_k\rangle \\ f(u_i, u_j)&= f(v_i, v_j)&(1 \le i,j \le k) \\ Q(u_i)&= Q(v_i)&(1 \le i \le k). \end{aligned}$$

Then there is an element \(g \in {\text {GCl}}_n(q)\) such that \(g u_i = v_i\) for each \(1 \le i \le k\). If \(k \le n-2\) there is such an element in \({\text {SCl}}_n(q)\).

Proof

See, e.g., [2, Sect. 20]. \(\square \)

2.5 Degree and support

The concepts of degree and support are essential in the rest of the paper. Both concepts are analogous to the size of the support of a permutation, defined as the set of non-fixed points. The degree of an element \(g \in {\text {GL}}_n(q)\) is

$$\begin{aligned} \deg g = {\text {rank}}(g - 1); \end{aligned}$$

the support of \(g \in {\text {GL}}_n(q)\) is

$$\begin{aligned} {\text {supp}}g = \min _{\lambda \in \overline{{\mathbf {F}}_q}} {\text {rank}}(g - \lambda ) \end{aligned}$$

(the former definition follows [10] and [24]; the latter definition follows Larsen–Shalev–Tiep [37]). Equivalently, if \(V_\lambda = \ker (g - \lambda )\) denotes the \(\lambda \)-eigenspace of g (for \(\lambda \in \overline{{\mathbf {F}}_q}\)), then

$$\begin{aligned} \deg g&= {\text {codim}}V_1, \\ {\text {supp}}g&= \min _{\lambda \in \overline{{\mathbf {F}}_q}} {\text {codim}}V_\lambda . \end{aligned}$$

Support is closely related to the size of the centralizer, as in the following lemma.

Lemma 2.3

For \(g \in G \le {\text {GL}}_n(q)\),

$$\begin{aligned} |C_G(g)| \le q^{n (n - {\text {supp}}g)}. \end{aligned}$$

Proof

(Cf. [35, Lemma 3.1].) Clearly

$$\begin{aligned} |C_G(g)| \le |C_{{\text {M}}_n({\mathbf {F}}_q)}(g)|. \end{aligned}$$

Note that \(C_{{\text {M}}_n({\mathbf {F}}_q)}(g)\) is a vector space over \({\mathbf {F}}_q\), so it will suffice to bound its dimension. Consider g as an element of \({\text {GL}}_n(\overline{{\mathbf {F}}_q})\) and decompose it into Jordan blocks. For each eigenvalue \(\lambda \) of g, let \(\pi _\lambda \) be the partition whose parts are the sizes of Jordan blocks associated to \(\lambda \). Denote by \(S^i(\pi )\) the sum of ith powers of the parts of a partition \(\pi \) and let \(\pi '\) be the transposed partition of \(\pi \). By [27, Sect. 1.3],

$$\begin{aligned} \dim C_{{\text {M}}_n({\mathbf {F}}_q)}(g) = \sum _{\lambda } S^2(\pi _\lambda '). \end{aligned}$$

The largest part of \(\pi _\lambda '\) is the dimension of \(V_\lambda \), so

$$\begin{aligned} S^2(\pi _\lambda ') \le S^1(\pi _\lambda ') \dim V_\lambda . \end{aligned}$$

Combined with \(\sum _{\lambda } S^1(\pi _\lambda ') = n\), this implies

$$\begin{aligned} \dim C_{{\text {M}}_n({\mathbf {F}}_q)}(g) \le n \max _\lambda \dim V_\lambda = n (n - {\text {supp}}g). \end{aligned}$$

2.6 Adjacency operator

Given any group G and \(x_1, \ldots , x_k \in G\), let

$$\begin{aligned} {\mathcal {A}}= {\mathcal {A}}_{x_1, \ldots , x_k} = \frac{1}{2k} \sum _{i=1}^{k} (x_i + x_i^{-1}). \end{aligned}$$

This is an element of the group algebra \({\mathbf {C}}[G]\). Given any \({\mathbf {C}}[G]\)-module W, we may consider the action of \({\mathcal {A}}\) on W. Since \({\mathcal {A}}\) is self-adjoint its spectrum is real. Write \(\rho ({\mathcal {A}}, W)\) for the spectral radius of \({\mathcal {A}}\).

We are most interested in permutation modules. If G acts transitively on a set \(\Omega \) then there is a corresponding permutation module \({\mathbf {C}}[\Omega ]\) containing a single copy of the trivial representation, denoted \({\mathbf {C}}[\Omega ]^G\). Let \(W = {\mathbf {C}}[\Omega ]_0\) denote the orthogonal complement of \({\mathbf {C}}[\Omega ]^G\). The spectral gap is \(1 - \rho ({\mathcal {A}}, W)\). Equivalently, if \({\mathcal {A}}\) acting on \({\mathbf {C}}[\Omega ]\) has spectrum

$$\begin{aligned} 1 = \lambda _1 \ge \lambda _2 \cdots \ge \lambda _N \ge -1, \end{aligned}$$

where \(N = |\Omega |\), then

$$\begin{aligned} \rho ({\mathcal {A}}, W) = \max (\lambda _2, -\lambda _N), \end{aligned}$$

so the spectral gap is

$$\begin{aligned} \min (1 - \lambda _2, 1 - |\lambda _N|). \end{aligned}$$

We say the action of \(x_1, \ldots , x_k\) on \(\Omega \) is expanding if the spectral gap is bounded away from zero. This is equivalent to rapid mixing of the random walk on \(\Omega \).

3 Word maps, queries, and trajectories

3.1 Word maps

Write \(F_k = F\{\xi _1, \ldots , \xi _k\}\) for the free group with generators \(\{\xi _1, \ldots , \xi _k\}\). Let \(w \in F_k\) have length \(\ell \), and let

$$\begin{aligned} w = w_\ell \cdots w_1 \qquad (w_i \in \{\xi _1^{\pm 1}, \ldots , \xi _k^{\pm 1}\}) \end{aligned}$$

be the reduced expression of w. Let G be a finite group and \(x_1, \ldots , x_k \in G\). Write

$$\begin{aligned} {{\overline{w}}} = w(x_1, \ldots , x_k) \end{aligned}$$

for the image of w under the homomorphism \(F_k \rightarrow G\) defined by \(\xi _i \mapsto x_i\).

Usually, but not always, \(x_1, \ldots , x_k\) will be chosen randomly. The following lemma is often useful for reducing to the cyclically reduced case.

Lemma 3.1

If \(x_1, \ldots , x_k \in G\) are uniform and independent then \({{\overline{w}}}\) is just the image of w under a uniformly random homomorphism \(F_k \rightarrow G\). In particular, the distribution of \({{\overline{w}}}\) depends only on the automorphism class of w.

3.2 Queries and coincidences

Let \(G = {\text {Cl}}_n(q)\) be a classical group and \(V = {\mathbf {F}}_q^n\) the defining module. Let \(x_1, \ldots , x_k \in G\). Define a query to be a pair \((\xi , v)\), where \(\xi \in \{\xi _1^{\pm 1}, \ldots , \xi _k^{\pm 1}\}\) and \(v \in V\); the result of the query is \({\overline{\xi }} v\). After any finite sequence of queries

$$\begin{aligned} (w_1, v_1), (w_2, v_2), \ldots , (w_{t-1}, v_{t-1}) \end{aligned}$$

the known domain of a letter \(\xi \) at time t is

$$\begin{aligned} D_\xi ^t = {\text {span}}\{ v_i : w_i = \xi , i< t\} + {\text {span}}\{\overline{w_i} v_i : w_i = \xi ^{-1}, i < t\}. \end{aligned}$$

Suppose we make a further query \((w_t, v_t)\). If \(v_t \in D_{w_t}^t\), then the result \(\overline{w_t} v_t\) is determined already by the values of \(\overline{w_1} v_1, \ldots , \overline{w_{t-1}} v_{t-1}\); we call this a forced choice. Otherwise, we say the query is a free choice.

Let R be some subset of V fixed in advance. If a query \((w_t, v_t)\) is a free choice and yet

$$\begin{aligned} \overline{w_t} v_t \in {\text {span}}R + {\text {span}}\{v_1, \overline{w_1} v_1, \ldots , v_{t-1}, \overline{w_{t-1}} v_{t-1}, v_t\} \end{aligned}$$

then we say the result of the query is a coincidence.

The language is most interesting when \(x_1, \ldots , x_k \in G\) are chosen randomly. Then, by Witt’s lemma, whenever \((\xi , v)\) is a free choice, \({\overline{\xi }} v\) is, conditionally on the result of previous queries, uniformly distributed among vectors satisfying the relevant independence and form conditions. In particular, coincidences are unlikely. We formalize these key points in the following lemmas.

Lemma 3.2

Let \(x \in G\) be uniformly random, and let \(u_1, \ldots , u_t\) be linearly independent, where \(t \le n-2\). Then, conditionally on the values of \(v_1 = x u_1, \ldots , v_{t-1} = x u_{t-1}\), the value of \(x u_t\) is uniformly distributed among vectors \(v_t\) such that \(u_i \mapsto v_i\) defines an isometric isomorphism \(\langle u_1, \ldots , u_t\rangle \rightarrow \langle v_1, \ldots , v_t \rangle \), or in other words such that \(v_t \notin {\text {span}}\{v_1, \ldots , v_{t-1}\}\) and \(f(u_i, u_t) = f(v_i, v_t)\) for each \(i\le t\) and \(Q(u_t) = Q(v_t)\).

Proof

For each such \(v_t\), Witt’s lemma asserts that there is at least one suitable \(x \in G\). The distribution is uniform by the orbit–stabilizer theorem. \(\square \)

Lemma 3.3

Let \(x_1, \ldots , x_k \in G\) be uniformly random and independent, and let

$$\begin{aligned} (w_1, v_1), (w_2, v_2), \ldots , (w_{t-1}, v_{t-1}) \end{aligned}$$

be a sequence of queries. Assume that \((w_t, v_t)\) is a free choice. Assume

$$\begin{aligned} \dim \langle v_1, \ldots , v_t\rangle \le n-2. \end{aligned}$$

Then, conditionally on the values of \(\overline{w_1} v_1, \ldots , \overline{w_{t-1}} v_{t-1}\), the result \(\overline{w_t} v_t\) of the query \((w_t, v_t)\) is uniformly distributed outside \(D_{w_t^{-1}}^t\) subject to

$$\begin{aligned} f(\overline{w_i} v_i, \overline{w_t} v_t)&= f(v_i, v_t)&(i< t, w_i = w_t),\\ f(v_i, \overline{w_t} v_t)&= f(\overline{w_i} v_i, v_t)&(i < t, w_i = w_t^{-1}),\\ Q(\overline{w_t} v_t)&= Q(v_t). \end{aligned}$$

In particular, the conditional probability that \(\overline{w_t} v_t\) is a coincidence is bounded by

$$\begin{aligned} \frac{q^d}{q^{n-s}/q_0 - q^s - q^{n/2}} \end{aligned}$$

(provided the denominator is positive), where

$$\begin{aligned} d = \dim ({\text {span}}R + {\text {span}}\{v_1, \overline{w_1} v_1, \ldots , v_{t-1}, \overline{w_{t-1}} v_{t-1}, v_t\}) \end{aligned}$$

and s is the number of \(i < t\) with \(w_i \in \{w_t, w_t^{-1}\}\).

Proof

The first part of the lemma is immediate from the previous lemma. For the second part, note that \(\overline{w_t} v\) is drawn from an affine subspace of codimension at most s, less a subspace of dimension at most s, subject only to the quadratic condition; by Lemma 2.1 there are at least \(q^{n-s}/q_0 - q^{n/2} - q^s\) possibilities, so we get at least the denominator claimed. \(\square \)

Remark 3.4

In the linear case there are no form conditions, so we get the simpler bound \(q^d / (q^n - q^s)\) for the probability of a coincidence.

3.3 Trajectories

Let \(w \in F_k\), and let

$$\begin{aligned} w = w_\ell \cdots w_1 \qquad (w_i \in \{\xi _1^{\pm 1}, \ldots , \xi _k^{\pm 1}\}) \end{aligned}$$

be the reduced expression. For each \(v \in V\), the trajectory of v is the sequence of queries \((w_t, v^{t-1})\), where \(v^0 = v\) and for each \(t \ge 1\) the vector \(v^t\) is the result of the query \((w_t, v^{t-1})\); in other words, the sequence \(v^0, v^1, \ldots , v^\ell \) is defined by

$$\begin{aligned} v^0&= v, \\ v^t&= \overline{w_t} v^{t-1}&(1\le t\le \ell ). \end{aligned}$$

The following lemma is trivial but essential.

Lemma 3.5

Suppose \(v \ne 0\) and \(v^\ell \in {\text {span}}R\). Then there is at least one coincidence in the trajectory of v.

Proof

Since \(D_{w_1}^1 = 0\), the first query \((w_1, v^0)\) is free. For each \(t\ge 1\), if \((w_t, v^{t-1})\) is free and not a coincidence then

$$\begin{aligned} v^t = \overline{w_t} v^{t-1} \notin {\text {span}}R + {\text {span}}\{v^0, \ldots , v^{t-1}\}, \end{aligned}$$

while

$$\begin{aligned} D_{w_{t+1}}^{t+1} \le {\text {span}}\{v^0, \ldots , v^{t-1}\}; \end{aligned}$$

hence the query \((w_{t+1}, v^t)\) is also free. Finally if \((w_\ell , v^{\ell -1})\) is free and not a coincidence then \(v^\ell \notin {\text {span}}R\). \(\square \)

More generally for any \(r\ge 1\) we consider the joint trajectory of an r-tuple

$$\begin{aligned} (v_1, \ldots , v_r) \in V^r, \end{aligned}$$

which is simply the r-tuple of individual trajectories, with the queries \((w_t, v_i^{t-1})\) ordered lexicographically by (ti); i.e., we answer the queries

$$\begin{aligned} (w_1, v_1^0)&(w_1, v_2^0)&\cdots&(w_1, v_r^0) \\ (w_2, v_1^1)&(w_2, v_2^1)&\cdots&(w_2, v_r^1) \\&\vdots&\end{aligned}$$

in reading order. Write \(\prec \) for this order, i.e., \((t',i') \prec (t,i)\) if \(t' < t\) or \(t'=t\) and \(i' < i\). The following lemma generalizes the previous one.

Lemma 3.6

Suppose \(v_i \notin {\text {span}}\{v_1, \ldots , v_{i-1}\}\) and \(v_i^\ell \in {\text {span}}R\). Then there is at least one coincidence in the trajectory of \(v_i\) (during the joint trajectory of \(v_1, \ldots , v_r\)).

Proof

At time (1, i), we have

$$\begin{aligned} D_{w_1}^{(1, i)} \le {\text {span}}\{v_1, \ldots , v_{i-1}\}, \end{aligned}$$

so the first query \((w_1, v_i^0)\) is free. For each \(t\ge 1\), if \((w_t, v_i^{t-1})\) is free and not a coincidence then

$$\begin{aligned} v_i^t = \overline{w_t} v_i^{t-1} \notin {\text {span}}R + {\text {span}}\{v_{i'}^{t'} : (t', i') \prec (t, i)\} \end{aligned}$$

(the vectors \(v_{i'}^{t'}\) with \(t' = t\) and \(i' < i\) get included because they are results of previous queries), while

$$\begin{aligned} D_{w_{t+1}}^{(t+1, i)} \le {\text {span}}\{v_{i'}^{t'} : (t', i') \prec (t, i)\}; \end{aligned}$$

hence the query \((w_{t+1}, v_i^t)\) is also free. Finally if \((w_\ell , v_i^{\ell -1})\) is free and not a coincidence then \(v_i^\ell \notin {\text {span}}R\). \(\square \)

4 The probability of small support

Let G be a finite group, let \(w \in F_k\), let \(x_1, \ldots , x_k \in G\) be random, and consider \({{\overline{w}}} = w(x_1, \ldots , x_k)\). The probability that \({{\overline{w}}} = 1\) quantifies the extent to which w is “almost a law” in G. This probability is a well-studied quantity, particularly when G is simple. For example, it is known that for any \(w \ne 1\) there is some \(c = c(w) > 0\) such that \({\mathbf {P}}({{\overline{w}}} = 1) \le |G|^{-c}\) for all sufficiently large finite simple groups G (Larsen–Shalev [35, Theorem 1.1]).

For groups of large rank (our particular interest), the following bounds have been proved recently. Let \(\ell > 0\) be the reduced length of w.

  1. 1.

    For \(G = A_n\) or \(G = S_n\), if \(\ell < cn^{1/2}\) then

    $${\mathbf {P}}({{\overline{w}}}=1) \le e^{-c n / \ell ^2}$$

    (Eberhard [12, Lemma 2.2]).

  2. 2.

    For any classical group \(G = {\text {Cl}}_n(q)\), if \(\ell < cn\) then

    $${\mathbf {P}}({{\overline{w}}}=1) \le |G|^{-c/\ell }$$

    (Liebeck–Shalev [36, Theorem 4]).

The proofs of these estimates can be adapted to show more, namely that with high probability \({{\overline{w}}}\) has large support. In this section we explain this observation in detail in the case of \(G = {\text {Cl}}_n(q)\). For the case of \(G = A_n\) or \(G = S_n\), see the appendix (Subsection A.2).

The following lemma generalizes a key step from the argument of [36, Theorem 4].

Lemma 4.1

Let \(G = {\text {Cl}}_n(q)\) be a classical group of dimension n. Let \(V = {\mathbf {F}}_q^n\) be the natural module, and let \(U \le V\) be a subspace of dimension \(r \le n-2\). Let \(w \in F_k\) be a nontrivial word of length \(\ell \le (\frac{n}{2} - 2)/r\). Then

$$\begin{aligned} {\mathbf {P}}\left( {{\overline{w}}} U = U\right) \le \left( C_{q^r} \frac{q^{\ell r}}{q^{n - \ell r - 1} - q^{\ell r} - q^{n/2}}\right) ^r, \end{aligned}$$

where \(C_{q^r} = 1 + (1 - q^{-r})^{-1} \le 3\).

Proof

Let \(v_1, \ldots , v_r\) be a basis for U. Consider the joint trajectory of \(v_1, \ldots , v_r\). By Lemma 3.6 with \(R = \{v_1, \ldots , v_r\}\), we can have \({{\overline{w}}} U = U\) only if there is at least one coincidence in each individual trajectory. We take a union bound over all possibilities for when the coincidences could occur. If \(t < \ell \), then by Lemma 3.3, the probability that step (ti) is a coincidence is bounded by

$$\begin{aligned} \frac{q^{(t+1) r}}{q^{n-\ell r-1} - q^{\ell r} - q^{n/2}}; \end{aligned}$$

indeed there are at most \(t r + i \le (t+1) r \le \ell r\) previous vectors. If \(t = \ell \), assuming \(v_j^\ell \in U\) for \(j < i\), we actually get a slightly stronger bound:

$$\begin{aligned} \frac{q^{\ell r}}{q^{n - \ell r - 1} - q^{\ell r} - q^{n/2}}. \end{aligned}$$

Summing over t, the probability that there is a coincidence in the trajectory of \(v_i\) is bounded by

$$\begin{aligned} (1 + 1 + q^{-r} + q^{-2r} + \cdots ) \frac{q^{\ell r}}{q^{n - \ell r - 1} - q^{\ell r} - q^{n/2}}. \end{aligned}$$

Taking the product over i gives the claimed bound. \(\square \)

In the following proof we will refer to the “q-binomial coefficient”, defined by

$$\begin{aligned} \left( {\begin{array}{c}x\\ r\end{array}}\right) _q = \frac{(q^x - 1) (q^x - q) \cdots (q^x - q^{r-1})}{(q^r - 1) (q^r - q) \cdots (q^r - q^{r-1})}. \end{aligned}$$

When x is a nonnegative integer this is the number of r-dimensional subspaces of \({\mathbf {F}}_q^x\). For \(x \ge r\) note that \(x\mapsto \left( {\begin{array}{c}x\\ r\end{array}}\right) _q\) is increasing and nonnegative, and

$$\begin{aligned} \left( {\begin{array}{c}x\\ r\end{array}}\right) _q = q^{xr - r^2} \frac{(1-q^{-x+r-1}) \cdots (1 - q^{-x})}{(1-q^{-r}) \cdots (1-q^{-1})} \asymp q^{xr - r^2}. \end{aligned}$$

The following theorem will be used for an unspecified, but fixed, \(\delta > 0\).

Theorem 4.2

There are constants \(c, C>0\) such that the following holds for all \(\delta > 0\). Let \(G = {\text {Cl}}_n(q)\) be a classical group of dimension n, and let \(w \in F_k\) be a nontrivial word of reduced length \(\ell < \delta ^2 n / 20\). Assume \(q^{\delta n} > C\). Then

$$\begin{aligned} {\mathbf {P}}\left( {\text {supp}}{{\overline{w}}} \le (1-\delta )n\right) \le |G|^{-c \delta ^2/\ell }. \end{aligned}$$

Proof

Let \(x_1, \ldots , x_k\) be chosen independently and uniformly from G. Suppose some eigenspace \(V_\lambda \le \overline{{\mathbf {F}}_q}^n\) of \({{\overline{w}}}\) has dimension at least \(\delta n\). Let \(d = [{\mathbf {F}}_q(\lambda ):{\mathbf {F}}_q]\). Let \(\Lambda \) be the set of d Galois conjugates of \(\lambda \). Since \(\dim V_{\lambda '} = \dim V_\lambda \) for each \(\lambda ' \in \Lambda \), \(\dim V_\lambda \le n / d\), so \(d \le 1 / \delta \). Let \(W \le V_\lambda \) be an r-dimensional subspace defined over \({\mathbf {F}}_q(\lambda ) \cong {\mathbf {F}}_{q^d}\). Then there is a conjugate subspace \(W' \le V_{\lambda '}\) for each \(\lambda ' \in \Lambda \), and the sum \(U = \sum _{\lambda ' \in \Lambda } W'\) is a dr-dimensional and \({\mathbf {F}}_q\)-rational since it is fixed by the Galois group, so it may be identified with a dr-dimensional subspace of V. Since \(U \cap V_\lambda = W\), this correspondence \(W \mapsto U\) is injective. Hence the number of dr-dimensional subspaces of V preserved by \({{\overline{w}}}\) is at least \(\left( {\begin{array}{c}\delta n\\ r\end{array}}\right) _{q^d}\).

Since \(\ell d \le \ell / \delta < \delta n / 20\), we may choose an integer \(r > 0\) such that \(\ell d r \in [\delta n / 5, \delta n / 4]\). Now by the previous lemma and Markov’s inequality, the probability that the number of dr-dimensional subspaces of V preserved by \({{\overline{w}}}\) is at least \(\left( {\begin{array}{c}\delta n\\ r\end{array}}\right) _{q^d}\) is bounded by

$$\begin{aligned}&\frac{\left( {\begin{array}{c}n\\ d r\end{array}}\right) _q}{\left( {\begin{array}{c}\delta n\\ r\end{array}}\right) _{q^d}} \left( 3 \frac{q^{\ell d r}}{q^{n - \ell d r - 1} - q^{\ell d r} - q^{n/2}} \right) ^{dr} \\&\quad \asymp \frac{q^{drn - d^2 r^2}}{(q^d)^{\delta r n - r^2}}\left( 3 \frac{q^{\ell d r}}{q^{n - \ell d r - 1} - q^{\ell d r} - q^{n/2}} \right) ^{dr} \\&\quad \le O\left( q^{-\delta n + 2 \ell dr + r - dr + 1}\right) ^{dr} \\&\quad \le O(1)^{\delta n / 4 \ell } \left( q^{-\delta n + 2\delta n / 4 + \delta n / 4}\right) ^{\delta n/5\ell } \\&\quad = O(1)^{\delta n / \ell } q^{-\frac{1}{20} \delta ^2 n^2 / \ell }. \end{aligned}$$

Taking the sum over all \(d \le 1/ \delta \), it follows that

$$\begin{aligned} {\mathbf {P}}\left( {\text {supp}}{{\overline{w}}} \le (1- \delta )n \right) = {\mathbf {P}}\left( \max _{\lambda \in \overline{{\mathbf {F}}_q}} \dim V_\lambda \ge \delta n \right) \le \delta ^{-1} O(1)^{\delta n / \ell } q^{-\frac{1}{20} \delta ^2 n^2 / \ell }. \end{aligned}$$

Assuming \(q^{\delta n}\) is sufficiently large, the first two factors are negligible compared to the third. \(\square \)

Remark 4.3

The restriction \(\ell < c \delta ^2 n\) in Theorem 4.2 is essential, and related to our reliance on linear algebra. For example, let \(G = {\text {SL}}_n(q)\), and suppose w is a word of length \(\ell \approx 10 n\). We do not know how to bound \({\mathbf {P}}({{\overline{w}}} = 1)\) satisfactorily. Is it true that \({\mathbf {P}}({{\overline{w}}} = 1) \le q^{-cn}\) for some \(c>0\)? Certainly w cannot be a law, because \({\text {SL}}_n(q)\) contains \({\text {SL}}_2(q^{\left\lfloor {n/2}\right\rfloor })\) and the shortest law in \({\text {SL}}_2(q^{\left\lfloor {n/2}\right\rfloor })\) has length at least \((q^{\left\lfloor {n/2}\right\rfloor } - 1)/3\) (see Hadad [21, Theorem 2]). The question is whether it can be an almost-law.

5 Expected values of characters

Throughout this section let \(G = {\text {Cl}}_n(q)\) be a classical group and \(\chi \in {\text {Irr}}G\) a nonlinear character. Our aim is to bound

$$\begin{aligned} {\mathbf {E}}_{x_1, \ldots , x_k} \left( \frac{|\chi ({{\overline{w}}})|}{\chi (1)}\right) \end{aligned}$$

when w is a fixed nontrivial word of length cn, evaluated at random \(x_1, \ldots , x_k \in G\). The proof consists of two steps:

  1. 1.

    By the previous section, with high probability \({{\overline{w}}}\) has large support.

  2. 2.

    By recent character bounds of Guralnick, Larsen, and Tiep [18, 19], if \({{\overline{w}}}\) has large support then \(|\chi ({{\overline{w}}})| \le \chi (1)^\epsilon \).

We first deal with elements of large support.

Lemma 5.1

For every \(\epsilon >0\) there is a \(\delta > 0\) such that the following holds. Let \(g \in G\) with \({\text {supp}}g \ge (1-\delta )n\). Then \(|\chi (g)| \le \chi (1)^\epsilon \).

Proof

By Lemma 2.3, \(|C_G(g)| \le q^{\delta n^2}\). Hence by the character bound [18, Theorem 1.3] we have \(|\chi (g)| \le \chi (1)^\epsilon \). \(\square \)

Theorem 5.2

There is a constant \(c > 0\) such that the following holds. Let \(w \in F_k\) be a fixed nontrivial word of reduced length less than cn. Then

$$\begin{aligned} {\mathbf {E}}_{x_1, \ldots , x_k} \left( \frac{|\chi ({{\overline{w}}})|}{\chi (1)} \right) < q^{-c n}. \end{aligned}$$

Proof

Let \(\delta \) be as in the previous lemma with \(\epsilon = 1/2\). By conditioning on whether or not \({\text {supp}}{{{\overline{w}}}} < (1-\delta )n\), we have

$$\begin{aligned} {\mathbf {E}}_{x_1, \ldots , x_k} \left( \frac{|\chi ({{\overline{w}}})|}{\chi (1)}\right)&\le {\mathbf {P}}_{x_1, \ldots , x_k}\left( {\text {supp}}{{{\overline{w}}}} < (1-\delta )n\right) \\&\qquad + \max _{x_1, \ldots , x_k : {\text {supp}}{{{\overline{w}}}} \ge (1-\delta )n} \left( \frac{|\chi ({{\overline{w}}})|}{\chi (1)} \right) . \end{aligned}$$

It follows from Theorem 4.2 that

$$\begin{aligned} {\mathbf {P}}_{x_1, \ldots , x_k}({\text {supp}}{{{\overline{w}}}} < (1-\delta )n) \le q^{-c_1n} \end{aligned}$$

for some constant \(c_1 > 0\). The other summand is bounded by Lemma 5.1:

$$\begin{aligned} \max _{x_1, \ldots , x_k :{\text {supp}}{{{\overline{w}}}} \ge (1-\delta )n} \left( \frac{|\chi ({{\overline{w}}})|}{\chi (1)} \right) \le \chi (1)^{-1/2} \le q^{-c_2 n} \end{aligned}$$

for some constant \(c_2 > 0\). (Here we used \(\chi (1) \ge q^{c_3 n}\): see [32].) \(\square \)

Our main interest is the case in which w is the result of a simple random walk in \(F_k\). With high probability the result of the random walk is nontrivial, so we can apply the above theorem.

Corollary 5.3

There is a constant \(c > 0\) such that the following holds. Let w be the result of a simple random walk of length \(\ell < cn\) in \(F_k\). Then

$$\begin{aligned} {\mathbf {E}}_{x_1, \ldots , x_k\in G, w} \left( \frac{|\chi ({{\overline{w}}})|}{\chi (1)} \right) < q^{-c n} + k^{-c \ell }. \end{aligned}$$

Proof

By conditioning on whether or not the word w is trivial, we get

$$\begin{aligned} {\mathbf {E}}_{x_1, \ldots , x_k, w} \left( \frac{|\chi ({{\overline{w}}})|}{\chi (1)} \right) \le \max _{0< |w| < cn} {\mathbf {E}}_{x_1, \ldots , x_k} \left( \frac{|\chi ({{\overline{w}}})|}{\chi (1)} \right) + {\mathbf {P}}_{w}(w = 1). \end{aligned}$$

The first term is bounded by Theorem 5.2. The second term is the return probability of a simple random walk on a 2k-regular tree, which is at most \(k^{-c \ell }\) for a constant \(c > 0\) (see [29, Theorem 3 and Lemma 2.2] or [15, Appendix B]). \(\square \)

6 Reaching a normal subset: the xw(yz) trick

In this section, something of an interlude, let G be any finite group, and let \({\mathfrak {C}}\) be a normal (i.e., conjugacy-closed) subset of a group G. We will develop a criterion ensuring that one can, with high probability as \(x,y,z \in G\) are chosen uniformly at random, find a word \(w \in F_2\) of at most a prescribed length such that \(x w(y, z) \in {\mathfrak {C}}\). The criterion applies to sets \({\mathfrak {C}}\) whose density is large compared to the expected values of characters. This is a variation of the technique used in [13, Sect. 4]; see also [14, Sect. 2].

The following theorem expresses the most general such estimate we will need, in which we further allow arbitrary weights to be attached to elements of \({\mathfrak {C}}\). We express the result in terms of a nonnegative conjugation-invariant function (class function) f on G. We define the \(L^p\) norm of f by

$$\begin{aligned} \Vert f\Vert _p^p = \frac{1}{|G|} \sum _{x \in G} |f(x)|^p \qquad (p \in \{1, 2\}), \end{aligned}$$

and we use the standard inner product on functions on G defined by

$$\begin{aligned} \langle f, g \rangle = \frac{1}{|G|} \sum _{x \in G} f(x) \overline{g(x)}. \end{aligned}$$

Theorem 6.1

Let f be a nonnegative and conjugation-invariant function on G, and let \(\ell \) be a positive integer. Let \(x_0, x_1, \ldots , x_k\) be elements of G chosen uniformly at random. Let E be the event that \(f(x_0 {{\overline{u}}}) = 0\) for every word \(u \in F_k\) of length at most \(\ell \). Let w be the result of a simple random walk of length \(2\ell \) in \(F_k\). ThenFootnote 3

$$\begin{aligned} {\mathbf {P}}_{x_0, \ldots , x_k} \left( E\right) \le \frac{1}{\Vert f\Vert _1^2} \sum _{1 \ne \chi \in {\text {Irr}}{G}} |\langle f, \chi \rangle |^2 {\mathbf {E}}_{x_1, \ldots , x_k, w} \left( \frac{\chi ({{\overline{w}}})}{\chi (1)}\right) . \end{aligned}$$

In particular,

$$\begin{aligned} {\mathbf {P}}_{x_0, \ldots , x_k} \left( E\right) \le \frac{\Vert f\Vert _2^2}{\Vert f\Vert _1^2} \max _{\begin{array}{c} 1 \ne \chi \in {\text {Irr}}{G} \\ \langle f, \chi \rangle \ne 0 \end{array}} {\mathbf {E}}_{x_1, \ldots , x_k,w} \left( \frac{\chi ({{\overline{w}}})}{\chi (1)}\right) . \end{aligned}$$

Proof

Let \({\mathcal {A}}= {\mathcal {A}}_{x_1, \ldots , x_k}\) be the adjacency operator defined in Sect. 2.6, and consider its natural action on \(L^2(G)\). Let \(X = {\mathcal {A}}^\ell f(x_0)\), regarded as a random variable dependent on \(x_0, x_1, \ldots , x_k\), and note that E is precisely the event \(X = 0\). By Chebyshev’s inequality,

$$\begin{aligned} {\mathbf {P}}(X = 0) \le \frac{{\text {Var}}X}{\left( {\mathbf {E}}X \right) ^2}. \end{aligned}$$
(4)

The first moment is

$$\begin{aligned} {\mathbf {E}}X = \Vert f\Vert _1. \end{aligned}$$

The second moment is

$$\begin{aligned} {\mathbf {E}}X^2 = {\mathbf {E}}_{x_1, \ldots , x_k} \Vert {\mathcal {A}}^\ell f\Vert _2^2 = {\mathbf {E}}_{x_1, \ldots , x_k} \langle {\mathcal {A}}^{2\ell } f, f\rangle . \end{aligned}$$
(5)

Since f is conjugation-invariant, we can expand this further in terms of characters. By orthogonality of characters, if \(\tau _x\) is the translation operator defined by \(\tau _x(h)(y) = h(x^{-1} y)\), we have

$$\begin{aligned} \langle \tau _x \chi , \psi \rangle = {\left\{ \begin{array}{ll} \chi (x^{-1})/\chi (1) &{}\text {if}~\chi =\psi , \\ 0 &{}\text {else}. \end{array}\right. } \end{aligned}$$

Hence

$$\begin{aligned} \langle {\mathcal {A}}^{2\ell } \chi , \psi \rangle = 0 \qquad (\chi \ne \psi ), \end{aligned}$$

and

$$\begin{aligned} \langle {\mathcal {A}}^{2\ell } \chi , \chi \rangle = {\mathbf {E}}_w \left( \frac{\chi ({{\overline{w}}})}{\chi (1)}\right) , \end{aligned}$$

where w is the result of a simple (symmetric) random walk of length \(2\ell \) in \(F_k\). Hence, from (5),

$$\begin{aligned} {\mathbf {E}}X^2 = \sum _{\chi \in {\text {Irr}}{G}} |\langle f, \chi \rangle |^2 {\mathbf {E}}_{x_1, \ldots , x_k, w} \left( \frac{\chi ({{\overline{w}}})}{\chi (1)}\right) . \end{aligned}$$

The \(\chi = 1\) term is \(\Vert f\Vert _1^2\), which is the same as \(({\mathbf {E}}X)^2\). Hence the first part of the theorem follows from (4). The second part holds because

$$\begin{aligned} \sum _{\chi \in {\text {Irr}}{G}} |\langle f, \chi \rangle |^2 = \Vert f\Vert _2^2. \end{aligned}$$

Corollary 6.2

Let \({\mathfrak {C}}\) be a normal subset of G. Write

$$\begin{aligned} {\mathfrak {C}}= \bigcup _{\alpha \in G^\text {ab}} {\mathfrak {C}}_\alpha , \end{aligned}$$

where \({\mathfrak {C}}_\alpha = {\mathfrak {C}}\cap \alpha G'\) is the fibre of \({\mathfrak {C}}\) over \(\alpha \in G^\text {ab}\). Let \(\delta _\alpha = |{\mathfrak {C}}_\alpha | / |G'|\) be the fibre density, and let \(\delta = \min _{\alpha \in G^\text {ab}} \delta _\alpha \). Assume \(\delta > 0\).

Let \(x_0, x_1, \ldots , x_k \in G\) be chosen uniformly at random, and let E be the event that for every word \(u \in F_k\) of length at most \(\ell \) we have \(x_0 {{\overline{u}}} \notin {\mathfrak {C}}\). Let w be the result of a simple random walk of length \(2\ell \) in \(F_k\). Then

$$\begin{aligned} {\mathbf {P}}(E) \le \delta ^{-1} \max _{\begin{array}{c} \chi \in {\text {Irr}}{G} \\ \chi (1) > 1 \end{array}} {\mathbf {E}}_{x_1, \ldots , x_k, w} \left( \frac{\chi ({{\overline{w}}})}{\chi (1)}\right) . \end{aligned}$$

Proof

In the previous theorem, take

$$\begin{aligned} f = \sum _{\alpha \in G^\text {ab}} \frac{1_{{\mathfrak {C}}_\alpha }}{\delta _\alpha }. \end{aligned}$$

Then \(\Vert f\Vert _1 = 1\), and

$$\begin{aligned} \Vert f\Vert _2^2 = \frac{1}{|G^\text {ab}|} \sum _{\alpha \in G^\text {ab}} \delta _\alpha ^{-1} \le \delta ^{-1}. \end{aligned}$$

Thus

$$\begin{aligned} {\mathbf {P}}(E) \le \delta ^{-1} \max _{\begin{array}{c} 1 \ne \chi \in {\text {Irr}}{G} \\ \langle f, \chi \rangle \ne 0 \end{array}} {\mathbf {E}}_{x_1, \ldots , x_k, w} \left( \frac{\chi ({{\overline{w}}})}{\chi (1)}\right) . \end{aligned}$$

Now if \(\chi \ne 1\) is one-dimensional then \(\chi \) factors through \(G^\text {ab}\), so

$$\begin{aligned} \langle f, \chi \rangle = \frac{1}{|G^\text {ab}|} \sum _{\alpha \in G^\text {ab}} \chi (\alpha ) = 0. \end{aligned}$$

Hence

$$\begin{aligned} {\mathbf {P}}(E) \le \delta ^{-1} \max _{\begin{array}{c} 1 \ne \chi \in {\text {Irr}}{G} \\ \chi (1) > 1 \end{array}} {\mathbf {E}}_{x_1, \ldots , x_k, w} \left( \frac{\chi (\overline{w})}{\chi (1)}\right) . \end{aligned}$$

7 Obtaining an element of minimal degree

Let \(G = {\text {GCl}}_n(q)\). Let s be the minimal degree of a nontrivial element of \({\text {SCl}}_n(q)\); thus \(s = 2\) in the orthogonal case and \(s = 1\) otherwise. Let

$$\begin{aligned} {\mathfrak {M}}= \{ g \in {\text {SCl}}_n(q) : \deg g = s \}. \end{aligned}$$

In this section we exhibit a large normal subset \({\mathfrak {C}}_d \subseteq G\) with an integer parameter d whose \(q^d - 1\) power is contained in \({\mathfrak {M}}\). We will use \({\mathfrak {C}}_d\) in combination with Corollaries 5.3 and 6.2 to obtain an element of minimal degree as a short word in random generators.

Proposition 7.1

There is a constant \(C > 0\) so that the following holds. Let \(d \in [2, n]\) be an integer parameter. Assume \(q^d > Cn\). Then there is a normal subset \({\mathfrak {C}}_d \subseteq G\) with the following properties.

  1. (1)

    For every \(\alpha \in G^\text {ab}\), if \({\mathfrak {C}}_{d;\alpha }\) is the fibre of \({\mathfrak {C}}_d\) over \(\alpha \), then

    $$\begin{aligned} \frac{|{\mathfrak {C}}_{d; \alpha }|}{|G|} \ge \exp \left( -O(d^2 \log q) - O(d^{-1} n \log n) \right) . \end{aligned}$$
  2. (2)

    For every \(g \in {\mathfrak {C}}_d\), we have

    $$\begin{aligned} g^{\kappa (q^d - 1)} \in {\mathfrak {M}}, \end{aligned}$$

    where \(\kappa = 2\) if G is orthogonal in even characteristic, and \(\kappa = 1\) otherwise.

The proof is split into cases depending on the type of G.

7.1 The linear case

Let \(G = {\text {GL}}_n(q)\). In this case \({\mathfrak {M}}\) is the set of transvections. Let V be the natural module for G. Write

$$\begin{aligned} n - 3 = kd + r, \qquad (0 \le r < d), \end{aligned}$$

i.e., let \(k = \left\lfloor {\frac{n-3}{d}}\right\rfloor \) and \(r = n - 3 - k d\). Decompose V as

$$\begin{aligned} V = L \oplus V_1 \oplus \cdots \oplus V_k \oplus R \oplus W, \end{aligned}$$

where \(\dim L = 2\), \(\dim V_i = d\), \(\dim R = 1\), and \(\dim W = r\). Fix a basis for each of the subspaces.

We now define a particular element \(g \in {\text {GL}}(V)\) respecting the above decomposition. We define g by its action on the chosen basis for each of the subspaces above.

Subspace L::

Let g act as a transvection on L, say \({\begin{matrix} 1 &{}\quad 1 \\ 0 &{}\quad 1 \end{matrix}}\). Note that \((g|_L)^{q^d-1} = (g|_L)^{-1}\) is also a transvection.

Subspace \(V_i\) ::

Let \(p_i\) be a monic irreducible polynomial of degree d over \({\mathbf {F}}_q\). Identify \(V_i\) with \({\mathbf {F}}_q[t]/(p_i(t))\). The variable t acts on the latter space by multiplication. Let g act on \(V_i\) as multiplication by t. Note that the minimal polynomial of this transformation is \(p_i\), and \((g|_{V_i})^{q^d-1} = 1\).

Subspace R::

Let \(\alpha \in G^\text {ab}\). Let g act on R as the scalar \(\det (\alpha )/ \prod _{i = 1}^k (-1)^d p_i(0)\).

Subspace W::

Let g act trivially on W.

Let \({\mathfrak {I}}_d\) denote the set of monic irreducible polynomials of degree d over \({\mathbf {F}}_q\). For every tuple \(p_1, \ldots , p_k \in {\mathfrak {I}}_d\) with \(p_{i} \ne p_{i'}\) for \(i \ne i'\) and \(\alpha \in G^\text {ab}\) we thus have an element \(g = g_{p_1, \ldots , p_k; \alpha } \in G\). Let \(g_{p_1, \ldots , p_k; \alpha }^G\) denote the conjugacy class of \(g_{p_1, \ldots , p_k; \alpha }\) (this class does not depend on the order of \(p_1, \ldots , p_k\)). Let

$$\begin{aligned} {\mathfrak {C}}_{d; \alpha } = \bigcup _{ \{ p_1, \ldots , p_k \} \in \left( {\begin{array}{c}{\mathfrak {I}}_d\\ k\end{array}}\right) } g_{p_1, \ldots , p_k; \alpha }^G. \end{aligned}$$

The union is disjoint, because the minimal polynomial of each element of \(g_{p_1, \ldots , p_k; \alpha }^G\) is divisible by \(p_1(t) \cdots p_k(t)\) (the other factors are \((t-1)^2\) and \((t-\lambda )\) for \(\lambda = \det (\alpha ) / \prod _{i=1}^k (-1)^d p_i(0)\) if \(\lambda \ne 1\)). Finally let

$$\begin{aligned} {\mathfrak {C}}_d = \bigcup _{\alpha \in G^\text {ab}} {\mathfrak {C}}_{d; \alpha }. \end{aligned}$$

Remark 7.2

This is a variation of the construction in [14, Sect. 3.2].

Proof of Proposition 7.1for \({\text {GL}}\). By construction, \({\mathfrak {C}}_{d;\alpha }\) is the fibre of \({\mathfrak {C}}_d\) over \(\alpha \), and for every \(p_1, \ldots , p_k,\alpha \) we have \(g_{p_1, \ldots , p_k; \alpha }^{q^d-1} \in {\mathfrak {M}}\). It remains only to estimate the density of \({\mathfrak {C}}_{d;\alpha }\).

For \(g = g_{p_1, \ldots , p_k; \alpha }\), we have (as in the proof of Lemma 2.3)

$$\begin{aligned} |C_G(g)| \le |C_{{\text {M}}_n({\mathbf {F}}_q)}(g)| = q^{n + r^2 + O(r)}. \end{aligned}$$

Therefore

$$\begin{aligned} |{\mathfrak {C}}_{d; \alpha }| \ge \left( {\begin{array}{c}|{\mathfrak {I}}_d|\\ k\end{array}}\right) \cdot \frac{|G|}{q^{n + r^2 + O(r)}}. \end{aligned}$$
(6)

Recall that

$$\begin{aligned} |{\mathfrak {I}}_d| = q^d / d - O(q^{d/2} / d). \end{aligned}$$

In particular, by the hypothesis \(q^d > Cn\) we have \(|{\mathfrak {I}}_d| > k\), and in fact

$$\begin{aligned} \left( {\begin{array}{c}|{\mathfrak {I}}_d|\\ k\end{array}}\right)&= \left( {\begin{array}{c}q^d/d - O(q^{d/2}/d)\\ k\end{array}}\right) \\&\ge \left( \frac{ q^d/d - O(q^{d/2}/d) }{k} \right) ^k \\&= \left( \frac{ q^d (1 - O(q^{-d/2}) }{dk} \right) ^k \\&\ge \frac{q^{n-2-r}}{ n^{n/d} } \cdot e^{- O \left( n/ d \cdot q^{-d/2} \right) }. \end{aligned}$$

Hence, from (6), since \(r < d\),

$$\begin{aligned} \frac{|{\mathfrak {C}}_{d; \alpha }|}{|G|} \ge \exp \left( - (d^2 + O(d)) \log q - \frac{n}{d} \log n - O(n/d \cdot q^{-d/2}) \right) . \end{aligned}$$

This proves the proposition. \(\square \)

7.2 Other classical groups

Let \(G = {\text {GCl}}_n(q)\), where \({\text {GCl}}\ne {\text {GL}}\). Let V be the natural module for G equipped with a nondegenerate binary form f and possibly a quadratic form Q. By Witt’s decomposition theorem, there is an orthogonal decomposition of V of the form

$$\begin{aligned} V = H \perp V_\text {an}, \end{aligned}$$

where H is an orthogonal direct sum of hyperbolic planes and \(V_{\text {an}}\) is anisotropic, and \(\dim V_\text {an}\le 2\) by the Chevalley–Warning theorem. Let \(\delta = \dim V_\text {an}+ 4 + 2 \kappa \), where \(\kappa = 2\) if G is orthogonal in even characteristic, and \(\kappa = 1\) otherwise. Let \(D = 2d\) and write

$$\begin{aligned} n - \delta = kD + r, \qquad (0 \le r < D), \end{aligned}$$

i.e., let \(k = \left\lfloor {(n-\delta )/D}\right\rfloor \) and \(r = n - \delta - k D\). Write the hyperbolic space H as

$$\begin{aligned} H = L \perp V_1 \perp \cdots \perp V_k \perp R \perp W', \end{aligned}$$

where each constituent is an orthogonal direct sum of hyperbolic planes with \(\dim L = 2 \kappa + 2\), \(\dim V_i = D\), \(\dim R = 2\), and \(\dim W' = r\). Let \(W = W' \perp V_{\text {an}}\). Thus we have the following orthogonal decomposition of V:

$$\begin{aligned} V = L \perp V_1 \perp \cdots \perp V_k \perp R \perp W. \end{aligned}$$
(7)

Fix a hyperbolic basis for each of the hyperbolic spaces, and fix a basis for W.

We now define a particular element \(g \in {\text {GCl}}(V)\) respecting the decomposition (7). As before we will define g by its action on the chosen bases.

Subspace L::

Let \(v_1, \ldots , v_{\kappa + 1}, w_1, \ldots , w_{\kappa + 1}\) be the chosen hyperbolic basis for L, i.e., such that \(L_1 = \langle v_1, \ldots , v_{\kappa + 1} \rangle \) and \(L_2 = \langle w_1, \ldots , w_{\kappa + 1} \rangle \) are totally singular subplanes, and f is represented with respect to \(v_1, \ldots , v_{\kappa + 1}, w_1, \ldots , w_{\kappa + 1}\) by

$$\begin{aligned} \begin{pmatrix} 0 &{}\quad I \\ \pm I &{}\quad 0 \end{pmatrix}. \end{aligned}$$
Symplectic case::

Let g act on L as the transvection

$$\begin{aligned} \begin{pmatrix} 1 &{}\quad 0 &{}\quad 1 &{}\quad 0 \\ 0 &{}\quad 1 &{}\quad 0 &{}\quad 0 \\ 0 &{}\quad 0 &{}\quad 1 &{}\quad 0 \\ 0 &{}\quad 0 &{}\quad 0 &{}\quad 1 \\ \end{pmatrix}. \end{aligned}$$
Unitary case::

Pick \(\lambda \in {\mathbf {F}}_q\) be such that \(\lambda + \lambda ^\theta = 0\) (where \(\theta \) is the field automorphism) and let g act on L as the transvection

$$\begin{aligned} \begin{pmatrix} 1 &{}\quad 0 &{}\quad \lambda &{}\quad 0 \\ 0 &{}\quad 1 &{}\quad 0 &{}\quad 0 \\ 0 &{}\quad 0 &{}\quad 1 &{}\quad 0 \\ 0 &{}\quad 0 &{}\quad 0 &{}\quad 1 \\ \end{pmatrix}. \end{aligned}$$
Orthogonal case::

Let \(g|_L\) be represented by the matrix

$$\begin{aligned} \begin{pmatrix} A &{}\quad 0 \\ 0 &{}\quad A^{-T} \end{pmatrix}, \end{aligned}$$

where in odd characteristic

$$\begin{aligned} A = \begin{pmatrix} 1 &{}\quad 1 \\ 0 &{}\quad 1 \end{pmatrix} \end{aligned}$$

and in even characteristic

$$\begin{aligned} A = \begin{pmatrix} 1 &{}\quad 1 &{}\quad 0 \\ 0 &{}\quad 1 &{}\quad 1 \\ 0 &{}\quad 0 &{}\quad 1 \end{pmatrix}. \end{aligned}$$

In all cases we have \(g|_L \in {\text {SCl}}(L)\) and \((g|_L)^{\kappa (q^d-1)} \ne 1\).

Subspace \(V_i\)::

Fix a monic irreducible polynomial \(p_i \in {\mathbf {F}}_q[t]\) of degree d. Let \(v_1, \ldots , v_d, w_1, \ldots , w_d\) be the chosen hyperbolic basis for \(V_i\). Thus there is a decomposition

$$\begin{aligned} V_i = V_{i,1} \oplus V_{i,2} \end{aligned}$$

into totally singular subspaces \(V_{i,1} = \langle v_1, \ldots , v_d \rangle \) and \(V_{i,2} = \langle w_1, \ldots , w_d \rangle \) with \(f(v_a, w_b) = \delta _{ab}\). Identify \(V_{i,1}\) with \({\mathbf {F}}_q[t]/(p_i(t))\). The variable t acts on the latter space by multiplication. By Witt’s lemma, this action extends to the space \(V_i\). This extension is moreover unique provided we demand that it preserves the decomposition of \(V_i\) (see [28, Hilfssatz 3.1]). Let \(g|_{V_i}\) be defined by this unique extension. The minimal polynomial of this transformation can be determined as follows (see [41]). In the symplectic and orthogonal cases, let \(p^*(t) = p(0)^{-1} t^d p(t^{-1})\). In the unitary case, let \(p^*(t) = p^\theta (0)^{-1} t^d p^\theta (t^{-1})\), where \(\theta \) acts on the coefficients. The minimal polynomial of g acting on \(V_i\) is \(*\)-symmetric, divisible by \(p_i\) (since \(p_i\) is irreducible), and hence also divisible by \(p_i^*\). Under the assumption that \(p_i \ne p_i^*\), the minimal polynomial of \(g|_{V_i}\) must therefore be equal to \(p_i p_i^*\). If \(p_i = p_i^*\) then the minimal polynomial is \(p_i\).

Subspace R::

Let \(\alpha \in G^\text {ab}\).

Symplectic case::

Let g act trivially on R. (Note \(G^\text {ab}\) is trivial.)

Unitary case::

Let g act as the matrix

$$\begin{aligned} \begin{pmatrix} a &{} 0 \\ 0 &{} a^{-\theta } \end{pmatrix}, \end{aligned}$$

where \(a \in {\mathbf {F}}_q\) satisfies \(a^{1-\theta } \prod _{i=1}^k p_i(0)^{1-\theta } = \det \alpha \). Such an element always exists since \(\det \alpha \) has norm 1.

Orthogonal case::

The natural map \({\text {GO}}(R)^\text {ab}\rightarrow G^\text {ab}\) is bijective.Footnote 4 Let g act on R so that for every linear character \(\lambda \) of G we have

$$\begin{aligned} \lambda (g|_R) \prod _{i=1}^k \lambda (g|_{V_i}) = \lambda (\alpha ). \end{aligned}$$

In all cases note that \((g|_R)^{\kappa (q^d-1)}\) is trivial.Footnote 5

Subspace W::

Let g act trivially on W.

For every k-tuple \(p_1, \ldots , p_k \in {\mathfrak {I}}_d\) and every \(\alpha \in G^\text {ab}\), we thus have an element \(g = g_{p_1, \ldots , p_k ; \alpha } \in G\). The conjugacy class \(g_{p_1, \ldots , p_k ; \alpha }^G\) is invariant under reordering \(p_1, \ldots , p_k\), and under replacing any \(p_i\) by \(p_i^*\). Conversely, \(g_{p_1, \ldots , p_k ; \alpha }^G\) is determined by \(p_1(t) p_1^*(t) \cdots p_k(t) p_k^*(t)\) and \(\alpha \). Let \({\mathfrak {I}}_d'\) be the set of unordered pairs \(\{p, p^*\}\) of monic irreducible polynomials \(p, p^* \in {\mathfrak {I}}_d\) with \(p \ne p^*\). Let

$$\begin{aligned} {\mathfrak {C}}_{d ; \alpha } = \bigcup \left\{ g_{p_1, \ldots , p_k ; \alpha }^G : \{ \{p_1, p_1^*\}, \ldots , \{p_k, p_k^*\} \} \in \left( {\begin{array}{c}{\mathfrak {I}}_d'\\ k\end{array}}\right) \right\} , \end{aligned}$$

The union is disjoint, because the minimal polynomial of every element of \(g_{p_1, \ldots , p_k ; j}^G\) is divisible by \(p_1(t) p_1^*(t) \cdots p_k(t) p_k^*(t)\) and has no other nonlinear factors. Finally let

$$\begin{aligned} {\mathfrak {C}}_d = \bigcup _{\alpha \in G^\text {ab}} {\mathfrak {C}}_{d ; \alpha }. \end{aligned}$$

Proof of Proposition 7.1 for other classical groups

By construction, \(g_{p_{1},\ldots ,p_{k} ; \alpha }\) lies over \(\alpha \) and \(g_{p_1, \ldots , p_k ; \alpha }^{\kappa (q^d-1)} \in {\mathfrak {M}}\). We must estimate the density of \({\mathfrak {C}}_{d;\alpha }\).

Consider \(g = g_{p_1, \ldots , p_k ; \alpha }\) for some \(p_1, \ldots , p_k \in {\mathfrak {I}}_d'\) with \(p_i \ne p_{i'}, p_{i'}^*\) for \(i \ne i'\). Let \(h \in C_G(g)\). Then h preserves each \(V_{i,1}\) and \(V_{i,2}\), those being the \(p_i\)- and \(p_i^*\)-primary subspaces of g. The restrictions of h to \(V_{i,1}\) and \(V_{i,2}\) determine one another, and there are at most \(q^d\) possibilities for \(h|_{V_{i,1}}\) (as in Lemma 2.3). Hence, since \(\delta = O(1)\),

$$\begin{aligned} |C_G(g)| \le (q^d)^k |{\text {M}}_{r+\delta }({\mathbf {F}}_q)| \le q^{dk + r^2 + O(r)+O(1)}. \end{aligned}$$

Therefore

$$\begin{aligned} |{\mathfrak {C}}_{d; \alpha }| \ge \left( {\begin{array}{c}|{\mathfrak {I}}_d'|\\ k\end{array}}\right) \cdot \frac{|G|}{q^{dk + r^2 + O(r) + O(1)}}. \end{aligned}$$

The number of monic irreducible polynomials of degree d over \({\mathbf {F}}_q\) is \(q^d/d - O(q^{d/2}/d)\), while the number of \(*\)-symmetric polynomials of degree d is at most \(q^{d/2}\), so

$$\begin{aligned} |{\mathfrak {I}}_d'| = (q^d / d - O(q^{d/2})) / 2 \ge c q^d/d. \end{aligned}$$

By the hypothesis \(q^d > Cn\) this is at least k, and in fact

$$\begin{aligned} \left( {\begin{array}{c}|{\mathfrak {I}}_d'|\\ k\end{array}}\right) \ge \left( \frac{cq^d}{dk}\right) ^k \ge q^{dk} \exp \left( -O(k \log n)\right) , \end{aligned}$$

so

$$\begin{aligned} \frac{|{\mathfrak {C}}_{d; \alpha }|}{|G|} \ge \exp \left( -O(d^2 \log q) - O(d^{-1} n \log n)\right) . \end{aligned}$$

This proves the proposition. \(\square \)

7.3 Collecting results

We now collect the results from the previous sections to conclude that with high probability as three random elements from G are chosen uniformly at random, there is a short word in these elements that belongs to \({\mathfrak {M}}\).

Theorem 7.3

There are constants \(c, C > 0\) so that the following holds. Let \(G = {\text {Cl}}_n(q)\), where \(\log q < c n \log ^{-2} n\). Let xyz be elements of G chosen uniformly at random. Let M be the event that there exists a word \(w \in F_3\) of length at most \(n^{C \log q}\) such that \(w(x,y,z) \in {\mathfrak {M}}\). Then

$$\begin{aligned} {\mathbf {P}}_{x,y,z}(M) \ge 1 - e^{- c n}. \end{aligned}$$

Proof

By Corollaries 5.3 and 6.2 there are constants \(c_1, c_2 > 0\) and \(C_1, C_2\) such that the following holds. Let \(\ell = \left\lfloor {c_1 n / 2}\right\rfloor \), and let E be the event that every word \(u \in F_2\) of length at most \(\ell \) satisfies \(x u(y, z) \notin {\mathfrak {C}}_d\). Then

$$\begin{aligned} {\mathbf {P}}(E)&\le \max _{\alpha \in G^\text {ab}} \frac{|G'|}{|{\mathfrak {C}}_{d;\alpha }|} (q^{-c_1 n} + 2^{-c_1 2 \ell })\nonumber \\&\le \exp (C_1 d^2 \log q + C_1 d^{-1} n \log n - c_2 n), \end{aligned}$$
(8)

provided \(q^d > C_2 n\). Take \(d \sim C_3 \log n\) for a constant \(C_3\). If \(\log q < cn / \log ^2 n\) for a sufficiently small constant c so that \(c, C_3\) satisfy \(C_1 C_3^2 c + C_1/C_3 - c_2 < - c\), then \({\mathbf {P}}(E) \le e^{-cn}\).

On the other hand suppose E fails, i.e., suppose there is a word u of length at most \(c_1 n\) such that \(x u(y, z) \in {\mathfrak {C}}_d\). Let \(w \in F_3\) be the word

$$\begin{aligned} w = (\xi _1 u(\xi _2, \xi _3))^{\kappa (q^d-1)}. \end{aligned}$$

The length of w is at most

$$\begin{aligned} \kappa (q^d - 1) (1 + c_1 n / 2) \le n^{C \log q}, \end{aligned}$$

and

$$\begin{aligned} w(x, y, z) = (x u(y, z))^{\kappa (q^d - 1)} \in {\mathfrak {M}}. \end{aligned}$$

Hence \(E^c \subseteq M\). This completes the proof. \(\square \)

This completes the proof of Theorem 1.1.

If we are allowed \(q^C\) random generators, we can reach the set \({\mathfrak {M}}\) using shorter words.

Theorem 7.4

There are constants \(c, C > 0\) so that the following holds. Let \(G = {\text {Cl}}_n(q)\), where \(n > C\). Let \(x_0, x_1, \ldots , x_k\) be elements of G chosen uniformly at random, where \(k > q^C\). Let M be the event that there exists a word \(w \in F_{k+1}\) of length at most \(q^2 n^C\) such that \(w(x_0, \ldots , x_k) \in {\mathfrak {M}}\). Then

$$\begin{aligned} {\mathbf {P}}_{x_0, \ldots , x_k}(M) \ge 1 - q^{- c n}. \end{aligned}$$

Proof

Follow the proof of the previous theorem, replacing \(u \in F_2\) with \(u \in F_k\). Since \(\log k > C \log q\), we can replace (8) with the bound

$$\begin{aligned} {\mathbf {P}}(E)&\le \max _{\alpha \in G^\text {ab}} \frac{|G'|}{|{\mathfrak {C}}_{d;\alpha }|} q^{- c_2 n} \\&\le \exp (C_1 d^2 \log q + C_1 d^{-1} n \log n - c_2 n \log q), \end{aligned}$$

provided \(q^d > C_2 n\). Take \(d = \max (\left\lceil {C_3 \log n / \log q}\right\rceil , 2)\) for sufficiently large \(C_3\). As long as \(n > C\) we find \({\mathbf {P}}(E) \le q^{-cn}\). Note that \(q^d \le q^2 n^C\) in this case. The rest of the argument is the same. \(\square \)

8 Closed trajectories with only one coincidence

A trajectory is closed if \(v^\ell = v^0\). In Sect. 9 we will need to understand the structure of closed trajectories with only one coincidence. More generally the joint trajectory of an r-tuple \((v_1, \ldots , v_r)\) is called closed if each individual trajectory is closed, and we will need to understand the structure of closed joint trajectories with only one coincidence in each individual trajectory. We begin with the single-trajectory case, for motivation.

Lemma 8.1

Assume w is nontrivial and cyclically reduced. Suppose the trajectory \(v^0, \ldots , v^\ell \) is closed, and suppose there is only one coincidence, at step t say. Then

$$\begin{aligned} w = (w_d \cdots w_1)^{\ell / d}, \qquad \text {where}~d = \gcd (t, \ell ). \end{aligned}$$

In particular if w is not a proper power then \(t = \ell \).

Proof

Let

$$\begin{aligned} {{\widetilde{w}}} = \cdots w_1 w_\ell \cdots w_1 \end{aligned}$$

be the left-infinite \(\ell \)-periodic extension of w. Since \(v^\ell = v^0\), the trajectory of v under \({{\widetilde{w}}}\) (defined in the obvious way) is just the \(\ell \)-periodic extension of \(v^0, \ldots , v^\ell \), and still there is only one coincidence, at step t. The choices at steps \(1, \ldots , t\) are free and all subsequent choices are forced (as in the proof of Lemma 3.5). We claim that \({{\widetilde{w}}}\) is in fact \(\gcd (t, \ell )\)-periodic, and it suffices to prove that it is t-periodic.

Since the choices at steps \(1, \ldots , t - 1\) are free and not coincidences, the choice at step t is a coincidence, and all subsequent choices are forced, the vectors \(v^0, \ldots , v^{t-1}\) are linearly independent and the whole trajectory is contained in their span. In particular

$$\begin{aligned} v^t = a_0 v^0 + \cdots + a_{t-1} v^{t-1} \qquad (a_0, \ldots , a_{t-1} \in {\mathbf {F}}_q). \end{aligned}$$
(9)

Given that step \(t+1\) is forced, we must have \(v^i \in D_{w_{t+1}}^{t+1}\) for each i such that \(a_i \ne 0\). Thus either \(w_{t+1} = w_{i+1}\) or \(w_{t+1} = w_i^{-1}\) (\(i > 0\)). Similarly,

$$\begin{aligned} v^{\ell - 1} = b_0 v^0 + \cdots + b_{t-1} v^{t-1} \qquad (b_0, \ldots , b_{t-1} \in {\mathbf {F}}_q), \end{aligned}$$

and \(v^\ell = v^0\) is forced. Since \(w_\ell \ne w_1^{-1}\), we must have \(w_\ell = w_t\) and \(a_0 \ne 0\) (see Remark 8.2 for more details). Therefore

$$\begin{aligned} w_{t+1} = w_1. \end{aligned}$$

Consider now the trajectory of \(v^1\) under

$$\begin{aligned} {{\widetilde{w}}}' = {{\widetilde{w}}} w_1^{-1} = \cdots w_3 w_2. \end{aligned}$$

The trajectory is just \(v^1, v^2, \ldots , v^\ell , v^0, v^1, \ldots \). By (9) and \(a_0 \ne 0\), \(v^1, \ldots , v^t\) are linearly independent, and, for every letter \(\xi \),

$$\begin{aligned} {\text {span}}\{ v^i \mid 0< i \le t, v^i \in D_\xi ^{t+1} \} = {\text {span}}\{ v^i \mid 0 \le i < t, v^i \in D_\xi ^{t} \}. \end{aligned}$$

Therefore the trajectory of \(v^1\) also has just one coincidence, again at step t (when \(v^{t+1}\) is chosen). Therefore by the same argument we must have \(w'_{t+1} = w'_1\), or

$$\begin{aligned} w_{t+2} = w_2. \end{aligned}$$

Repeating this argument as many times as necessary proves that \({{\widetilde{w}}}\) is t-periodic, as claimed. \(\square \)

Remark 8.2

If \(t = \ell \), we must have \(a_0 = 1\) and all other \(a_i = 0\). The general case \(t < \ell \) is more complicated, but we can still describe the possibilities. From (9), because step \(t+1\) is forced we must have

$$\begin{aligned} v^{t+1} = \overline{w_{t+1}} v^t = \sum _{i=0}^{t-1} a_i v^{i \pm 1}, \end{aligned}$$
(10)

the signs depending on whether \(w_{t+1} = w_{i+1}\) or \(w_{t+1} = w_i^{-1}\) (\(a_i \ne 0\)). At the next step,

$$\begin{aligned} v^{t+2} = \sum _{i=0}^{t-1} a_i v^{i \pm 1 \pm 1}, \end{aligned}$$

and so on. We make a few observations:

  1. 1.

    The vectors \(v^{i \pm 1}\), etc, obey a no-crossing rule: we cannot have

    $$\begin{aligned}&v^i \xrightarrow {w_{s+1}} v^{i+1}, \\&v^{i+1} \xrightarrow {w_{s+1}} v^i, \end{aligned}$$

    as then we would have both \(w_{s+1} = w_{i+1}\) and \(w_{s+1} = w_{i+1}^{-1}\), for some i.

  2. 2.

    Similarly, there is a no-meeting rule: we cannot have

    $$\begin{aligned} v^i&\xrightarrow {w_{s+1}} v^{i+1}, \\ v^{i+2}&\xrightarrow {w_{s+1}} v^{i+1}, \end{aligned}$$

    as then we would have both \(w_{s+1} = w_{i+1}\) and \(w_{s+1} = w_{i+2}^{-1}\), but the expression for w is supposed to be reduced.

  3. 3.

    Finally, there is a time-consistency rule: we cannot have

    $$\begin{aligned} v^i \xrightarrow {w_{s+1}} v^{i+1} \xrightarrow {w_{s+2}} v^i, \end{aligned}$$

    as then we would have \(w_{s+1} = w_{i+1}\) and \(w_{s+2} = w_{i+1}^{-1}\), but again the expression for w is supposed to be reduced; nor could we have

    $$\begin{aligned} v^i \xrightarrow {w_{s+1}} v^{i-1} \xrightarrow {w_{s+2}} v^i, \end{aligned}$$

    as then we would have \(w_{s+1} = w_i^{-1}\) and \(w_{s+2} = w_i\).

Since \(a_0 \ne 0\) and \(w_{t+1} v^0 = v^1\), the only resolution is that

$$\begin{aligned} v^{t+s} = \sum _{i=0}^{t-1} a_i v^{i + s} \end{aligned}$$

for all \(s \ge 0\) (extending \(\ell \)-periodically). In other words, the sequence \((v^s)\) in \({\text {span}}\{v^0, \ldots , v^{t-1}\}\) corresponds with the sequence \((X^s)\) in \({\mathbf {F}}_q[X] / (f)\), where

$$\begin{aligned} f = X^t - a_{t-1} X^{t-1} - \cdots - a_0 X^0. \end{aligned}$$

Since \(v^\ell = v^0\) we must have

$$\begin{aligned} f \mid X^\ell - 1. \end{aligned}$$

Conversely, if f is a divisor of \(X^\ell - 1\), and if the period of w divides t and \(i - i'\) whenever \(a_i \ne 0\) and \(a_{i'} \ne 0\), then a one-coincidence trajectory of this type exists.

We now consider closed joint trajectories with only one coincidence in each individual trajectory. The following lemma generalizes Lemma 8.1.

Lemma 8.3

Assume w is nontrivial and cyclically reduced. Let \(v_1, \ldots , v_r \in V\) be linearly independent. Suppose the joint trajectory of \(v_1, \ldots , v_r\) is closed. Suppose there is just one coincidence in each individual trajectory, and suppose the coincidence in the trajectory of \(v_i\) occurs at step \((t_i, i)\). Then

$$\begin{aligned} w = (w_d \cdots w_1)^{\ell / d}, \qquad \text {where}~d = \gcd (t_1, \ldots , t_r, \ell ). \end{aligned}$$

In particular if w is not a proper power then \(t_i = \ell \) for each i.

Proof

As in the proof of Lemma 8.1, let \({{\widetilde{w}}}\) be the left-infinite \(\ell \)-periodic extension of w, and note that the trajectory of \(v_1, \ldots , v_r\) under \({{\widetilde{w}}}\) is just the \(\ell \)-periodic extension of the trajectory under w, and there are no further free choices.

The choice at step (ti) must be free for \(t \le t_i\) and forced for \(t > t_i\). Therefore the vectors \((v_i^t)_{1 \le i \le r, 0 \le t < t_i}\) are linearly independent and the whole trajectory is contained in their span. Since there is a coincidence at step \((t_i, i)\), we have

$$\begin{aligned} v_i^{t_i} = \sum _{(t, j) \prec (t_i, i)} a_{itj} v_j^t \qquad (a_{itj} \in {\mathbf {F}}_q), \end{aligned}$$
(11)

where \(a_{itj} = 0\) whenever \(t \ge t_j\) (and \((t, j) \prec (t_i, i)\) means \(t < t_i\) or \(t = t_i\) and \(j < i\), as in Sect. 3.3). Let \(A_0\) be the \(r \times r\) matrix

$$\begin{aligned} A_0 = (a_{i0j} : 1 \le i,j \le r). \end{aligned}$$

The matrix \(A_0\) must be nonsingular, for otherwise we could not have \((v_1^\ell , \ldots , v_r^\ell ) = (v_1^0, \ldots , v_r^0)\). In particular, for each i there is some j such that \(a_{i0j} \ne 0\). Since step \((t_i+1, i)\) is forced, the value of \(w_{t_i+1} v_j^0\) must be known; hence

$$\begin{aligned} w_{t_i+1} = w_1. \end{aligned}$$

Consider the joint trajectory of \((v_1^1, \ldots , v_r^1)\) under

$$\begin{aligned} {{\widetilde{w}}}' = {{\widetilde{w}}} w_1^{-1} = \cdots w_3 w_2, \end{aligned}$$

which is just \((v_i^t)_{1 \le i \le r, t \ge 1}\). Since \(A_0\) is nonsingular, we have

$$\begin{aligned} {\text {span}}\{ v_i^t : 1 \le i \le r, 1 \le t \le t_i\} = {\text {span}}\{ v_i^t : 1 \le i \le r, 0 \le t \le t_i - 1\}. \end{aligned}$$

Therefore the vectors \((v_i^t)_{1 \le i \le r, 1 \le t \le t_i}\) are linearly independent, and the joint trajectory of \((v_1^1, \ldots , v_r^1)\) under \({{\widetilde{w}}}'\) has the same behaviour as that of \((v_1, \ldots , v_r)\) under \({{\widetilde{w}}}\): the trajectory of \(v_i^1\) has just one coincidence, at step \((t_i, i)\) (when \(v_i^{t_i+1}\) is chosen). Therefore by the same argument \(w'_{t_i+1} = w'_1\), or

$$\begin{aligned} w_{t_i+2} = w_2. \end{aligned}$$

Repeating the argument as many times as necessary, we conclude that the period of \({{\widetilde{w}}}\) divides \(t_i\) for each i. \(\square \)

Remark 8.4

The discussion in Remark 8.2 generalizes too. From (11) and forcedness, we have

$$\begin{aligned} v_i^{t_i + 1} = \sum _{(t, j) \prec (t_i, i)} a_{itj} v_j^{t \pm 1} \qquad (i \in \{1, \ldots , r\}), \end{aligned}$$
(12)

where the signs are chosen depending on whether \(w_{t+1} = w_1\) or \(w_t = w_1^{-1}\). The latter case can arise only for \(t > 0\), so no \(v_j^0\) can appear in this expression. Hence (12) is the analogue of (10) for the joint trajectory of \((v_1^1, \ldots , v_r^1)\). As before there are no-crossing, no-meeting, and time-consistency rules for the indices t such that \(a_{itj}\ne 0\) for some ij, so in fact we can never have \(v_j^{t-1}\).

We conclude that

$$\begin{aligned} v_i^{t_i+s} = \sum _{(t, j) \prec (t_i, i)} a_{itj} v_j^{t+s} \end{aligned}$$

for all \(s \ge 0\), and hence the trajectory of \((v_1^s, \ldots , v_r^s)\) corresponds with the trajectory of \((Z^s X_1, \ldots , Z^s X_r)\) in the \({\mathbf {F}}_q[Z]\)-module \(({\mathbf {F}}_q[Z] X_1 \oplus \cdots \oplus {\mathbf {F}}_q[Z] X_r) / \langle f_1, \ldots , f_r \rangle \), where

$$\begin{aligned} f_i = Z^{t_i} X_i - \sum _{(t, j) \prec (t_i, i)} a_{itj} Z^t X_j, \end{aligned}$$

and we must have

$$\begin{aligned} \langle (Z^\ell - 1) X_1, \ldots , (Z^\ell -1) X_r \rangle \subseteq \langle f_1, \ldots , f_r \rangle . \end{aligned}$$

Write \(f_i = \sum _j p_{ij} X_j\) for some \(p_{ij} \in {\mathbf {F}}_q[Z]\) and let \(F = (p_{ij} : 1 \le i, j \le r)\). Then there must exist a matrix \(E \in {\text {M}}_r({\mathbf {F}}_q[Z])\) with

$$\begin{aligned} (Z^\ell - 1) I = E F. \end{aligned}$$

This is possible if and only if \(\det F\) divides \(Z^\ell - 1\).

9 Expansion in low-degree representations

We turn now to the proof of Theorem 1.3. We again consider the action of \(G = {\text {Cl}}_n(q)\) on linearly independent r-tuples of vectors, and we again consider trajectories under the action of a fixed word \(w \in F_k\), much as in Sect. 4. The difference is mainly one of parameter regime. In Sect. 4 we considered r-tuples with r as large as cn for constant c, and we were satisfied with somewhat crude bounds. In this section we consider \(r = O(1)\), and we seek sharper bounds. Our aim is to show that, in an orbit of G of size N, the probability that a trajectory under a given word closes is close to 1/N, with a small relative error; if we can do this it follows that there is a uniform spectral gap. We begin with the case of \(r=1\), which contains most of the key ideas.

9.1 The defining representation

Now let \(x_1, \ldots , x_k \in G\) be chosen uniformly at random. Let \({{\overline{w}}} = w(x_1, \ldots , x_k)\). Let \(v \in V \setminus \{0\}\). Let \(N = |G v|\). By Witt’s lemma (Lemma 2.2), N is the number of \(u \in V \setminus \{0\}\) such that \(Q(u) = Q(v)\). Thus, by Lemma 2.1, \(N = q^n/q_0 + O(q^{n/2})\). More generally, if \(U \le V\) is a subspace of dimension d then

$$\begin{aligned} |Gv \cap U| = q^d/q_0 + O(q^{n/2}). \end{aligned}$$

Lemma 9.1

Assume w is nontrivial and not a proper power. Assume \(\ell < n/4\). Then

$$\begin{aligned} {\mathbf {P}}({{\overline{w}}} v = v) \le \frac{1}{N} \left( 1 + O(q^{2\ell - n/2}) \right) . \end{aligned}$$

Proof

By Lemma 3.1 we may also assume that w is cyclically reduced, as replacing w by its cyclic reduction can only decrease its length. In this case Lemma 8.1 implies that the event that \({{\overline{w}}} v = v\) is contained in the union of the following two events:

\(E_1\)::

the trajectory \(v^0, \ldots , v^\ell \) has exactly one coincidence, occuring at step \(\ell \), and \(v^\ell = v^0\),

\(E_2\)::

the trajectory \(v^0, \ldots , v^\ell \) has at least two coincidences.

We can bound the probability of \(E_2\) using Lemma 3.3. Suppose there is a free choice at step \(t \le \ell \). There are t previous vectors, so the probability of a coincidence, conditional on previous steps, is bounded by

$$\begin{aligned} \frac{q^t}{q^{n-t} - q^{t-1} - q^{n/2}}. \end{aligned}$$

Similarly, the conditional probability of a coincidence at a later step \(t'\) is bounded by

$$\begin{aligned} \frac{q^{t'-1}}{q^{n-t'} - q^{t'-1} - q^{n/2}}. \end{aligned}$$

Summing over \(t < t' \le \ell \), we find, using \(\ell < n/4\),

$$\begin{aligned} {\mathbf {P}}(E_2)\le & {} \sum _{1\le t< t' \le \ell } \frac{q^{t+t'-1}}{(q^{n-\ell } - q^{\ell - 1} - q^{n/2})^2} \\\ll & {} q^{4\ell - 2n} \le q^{4 \ell - n} / N < q^{2 \ell - n/2} / N. \end{aligned}$$

Hence we may focus on the event \(E_1\). In the linear case (\({\text {Cl}}={\text {SL}}\)), \(v^\ell \) is chosen uniformly at random outside a linear subspace of dimension at most \(\ell -1\), so the probability of \(E_1\) is bounded by

$$\begin{aligned} \frac{1}{q^n - q^{\ell - 1}} = \frac{1}{N} \left( 1 + O(q^{\ell - n}) \right) . \end{aligned}$$

This completes the proof in this case.

In general, the situation is complicated by form conditions, as previous choices may significantly impact the probability that \(v^\ell = v^0\), even if there were no previous coincidences.

Let \(\xi = w_\ell \). The choice of \(v^\ell \) is subject to one linear constraint for every occurence of \(\xi = w_\ell \) as \(w_t\) or \(w_{t+1}^{-1}\) for some \(t < \ell \). Each such occurence is the end of a maximal subword matching a prefix \(u = w_\ell \cdots w_{\ell -s+1}\) of w, forward in the case \(\xi = w_t\) and backward in the case \(\xi = w_{t+1}^{-1}\) (see Fig. 1). Write \(s = s(t)\) and \(u = u(t)\). Define

$$\begin{aligned} T_1&= \{t< \ell : \xi = w_{t+1}^{-1}\}, \\ T_2&= \{t< \ell : \xi = w_t, \ell - s(t) > t\}, \\ T_3&= \{t < \ell : \xi = w_t, \ell - s(t) \le t\}. \end{aligned}$$

Note that, for \(t \in T_1\), we must have \(t + s < \ell - s\), because \(w_\ell \cdots w_{t+1}\) is reduced. In the \(\xi = w_t\) case it is possible that the subword overlaps (or is adjacent to) the matching prefix, and the division into \(T_2\) and \(T_3\) reflects this possibility.

Fig. 1
figure 1

The word w and one of its maximal subwords matching a prefix u. Each occurence of the letter \(w_\ell \) or \(w_\ell ^{-1}\) is the end of one such subword. In the \(w_\ell = w_{t+1}^{-1}\) case we must have \(t + s < \ell - s\)

The choice of \(v^\ell \) at step \(\ell \) is constrained by the linear conditions

$$\begin{aligned} f(v^\ell , v^t)&= f(v^{\ell -s}, v^{t+s})&(t \in T_1) \\ f(v^\ell , v^t)&= f(v^{\ell -s}, v^{t-s})&(t \in T_2 \cup T_3) \end{aligned}$$

(where \(s = s(t)\)). We need to determine whether \(v^0\) is in this affine subspace. Obviously this is the case if and only if

$$\begin{aligned} f(v^0, v^t)&= f(v^{\ell -s}, v^{t+s})&(t \in T_1) \\ f(v^0, v^t)&= f(v^{\ell -s}, v^{t-s})&(t \in T_2 \cup T_3). \end{aligned}$$

Write \(C_t\) for this condition. For \(t \in T_1 \cup T_2\), the truth or falsity of \(C_t\) is determined at step \(\ell - s\), because \(\ell - s > t + s\) in the \(t \in T_1\) case and \(\ell - s > t\) in the \(t \in T_2\) case. The condition is not determined before step \(\ell - s\) by maximality of u(t). For \(t \in T_3\), \(C_t\) is settled at step t, because \(t \ge \ell - s\). The condition is not settled before step t because \(w_t = w_\ell \ne w_1^{-1}\) (since w is cyclically reduced).

Note that we may have \(\ell - s = t\) for \(t \in T_3\): this is the case in which the subword is adjacent to the prefix (see Fig. 2). In this case the condition \(C_t\) is

$$\begin{aligned} f(v^0, v^t) = f(v^t, v^{t-s}). \end{aligned}$$

However, we cannot have also \(t - s = 0\), for then we would have \(w = u^2\). Hence, by linear independence of \(v^0, \ldots , v^{t-1}\), still the condition \(C_t\) is settled at step t and not before. Note, however, if G unitary then \(C_t\) is linear only over \({\mathbf {F}}_{q_0}\) (because the form f is only sesquilinear).

Fig. 2
figure 2

The case \(t = \ell - s \in T_3\). In this case we must have \(t - s > 0\), or else \(w = u^2\)

There is a case that may arise in which the various conditions \(C_{t'}\) settled at a given step t are not independent. This is the case in which \(t \in T_3\) and \(t = \ell - s'\) for some \(t' \in T_2\), where \(s' = s(t')\), and \(t' - s' = 0\) (see Fig. 3). Let \(T_4\) be the set of such steps t and let \(T_3' = T_3 \setminus T_4\). If \(t \in T_4\) then we have an overdetermined pair of conditions

$$\begin{aligned} f(v^0, v^{t'})&= f(v^t, v^0)&(C_{t'})\\ f(v^0, v^t)&= f(v^{\ell -s}, v^{t-s})&(C_t). \end{aligned}$$

This system is consistent if and only if

$$\begin{aligned} f(v^{t'}, v^0) = f(v^{\ell -s}, v^{t-s}). \end{aligned}$$

For \(t \in T_4\) let us redefine \(C_t\) to be this reduced condition. Certainly \(t - s < \ell - s\), and if \(t' = \ell - s\) then \(wu' = u'w\), so w is a proper power, contrary to hypothesis. Hence \(C_t\) is settled at step \(\ell - s \le t\).

Fig. 3
figure 3

The case \(t \in T_4 \subseteq T_3\). Here \(t = \ell - s'\) for some \(t' \in T_2\) with \(t' - s' = 0\). If \(t' = \ell - s\) then w must be a proper power

Now consider any step \(t \in \{1, \ldots , \ell -1\}\), and consider all those conditions \(C_{t'}\) which are settled at step t. These conditions are \(C_{t'}\) for \(t' \in T_1 \cup T_2 \cup T_4\) such that \(\ell - s' = t\), as well as \(C_t\) if \(t \in T_3'\), i.e.,

$$\begin{aligned} f(v^t, v^{t' + s'})&= f(v^0, v^{t'})&(t' \in T_1, \ell - s' = t) \\ f(v^t, v^{t' - s'})&= f(v^0, v^{t'})&(t' \in T_2, \ell - s' = t) \\ f(v^t, v^{t' - s'})&= f(v^{t''}, v^0)&(t' \in T_4, \ell - s' = t) \\ f(v^t, v^0)&= f(v^{t-s}, v^{\ell -s})&(\text {if}~t \in T_3'.) \end{aligned}$$

We claim that these affine conditions for \(v^t\) are independent, and it suffices to demonstrate that the indices \(t' + s'\) (\(t' \in T_1, \ell - s' = t\)), \(t' - s'\) (\(t' \in T_2 \cup T_4\), \(\ell - s' = t\)), and 0 if \(t \in T_3'\) are all distinct. Since \(s' = \ell - t\) is a constant, the indices \(t'+s'\) are all distinct for \(t' \in T_1\), as are the indices \(t' - s'\) for \(t' \in T_2 \cup T_4\). Moreover we cannot have \(t_1 + s_1 = t_2 - s_2\) for \(t_1 \in T_1\) and \(t_2 \in T_2 \cup T_4\) with \(\ell - s_1 = \ell - s_2 = t\), because then we would have \(w_{t_1+s_1} = w_{t_2-s_2+1}^{-1} = w_{t_1+s_1+1}^{-1}\), in contradiction with the reducedness of w. If \(t' - s' = 0\) for some \(t' \in T_2\) then \(t \in T_4\) by definition, so \(t \notin T_3'\). Finally, if \(t' \in T_4\) then we cannot have \(t' - s' = 0\) unless w is a proper power, as discussed.

Hence, by linear independence of \(v^0, \ldots , v^{t-1}\), the h (say) conditions \(C_{t'}\) settled at step t consist of h independent affine linear conditions for \(v^t\), or, in the unitary case, if \(t = \ell - s \in T_3\), 2h independent affine linear conditions over \({\mathbf {F}}_{q_0}\). Suppose \(v^t\) is drawn from a subspace of codimension d (d is the number of previous occurences of \(w_t\) or \(w_{t}^{-1}\)). Then, by Lemma 2.1 and Lemma 3.3, the probability that all these conditions are satisfied, conditional on the past trajectory \(v^0, \ldots , v^{t-1}\), is

$$\begin{aligned} \frac{q^{n-d-h}/q_0 + O(q^d + q^{n/2})}{q^{n-d}/q_0 + O(q^d + q^{n/2})}&= q^{-h} \left( 1 + O(q^{h+d-n/2} q_0)\right) \nonumber \\&= q^{-h} \left( 1 + O(q^{\ell + t - n/2})\right) \end{aligned}$$
(13)

(in the second line we used \(h < \ell \), \(d < t\), and \(q_0 \le q\)).

Suppose \(H = |T_1| + |T_2| + |T_3'| + |T_4|\) (i.e., let \(H+1\) be the number of appearances of \(w_\ell \) or \(w_\ell ^{-1}\) in w). Taking the product of (13) over all t, the probability that \(C_{t'}\) is satisfied for every \(t' \in T_1 \cup T_2 \cup T_3' \cup T_4\) is

$$\begin{aligned} q^{-H}\left( 1 + O(q^{2\ell -n/2}) \right) . \end{aligned}$$

The conditions \(C_t\) are prequisite to the event \(v^\ell = v^0\). If all these conditions are satisfied, then at step \(\ell \) the vector \(v^\ell \) is drawn from an affine subspace of codimension H which includes \(v^0\). Note also that \(Q(v^{\ell -1}) = Q(v^0)\). Hence, from Lemma 3.3,

$$\begin{aligned} {\mathbf {P}}(v^\ell = v^0 \mid v^0, \ldots , v^{\ell -1}) = \frac{1}{q^{n-H} / q_0 - O(q^H) - O(q^{n/2})}. \end{aligned}$$

Hence the overall probability of \(E_1\) is bounded by

$$\begin{aligned} \frac{q^{-H}\left( 1 + O(q^{2\ell - n/2})\right) }{q^{n-H} / q_0 - O(q^H) - O(q^{n/2})}&= \frac{1}{q^n/q_0} \left( 1 + O(q^{2\ell - n/2}) \right) \\&= \frac{1}{N} \left( 1 + O(q^{2\ell - n/2}) \right) . \end{aligned}$$

Thus in all cases the error is bounded as claimed. \(\square \)

Remark 9.2

In the linear case, the hypothesis that w is not a proper power is needed only to ensure that the event \(v^\ell = v^0\) is contained in \(E_1 \cup E_2\); we do not need the hypothesis in order to bound \({\mathbf {P}}(E_1)\) or \({\mathbf {P}}(E_2)\). By contrast, at least in the orthogonal case, we do need this hypothesis in order to bound \({\mathbf {P}}(E_1)\) satisfactorily, so at least some of the complexity of the above proof is necessary. Suppose \(G = {\text {GO}}_n(q)\) and \(w = u^2\) for some word u of length \(\ell / 2\). Then the choice of \(v^\ell \) is constrained by

$$\begin{aligned} f(v^\ell , v^{\ell /2}) = f(u v^{\ell /2}, u v^0) = f(v^{\ell /2}, v^0) = f(v^0, v^{\ell /2}). \end{aligned}$$

Hence \(v^\ell \) is always restricted to an affine hyperplane that includes \(v^0\), so the probability that \({{\overline{w}}} v = v\) will be at least approximately q/N, even conditionally on there being only one coincidence.

Remark 9.3

On the other hand, it is usually possible to cyclically rotate w so that much of the complexity in the previous proof disappears. For example, if w can be cyclically rotated so that it has no square prefix, then, after such a rotation, \(T_3 = \emptyset \). Not every non-proper-power has this property,Footnote 6 but almost all words do.

We can now prove that the permutation action of uniformly random \(x_1, \ldots , x_k \in G\) on an orbit \(Gv \subseteq V\) has a uniform spectral gap. Assume \(v \ne 0\). As usual let \({\mathcal {A}}\) be the normalized adjacency operator

$$\begin{aligned} {\mathcal {A}}= \frac{1}{2k} \sum _{i=1}^k (x_i + x_i^{-1}) \end{aligned}$$

acting on \({\mathbf {C}}[Gv]\), and let \(1 = \lambda _1 \ge \lambda _2 \ge \cdots \ge \lambda _N\) be the spectrum. Let \(\lambda = \max (\lambda _2, -\lambda _N)\). Then, for even \(\ell \),

$$\begin{aligned} 1 + \lambda ^\ell \le {\text {tr}}{\mathcal {A}}^\ell = {\mathbf {E}}_w |\{u \in Gv : {{\overline{w}}} u = u\}|, \end{aligned}$$

where w is the result of a simple random walk of length \(\ell \) in \(F_k\). Let \({\mathcal {P}}\subseteq F_k\) be the set of proper powers \(w^m\) (\(w \in F_k, m \ge 2\)). Then

$$\begin{aligned} {\mathbf {E}}\lambda ^\ell&\le {\mathbf {E}}_{x_1, \ldots , x_k} {\mathbf {E}}_w |\{ u \in Gv : {{\overline{w}}} u = u \}| - 1\\&= {\mathbf {E}}_w \left( {\mathbf {P}}({{\overline{w}}} v = v) - \frac{1}{N} \right) N\\&\le {\mathbf {P}}(w\in {\mathcal {P}}) N + \max _{w \notin {\mathcal {P}}, |w| \le \ell } \left( {\mathbf {P}}({{\overline{w}}} v = v) - \frac{1}{N} \right) N. \end{aligned}$$

By [15, Lemma 2.6],

$$\begin{aligned} {\mathbf {P}}(w \in {\mathcal {P}}) \ll \ell \left( \frac{2k - 1}{k^2}\right) ^{\ell / 2} \ll k^{-c\ell }. \end{aligned}$$

By Lemma 9.1,

$$\begin{aligned} \max _{w \notin {\mathcal {P}}, |w| \le \ell } {\mathbf {P}}({{\overline{w}}} v = v) \le \frac{1}{N}\left( 1 + O(q^{2\ell - n/2})\right) , \end{aligned}$$

provided \(\ell < n/4\). Hence

$$\begin{aligned} {\mathbf {E}}\lambda ^\ell \ll k^{-c\ell } q^n + q^{2\ell - n/2}. \end{aligned}$$

Take \(\ell \sim n/5\). If \(\log k / \log q\) is sufficiently large then

$$\begin{aligned} {\mathbf {E}}\lambda ^\ell \le q^{-c' \ell }. \end{aligned}$$

Hence, by Markov’s inequality,

$$\begin{aligned} {\mathbf {P}}(\lambda \ge q^{-c'/2}) = {\mathbf {P}}(\lambda ^\ell \ge q^{-c'\ell /2}) \le q^{c'\ell /2} {\mathbf {E}}\lambda ^\ell \le q^{-c'\ell /2} \le q^{-c''n}, \end{aligned}$$

so almost surely \(\lambda < q^{-c'/2}\).

9.2 The action on r-tuples

We now generalize the argument of the previous subsection to r-tuples of vectors, where r is bounded. It will be convenient to use the following notation. For \(v, v' \in V^r\), let \(f(v, v')\) denote the \(r \times r\) matrix

$$\begin{aligned} f(v, v')_{ij} = f(v_i, v'_j). \end{aligned}$$

Define also

$$\begin{aligned} Q(v)_i = Q(v_i). \end{aligned}$$

Let \(v = (v_1, \ldots , v_r) \in V^r\), where \(v_1, \ldots , v_r \in V\) are linearly independent. Let \(N = |Gv|\). By Witt’s lemma, N is the number of \(v' \in V^r\) with \(v'_1, \ldots , v'_r\) linearly independent such that \(f(v, v) = f(v', v')\) and \(Q(v) = Q(v')\). In the linear case,

$$\begin{aligned} N&= (q^n - 1) (q^n - q) \cdots (q^n - q^{r-1})\\&= q^{rn} \left( 1 - O(q^{-n+r-1})\right) . \end{aligned}$$

In the other cases we have, inductively, using Lemma 2.1,

$$\begin{aligned} N&= |G(v_1, \ldots , v_{r-1})| (q^{n-r+1} / q_0 + O(q^{n/2})) \nonumber \\&= q^{rn - r(r-1)/2} / q_0^r \left( 1+ O(q^{-n/2+r-1} q_0)\right) . \end{aligned}$$
(14)

Lemma 9.4

Assume w is nontrivial and not a proper power. Assume \(\ell r^2 < n/4\). Then

$$\begin{aligned} {\mathbf {P}}({{\overline{w}}} v = v) \le \frac{1}{N} \left( 1 + O( q^{2 \ell r - n/2} ) \right) . \end{aligned}$$

Proof

Again we may assume w is cyclically reduced. In this case Lemma 8.3 implies that the event that \({{\overline{w}}} v = v\) is contained in the union of the following two events:

\(E_1\)::

the joint trajectory \((v_i^t)\) has exactly one coincidence in each individual trajectory, each occuring at the final step \(t=\ell \), and \(v_i^\ell = v_i^0\) for each i,

\(E_2\)::

the joint trajectory \((v_i^t)\) has at least \(r+1\) coincidences.

Again we can bound the probability of \(E_2\) using Lemma 3.3. Suppose there is a free choice at step (ti). There are at most \(t r + i - 1 \le \ell r\) previous vectors, so the conditional probability of a coincidence is bounded by

$$\begin{aligned} \frac{q^{t r + i - 1}}{q^{n - \ell r} - q^{\ell r - 1} - q^{n/2}} = q^{tr + i - 1 + \ell r - n} \left( 1 + O(q^{\ell r - n/2})\right) . \end{aligned}$$

Hence the probability of \(E_2\) is bounded by (summing over all possibilities for \(r+1\) coincidences)

$$\begin{aligned} q^{(2 \ell r - n)(r+1)} \left( 1 + O(r q^{\ell r - n/2})\right) \ll q^{(2 \ell r - n)(r+1)}. \end{aligned}$$

Using \(N \le q^{rn}\), this is at most

$$\begin{aligned} q^{2 \ell r (r+1) - n} / N \le q^{2 \ell r - n/2} / N. \end{aligned}$$

Hence we may focus on the event \(E_1\). In the linear case, for each i the vector \(v_i^\ell \) is chosen uniformly at random outside a linear subspace of dimension at most \(\ell r\), so the probability of \(E_1\) is bounded by

$$\begin{aligned} \left( \frac{1}{q^n - q^{\ell r}}\right) ^r&= q^{-rn} \left( 1 + O(r q^{\ell r - n}) \right) \\&= \frac{1}{N} \left( 1 + O(r q^{\ell r - n}) \right) . \end{aligned}$$

This completes the proof in this case.

As in the previous subsection, the general situation is complicated by form conditions, but fortunately few changes are necessary in the \(r > 1\) case. Let \(\xi = w_\ell \). Assume there are \(H+1\) occurences of \(\xi \) or \(\xi ^{-1}\) in w, and consider the H maximal subwords u ending with \(\xi \) or \(\xi ^{-1}\) and matching a proper prefix of w, as in Fig. 1. Define \(T_1\), \(T_2\), and \(T_3 = T'_3 \cup T_4\) as before.

The choice of \(v^\ell \) at step \(\ell \) is constrained by the linear conditions

$$\begin{aligned} f(v^\ell , v^t)&= f(v^{\ell -s}, v^{t+s})&(t \in T_1) \\ f(v^\ell , v^t)&= f(v^{\ell -s}, v^{t-s})&(t \in T_2 \cup T_3) \end{aligned}$$

(where \(s = s(t)\)). For \(t \in T_1 \cup T_2 \cup T_3'\) we have a condition \(C_t\) defined by

$$\begin{aligned} f(v^0, v^t)&= f(v^{\ell -s}, v^{t+s})&(t \in T_1) \\ f(v^0, v^t)&= f(v^{\ell -s}, v^{t-s})&(t \in T_2 \cup T_3'). \end{aligned}$$

For \(t \in T_4\) the condition \(C_t\) is the reduced condition

$$\begin{aligned} f(v^{t'}, v^0) = f(v^{\ell - s}, v^{t-s}). \end{aligned}$$

Conditional on linear independence of \(v_i^t\) for \(1 \le i \le r\) and \(t < \ell \), it can be verified exactly as in the \(r = 1\) case that the conditions settled at any given step \(t < \ell \) are precisely \(C_{t'}\) for \(t' \in T_1 \cup T_2 \cup T_4\) and \(\ell - s' = t\), as well as \(C_t\) if \(t \in T_3'\), and these conditions are linearly independent.

Suppose at step \(t < \ell \) there are h conditions \(C_{t'}\) to be settled. Assume first that we are not in the case \(t = \ell - s \in T_3'\) (the case in which the subword is adjacent to the prefix, as in Fig. 2). Let d be the number of previous occurences of \(w_t\) or \(w_t^{-1}\). Then, by Lemma 3.3, at step (ti) the vector \(v_i^t\) is drawn from an affine subspace of codimension \(d' = dr + i-1\), less a subspace of dimension \(d'\), subject to the quadratic condition \(Q(v_i^t) = Q(v_i^{t-1})\). Hence, using Lemma 2.1, the probability that ji-component of each \(C_{t'}\) is satisfied for each \(j \in \{1, \ldots , r\}\) is

$$\begin{aligned} \frac{q^{n - d' - hr}/q_0 + O(q^{d'} + q^{n/2})}{q^{n-d'}/q_0 + O(q^{d'} + q^{n/2})}&= q^{-hr} \left( 1 + O( q^{hr+d'-n/2} q_0) \right) \nonumber \\&= q^{-hr} \left( 1 + O( q^{\ell r + (t-1) r + i-1 -n/2}) \right) \end{aligned}$$
(15)

(using \(h < \ell \), \(d' \le (t-1)r + i-1\), and \(q_0 \le q\)). Taking the product over all i, the probability that each \(C_{t'}\) is satisfied after step t is

$$\begin{aligned} q^{-hr^2} \left( 1 + O(q^{\ell r + tr - n/2}) \right) . \end{aligned}$$
(16)

The case \(t = \ell - s \in T_3'\) is slightly different. In this case the ji-component of \(C_t\) is

$$\begin{aligned} f(v_j^0, v_i^t) = f(v_j^t, v_i^{t-s}). \end{aligned}$$

This condition is settled at step (tk), where \(k = \max (i, j)\). Hence \(2k-1\) components of \(C_t\) are settled at step (tk). Therefore, in this case, (15) must be replaced with

$$\begin{aligned} q^{-(h-1)r - (2i - 1)} \left( 1 + O( q^{\ell r + (t-1) r + i-1 -n/2}) \right) . \end{aligned}$$

Taking the product over all i again gives (16).

Taking the product of (16) over all t, the probability that \(C_{t'}\) is satisfied for every \(t' \in T_1 \cup T_2 \cup T_3' \cup T_4\) is

$$\begin{aligned} q^{-Hr^2}\left( 1 + O(q^{2\ell r - n/2}) \right) . \end{aligned}$$
(17)

Finally, if all the conditions \(C_t\) are satisfied, then for each i the vector \(v_i^\ell \) is drawn from an affine subspace of codimension \(Hr+i-1\) which includes \(v_i^0\), less a subspace of dimension \(Hr+i-1\), subject to the quadratic condition \(Q(v_i^\ell ) = Q(v_i^{\ell -1}) = Q(v_i^0)\). Hence

$$\begin{aligned}&{\mathbf {P}}(v_i^\ell = v_i^0 \mid (v_j^t, (t, j) \prec (\ell , i)))\\&\quad = \frac{1}{q^{n-Hr-i+1} / q_0 - O(q^{Hr+i-1}) - O(q^{n/2})} \\&\quad = (q^{n-Hr-i+1}/q_0)^{-1} \left( 1 + O(q^{Hr+i-1-n/2} q_0) \right) \end{aligned}$$

Hence the conditional probability that \(v^\ell = v^0\) is

$$\begin{aligned} (q^{nr-Hr^2-r(r-1)/2} / q_0^r)^{-1} \left( 1 + O(q^{(H+1)r - n/2}) \right) . \end{aligned}$$

Hence the overall probability of \(E_1\) is, multiplying the previous line by (17),

$$\begin{aligned} (q^{nr - r(r-1)/2} / q_0^r)^{-1} \left( 1 + O(q^{2\ell r - n/2}) \right) . \end{aligned}$$

Comparing with (14), this is

$$\begin{aligned} N^{-1} \left( 1 + O(q^{2\ell r - n/2}) \right) . \end{aligned}$$

Thus in all cases the error is bounded as claimed. \(\square \)

We can now prove that the permutation action of uniformly random \(x_1, \ldots , x_k \in G\) on an orbit \(Gv \subseteq V^r\) has a uniform spectral gap. The argument is little different from that in the previous subsection. We may assume \(v_1, \ldots , v_r\) are linearly independent, by reducing r if necessary. Suppose the adjacency operator \({\mathcal {A}}\) acting on \({\mathbf {C}}[Gv]\) has spectrum \(1 = \lambda _1 \ge \cdots \ge \lambda _N\). Let \(\lambda = \max (\lambda _2, -\lambda _N)\). For even \(\ell \), let w be the result of a simple random walk of length \(\ell \) in \(F_k\). Then

$$\begin{aligned} {\mathbf {E}}\lambda ^\ell \le {\mathbf {P}}(w \in {\mathcal {P}}) N + \max _{w \notin {\mathcal {P}}, |w|\le \ell } \left( {\mathbf {P}}({{\overline{w}}} v = v) - \frac{1}{N}\right) N. \end{aligned}$$

We bound \({\mathbf {P}}(w \in {\mathcal {P}})\) as before, while by Lemma 9.4 we have

$$\begin{aligned} \max _{w \notin {\mathcal {P}}, |w| \le \ell } {\mathbf {P}}({{\overline{w}}} v = v) \le \frac{1}{N} \left( 1 + O(q^{2\ell r - n/2}) \right) , \end{aligned}$$

provided \(\ell r^2 < n/4\). Hence

$$\begin{aligned} {\mathbf {E}}\lambda ^\ell \ll k^{-c \ell } q^{rn} + q^{2 \ell r - n/2}. \end{aligned}$$

Take \(\ell \sim n/(5 r^2)\). If \(\log k / \log q \ge C r^3\), for a sufficiently large constant C, then

$$\begin{aligned} {\mathbf {E}}\lambda ^\ell \le q^{-c' \ell }. \end{aligned}$$

Hence, by Markov’s inequality,

$$\begin{aligned} {\mathbf {P}}(\lambda \ge q^{-c'/2}) \le q^{c' \ell / 2} {\mathbf {E}}\lambda ^\ell \le q^{-c'\ell / 2} < q^{-c''n/r^2}, \end{aligned}$$

so almost surely \(\lambda < q^{-c'/2}\), as before.

9.3 Other low-degree representations

The result of the final argument of the previous subsection can be expressed as follows.

Theorem 9.5

Let \({\mathbf {C}}[V^r]_0\) be the orthogonal complement of \({\mathbf {C}}[V^r]^G\) in \({\mathbf {C}}[V^r]\). Let \(x_1, \ldots , x_k \in G\) be uniform and independent, where \(k \ge q^{Cr^3}\) and \(r < cn^{1/4}\). Let \(\rho = \rho ({\mathcal {A}}, {\mathbf {C}}[V^r]_0)\) be the spectral radius of \({\mathcal {A}}= {\mathcal {A}}_{x_1, \ldots , x_k}\) acting on \({\mathbf {C}}[V^r]_0\). Then

$$\begin{aligned} {\mathbf {P}}(\rho > q^{-c}) < q^{-cn/r^2}. \end{aligned}$$

Proof

By Witt’s lemma, there are \(O(q^{r^2})\) orbits of G on \(V^r\). Let \(Gv_1, \ldots , Gv_s\) be a decomposition of \(V^r\) into G-orbits, where \(s \ll q^{r^2}\). Then

$$\begin{aligned} {\mathbf {C}}[V^r]_0 = {\mathbf {C}}[Gv_1]_0 \oplus \cdots \oplus {\mathbf {C}}[Gv_s]_0. \end{aligned}$$

Let \(\rho _i = \rho ({\mathcal {A}}, {\mathbf {C}}[Gv_i]_0)\) be the spectral radius of \({\mathcal {A}}\) on \({\mathbf {C}}[Gv_i]_0\). Then

$$\begin{aligned} \rho = \max _{1\le i \le s} \rho _i. \end{aligned}$$

From the previous subsection (possibly with a smaller r, if the components of \(v_i\) are not linearly independent), for each i we have

$$\begin{aligned} {\mathbf {P}}(\rho _i > q^{-c}) < q^{-c'n/r^2}. \end{aligned}$$

Hence

$$\begin{aligned} {\mathbf {P}}(\rho > q^{-c}) \ll q^{r^2 - c'n/r^2} < q^{-c'' n/r^2}. \end{aligned}$$

Our main interest is the conjugation action of G on a conjugacy class \({\mathfrak {C}}\subseteq {\text {SCl}}_n(q)\) of elements of degree \(s = O(1)\), which is actually a quotient of an orbit of G on \(V^s \oplus (V^*)^s\), where \(V^*\) is the dual space. It is possible to repeat the analysis of the previous subsection allowing also r factors of \(V^*\), but in fact this generalization follows formally, since \({\mathbf {C}}[V^*] \cong {\mathbf {C}}[V]\) (as both have character \(\chi (g) = q^{\dim \ker (g-1)}\)), so

$$\begin{aligned} {\mathbf {C}}[V^r \oplus (V^*)^r] \cong {\mathbf {C}}[V]^{\otimes r} \otimes {\mathbf {C}}[V^*]^{\otimes r} \cong {\mathbf {C}}[V]^{\otimes 2r} \cong {\mathbf {C}}[V^{2r}]. \end{aligned}$$

Corollary 9.6

(the conjugation action on \({\mathfrak {M}}\) is expanding) Let \(x_1, \ldots , x_k \in G\) be independent and uniformly random, where \(k > q^C\) and \(n > C\). Let \(\rho = \rho ({\mathcal {A}}, {\mathbf {C}}[{\mathfrak {M}}]_0)\) be the spectral radius of \({\mathcal {A}}\) acting on \({\mathbf {C}}[{\mathfrak {M}}]_0\). Then

$$\begin{aligned} {\mathbf {P}}(\rho > q^{-c}) \le q^{-cn}. \end{aligned}$$

Proof

We claim that \({\mathbf {C}}[{\mathfrak {M}}]\) is contained in \({\mathbf {C}}[V^{2s}]\). The map

$$\begin{aligned} V^s \oplus (V^*)^s&\rightarrow {\text {M}}_n({\mathbf {F}}_q) \\ (v_i, \phi _i)&\mapsto 1 + \sum _{i=1}^s v_i \otimes \phi _i. \end{aligned}$$

is a map of permutation representations (where G acts by conjugation on \(M_n({\mathbf {F}}_q)\)), and hence induces a map of \({\mathbf {C}}[G]\)-modules \({\mathbf {C}}[V^s \oplus (V^*)^s] \rightarrow {\mathbf {C}}[M_n({\mathbf {F}}_q)]\). The module \({\mathbf {C}}[{\mathfrak {M}}]\) is contained in the image, so it is isomorphic to a submodule of \({\mathbf {C}}[V^s \oplus (V^*)^s] \cong {\mathbf {C}}[V^{2s}]\) by complete reducibility. Hence the result follows from the previous theorem with \(r = 2s\). \(\square \)

10 Diameter of the Cayley graph

We now collect results from the previous sections and bound the diameter of the Cayley graph of the subgroup of \({\text {Cl}}_n(q)\) generated by random elements.

10.1 \({\text {GL}}_n(p)\) and 3 random elements

In this subsection we prove Theorem 1.2. Recall that \({\text {SL}}_n(p) \le G \le {\text {GL}}_n(p)\), where p is prime and \(\log p < cn / \log ^2 n\), the elements \(x, y, z \in G\) are chosen uniformly at random, and \(S = \{x^{\pm 1}, y^{\pm 1}, z^{\pm 1}\}\). We claim that with probability \(1-e^{-cn}\) we have

$$\begin{aligned}&\langle S \rangle \ge {\text {SL}}_n(p),~\text {and} \\&{\text {diam}}{\text {Cay}}(\langle S \rangle , S) \le n^{O(\log p)}. \end{aligned}$$

First we show that \(\langle S \rangle \ge {\text {SL}}_n(p)\) with high probability. The argument is a slight modification of [14, Sect. 5].Footnote 7

Let \({\mathfrak {C}}_1\) be the set of all irreducible \(g \in {\text {GL}}_n(p)\) of order \(d(p^n-1)/(p-1)\) for some \(d \mid (p-1)\). Each such g is equivalent to the multiplication action of some \(x \in {\mathbf {F}}_{p^n}\) of the same order, and \(\det g = N(x)\). Therefore, for each \(\alpha \in G^\text {ab}\cong {\mathbf {F}}_p^\times \), the \({\text {GL}}_n(p)\)-classes in \({\mathfrak {C}}_{1;\alpha } = {\mathfrak {C}}_1 \cap \alpha G'\) are in bijection with elements of \({\mathbf {F}}_{p^n}\), up to Galois conjugacy, of order \(d(p^n-1)/(p-1)\) and norm \(\alpha \), where d is the order of \(\alpha \). Note there are \(\phi (d)\) elements \(\alpha \) of order d. Moreover, each such \(g \in G\) has centralizer isomorphic to \({\mathbf {F}}_{p^n}^\times \). Hence

$$\begin{aligned} \frac{|{\mathfrak {C}}_{1;\alpha }|}{|{\text {GL}}_n(p)|} = \frac{\phi (d(p^n-1) / (p-1)) / \phi (d)}{n (p^n-1)} > e^{-o(n)}. \end{aligned}$$

Here we used the standard estimate \(\phi (m) \gg m / \log \log m\).

Let \({\mathfrak {C}}_2\) be the set of all \(g \in {\text {GL}}_n(p)\) of order \(p^{n-1}-1\) splitting V as \(\ell \oplus W\) for some \(\ell , W\) with \(\dim \ell = 1\), \(\dim W = n-1\). A similar calculation shows that

$$\begin{aligned} \frac{|{\mathfrak {C}}_{2;\alpha }|}{|{\text {GL}}_n(p)|} > e^{-o(n)} \end{aligned}$$

for each \(\alpha \in {\mathbf {F}}_p^\times \) in this case as well. (In fact, \({\mathfrak {C}}_2\) is uniform over \(\det \) fibres.)

Hence, by Corollaries 5.3 and 6.2 as in the proof of Theorem 7.3, with probability at least \(1 - e^{- c n}\) there are words \(w_1, w_2\) such that

$$\begin{aligned} w_i(x,y,z) \in {\mathfrak {C}}_i \qquad (i \in \{1, 2\}). \end{aligned}$$

By a straightforward adaptation of [14, Lemma 5.2] (assuming \(n > 6\), say),

$$\begin{aligned} \langle w_1(x, y, z), w_2(x, y, z) \rangle \ge {\text {SL}}_n(p). \end{aligned}$$

Hence indeed \(\langle S \rangle \ge {\text {SL}}_n(p)\).

In particular, using Schreier generators, there is a symmetric set \(S' \subseteq S^{2p} \cap {\text {SL}}_n(p)\) such that \(\langle S'\rangle = {\text {SL}}_n(p)\).

Meanwhile, by Theorem 1.1, with probability \(1 - e^{-cn}\) there is another word w of length \(n^{O(\log p)}\) such that

$$\begin{aligned} w(x, y, z) \in {\mathfrak {M}}. \end{aligned}$$

Let \(X = S' \cup \{w(x, y, z)^{\pm 1}\}\). By [22, Theorem 1.5] we have

$$\begin{aligned} {\text {diam}}{\text {Cay}}({\text {SL}}_n(p), X) \ll p n^{12}. \end{aligned}$$

As \(|\langle S \rangle /{\text {SL}}_n(p)| < p\), we thus have

$$\begin{aligned} {\text {diam}}{\text {Cay}}(\langle S \rangle , S) \ll p^2 n^{12 + C \log p} = n^{O(\log p)}. \end{aligned}$$

This completes the proof.

10.2 Classical groups and \(q^C\) random elements

In this subsection we prove Theorem 1.4. Recall that \(G = {\text {Cl}}_n(q)\), where \(n > C\), elements \(x_1, \ldots , x_k \in G\) are chosen uniformly at random where \(k > q^C\), and \(S = \{x_1^{\pm 1}, \ldots , x_k^{\pm 1}\}\). We claim that with probability \(1-q^{-cn}\) we have

$$\begin{aligned}&\langle S \rangle \ge {\text {SCl}}_n(p),~\text {and} \\&{\text {diam}}{\text {Cay}}(\langle S \rangle , S) \le q^2 n^C. \end{aligned}$$

By Theorem 7.4, with probability at least \(1 - q^{-c_1 n}\) there is a word w of length at most \(q^2 n^{C_1}\) so that

$$\begin{aligned} w(x_1, \ldots , x_k) \in {\mathfrak {M}}. \end{aligned}$$

Let \({\mathfrak {C}}\) be the conjugacy class of \(w(x_1, \ldots , x_k)\) in G. Note that \({\mathfrak {C}}\subseteq {\text {SCl}}_n(q)\). It follows from Corollary 9.6 that, with probability at least \(1 - q^{-c_2 n}\), the conjugation action of G on \({\mathfrak {C}}\) is expanding with spectral gap bounded away from zero. Hence (see, e.g., [31, Proposition 3.1.5 and Proposition 3.3.6])

$$\begin{aligned} {\text {diam}}{\text {Sch}}(G, S, {\mathfrak {C}}) \ll \log |{\mathfrak {C}}|. \end{aligned}$$

It follows that with probability at least \(1 - q^{-c_3 n}\), every element of \({\mathfrak {C}}\) is a word in S of length at most

$$\begin{aligned} q^2 n^{C_1} + O(\log |{\mathfrak {C}}|) \ll q^2 n^{C_2}. \end{aligned}$$

This already proves that \(\langle S \rangle \ge {\text {SCl}}_n(q)\). It follows from [33] that

$$\begin{aligned} {\text {diam}}{\text {Cay}}({\text {SCl}}_n(q), {\mathfrak {C}}) \ll \log |{\text {SCl}}_n(q)| / \log |{\mathfrak {C}}| \ll n. \end{aligned}$$

Hence

$$\begin{aligned} {\text {diam}}{\text {Cay}}(\langle S \rangle , S) \ll q^2 n^{C_2 + 1}. \end{aligned}$$

This completes the proof.

Corollary 1.5(2) follows immediately for \(q < n^{O(1)}\), since \(\log |G| \asymp n^2 \log q\). If q is larger then the claim follows from Alon–Roichman [1], which implies that the Cayley graph on \(C n^2 \log q\) random generators is almost surely an expander.